Sample records for partially supervised classification

  1. Design of partially supervised classifiers for multispectral image data

    NASA Technical Reports Server (NTRS)

    Jeon, Byeungwoo; Landgrebe, David

    1993-01-01

    A partially supervised classification problem is addressed, especially when the class definition and corresponding training samples are provided a priori only for just one particular class. In practical applications of pattern classification techniques, a frequently observed characteristic is the heavy, often nearly impossible requirements on representative prior statistical class characteristics of all classes in a given data set. Considering the effort in both time and man-power required to have a well-defined, exhaustive list of classes with a corresponding representative set of training samples, this 'partially' supervised capability would be very desirable, assuming adequate classifier performance can be obtained. Two different classification algorithms are developed to achieve simplicity in classifier design by reducing the requirement of prior statistical information without sacrificing significant classifying capability. The first one is based on optimal significance testing, where the optimal acceptance probability is estimated directly from the data set. In the second approach, the partially supervised classification is considered as a problem of unsupervised clustering with initially one known cluster or class. A weighted unsupervised clustering procedure is developed to automatically define other classes and estimate their class statistics. The operational simplicity thus realized should make these partially supervised classification schemes very viable tools in pattern classification.

  2. QUEST: Eliminating Online Supervised Learning for Efficient Classification Algorithms.

    PubMed

    Zwartjes, Ardjan; Havinga, Paul J M; Smit, Gerard J M; Hurink, Johann L

    2016-10-01

    In this work, we introduce QUEST (QUantile Estimation after Supervised Training), an adaptive classification algorithm for Wireless Sensor Networks (WSNs) that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting raw sensor data puts high demands on the battery, reducing network life time. By merely transmitting partial results or classifications based on the sampled data, the amount of traffic on the network can be significantly reduced. Such classifications can be made by learning based algorithms using sampled data. An important issue, however, is the training phase of these learning based algorithms. Training a deployed sensor network requires a lot of communication and an impractical amount of human involvement. QUEST is a hybrid algorithm that combines supervised learning in a controlled environment with unsupervised learning on the location of deployment. Using the SITEX02 dataset, we demonstrate that the presented solution works with a performance penalty of less than 10% in 90% of the tests. Under some circumstances, it even outperforms a network of classifiers completely trained with supervised learning. As a result, the need for on-site supervised learning and communication for training is completely eliminated by our solution.

  3. Partially supervised P300 speller adaptation for eventual stimulus timing optimization: target confidence is superior to error-related potential score as an uncertain label

    NASA Astrophysics Data System (ADS)

    Zeyl, Timothy; Yin, Erwei; Keightley, Michelle; Chau, Tom

    2016-04-01

    Objective. Error-related potentials (ErrPs) have the potential to guide classifier adaptation in BCI spellers, for addressing non-stationary performance as well as for online optimization of system parameters, by providing imperfect or partial labels. However, the usefulness of ErrP-based labels for BCI adaptation has not been established in comparison to other partially supervised methods. Our objective is to make this comparison by retraining a two-step P300 speller on a subset of confident online trials using naïve labels taken from speller output, where confidence is determined either by (i) ErrP scores, (ii) posterior target scores derived from the P300 potential, or (iii) a hybrid of these scores. We further wish to evaluate the ability of partially supervised adaptation and retraining methods to adjust to a new stimulus-onset asynchrony (SOA), a necessary step towards online SOA optimization. Approach. Eleven consenting able-bodied adults attended three online spelling sessions on separate days with feedback in which SOAs were set at 160 ms (sessions 1 and 2) and 80 ms (session 3). A post hoc offline analysis and a simulated online analysis were performed on sessions two and three to compare multiple adaptation methods. Area under the curve (AUC) and symbols spelled per minute (SPM) were the primary outcome measures. Main results. Retraining using supervised labels confirmed improvements of 0.9 percentage points (session 2, p < 0.01) and 1.9 percentage points (session 3, p < 0.05) in AUC using same-day training data over using data from a previous day, which supports classifier adaptation in general. Significance. Using posterior target score alone as a confidence measure resulted in the highest SPM of the partially supervised methods, indicating that ErrPs are not necessary to boost the performance of partially supervised adaptive classification. Partial supervision significantly improved SPM at a novel SOA, showing promise for eventual online SOA optimization.

  4. Filtering big data from social media--Building an early warning system for adverse drug reactions.

    PubMed

    Yang, Ming; Kiang, Melody; Shang, Wei

    2015-04-01

    Adverse drug reactions (ADRs) are believed to be a leading cause of death in the world. Pharmacovigilance systems are aimed at early detection of ADRs. With the popularity of social media, Web forums and discussion boards become important sources of data for consumers to share their drug use experience, as a result may provide useful information on drugs and their adverse reactions. In this study, we propose an automated ADR related posts filtering mechanism using text classification methods. In real-life settings, ADR related messages are highly distributed in social media, while non-ADR related messages are unspecific and topically diverse. It is expensive to manually label a large amount of ADR related messages (positive examples) and non-ADR related messages (negative examples) to train classification systems. To mitigate this challenge, we examine the use of a partially supervised learning classification method to automate the process. We propose a novel pharmacovigilance system leveraging a Latent Dirichlet Allocation modeling module and a partially supervised classification approach. We select drugs with more than 500 threads of discussion, and collect all the original posts and comments of these drugs using an automatic Web spidering program as the text corpus. Various classifiers were trained by varying the number of positive examples and the number of topics. The trained classifiers were applied to 3000 posts published over 60 days. Top-ranked posts from each classifier were pooled and the resulting set of 300 posts was reviewed by a domain expert to evaluate the classifiers. Compare to the alternative approaches using supervised learning methods and three general purpose partially supervised learning methods, our approach performs significantly better in terms of precision, recall, and the F measure (the harmonic mean of precision and recall), based on a computational experiment using online discussion threads from Medhelp. Our design provides satisfactory performance in identifying ADR related posts for post-marketing drug surveillance. The overall design of our system also points out a potentially fruitful direction for building other early warning systems that need to filter big data from social media networks. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. 7 CFR 27.80 - Fees; classification, Micronaire, and supervision.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Fees; classification, Micronaire, and supervision. 27... Classification and Micronaire § 27.80 Fees; classification, Micronaire, and supervision. For services rendered by... classification and Micronaire determination results certified on cotton class certificates.) (e) Supervision, by...

  6. 7 CFR 27.80 - Fees; classification, Micronaire, and supervision.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Fees; classification, Micronaire, and supervision. 27... Classification and Micronaire § 27.80 Fees; classification, Micronaire, and supervision. For services rendered by... classification and Micronaire determination results certified on cotton class certificates.) (e) Supervision, by...

  7. Quasi-Supervised Scoring of Human Sleep in Polysomnograms Using Augmented Input Variables

    PubMed Central

    Yaghouby, Farid; Sunderam, Sridhar

    2015-01-01

    The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18 to 79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models—specifically Gaussian mixtures and hidden Markov models—are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's K statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. PMID:25679475

  8. Quasi-supervised scoring of human sleep in polysomnograms using augmented input variables.

    PubMed

    Yaghouby, Farid; Sunderam, Sridhar

    2015-04-01

    The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18-79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models-specifically Gaussian mixtures and hidden Markov models--are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's Κ statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Automatic age and gender classification using supervised appearance model

    NASA Astrophysics Data System (ADS)

    Bukar, Ali Maina; Ugail, Hassan; Connah, David

    2016-11-01

    Age and gender classification are two important problems that recently gained popularity in the research community, due to their wide range of applications. Research has shown that both age and gender information are encoded in the face shape and texture, hence the active appearance model (AAM), a statistical model that captures shape and texture variations, has been one of the most widely used feature extraction techniques for the aforementioned problems. However, AAM suffers from some drawbacks, especially when used for classification. This is primarily because principal component analysis (PCA), which is at the core of the model, works in an unsupervised manner, i.e., PCA dimensionality reduction does not take into account how the predictor variables relate to the response (class labels). Rather, it explores only the underlying structure of the predictor variables, thus, it is no surprise if PCA discards valuable parts of the data that represent discriminatory features. Toward this end, we propose a supervised appearance model (sAM) that improves on AAM by replacing PCA with partial least-squares regression. This feature extraction technique is then used for the problems of age and gender classification. Our experiments show that sAM has better predictive power than the conventional AAM.

  10. Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding.

    PubMed

    Xu, Yun; Muhamadali, Howbeer; Sayqal, Ali; Dixon, Neil; Goodacre, Royston

    2016-10-28

    Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a "pure" regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.

  11. Self-supervised ARTMAP.

    PubMed

    Amis, Gregory P; Carpenter, Gail A

    2010-03-01

    Computational models of learning typically train on labeled input patterns (supervised learning), unlabeled input patterns (unsupervised learning), or a combination of the two (semi-supervised learning). In each case input patterns have a fixed number of features throughout training and testing. Human and machine learning contexts present additional opportunities for expanding incomplete knowledge from formal training, via self-directed learning that incorporates features not previously experienced. This article defines a new self-supervised learning paradigm to address these richer learning contexts, introducing a neural network called self-supervised ARTMAP. Self-supervised learning integrates knowledge from a teacher (labeled patterns with some features), knowledge from the environment (unlabeled patterns with more features), and knowledge from internal model activation (self-labeled patterns). Self-supervised ARTMAP learns about novel features from unlabeled patterns without destroying partial knowledge previously acquired from labeled patterns. A category selection function bases system predictions on known features, and distributed network activation scales unlabeled learning to prediction confidence. Slow distributed learning on unlabeled patterns focuses on novel features and confident predictions, defining classification boundaries that were ambiguous in the labeled patterns. Self-supervised ARTMAP improves test accuracy on illustrative low-dimensional problems and on high-dimensional benchmarks. Model code and benchmark data are available from: http://techlab.eu.edu/SSART/. Copyright 2009 Elsevier Ltd. All rights reserved.

  12. Semi-Supervised Marginal Fisher Analysis for Hyperspectral Image Classification

    NASA Astrophysics Data System (ADS)

    Huang, H.; Liu, J.; Pan, Y.

    2012-07-01

    The problem of learning with both labeled and unlabeled examples arises frequently in Hyperspectral image (HSI) classification. While marginal Fisher analysis is a supervised method, which cannot be directly applied for Semi-supervised classification. In this paper, we proposed a novel method, called semi-supervised marginal Fisher analysis (SSMFA), to process HSI of natural scenes, which uses a combination of semi-supervised learning and manifold learning. In SSMFA, a new difference-based optimization objective function with unlabeled samples has been designed. SSMFA preserves the manifold structure of labeled and unlabeled samples in addition to separating labeled samples in different classes from each other. The semi-supervised method has an analytic form of the globally optimal solution, and it can be computed based on eigen decomposition. Classification experiments with a challenging HSI task demonstrate that this method outperforms current state-of-the-art HSI-classification methods.

  13. Analysis of spreadable cheese by Raman spectroscopy and chemometric tools.

    PubMed

    Oliveira, Kamila de Sá; Callegaro, Layce de Souza; Stephani, Rodrigo; Almeida, Mariana Ramos; de Oliveira, Luiz Fernando Cappa

    2016-03-01

    In this work, FT-Raman spectroscopy was explored to evaluate spreadable cheese samples. A partial least squares discriminant analysis was employed to identify the spreadable cheese samples containing starch. To build the models, two types of samples were used: commercial samples and samples manufactured in local industries. The method of supervised classification PLS-DA was employed to classify the samples as adulterated or without starch. Multivariate regression was performed using the partial least squares method to quantify the starch in the spreadable cheese. The limit of detection obtained for the model was 0.34% (w/w) and the limit of quantification was 1.14% (w/w). The reliability of the models was evaluated by determining the confidence interval, which was calculated using the bootstrap re-sampling technique. The results show that the classification models can be used to complement classical analysis and as screening methods. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. A semi-supervised classification algorithm using the TAD-derived background as training data

    NASA Astrophysics Data System (ADS)

    Fan, Lei; Ambeau, Brittany; Messinger, David W.

    2013-05-01

    In general, spectral image classification algorithms fall into one of two categories: supervised and unsupervised. In unsupervised approaches, the algorithm automatically identifies clusters in the data without a priori information about those clusters (except perhaps the expected number of them). Supervised approaches require an analyst to identify training data to learn the characteristics of the clusters such that they can then classify all other pixels into one of the pre-defined groups. The classification algorithm presented here is a semi-supervised approach based on the Topological Anomaly Detection (TAD) algorithm. The TAD algorithm defines background components based on a mutual k-Nearest Neighbor graph model of the data, along with a spectral connected components analysis. Here, the largest components produced by TAD are used as regions of interest (ROI's),or training data for a supervised classification scheme. By combining those ROI's with a Gaussian Maximum Likelihood (GML) or a Minimum Distance to the Mean (MDM) algorithm, we are able to achieve a semi supervised classification method. We test this classification algorithm against data collected by the HyMAP sensor over the Cooke City, MT area and University of Pavia scene.

  15. Weakly supervised classification in high energy physics

    DOE PAGES

    Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco; ...

    2017-05-01

    As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. Here, this paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics $-$ quark versus gluon tagging $-$ we show that weakly supervised classification can match the performance of fully supervised algorithms. Furthermore, by design, the new algorithm is insensitive to any mis-modeling of discriminating features in the data by the simulation. Weakly supervisedmore » classification is a general procedure that can be applied to a wide variety of learning problems to boost performance and robustness when detailed simulations are not reliable or not available.« less

  16. Weakly supervised classification in high energy physics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco

    As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. Here, this paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics $-$ quark versus gluon tagging $-$ we show that weakly supervised classification can match the performance of fully supervised algorithms. Furthermore, by design, the new algorithm is insensitive to any mis-modeling of discriminating features in the data by the simulation. Weakly supervisedmore » classification is a general procedure that can be applied to a wide variety of learning problems to boost performance and robustness when detailed simulations are not reliable or not available.« less

  17. Oscillatory neural network for pattern recognition: trajectory based classification and supervised learning.

    PubMed

    Miller, Vonda H; Jansen, Ben H

    2008-12-01

    Computer algorithms that match human performance in recognizing written text or spoken conversation remain elusive. The reasons why the human brain far exceeds any existing recognition scheme to date in the ability to generalize and to extract invariant characteristics relevant to category matching are not clear. However, it has been postulated that the dynamic distribution of brain activity (spatiotemporal activation patterns) is the mechanism by which stimuli are encoded and matched to categories. This research focuses on supervised learning using a trajectory based distance metric for category discrimination in an oscillatory neural network model. Classification is accomplished using a trajectory based distance metric. Since the distance metric is differentiable, a supervised learning algorithm based on gradient descent is demonstrated. Classification of spatiotemporal frequency transitions and their relation to a priori assessed categories is shown along with the improved classification results after supervised training. The results indicate that this spatiotemporal representation of stimuli and the associated distance metric is useful for simple pattern recognition tasks and that supervised learning improves classification results.

  18. Semi-supervised classification tool for DubaiSat-2 multispectral imagery

    NASA Astrophysics Data System (ADS)

    Al-Mansoori, Saeed

    2015-10-01

    This paper addresses a semi-supervised classification tool based on a pixel-based approach of the multi-spectral satellite imagery. There are not many studies demonstrating such algorithm for the multispectral images, especially when the image consists of 4 bands (Red, Green, Blue and Near Infrared) as in DubaiSat-2 satellite images. The proposed approach utilizes both unsupervised and supervised classification schemes sequentially to identify four classes in the image, namely, water bodies, vegetation, land (developed and undeveloped areas) and paved areas (i.e. roads). The unsupervised classification concept is applied to identify two classes; water bodies and vegetation, based on a well-known index that uses the distinct wavelengths of visible and near-infrared sunlight that is absorbed and reflected by the plants to identify the classes; this index parameter is called "Normalized Difference Vegetation Index (NDVI)". Afterward, the supervised classification is performed by selecting training homogenous samples for roads and land areas. Here, a precise selection of training samples plays a vital role in the classification accuracy. Post classification is finally performed to enhance the classification accuracy, where the classified image is sieved, clumped and filtered before producing final output. Overall, the supervised classification approach produced higher accuracy than the unsupervised method. This paper shows some current preliminary research results which point out the effectiveness of the proposed technique in a virtual perspective.

  19. An evaluation of supervised classifiers for indirectly detecting salt-affected areas at irrigation scheme level

    NASA Astrophysics Data System (ADS)

    Muller, Sybrand Jacobus; van Niekerk, Adriaan

    2016-07-01

    Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.

  20. 7 CFR 27.73 - Supervision of transfers of cotton.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Supervision of transfers of cotton. 27.73 Section 27... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Postponed Classification § 27.73 Supervision of transfers of cotton. Whenever the owner of any cotton inspected and sampled for...

  1. 7 CFR 27.73 - Supervision of transfers of cotton.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Supervision of transfers of cotton. 27.73 Section 27... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Transfers of Cotton § 27.73 Supervision of transfers of cotton. Whenever the owner of any cotton inspected and sampled for classification...

  2. 7 CFR 27.73 - Supervision of transfers of cotton.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Supervision of transfers of cotton. 27.73 Section 27... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Transfers of Cotton § 27.73 Supervision of transfers of cotton. Whenever the owner of any cotton inspected and sampled for classification...

  3. 7 CFR 27.73 - Supervision of transfers of cotton.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Supervision of transfers of cotton. 27.73 Section 27... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Transfers of Cotton § 27.73 Supervision of transfers of cotton. Whenever the owner of any cotton inspected and sampled for classification...

  4. 7 CFR 27.73 - Supervision of transfers of cotton.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Supervision of transfers of cotton. 27.73 Section 27... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Postponed Classification § 27.73 Supervision of transfers of cotton. Whenever the owner of any cotton inspected and sampled for...

  5. Graph-Based Semi-Supervised Hyperspectral Image Classification Using Spatial Information

    NASA Astrophysics Data System (ADS)

    Jamshidpour, N.; Homayouni, S.; Safari, A.

    2017-09-01

    Hyperspectral image classification has been one of the most popular research areas in the remote sensing community in the past decades. However, there are still some problems that need specific attentions. For example, the lack of enough labeled samples and the high dimensionality problem are two most important issues which degrade the performance of supervised classification dramatically. The main idea of semi-supervised learning is to overcome these issues by the contribution of unlabeled samples, which are available in an enormous amount. In this paper, we propose a graph-based semi-supervised classification method, which uses both spectral and spatial information for hyperspectral image classification. More specifically, two graphs were designed and constructed in order to exploit the relationship among pixels in spectral and spatial spaces respectively. Then, the Laplacians of both graphs were merged to form a weighted joint graph. The experiments were carried out on two different benchmark hyperspectral data sets. The proposed method performed significantly better than the well-known supervised classification methods, such as SVM. The assessments consisted of both accuracy and homogeneity analyses of the produced classification maps. The proposed spectral-spatial SSL method considerably increased the classification accuracy when the labeled training data set is too scarce.When there were only five labeled samples for each class, the performance improved 5.92% and 10.76% compared to spatial graph-based SSL, for AVIRIS Indian Pine and Pavia University data sets respectively.

  6. Classification of earth terrain using polarimetric synthetic aperture radar images

    NASA Technical Reports Server (NTRS)

    Lim, H. H.; Swartz, A. A.; Yueh, H. A.; Kong, J. A.; Shin, R. T.; Van Zyl, J. J.

    1989-01-01

    Supervised and unsupervised classification techniques are developed and used to classify the earth terrain components from SAR polarimetric images of San Francisco Bay and Traverse City, Michigan. The supervised techniques include the Bayes classifiers, normalized polarimetric classification, and simple feature classification using discriminates such as the absolute and normalized magnitude response of individual receiver channel returns and the phase difference between receiver channels. An algorithm is developed as an unsupervised technique which classifies terrain elements based on the relationship between the orientation angle and the handedness of the transmitting and receiving polariation states. It is found that supervised classification produces the best results when accurate classifier training data are used, while unsupervised classification may be applied when training data are not available.

  7. Automatic Classification Using Supervised Learning in a Medical Document Filtering Application.

    ERIC Educational Resources Information Center

    Mostafa, J.; Lam, W.

    2000-01-01

    Presents a multilevel model of the information filtering process that permits document classification. Evaluates a document classification approach based on a supervised learning algorithm, measures the accuracy of the algorithm in a neural network that was trained to classify medical documents on cell biology, and discusses filtering…

  8. A Cluster-then-label Semi-supervised Learning Approach for Pathology Image Classification.

    PubMed

    Peikari, Mohammad; Salama, Sherine; Nofech-Mozes, Sharon; Martel, Anne L

    2018-05-08

    Completely labeled pathology datasets are often challenging and time-consuming to obtain. Semi-supervised learning (SSL) methods are able to learn from fewer labeled data points with the help of a large number of unlabeled data points. In this paper, we investigated the possibility of using clustering analysis to identify the underlying structure of the data space for SSL. A cluster-then-label method was proposed to identify high-density regions in the data space which were then used to help a supervised SVM in finding the decision boundary. We have compared our method with other supervised and semi-supervised state-of-the-art techniques using two different classification tasks applied to breast pathology datasets. We found that compared with other state-of-the-art supervised and semi-supervised methods, our SSL method is able to improve classification performance when a limited number of labeled data instances are made available. We also showed that it is important to examine the underlying distribution of the data space before applying SSL techniques to ensure semi-supervised learning assumptions are not violated by the data.

  9. A review of supervised object-based land-cover image classification

    NASA Astrophysics Data System (ADS)

    Ma, Lei; Li, Manchun; Ma, Xiaoxue; Cheng, Liang; Du, Peijun; Liu, Yongxue

    2017-08-01

    Object-based image classification for land-cover mapping purposes using remote-sensing imagery has attracted significant attention in recent years. Numerous studies conducted over the past decade have investigated a broad array of sensors, feature selection, classifiers, and other factors of interest. However, these research results have not yet been synthesized to provide coherent guidance on the effect of different supervised object-based land-cover classification processes. In this study, we first construct a database with 28 fields using qualitative and quantitative information extracted from 254 experimental cases described in 173 scientific papers. Second, the results of the meta-analysis are reported, including general characteristics of the studies (e.g., the geographic range of relevant institutes, preferred journals) and the relationships between factors of interest (e.g., spatial resolution and study area or optimal segmentation scale, accuracy and number of targeted classes), especially with respect to the classification accuracy of different sensors, segmentation scale, training set size, supervised classifiers, and land-cover types. Third, useful data on supervised object-based image classification are determined from the meta-analysis. For example, we find that supervised object-based classification is currently experiencing rapid advances, while development of the fuzzy technique is limited in the object-based framework. Furthermore, spatial resolution correlates with the optimal segmentation scale and study area, and Random Forest (RF) shows the best performance in object-based classification. The area-based accuracy assessment method can obtain stable classification performance, and indicates a strong correlation between accuracy and training set size, while the accuracy of the point-based method is likely to be unstable due to mixed objects. In addition, the overall accuracy benefits from higher spatial resolution images (e.g., unmanned aerial vehicle) or agricultural sites where it also correlates with the number of targeted classes. More than 95.6% of studies involve an area less than 300 ha, and the spatial resolution of images is predominantly between 0 and 2 m. Furthermore, we identify some methods that may advance supervised object-based image classification. For example, deep learning and type-2 fuzzy techniques may further improve classification accuracy. Lastly, scientists are strongly encouraged to report results of uncertainty studies to further explore the effects of varied factors on supervised object-based image classification.

  10. The Costs of Supervised Classification: The Effect of Learning Task on Conceptual Flexibility

    ERIC Educational Resources Information Center

    Hoffman, Aaron B.; Rehder, Bob

    2010-01-01

    Research has shown that learning a concept via standard supervised classification leads to a focus on diagnostic features, whereas learning by inferring missing features promotes the acquisition of within-category information. Accordingly, we predicted that classification learning would produce a deficit in people's ability to draw "novel…

  11. Intra-regional classification of grape seeds produced in Mendoza province (Argentina) by multi-elemental analysis and chemometrics tools.

    PubMed

    Canizo, Brenda V; Escudero, Leticia B; Pérez, María B; Pellerano, Roberto G; Wuilloud, Rodolfo G

    2018-03-01

    The feasibility of the application of chemometric techniques associated with multi-element analysis for the classification of grape seeds according to their provenance vineyard soil was investigated. Grape seed samples from different localities of Mendoza province (Argentina) were evaluated. Inductively coupled plasma mass spectrometry (ICP-MS) was used for the determination of twenty-nine elements (Ag, As, Ce, Co, Cs, Cu, Eu, Fe, Ga, Gd, La, Lu, Mn, Mo, Nb, Nd, Ni, Pr, Rb, Sm, Te, Ti, Tl, Tm, U, V, Y, Zn and Zr). Once the analytical data were collected, supervised pattern recognition techniques such as linear discriminant analysis (LDA), partial least square discriminant analysis (PLS-DA), k-nearest neighbors (k-NN), support vector machine (SVM) and Random Forest (RF) were applied to construct classification/discrimination rules. The results indicated that nonlinear methods, RF and SVM, perform best with up to 98% and 93% accuracy rate, respectively, and therefore are excellent tools for classification of grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Supervised versus unsupervised categorization: two sides of the same coin?

    PubMed

    Pothos, Emmanuel M; Edwards, Darren J; Perlman, Amotz

    2011-09-01

    Supervised and unsupervised categorization have been studied in separate research traditions. A handful of studies have attempted to explore a possible convergence between the two. The present research builds on these studies, by comparing the unsupervised categorization results of Pothos et al. ( 2011 ; Pothos et al., 2008 ) with the results from two procedures of supervised categorization. In two experiments, we tested 375 participants with nine different stimulus sets and examined the relation between ease of learning of a classification, memory for a classification, and spontaneous preference for a classification. After taking into account the role of the number of category labels (clusters) in supervised learning, we found the three variables to be closely associated with each other. Our results provide encouragement for researchers seeking unified theoretical explanations for supervised and unsupervised categorization, but raise a range of challenging theoretical questions.

  13. Global Optimization Ensemble Model for Classification Methods

    PubMed Central

    Anwar, Hina; Qamar, Usman; Muzaffar Qureshi, Abdul Wahab

    2014-01-01

    Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC) that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity. PMID:24883382

  14. Comparing supervised and unsupervised multiresolution segmentation approaches for extracting buildings from very high resolution imagery.

    PubMed

    Belgiu, Mariana; Dr Guţ, Lucian

    2014-10-01

    Although multiresolution segmentation (MRS) is a powerful technique for dealing with very high resolution imagery, some of the image objects that it generates do not match the geometries of the target objects, which reduces the classification accuracy. MRS can, however, be guided to produce results that approach the desired object geometry using either supervised or unsupervised approaches. Although some studies have suggested that a supervised approach is preferable, there has been no comparative evaluation of these two approaches. Therefore, in this study, we have compared supervised and unsupervised approaches to MRS. One supervised and two unsupervised segmentation methods were tested on three areas using QuickBird and WorldView-2 satellite imagery. The results were assessed using both segmentation evaluation methods and an accuracy assessment of the resulting building classifications. Thus, differences in the geometries of the image objects and in the potential to achieve satisfactory thematic accuracies were evaluated. The two approaches yielded remarkably similar classification results, with overall accuracies ranging from 82% to 86%. The performance of one of the unsupervised methods was unexpectedly similar to that of the supervised method; they identified almost identical scale parameters as being optimal for segmenting buildings, resulting in very similar geometries for the resulting image objects. The second unsupervised method produced very different image objects from the supervised method, but their classification accuracies were still very similar. The latter result was unexpected because, contrary to previously published findings, it suggests a high degree of independence between the segmentation results and classification accuracy. The results of this study have two important implications. The first is that object-based image analysis can be automated without sacrificing classification accuracy, and the second is that the previously accepted idea that classification is dependent on segmentation is challenged by our unexpected results, casting doubt on the value of pursuing 'optimal segmentation'. Our results rather suggest that as long as under-segmentation remains at acceptable levels, imperfections in segmentation can be ruled out, so that a high level of classification accuracy can still be achieved.

  15. Artificial neural network classification using a minimal training set - Comparison to conventional supervised classification

    NASA Technical Reports Server (NTRS)

    Hepner, George F.; Logan, Thomas; Ritter, Niles; Bryant, Nevin

    1990-01-01

    Recent research has shown an artificial neural network (ANN) to be capable of pattern recognition and the classification of image data. This paper examines the potential for the application of neural network computing to satellite image processing. A second objective is to provide a preliminary comparison and ANN classification. An artificial neural network can be trained to do land-cover classification of satellite imagery using selected sites representative of each class in a manner similar to conventional supervised classification. One of the major problems associated with recognition and classifications of pattern from remotely sensed data is the time and cost of developing a set of training sites. This reseach compares the use of an ANN back propagation classification procedure with a conventional supervised maximum likelihood classification procedure using a minimal training set. When using a minimal training set, the neural network is able to provide a land-cover classification superior to the classification derived from the conventional classification procedure. This research is the foundation for developing application parameters for further prototyping of software and hardware implementations for artificial neural networks in satellite image and geographic information processing.

  16. Supervised DNA Barcodes species classification: analysis, comparisons and results

    PubMed Central

    2014-01-01

    Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. Conclusions The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community. PMID:24721333

  17. Semi-supervised SVM for individual tree crown species classification

    NASA Astrophysics Data System (ADS)

    Dalponte, Michele; Ene, Liviu Theodor; Marconcini, Mattia; Gobakken, Terje; Næsset, Erik

    2015-12-01

    In this paper a novel semi-supervised SVM classifier is presented, specifically developed for tree species classification at individual tree crown (ITC) level. In ITC tree species classification, all the pixels belonging to an ITC should have the same label. This assumption is used in the learning of the proposed semi-supervised SVM classifier (ITC-S3VM). This method exploits the information contained in the unlabeled ITC samples in order to improve the classification accuracy of a standard SVM. The ITC-S3VM method can be easily implemented using freely available software libraries. The datasets used in this study include hyperspectral imagery and laser scanning data acquired over two boreal forest areas characterized by the presence of three information classes (Pine, Spruce, and Broadleaves). The experimental results quantify the effectiveness of the proposed approach, which provides classification accuracies significantly higher (from 2% to above 27%) than those obtained by the standard supervised SVM and by a state-of-the-art semi-supervised SVM (S3VM). Particularly, by reducing the number of training samples (i.e. from 100% to 25%, and from 100% to 5% for the two datasets, respectively) the proposed method still exhibits results comparable to the ones of a supervised SVM trained with the full available training set. This property of the method makes it particularly suitable for practical forest inventory applications in which collection of in situ information can be very expensive both in terms of cost and time.

  18. Improved semi-supervised online boosting for object tracking

    NASA Astrophysics Data System (ADS)

    Li, Yicui; Qi, Lin; Tan, Shukun

    2016-10-01

    The advantage of an online semi-supervised boosting method which takes object tracking problem as a classification problem, is training a binary classifier from labeled and unlabeled examples. Appropriate object features are selected based on real time changes in the object. However, the online semi-supervised boosting method faces one key problem: The traditional self-training using the classification results to update the classifier itself, often leads to drifting or tracking failure, due to the accumulated error during each update of the tracker. To overcome the disadvantages of semi-supervised online boosting based on object tracking methods, the contribution of this paper is an improved online semi-supervised boosting method, in which the learning process is guided by positive (P) and negative (N) constraints, termed P-N constraints, which restrict the labeling of the unlabeled samples. First, we train the classification by an online semi-supervised boosting. Then, this classification is used to process the next frame. Finally, the classification is analyzed by the P-N constraints, which are used to verify if the labels of unlabeled data assigned by the classifier are in line with the assumptions made about positive and negative samples. The proposed algorithm can effectively improve the discriminative ability of the classifier and significantly alleviate the drifting problem in tracking applications. In the experiments, we demonstrate real-time tracking of our tracker on several challenging test sequences where our tracker outperforms other related on-line tracking methods and achieves promising tracking performance.

  19. A supervised learning rule for classification of spatiotemporal spike patterns.

    PubMed

    Lilin Guo; Zhenzhong Wang; Adjouadi, Malek

    2016-08-01

    This study introduces a novel supervised algorithm for spiking neurons that take into consideration synapse delays and axonal delays associated with weights. It can be utilized for both classification and association and uses several biologically influenced properties, such as axonal and synaptic delays. This algorithm also takes into consideration spike-timing-dependent plasticity as in Remote Supervised Method (ReSuMe). This paper focuses on the classification aspect alone. Spiked neurons trained according to this proposed learning rule are capable of classifying different categories by the associated sequences of precisely timed spikes. Simulation results have shown that the proposed learning method greatly improves classification accuracy when compared to the Spike Pattern Association Neuron (SPAN) and the Tempotron learning rule.

  20. Comparison Between Supervised and Unsupervised Classifications of Neuronal Cell Types: A Case Study

    PubMed Central

    Guerra, Luis; McGarry, Laura M; Robles, Víctor; Bielza, Concha; Larrañaga, Pedro; Yuste, Rafael

    2011-01-01

    In the study of neural circuits, it becomes essential to discern the different neuronal cell types that build the circuit. Traditionally, neuronal cell types have been classified using qualitative descriptors. More recently, several attempts have been made to classify neurons quantitatively, using unsupervised clustering methods. While useful, these algorithms do not take advantage of previous information known to the investigator, which could improve the classification task. For neocortical GABAergic interneurons, the problem to discern among different cell types is particularly difficult and better methods are needed to perform objective classifications. Here we explore the use of supervised classification algorithms to classify neurons based on their morphological features, using a database of 128 pyramidal cells and 199 interneurons from mouse neocortex. To evaluate the performance of different algorithms we used, as a “benchmark,” the test to automatically distinguish between pyramidal cells and interneurons, defining “ground truth” by the presence or absence of an apical dendrite. We compared hierarchical clustering with a battery of different supervised classification algorithms, finding that supervised classifications outperformed hierarchical clustering. In addition, the selection of subsets of distinguishing features enhanced the classification accuracy for both sets of algorithms. The analysis of selected variables indicates that dendritic features were most useful to distinguish pyramidal cells from interneurons when compared with somatic and axonal morphological variables. We conclude that supervised classification algorithms are better matched to the general problem of distinguishing neuronal cell types when some information on these cell groups, in our case being pyramidal or interneuron, is known a priori. As a spin-off of this methodological study, we provide several methods to automatically distinguish neocortical pyramidal cells from interneurons, based on their morphologies. © 2010 Wiley Periodicals, Inc. Develop Neurobiol 71: 71–82, 2011 PMID:21154911

  1. Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning

    PubMed Central

    Gönen, Mehmet

    2014-01-01

    Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F1, and micro F1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks. PMID:24532862

  2. Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning.

    PubMed

    Gönen, Mehmet

    2014-03-01

    Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F 1 , and micro F 1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks.

  3. Predicting protein complexes using a supervised learning method combined with local structural information.

    PubMed

    Dong, Yadong; Sun, Yongqi; Qin, Chao

    2018-01-01

    The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.

  4. Fatigue Level Estimation of Bill Based on Acoustic Signal Feature by Supervised SOM

    NASA Astrophysics Data System (ADS)

    Teranishi, Masaru; Omatu, Sigeru; Kosaka, Toshihisa

    Fatigued bills have harmful influence on daily operation of Automated Teller Machine(ATM). To make the fatigued bills classification more efficient, development of an automatic fatigued bill classification method is desired. We propose a new method to estimate bending rigidity of bill from acoustic signal feature of banking machines. The estimated bending rigidities are used as continuous fatigue level for classification of fatigued bill. By using the supervised Self-Organizing Map(supervised SOM), we estimate the bending rigidity from only the acoustic energy pattern effectively. The experimental result with real bill samples shows the effectiveness of the proposed method.

  5. Semi-supervised learning for ordinal Kernel Discriminant Analysis.

    PubMed

    Pérez-Ortiz, M; Gutiérrez, P A; Carbonero-Ruz, M; Hervás-Martínez, C

    2016-12-01

    Ordinal classification considers those classification problems where the labels of the variable to predict follow a given order. Naturally, labelled data is scarce or difficult to obtain in this type of problems because, in many cases, ordinal labels are given by a user or expert (e.g. in recommendation systems). Firstly, this paper develops a new strategy for ordinal classification where both labelled and unlabelled data are used in the model construction step (a scheme which is referred to as semi-supervised learning). More specifically, the ordinal version of kernel discriminant learning is extended for this setting considering the neighbourhood information of unlabelled data, which is proposed to be computed in the feature space induced by the kernel function. Secondly, a new method for semi-supervised kernel learning is devised in the context of ordinal classification, which is combined with our developed classification strategy to optimise the kernel parameters. The experiments conducted compare 6 different approaches for semi-supervised learning in the context of ordinal classification in a battery of 30 datasets, showing (1) the good synergy of the ordinal version of discriminant analysis and the use of unlabelled data and (2) the advantage of computing distances in the feature space induced by the kernel function. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. On the Implementation of a Land Cover Classification System for SAR Images Using Khoros

    NASA Technical Reports Server (NTRS)

    Medina Revera, Edwin J.; Espinosa, Ramon Vasquez

    1997-01-01

    The Synthetic Aperture Radar (SAR) sensor is widely used to record data about the ground under all atmospheric conditions. The SAR acquired images have very good resolution which necessitates the development of a classification system that process the SAR images to extract useful information for different applications. In this work, a complete system for the land cover classification was designed and programmed using the Khoros, a data flow visual language environment, taking full advantages of the polymorphic data services that it provides. Image analysis was applied to SAR images to improve and automate the processes of recognition and classification of the different regions like mountains and lakes. Both unsupervised and supervised classification utilities were used. The unsupervised classification routines included the use of several Classification/Clustering algorithms like the K-means, ISO2, Weighted Minimum Distance, and the Localized Receptive Field (LRF) training/classifier. Different texture analysis approaches such as Invariant Moments, Fractal Dimension and Second Order statistics were implemented for supervised classification of the images. The results and conclusions for SAR image classification using the various unsupervised and supervised procedures are presented based on their accuracy and performance.

  7. Benchmarking protein classification algorithms via supervised cross-validation.

    PubMed

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

  8. Method of Grassland Information Extraction Based on Multi-Level Segmentation and Cart Model

    NASA Astrophysics Data System (ADS)

    Qiao, Y.; Chen, T.; He, J.; Wen, Q.; Liu, F.; Wang, Z.

    2018-04-01

    It is difficult to extract grassland accurately by traditional classification methods, such as supervised method based on pixels or objects. This paper proposed a new method combing the multi-level segmentation with CART (classification and regression tree) model. The multi-level segmentation which combined the multi-resolution segmentation and the spectral difference segmentation could avoid the over and insufficient segmentation seen in the single segmentation mode. The CART model was established based on the spectral characteristics and texture feature which were excavated from training sample data. Xilinhaote City in Inner Mongolia Autonomous Region was chosen as the typical study area and the proposed method was verified by using visual interpretation results as approximate truth value. Meanwhile, the comparison with the nearest neighbor supervised classification method was obtained. The experimental results showed that the total precision of classification and the Kappa coefficient of the proposed method was 95 % and 0.9, respectively. However, the total precision of classification and the Kappa coefficient of the nearest neighbor supervised classification method was 80 % and 0.56, respectively. The result suggested that the accuracy of classification proposed in this paper was higher than the nearest neighbor supervised classification method. The experiment certificated that the proposed method was an effective extraction method of grassland information, which could enhance the boundary of grassland classification and avoid the restriction of grassland distribution scale. This method was also applicable to the extraction of grassland information in other regions with complicated spatial features, which could avoid the interference of woodland, arable land and water body effectively.

  9. Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery.

    PubMed

    Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S; Pusey, Marc L; Aygün, Ramazan S

    2014-03-01

    In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset.

  10. Experiments on Supervised Learning Algorithms for Text Categorization

    NASA Technical Reports Server (NTRS)

    Namburu, Setu Madhavi; Tu, Haiying; Luo, Jianhui; Pattipati, Krishna R.

    2005-01-01

    Modern information society is facing the challenge of handling massive volume of online documents, news, intelligence reports, and so on. How to use the information accurately and in a timely manner becomes a major concern in many areas. While the general information may also include images and voice, we focus on the categorization of text data in this paper. We provide a brief overview of the information processing flow for text categorization, and discuss two supervised learning algorithms, viz., support vector machines (SVM) and partial least squares (PLS), which have been successfully applied in other domains, e.g., fault diagnosis [9]. While SVM has been well explored for binary classification and was reported as an efficient algorithm for text categorization, PLS has not yet been applied to text categorization. Our experiments are conducted on three data sets: Reuter's- 21578 dataset about corporate mergers and data acquisitions (ACQ), WebKB and the 20-Newsgroups. Results show that the performance of PLS is comparable to SVM in text categorization. A major drawback of SVM for multi-class categorization is that it requires a voting scheme based on the results of pair-wise classification. PLS does not have this drawback and could be a better candidate for multi-class text categorization.

  11. New Features for Neuron Classification.

    PubMed

    Hernández-Pérez, Leonardo A; Delgado-Castillo, Duniel; Martín-Pérez, Rainer; Orozco-Morales, Rubén; Lorenzo-Ginori, Juan V

    2018-04-28

    This paper addresses the problem of obtaining new neuron features capable of improving results of neuron classification. Most studies on neuron classification using morphological features have been based on Euclidean geometry. Here three one-dimensional (1D) time series are derived from the three-dimensional (3D) structure of neuron instead, and afterwards a spatial time series is finally constructed from which the features are calculated. Digitally reconstructed neurons were separated into control and pathological sets, which are related to three categories of alterations caused by epilepsy, Alzheimer's disease (long and local projections), and ischemia. These neuron sets were then subjected to supervised classification and the results were compared considering three sets of features: morphological, features obtained from the time series and a combination of both. The best results were obtained using features from the time series, which outperformed the classification using only morphological features, showing higher correct classification rates with differences of 5.15, 3.75, 5.33% for epilepsy and Alzheimer's disease (long and local projections) respectively. The morphological features were better for the ischemia set with a difference of 3.05%. Features like variance, Spearman auto-correlation, partial auto-correlation, mutual information, local minima and maxima, all related to the time series, exhibited the best performance. Also we compared different evaluators, among which ReliefF was the best ranked.

  12. Evaluation of SLAR and thematic mapper MSS data for forest cover mapping using computer-aided analysis techniques

    NASA Technical Reports Server (NTRS)

    Hoffer, R. M. (Principal Investigator); Knowlton, D. J.; Dean, M. E.

    1981-01-01

    A set of training statistics for the 30 meter resolution simulated thematic mapper MSS data was generated based on land use/land cover classes. In addition to this supervised data set, a nonsupervised multicluster block of training statistics is being defined in order to compare the classification results and evaluate the effect of the different training selection methods on classification performance. Two test data sets, defined using a stratified sampling procedure incorporating a grid system with dimensions of 50 lines by 50 columns, and another set based on an analyst supervised set of test fields were used to evaluate the classifications of the TMS data. The supervised training data set generated training statistics, and a per point Gaussian maximum likelihood classification of the 1979 TMS data was obtained. The August 1980 MSS data was radiometrically adjusted. The SAR data was redigitized and the SAR imagery was qualitatively analyzed.

  13. Combining Unsupervised and Supervised Classification to Build User Models for Exploratory Learning Environments

    ERIC Educational Resources Information Center

    Amershi, Saleema; Conati, Cristina

    2009-01-01

    In this paper, we present a data-based user modeling framework that uses both unsupervised and supervised classification to build student models for exploratory learning environments. We apply the framework to build student models for two different learning environments and using two different data sources (logged interface and eye-tracking data).…

  14. Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification

    PubMed Central

    Wang, Yun; Huang, Fangzhou

    2018-01-01

    The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible. PMID:29666661

  15. Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification.

    PubMed

    Xu, Jiucheng; Mu, Huiyu; Wang, Yun; Huang, Fangzhou

    2018-01-01

    The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC 2 ), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.

  16. Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery

    PubMed Central

    Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S.; Pusey, Marc L.; Aygün, Ramazan S.

    2015-01-01

    In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset. PMID:25914518

  17. Multispectral and Panchromatic used Enhancement Resolution and Study Effective Enhancement on Supervised and Unsupervised Classification Land – Cover

    NASA Astrophysics Data System (ADS)

    Salman, S. S.; Abbas, W. A.

    2018-05-01

    The goal of the study is to support analysis Enhancement of Resolution and study effect on classification methods on bands spectral information of specific and quantitative approaches. In this study introduce a method to enhancement resolution Landsat 8 of combining the bands spectral of 30 meters resolution with panchromatic band 8 of 15 meters resolution, because of importance multispectral imagery to extracting land - cover. Classification methods used in this study to classify several lands -covers recorded from OLI- 8 imagery. Two methods of Data mining can be classified as either supervised or unsupervised. In supervised methods, there is a particular predefined target, that means the algorithm learn which values of the target are associated with which values of the predictor sample. K-nearest neighbors and maximum likelihood algorithms examine in this work as supervised methods. In other hand, no sample identified as target in unsupervised methods, the algorithm of data extraction searches for structure and patterns between all the variables, represented by Fuzzy C-mean clustering method as one of the unsupervised methods, NDVI vegetation index used to compare the results of classification method, the percent of dense vegetation in maximum likelihood method give a best results.

  18. Semi-supervised manifold learning with affinity regularization for Alzheimer's disease identification using positron emission tomography imaging.

    PubMed

    Lu, Shen; Xia, Yong; Cai, Tom Weidong; Feng, David Dagan

    2015-01-01

    Dementia, Alzheimer's disease (AD) in particular is a global problem and big threat to the aging population. An image based computer-aided dementia diagnosis method is needed to providing doctors help during medical image examination. Many machine learning based dementia classification methods using medical imaging have been proposed and most of them achieve accurate results. However, most of these methods make use of supervised learning requiring fully labeled image dataset, which usually is not practical in real clinical environment. Using large amount of unlabeled images can improve the dementia classification performance. In this study we propose a new semi-supervised dementia classification method based on random manifold learning with affinity regularization. Three groups of spatial features are extracted from positron emission tomography (PET) images to construct an unsupervised random forest which is then used to regularize the manifold learning objective function. The proposed method, stat-of-the-art Laplacian support vector machine (LapSVM) and supervised SVM are applied to classify AD and normal controls (NC). The experiment results show that learning with unlabeled images indeed improves the classification performance. And our method outperforms LapSVM on the same dataset.

  19. SemiBoost: boosting for semi-supervised learning.

    PubMed

    Mallapragada, Pavan Kumar; Jin, Rong; Jain, Anil K; Liu, Yi

    2009-11-01

    Semi-supervised learning has attracted a significant amount of attention in pattern recognition and machine learning. Most previous studies have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data. Our goal is to improve the classification accuracy of any given supervised learning algorithm by using the available unlabeled examples. We call this as the Semi-supervised improvement problem, to distinguish the proposed approach from the existing approaches. We design a metasemi-supervised learning algorithm that wraps around the underlying supervised algorithm and improves its performance using unlabeled data. This problem is particularly important when we need to train a supervised learning algorithm with a limited number of labeled examples and a multitude of unlabeled examples. We present a boosting framework for semi-supervised learning, termed as SemiBoost. The key advantages of the proposed semi-supervised learning approach are: 1) performance improvement of any supervised learning algorithm with a multitude of unlabeled data, 2) efficient computation by the iterative boosting algorithm, and 3) exploiting both manifold and cluster assumption in training classification models. An empirical study on 16 different data sets and text categorization demonstrates that the proposed framework improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples. We also show that the performance of the proposed algorithm, SemiBoost, is comparable to the state-of-the-art semi-supervised learning algorithms.

  20. Transfer learning improves supervised image segmentation across imaging protocols.

    PubMed

    van Opbroek, Annegreet; Ikram, M Arfan; Vernooij, Meike W; de Bruijne, Marleen

    2015-05-01

    The variation between images obtained with different scanners or different imaging protocols presents a major challenge in automatic segmentation of biomedical images. This variation especially hampers the application of otherwise successful supervised-learning techniques which, in order to perform well, often require a large amount of labeled training data that is exactly representative of the target data. We therefore propose to use transfer learning for image segmentation. Transfer-learning techniques can cope with differences in distributions between training and target data, and therefore may improve performance over supervised learning for segmentation across scanners and scan protocols. We present four transfer classifiers that can train a classification scheme with only a small amount of representative training data, in addition to a larger amount of other training data with slightly different characteristics. The performance of the four transfer classifiers was compared to that of standard supervised classification on two magnetic resonance imaging brain-segmentation tasks with multi-site data: white matter, gray matter, and cerebrospinal fluid segmentation; and white-matter-/MS-lesion segmentation. The experiments showed that when there is only a small amount of representative training data available, transfer learning can greatly outperform common supervised-learning approaches, minimizing classification errors by up to 60%.

  1. An iterated Laplacian based semi-supervised dimensionality reduction for classification of breast cancer on ultrasound images.

    PubMed

    Liu, Xiao; Shi, Jun; Zhou, Shichong; Lu, Minhua

    2014-01-01

    The dimensionality reduction is an important step in ultrasound image based computer-aided diagnosis (CAD) for breast cancer. A newly proposed l2,1 regularized correntropy algorithm for robust feature selection (CRFS) has achieved good performance for noise corrupted data. Therefore, it has the potential to reduce the dimensions of ultrasound image features. However, in clinical practice, the collection of labeled instances is usually expensive and time costing, while it is relatively easy to acquire the unlabeled or undetermined instances. Therefore, the semi-supervised learning is very suitable for clinical CAD. The iterated Laplacian regularization (Iter-LR) is a new regularization method, which has been proved to outperform the traditional graph Laplacian regularization in semi-supervised classification and ranking. In this study, to augment the classification accuracy of the breast ultrasound CAD based on texture feature, we propose an Iter-LR-based semi-supervised CRFS (Iter-LR-CRFS) algorithm, and then apply it to reduce the feature dimensions of ultrasound images for breast CAD. We compared the Iter-LR-CRFS with LR-CRFS, original supervised CRFS, and principal component analysis. The experimental results indicate that the proposed Iter-LR-CRFS significantly outperforms all other algorithms.

  2. Automatic segmentation of MR brain images of preterm infants using supervised classification.

    PubMed

    Moeskops, Pim; Benders, Manon J N L; Chiţ, Sabina M; Kersbergen, Karina J; Groenendaal, Floris; de Vries, Linda S; Viergever, Max A; Išgum, Ivana

    2015-09-01

    Preterm birth is often associated with impaired brain development. The state and expected progression of preterm brain development can be evaluated using quantitative assessment of MR images. Such measurements require accurate segmentation of different tissue types in those images. This paper presents an algorithm for the automatic segmentation of unmyelinated white matter (WM), cortical grey matter (GM), and cerebrospinal fluid in the extracerebral space (CSF). The algorithm uses supervised voxel classification in three subsequent stages. In the first stage, voxels that can easily be assigned to one of the three tissue types are labelled. In the second stage, dedicated analysis of the remaining voxels is performed. The first and the second stages both use two-class classification for each tissue type separately. Possible inconsistencies that could result from these tissue-specific segmentation stages are resolved in the third stage, which performs multi-class classification. A set of T1- and T2-weighted images was analysed, but the optimised system performs automatic segmentation using a T2-weighted image only. We have investigated the performance of the algorithm when using training data randomly selected from completely annotated images as well as when using training data from only partially annotated images. The method was evaluated on images of preterm infants acquired at 30 and 40weeks postmenstrual age (PMA). When the method was trained using random selection from the completely annotated images, the average Dice coefficients were 0.95 for WM, 0.81 for GM, and 0.89 for CSF on an independent set of images acquired at 30weeks PMA. When the method was trained using only the partially annotated images, the average Dice coefficients were 0.95 for WM, 0.78 for GM and 0.87 for CSF for the images acquired at 30weeks PMA, and 0.92 for WM, 0.80 for GM and 0.85 for CSF for the images acquired at 40weeks PMA. Even though the segmentations obtained using training data from the partially annotated images resulted in slightly lower Dice coefficients, the performance in all experiments was close to that of a second human expert (0.93 for WM, 0.79 for GM and 0.86 for CSF for the images acquired at 30weeks, and 0.94 for WM, 0.76 for GM and 0.87 for CSF for the images acquired at 40weeks). These results show that the presented method is robust to age and acquisition protocol and that it performs accurate segmentation of WM, GM, and CSF when the training data is extracted from complete annotations as well as when the training data is extracted from partial annotations only. This extends the applicability of the method by reducing the time and effort necessary to create training data in a population with different characteristics. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. [Quantitative classification in catering trade and countermeasures of supervision and management in Hunan Province].

    PubMed

    Liu, Xiulan; Chen, Lizhang; He, Xiang

    2012-02-01

    To analyze the status quo of quantitative classification in Hunan Province catering industry, and to discuss the countermeasures in-depth. According to relevant laws and regulations, and after referring to Daily supervision and quantitative scoring sheet and consulting experts, a checklist of key supervision indicators was made. The implementation of quantitative classification in 10 cities in Hunan Province was studied, and the status quo was analyzed. All the 390 catering units implemented quantitative classified management. The larger the catering enterprise, the higher level of quantitative classification. In addition to cafeterias, the smaller the catering units, the higher point of deduction, and snack bars and beverage stores were the highest. For those quantified and classified as C and D, the point of deduction was higher in the procurement and storage of raw materials, operation processing and other aspects. The quantitative classification of Hunan Province has relatively wide coverage. There are hidden risks in food security in small catering units, snack bars, and beverage stores. The food hygienic condition of Hunan Province needs to be improved.

  4. A Hybrid Semi-supervised Classification Scheme for Mining Multisource Geospatial Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vatsavai, Raju; Bhaduri, Budhendra L

    2011-01-01

    Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of large number of accurate training samples (10 to 30 |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, itmore » is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on real datasets, and our new hybrid approach shows over 25 to 35% improvement in overall classification accuracy over conventional classification schemes.« less

  5. Observation versus classification in supervised category learning.

    PubMed

    Levering, Kimery R; Kurtz, Kenneth J

    2015-02-01

    The traditional supervised classification paradigm encourages learners to acquire only the knowledge needed to predict category membership (a discriminative approach). An alternative that aligns with important aspects of real-world concept formation is learning with a broader focus to acquire knowledge of the internal structure of each category (a generative approach). Our work addresses the impact of a particular component of the traditional classification task: the guess-and-correct cycle. We compare classification learning to a supervised observational learning task in which learners are shown labeled examples but make no classification response. The goals of this work sit at two levels: (1) testing for differences in the nature of the category representations that arise from two basic learning modes; and (2) evaluating the generative/discriminative continuum as a theoretical tool for understand learning modes and their outcomes. Specifically, we view the guess-and-correct cycle as consistent with a more discriminative approach and therefore expected it to lead to narrower category knowledge. Across two experiments, the observational mode led to greater sensitivity to distributional properties of features and correlations between features. We conclude that a relatively subtle procedural difference in supervised category learning substantially impacts what learners come to know about the categories. The results demonstrate the value of the generative/discriminative continuum as a tool for advancing the psychology of category learning and also provide a valuable constraint for formal models and associated theories.

  6. Assessment of various supervised learning algorithms using different performance metrics

    NASA Astrophysics Data System (ADS)

    Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.

    2017-11-01

    Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.

  7. Semi-supervised morphosyntactic classification of Old Icelandic.

    PubMed

    Urban, Kryztof; Tangherlini, Timothy R; Vijūnas, Aurelijus; Broadwell, Peter M

    2014-01-01

    We present IceMorph, a semi-supervised morphosyntactic analyzer of Old Icelandic. In addition to machine-read corpora and dictionaries, it applies a small set of declension prototypes to map corpus words to dictionary entries. A web-based GUI allows expert users to modify and augment data through an online process. A machine learning module incorporates prototype data, edit-distance metrics, and expert feedback to continuously update part-of-speech and morphosyntactic classification. An advantage of the analyzer is its ability to achieve competitive classification accuracy with minimum training data.

  8. A new supervised learning algorithm for spiking neurons.

    PubMed

    Xu, Yan; Zeng, Xiaoqin; Zhong, Shuiming

    2013-06-01

    The purpose of supervised learning with temporal encoding for spiking neurons is to make the neurons emit a specific spike train encoded by the precise firing times of spikes. If only running time is considered, the supervised learning for a spiking neuron is equivalent to distinguishing the times of desired output spikes and the other time during the running process of the neuron through adjusting synaptic weights, which can be regarded as a classification problem. Based on this idea, this letter proposes a new supervised learning method for spiking neurons with temporal encoding; it first transforms the supervised learning into a classification problem and then solves the problem by using the perceptron learning rule. The experiment results show that the proposed method has higher learning accuracy and efficiency over the existing learning methods, so it is more powerful for solving complex and real-time problems.

  9. Active semi-supervised learning method with hybrid deep belief networks.

    PubMed

    Zhou, Shusen; Chen, Qingcai; Wang, Xiaolong

    2014-01-01

    In this paper, we develop a novel semi-supervised learning algorithm called active hybrid deep belief networks (AHD), to address the semi-supervised sentiment classification problem with deep learning. First, we construct the previous several hidden layers using restricted Boltzmann machines (RBM), which can reduce the dimension and abstract the information of the reviews quickly. Second, we construct the following hidden layers using convolutional restricted Boltzmann machines (CRBM), which can abstract the information of reviews effectively. Third, the constructed deep architecture is fine-tuned by gradient-descent based supervised learning with an exponential loss function. Finally, active learning method is combined based on the proposed deep architecture. We did several experiments on five sentiment classification datasets, and show that AHD is competitive with previous semi-supervised learning algorithm. Experiments are also conducted to verify the effectiveness of our proposed method with different number of labeled reviews and unlabeled reviews respectively.

  10. 7 CFR 27.10 - Supervision of cotton inspection, weighing, sampling; and other duties.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Supervision of cotton inspection, weighing, sampling... COMMODITY STANDARDS AND STANDARD CONTAINER REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Administration § 27.10 Supervision of cotton inspection, weighing, sampling; and other...

  11. 7 CFR 27.10 - Supervision of cotton inspection, weighing, sampling; and other duties.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Supervision of cotton inspection, weighing, sampling... COMMODITY STANDARDS AND STANDARD CONTAINER REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Administration § 27.10 Supervision of cotton inspection, weighing, sampling; and other...

  12. 7 CFR 27.10 - Supervision of cotton inspection, weighing, sampling; and other duties.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Supervision of cotton inspection, weighing, sampling... COMMODITY STANDARDS AND STANDARD CONTAINER REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Administration § 27.10 Supervision of cotton inspection, weighing, sampling; and other...

  13. 7 CFR 27.10 - Supervision of cotton inspection, weighing, sampling; and other duties.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Supervision of cotton inspection, weighing, sampling... COMMODITY STANDARDS AND STANDARD CONTAINER REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Administration § 27.10 Supervision of cotton inspection, weighing, sampling; and other...

  14. 7 CFR 27.10 - Supervision of cotton inspection, weighing, sampling; and other duties.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Supervision of cotton inspection, weighing, sampling... COMMODITY STANDARDS AND STANDARD CONTAINER REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Administration § 27.10 Supervision of cotton inspection, weighing, sampling; and other...

  15. Rapid classification of heavy metal-exposed freshwater bacteria by infrared spectroscopy coupled with chemometrics using supervised method

    NASA Astrophysics Data System (ADS)

    Gurbanov, Rafig; Gozen, Ayse Gul; Severcan, Feride

    2018-01-01

    Rapid, cost-effective, sensitive and accurate methodologies to classify bacteria are still in the process of development. The major drawbacks of standard microbiological, molecular and immunological techniques call for the possible usage of infrared (IR) spectroscopy based supervised chemometric techniques. Previous applications of IR based chemometric methods have demonstrated outstanding findings in the classification of bacteria. Therefore, we have exploited an IR spectroscopy based chemometrics using supervised method namely Soft Independent Modeling of Class Analogy (SIMCA) technique for the first time to classify heavy metal-exposed bacteria to be used in the selection of suitable bacteria to evaluate their potential for environmental cleanup applications. Herein, we present the powerful differentiation and classification of laboratory strains (Escherichia coli and Staphylococcus aureus) and environmental isolates (Gordonia sp. and Microbacterium oxydans) of bacteria exposed to growth inhibitory concentrations of silver (Ag), cadmium (Cd) and lead (Pb). Our results demonstrated that SIMCA was able to differentiate all heavy metal-exposed and control groups from each other with 95% confidence level. Correct identification of randomly chosen test samples in their corresponding groups and high model distances between the classes were also achieved. We report, for the first time, the success of IR spectroscopy coupled with supervised chemometric technique SIMCA in classification of different bacteria under a given treatment.

  16. Task-driven dictionary learning.

    PubMed

    Mairal, Julien; Bach, Francis; Ponce, Jean

    2012-04-01

    Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a large-scale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in large-scale settings, and is well suited to supervised and semi-supervised classification, as well as regression tasks for data that admit sparse representations.

  17. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

    PubMed Central

    Galpert, Deborah; del Río, Sara; Herrera, Francisco; Ancede-Gallardo, Evys; Antunes, Agostinho; Agüero-Chapin, Guillermin

    2015-01-01

    Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification. PMID:26605337

  18. Semi-supervised and unsupervised extreme learning machines.

    PubMed

    Huang, Gao; Song, Shiji; Gupta, Jatinder N D; Wu, Cheng

    2014-12-01

    Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.

  19. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species.

    PubMed

    Galpert, Deborah; Del Río, Sara; Herrera, Francisco; Ancede-Gallardo, Evys; Antunes, Agostinho; Agüero-Chapin, Guillermin

    2015-01-01

    Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.

  20. [Object-oriented aquatic vegetation extracting approach based on visible vegetation indices.

    PubMed

    Jing, Ran; Deng, Lei; Zhao, Wen Ji; Gong, Zhao Ning

    2016-05-01

    Using the estimation of scale parameters (ESP) image segmentation tool to determine the ideal image segmentation scale, the optimal segmented image was created by the multi-scale segmentation method. Based on the visible vegetation indices derived from mini-UAV imaging data, we chose a set of optimal vegetation indices from a series of visible vegetation indices, and built up a decision tree rule. A membership function was used to automatically classify the study area and an aquatic vegetation map was generated. The results showed the overall accuracy of image classification using the supervised classification was 53.7%, and the overall accuracy of object-oriented image analysis (OBIA) was 91.7%. Compared with pixel-based supervised classification method, the OBIA method improved significantly the image classification result and further increased the accuracy of extracting the aquatic vegetation. The Kappa value of supervised classification was 0.4, and the Kappa value based OBIA was 0.9. The experimental results demonstrated that using visible vegetation indices derived from the mini-UAV data and OBIA method extracting the aquatic vegetation developed in this study was feasible and could be applied in other physically similar areas.

  1. Physical Human Activity Recognition Using Wearable Sensors.

    PubMed

    Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine

    2015-12-11

    This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.

  2. Physical Human Activity Recognition Using Wearable Sensors

    PubMed Central

    Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine

    2015-01-01

    This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors’ placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject. PMID:26690450

  3. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds.

    PubMed

    Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M; Bloom, Peter H; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.

  4. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

    PubMed Central

    Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data. PMID:28403159

  5. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

    USGS Publications Warehouse

    Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael J.; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.

  6. Detection of sunn pest-damaged wheat samples using visible/near-infrared spectroscopy based on pattern recognition.

    PubMed

    Basati, Zahra; Jamshidi, Bahareh; Rasekh, Mansour; Abbaspour-Gilandeh, Yousef

    2018-05-30

    The presence of sunn pest-damaged grains in wheat mass reduces the quality of flour and bread produced from it. Therefore, it is essential to assess the quality of the samples in collecting and storage centers of wheat and flour mills. In this research, the capability of visible/near-infrared (Vis/NIR) spectroscopy combined with pattern recognition methods was investigated for discrimination of wheat samples with different percentages of sunn pest-damaged. To this end, various samples belonging to five classes (healthy and 5%, 10%, 15% and 20% unhealthy) were analyzed using Vis/NIR spectroscopy (wavelength range of 350-1000 nm) based on both supervised and unsupervised pattern recognition methods. Principal component analysis (PCA) and hierarchical cluster analysis (HCA) as the unsupervised techniques and soft independent modeling of class analogies (SIMCA) and partial least squares-discriminant analysis (PLS-DA) as supervised methods were used. The results showed that Vis/NIR spectra of healthy samples were correctly clustered using both PCA and HCA. Due to the high overlapping between the four unhealthy classes (5%, 10%, 15% and 20%), it was not possible to discriminate all the unhealthy samples in individual classes. However, when considering only the two main categories of healthy and unhealthy, an acceptable degree of separation between the classes can be obtained after classification with supervised pattern recognition methods of SIMCA and PLS-DA. SIMCA based on PCA modeling correctly classified samples in two classes of healthy and unhealthy with classification accuracy of 100%. Moreover, the power of the wavelengths of 839 nm, 918 nm and 995 nm were more than other wavelengths to discriminate two classes of healthy and unhealthy. It was also concluded that PLS-DA provides excellent classification results of healthy and unhealthy samples (R 2  = 0.973 and RMSECV = 0.057). Therefore, Vis/NIR spectroscopy based on pattern recognition techniques can be useful for rapid distinguishing the healthy wheat samples from those damaged by sunn pest in the maintenance and processing centers. Copyright © 2018 Elsevier B.V. All rights reserved.

  7. Tensor-based classification of an auditory mobile BCI without a subject-specific calibration phase

    NASA Astrophysics Data System (ADS)

    Zink, Rob; Hunyadi, Borbála; Van Huffel, Sabine; De Vos, Maarten

    2016-04-01

    Objective. One of the major drawbacks in EEG brain-computer interfaces (BCI) is the need for subject-specific training of the classifier. By removing the need for a supervised calibration phase, new users could potentially explore a BCI faster. In this work we aim to remove this subject-specific calibration phase and allow direct classification. Approach. We explore canonical polyadic decompositions and block term decompositions of the EEG. These methods exploit structure in higher dimensional data arrays called tensors. The BCI tensors are constructed by concatenating ERP templates from other subjects to a target and non-target trial and the inherent structure guides a decomposition that allows accurate classification. We illustrate the new method on data from a three-class auditory oddball paradigm. Main results. The presented approach leads to a fast and intuitive classification with accuracies competitive with a supervised and cross-validated LDA approach. Significance. The described methods are a promising new way of classifying BCI data with a forthright link to the original P300 ERP signal over the conventional and widely used supervised approaches.

  8. Tensor-based classification of an auditory mobile BCI without a subject-specific calibration phase.

    PubMed

    Zink, Rob; Hunyadi, Borbála; Huffel, Sabine Van; Vos, Maarten De

    2016-04-01

    One of the major drawbacks in EEG brain-computer interfaces (BCI) is the need for subject-specific training of the classifier. By removing the need for a supervised calibration phase, new users could potentially explore a BCI faster. In this work we aim to remove this subject-specific calibration phase and allow direct classification. We explore canonical polyadic decompositions and block term decompositions of the EEG. These methods exploit structure in higher dimensional data arrays called tensors. The BCI tensors are constructed by concatenating ERP templates from other subjects to a target and non-target trial and the inherent structure guides a decomposition that allows accurate classification. We illustrate the new method on data from a three-class auditory oddball paradigm. The presented approach leads to a fast and intuitive classification with accuracies competitive with a supervised and cross-validated LDA approach. The described methods are a promising new way of classifying BCI data with a forthright link to the original P300 ERP signal over the conventional and widely used supervised approaches.

  9. Automatic Building Detection based on Supervised Classification using High Resolution Google Earth Images

    NASA Astrophysics Data System (ADS)

    Ghaffarian, S.; Ghaffarian, S.

    2014-08-01

    This paper presents a novel approach to detect the buildings by automization of the training area collecting stage for supervised classification. The method based on the fact that a 3d building structure should cast a shadow under suitable imaging conditions. Therefore, the methodology begins with the detection and masking out the shadow areas using luminance component of the LAB color space, which indicates the lightness of the image, and a novel double thresholding technique. Further, the training areas for supervised classification are selected by automatically determining a buffer zone on each building whose shadow is detected by using the shadow shape and the sun illumination direction. Thereafter, by calculating the statistic values of each buffer zone which is collected from the building areas the Improved Parallelepiped Supervised Classification is executed to detect the buildings. Standard deviation thresholding applied to the Parallelepiped classification method to improve its accuracy. Finally, simple morphological operations conducted for releasing the noises and increasing the accuracy of the results. The experiments were performed on set of high resolution Google Earth images. The performance of the proposed approach was assessed by comparing the results of the proposed approach with the reference data by using well-known quality measurements (Precision, Recall and F1-score) to evaluate the pixel-based and object-based performances of the proposed approach. Evaluation of the results illustrates that buildings detected from dense and suburban districts with divers characteristics and color combinations using our proposed method have 88.4 % and 853 % overall pixel-based and object-based precision performances, respectively.

  10. Biological classification with RNA-Seq data: Can alternatively spliced transcript expression enhance machine learning classifier?

    PubMed

    Johnson, Nathan T; Dhroso, Andi; Hughes, Katelyn J; Korkin, Dmitry

    2018-06-25

    The extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data. In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis. Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  11. Semi-supervised anomaly detection - towards model-independent searches of new physics

    NASA Astrophysics Data System (ADS)

    Kuusela, Mikael; Vatanen, Tommi; Malmi, Eric; Raiko, Tapani; Aaltonen, Timo; Nagai, Yoshikazu

    2012-06-01

    Most classification algorithms used in high energy physics fall under the category of supervised machine learning. Such methods require a training set containing both signal and background events and are prone to classification errors should this training data be systematically inaccurate for example due to the assumed MC model. To complement such model-dependent searches, we propose an algorithm based on semi-supervised anomaly detection techniques, which does not require a MC training sample for the signal data. We first model the background using a multivariate Gaussian mixture model. We then search for deviations from this model by fitting to the observations a mixture of the background model and a number of additional Gaussians. This allows us to perform pattern recognition of any anomalous excess over the background. We show by a comparison to neural network classifiers that such an approach is a lot more robust against misspecification of the signal MC than supervised classification. In cases where there is an unexpected signal, a neural network might fail to correctly identify it, while anomaly detection does not suffer from such a limitation. On the other hand, when there are no systematic errors in the training data, both methods perform comparably.

  12. Differentiation of Crataegus spp. guided by nuclear magnetic resonance spectrometry with chemometric analyses.

    PubMed

    Lund, Jensen A; Brown, Paula N; Shipley, Paul R

    2017-09-01

    For compliance with US Current Good Manufacturing Practice regulations for dietary supplements, manufacturers must provide identity of source plant material. Despite the popularity of hawthorn as a dietary supplement, relatively little is known about the comparative phytochemistry of different hawthorn species, and in particular North American hawthorns. The combination of NMR spectrometry with chemometric analyses offers an innovative approach to differentiating hawthorn species and exploring the phytochemistry. Two European and two North American species, harvested from a farm trial in late summer 2008, were analyzed by standard 1D 1 H and J-resolved (JRES) experiments. The data were preprocessed and modelled by principal component analysis (PCA). A supervised model was then generated by partial least squares-discriminant analysis (PLS-DA) for classification and evaluated by cross validation. Supervised random forests models were constructed from the dataset to explore the potential of machine learning for identification of unique patterns across species. 1D 1 H NMR data yielded increased differentiation over the JRES data. The random forests results correlated with PLS-DA results and outperformed PLS-DA in classification accuracy. In all of these analyses differentiation of the Crataegus spp. was best achieved by focusing on the NMR spectral region that contains signals unique to plant phenolic compounds. Identification of potentially significant metabolites for differentiation between species was approached using univariate techniques including significance analysis of microarrays and Kruskall-Wallis tests. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers.

    PubMed

    Galpert, Deborah; Fernández, Alberto; Herrera, Francisco; Antunes, Agostinho; Molina-Ruiz, Reinaldo; Agüero-Chapin, Guillermin

    2018-05-03

    The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.

  14. Supervised Gamma Process Poisson Factorization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Dylan Zachary

    This thesis develops the supervised gamma process Poisson factorization (S- GPPF) framework, a novel supervised topic model for joint modeling of count matrices and document labels. S-GPPF is fully generative and nonparametric: document labels and count matrices are modeled under a uni ed probabilistic framework and the number of latent topics is controlled automatically via a gamma process prior. The framework provides for multi-class classification of documents using a generative max-margin classifier. Several recent data augmentation techniques are leveraged to provide for exact inference using a Gibbs sampling scheme. The first portion of this thesis reviews supervised topic modeling andmore » several key mathematical devices used in the formulation of S-GPPF. The thesis then introduces the S-GPPF generative model and derives the conditional posterior distributions of the latent variables for posterior inference via Gibbs sampling. The S-GPPF is shown to exhibit state-of-the-art performance for joint topic modeling and document classification on a dataset of conference abstracts, beating out competing supervised topic models. The unique properties of S-GPPF along with its competitive performance make it a novel contribution to supervised topic modeling.« less

  15. New developments in technology-assisted supervision and training: a practical overview.

    PubMed

    Rousmaniere, Tony; Abbass, Allan; Frederickson, Jon

    2014-11-01

    Clinical supervision and training are now widely available online. In this article, three of the most accessible and widely adopted new developments in clinical supervision and training technology are described: Videoconference supervision, cloud-based file sharing software, and clinical outcome tracking software. Partial transcripts from two online supervision sessions are provided as examples of videoconference-based supervision. The benefits and limitations of technology in supervision and training are discussed, with an emphasis on supervision process, ethics, privacy, and security. Recommendations for supervision practice are made, including methods to enhance experiential learning, the supervisory working alliance, and online security. © 2014 Wiley Periodicals, Inc.

  16. Restricted Boltzmann machines based oversampling and semi-supervised learning for false positive reduction in breast CAD.

    PubMed

    Cao, Peng; Liu, Xiaoli; Bao, Hang; Yang, Jinzhu; Zhao, Dazhe

    2015-01-01

    The false-positive reduction (FPR) is a crucial step in the computer aided detection system for the breast. The issues of imbalanced data distribution and the limitation of labeled samples complicate the classification procedure. To overcome these challenges, we propose oversampling and semi-supervised learning methods based on the restricted Boltzmann machines (RBMs) to solve the classification of imbalanced data with a few labeled samples. To evaluate the proposed method, we conducted a comprehensive performance study and compared its results with the commonly used techniques. Experiments on benchmark dataset of DDSM demonstrate the effectiveness of the RBMs based oversampling and semi-supervised learning method in terms of geometric mean (G-mean) for false positive reduction in Breast CAD.

  17. Supervised multiblock sparse multivariable analysis with application to multimodal brain imaging genetics.

    PubMed

    Kawaguchi, Atsushi; Yamashita, Fumio

    2017-10-01

    This article proposes a procedure for describing the relationship between high-dimensional data sets, such as multimodal brain images and genetic data. We propose a supervised technique to incorporate the clinical outcome to determine a score, which is a linear combination of variables with hieratical structures to multimodalities. This approach is expected to obtain interpretable and predictive scores. The proposed method was applied to a study of Alzheimer's disease (AD). We propose a diagnostic method for AD that involves using whole-brain magnetic resonance imaging (MRI) and positron emission tomography (PET), and we select effective brain regions for the diagnostic probability and investigate the genome-wide association with the regions using single nucleotide polymorphisms (SNPs). The two-step dimension reduction method, which we previously introduced, was considered applicable to such a study and allows us to partially incorporate the proposed method. We show that the proposed method offers classification functions with feasibility and reasonable prediction accuracy based on the receiver operating characteristic (ROC) analysis and reasonable regions of the brain and genomes. Our simulation study based on the synthetic structured data set showed that the proposed method outperformed the original method and provided the characteristic for the supervised feature. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  18. 12 CFR 560.160 - Asset classification.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 12 Banks and Banking 5 2010-01-01 2010-01-01 false Asset classification. 560.160 Section 560.160 Banks and Banking OFFICE OF THRIFT SUPERVISION, DEPARTMENT OF THE TREASURY LENDING AND INVESTMENT Lending and Investment Provisions Applicable to all Savings Associations § 560.160 Asset classification...

  19. Discrimination and supervised classification of volcanic flows of the Puna-Altiplano, Central Andes Mountains using Landsat TM data

    NASA Technical Reports Server (NTRS)

    Mcbride, J. H.; Fielding, E. J.; Isacks, B. L.

    1987-01-01

    Landsat Thematic Mapper (TM) images of portions of the Central Andean Puna-Altiplano volcanic belt have been tested for the feasibility of discriminating individual volcanic flows using supervised classifications. This technique distinguishes volcanic rock classes as well as individual phases (i.e., relative age groups) within each class. The spectral signature of a volcanic rock class appears to depend on original texture and composition and on the degree of erosion, weathering, and chemical alteration. Basalts and basaltic andesite stand out as a clearly distinguishable class. The age dependent degree of weathering of these generally dark volcanic rocks can be correlated with reflectance: older rocks have a higher reflectance. On the basis of this relationship, basaltaic lava flows can be separated into several subclasses. These individual subclasses would correspond to mappable geologic units on the ground at a reconnaissance scale. The supervised classification maps are therefore useful for establishing a general stratigraphic framework for later detailed surface mapping of volcanic sequences.

  20. A Multiagent-based Intrusion Detection System with the Support of Multi-Class Supervised Classification

    NASA Astrophysics Data System (ADS)

    Shyu, Mei-Ling; Sainani, Varsha

    The increasing number of network security related incidents have made it necessary for the organizations to actively protect their sensitive data with network intrusion detection systems (IDSs). IDSs are expected to analyze a large volume of data while not placing a significantly added load on the monitoring systems and networks. This requires good data mining strategies which take less time and give accurate results. In this study, a novel data mining assisted multiagent-based intrusion detection system (DMAS-IDS) is proposed, particularly with the support of multiclass supervised classification. These agents can detect and take predefined actions against malicious activities, and data mining techniques can help detect them. Our proposed DMAS-IDS shows superior performance compared to central sniffing IDS techniques, and saves network resources compared to other distributed IDS with mobile agents that activate too many sniffers causing bottlenecks in the network. This is one of the major motivations to use a distributed model based on multiagent platform along with a supervised classification technique.

  1. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling

    PubMed Central

    Zhou, Fuqun; Zhang, Aining

    2016-01-01

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2–3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests’ features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data. PMID:27792152

  2. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling.

    PubMed

    Zhou, Fuqun; Zhang, Aining

    2016-10-25

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2-3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests' features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data.

  3. Unsupervised active learning based on hierarchical graph-theoretic clustering.

    PubMed

    Hu, Weiming; Hu, Wei; Xie, Nianhua; Maybank, Steve

    2009-10-01

    Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.

  4. The dynamics of human-induced land cover change in miombo ecosystems of southern Africa

    NASA Astrophysics Data System (ADS)

    Jaiteh, Malanding Sambou

    Understanding human-induced land cover change in the miombo require the consistent, geographically-referenced, data on temporal land cover characteristics as well as biophysical and socioeconomic drivers of land use, the major cause of land cover change. The overall goal of this research to examine the applications of high-resolution satellite remote sensing data in studying the dynamics of human-induced land cover change in the miombo. Specific objectives are to: (1) evaluate the applications of computer-assisted classification of Landsat Thematic Mapper (TM) data for land cover mapping in the miombo and (2) analyze spatial and temporal patterns of landscape change locations in the miombo. Stepwise Thematic Classification, STC (a hybrid supervised-unsupervised classification) procedure for classifying Landsat TM data was developed and tested using Landsat TM data. Classification accuracy results were compared to those from supervised and unsupervised classification. The STC provided the highest classification accuracy i.e., 83.9% correspondence between classified and referenced data compared to 44.2% and 34.5% for unsupervised and supervised classification respectively. Improvements in the classification process can be attributed to thematic stratification of the image data into spectrally homogenous (thematic) groups and step-by-step classification of the groups using supervised or unsupervised classification techniques. Supervised classification failed to classify 18% of the scene evidence that training data used did not adequately represent all of the variability in the data. Application of the procedure in drier miombo produced overall classification accuracy of 63%. This is much lower than that of wetter miombo. The results clearly demonstrate that digital classification of Landsat TM can be successfully implemented in the miombo without intensive fieldwork. Spatial characteristics of land cover change in agricultural and forested landscapes in central Malawi were analyzed for the period 1984 to 1995 spatial pattern analysis methods. Shifting cultivation areas, Agriculture in forested landscape, experienced highest rate of woodland cover fragmentation with mean patch size of closed woodland cover decreasing from 20ha to 7.5ha. Permanent bare (cropland and settlement) in intensive agricultural matrix landscapes increased 52% largely through the conversion of fallow areas. Protected National Park area remained fairly unchanged although closed woodland area increased by 4%, mainly from regeneration of open woodland. This study provided evidence that changes in spatial characteristics in the miombo differ with landscape. Land use change (i.e. conversion to cropland) is the primary driving force behind changes in landscape spatial patterns. Also, results revealed that exclusion of intense human use (i.e. cultivation and woodcutting) through regulations and/or fencing increased both closed woodland area (through regeneration of open woodland) and overall connectivity in the landscape. Spatial characteristics of land cover change were analyzed at locations in Malawi (wetter miombo) and Zimbabwe (drier miombo). Results indicate land cover dynamics differ both between and within case study sites. In communal areas in the Kasungu scene, land cover change is dominated by woodland fragmentation to open vegetation. Change in private commercial lands was dominantly expansion of bare (settlement and cropland) areas primarily at the expense of open vegetation (fallow land).

  5. Voxel-Based Neighborhood for Spatial Shape Pattern Classification of Lidar Point Clouds with Supervised Learning.

    PubMed

    Plaza-Leiva, Victoria; Gomez-Ruiz, Jose Antonio; Mandow, Anthony; García-Cerezo, Alfonso

    2017-03-15

    Improving the effectiveness of spatial shape features classification from 3D lidar data is very relevant because it is largely used as a fundamental step towards higher level scene understanding challenges of autonomous vehicles and terrestrial robots. In this sense, computing neighborhood for points in dense scans becomes a costly process for both training and classification. This paper proposes a new general framework for implementing and comparing different supervised learning classifiers with a simple voxel-based neighborhood computation where points in each non-overlapping voxel in a regular grid are assigned to the same class by considering features within a support region defined by the voxel itself. The contribution provides offline training and online classification procedures as well as five alternative feature vector definitions based on principal component analysis for scatter, tubular and planar shapes. Moreover, the feasibility of this approach is evaluated by implementing a neural network (NN) method previously proposed by the authors as well as three other supervised learning classifiers found in scene processing methods: support vector machines (SVM), Gaussian processes (GP), and Gaussian mixture models (GMM). A comparative performance analysis is presented using real point clouds from both natural and urban environments and two different 3D rangefinders (a tilting Hokuyo UTM-30LX and a Riegl). Classification performance metrics and processing time measurements confirm the benefits of the NN classifier and the feasibility of voxel-based neighborhood.

  6. Conduction Delay Learning Model for Unsupervised and Supervised Classification of Spatio-Temporal Spike Patterns

    PubMed Central

    Matsubara, Takashi

    2017-01-01

    Precise spike timing is considered to play a fundamental role in communications and signal processing in biological neural networks. Understanding the mechanism of spike timing adjustment would deepen our understanding of biological systems and enable advanced engineering applications such as efficient computational architectures. However, the biological mechanisms that adjust and maintain spike timing remain unclear. Existing algorithms adopt a supervised approach, which adjusts the axonal conduction delay and synaptic efficacy until the spike timings approximate the desired timings. This study proposes a spike timing-dependent learning model that adjusts the axonal conduction delay and synaptic efficacy in both unsupervised and supervised manners. The proposed learning algorithm approximates the Expectation-Maximization algorithm, and classifies the input data encoded into spatio-temporal spike patterns. Even in the supervised classification, the algorithm requires no external spikes indicating the desired spike timings unlike existing algorithms. Furthermore, because the algorithm is consistent with biological models and hypotheses found in existing biological studies, it could capture the mechanism underlying biological delay learning. PMID:29209191

  7. Conduction Delay Learning Model for Unsupervised and Supervised Classification of Spatio-Temporal Spike Patterns.

    PubMed

    Matsubara, Takashi

    2017-01-01

    Precise spike timing is considered to play a fundamental role in communications and signal processing in biological neural networks. Understanding the mechanism of spike timing adjustment would deepen our understanding of biological systems and enable advanced engineering applications such as efficient computational architectures. However, the biological mechanisms that adjust and maintain spike timing remain unclear. Existing algorithms adopt a supervised approach, which adjusts the axonal conduction delay and synaptic efficacy until the spike timings approximate the desired timings. This study proposes a spike timing-dependent learning model that adjusts the axonal conduction delay and synaptic efficacy in both unsupervised and supervised manners. The proposed learning algorithm approximates the Expectation-Maximization algorithm, and classifies the input data encoded into spatio-temporal spike patterns. Even in the supervised classification, the algorithm requires no external spikes indicating the desired spike timings unlike existing algorithms. Furthermore, because the algorithm is consistent with biological models and hypotheses found in existing biological studies, it could capture the mechanism underlying biological delay learning.

  8. 9 CFR 145.23 - Terminology and classification; flocks and products.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ...” or have met equivalent requirements for pullorum-typhoid control under official supervision; (ii) All... equivalent requirements for pullorum-typhoid control under official supervision: Provided, That if other... the following terms and the corresponding designs illustrated in § 145.10: (a) [Reserved] (b) U.S...

  9. 9 CFR 145.23 - Terminology and classification; flocks and products.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ...” or have met equivalent requirements for pullorum-typhoid control under official supervision; (ii) All... equivalent requirements for pullorum-typhoid control under official supervision: Provided, That if other... the following terms and the corresponding designs illustrated in § 145.10: (a) [Reserved] (b) U.S...

  10. A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

    PubMed Central

    Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953

  11. Machine learning on brain MRI data for differential diagnosis of Parkinson's disease and Progressive Supranuclear Palsy.

    PubMed

    Salvatore, C; Cerasa, A; Castiglioni, I; Gallivanone, F; Augimeri, A; Lopez, M; Arabia, G; Morelli, M; Gilardi, M C; Quattrone, A

    2014-01-30

    Supervised machine learning has been proposed as a revolutionary approach for identifying sensitive medical image biomarkers (or combination of them) allowing for automatic diagnosis of individual subjects. The aim of this work was to assess the feasibility of a supervised machine learning algorithm for the assisted diagnosis of patients with clinically diagnosed Parkinson's disease (PD) and Progressive Supranuclear Palsy (PSP). Morphological T1-weighted Magnetic Resonance Images (MRIs) of PD patients (28), PSP patients (28) and healthy control subjects (28) were used by a supervised machine learning algorithm based on the combination of Principal Components Analysis as feature extraction technique and on Support Vector Machines as classification algorithm. The algorithm was able to obtain voxel-based morphological biomarkers of PD and PSP. The algorithm allowed individual diagnosis of PD versus controls, PSP versus controls and PSP versus PD with an Accuracy, Specificity and Sensitivity>90%. Voxels influencing classification between PD and PSP patients involved midbrain, pons, corpus callosum and thalamus, four critical regions known to be strongly involved in the pathophysiological mechanisms of PSP. Classification accuracy of individual PSP patients was consistent with previous manual morphological metrics and with other supervised machine learning application to MRI data, whereas accuracy in the detection of individual PD patients was significantly higher with our classification method. The algorithm provides excellent discrimination of PD patients from PSP patients at an individual level, thus encouraging the application of computer-based diagnosis in clinical practice. Copyright © 2013 Elsevier B.V. All rights reserved.

  12. A comparison of supervised machine learning algorithms and feature vectors for MS lesion segmentation using multimodal structural MRI.

    PubMed

    Sweeney, Elizabeth M; Vogelstein, Joshua T; Cuzzocreo, Jennifer L; Calabresi, Peter A; Reich, Daniel S; Crainiceanu, Ciprian M; Shinohara, Russell T

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance.

  13. 7 CFR 27.80 - Fees; classification, Micronaire, and supervision.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ....80 Section 27.80 Agriculture Regulations of the Department of Agriculture AGRICULTURAL MARKETING SERVICE (Standards, Inspections, Marketing Practices), DEPARTMENT OF AGRICULTURE COMMODITY STANDARDS AND STANDARD CONTAINER REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Costs of...

  14. Impervious surface mapping with Quickbird imagery

    PubMed Central

    Lu, Dengsheng; Hetrick, Scott; Moran, Emilio

    2010-01-01

    This research selects two study areas with different urban developments, sizes, and spatial patterns to explore the suitable methods for mapping impervious surface distribution using Quickbird imagery. The selected methods include per-pixel based supervised classification, segmentation-based classification, and a hybrid method. A comparative analysis of the results indicates that per-pixel based supervised classification produces a large number of “salt-and-pepper” pixels, and segmentation based methods can significantly reduce this problem. However, neither method can effectively solve the spectral confusion of impervious surfaces with water/wetland and bare soils and the impacts of shadows. In order to accurately map impervious surface distribution from Quickbird images, manual editing is necessary and may be the only way to extract impervious surfaces from the confused land covers and the shadow problem. This research indicates that the hybrid method consisting of thresholding techniques, unsupervised classification and limited manual editing provides the best performance. PMID:21643434

  15. Comparison of standard maximum likelihood classification and polytomous logistic regression used in remote sensing

    Treesearch

    John Hogland; Nedret Billor; Nathaniel Anderson

    2013-01-01

    Discriminant analysis, referred to as maximum likelihood classification within popular remote sensing software packages, is a common supervised technique used by analysts. Polytomous logistic regression (PLR), also referred to as multinomial logistic regression, is an alternative classification approach that is less restrictive, more flexible, and easy to interpret. To...

  16. Seizure Classification From EEG Signals Using Transfer Learning, Semi-Supervised Learning and TSK Fuzzy System.

    PubMed

    Jiang, Yizhang; Wu, Dongrui; Deng, Zhaohong; Qian, Pengjiang; Wang, Jun; Wang, Guanjin; Chung, Fu-Lai; Choi, Kup-Sze; Wang, Shitong

    2017-12-01

    Recognition of epileptic seizures from offline EEG signals is very important in clinical diagnosis of epilepsy. Compared with manual labeling of EEG signals by doctors, machine learning approaches can be faster and more consistent. However, the classification accuracy is usually not satisfactory for two main reasons: the distributions of the data used for training and testing may be different, and the amount of training data may not be enough. In addition, most machine learning approaches generate black-box models that are difficult to interpret. In this paper, we integrate transductive transfer learning, semi-supervised learning and TSK fuzzy system to tackle these three problems. More specifically, we use transfer learning to reduce the discrepancy in data distribution between the training and testing data, employ semi-supervised learning to use the unlabeled testing data to remedy the shortage of training data, and adopt TSK fuzzy system to increase model interpretability. Two learning algorithms are proposed to train the system. Our experimental results show that the proposed approaches can achieve better performance than many state-of-the-art seizure classification algorithms.

  17. HClass: Automatic classification tool for health pathologies using artificial intelligence techniques.

    PubMed

    Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya

    2015-01-01

    The classification of subjects' pathologies enables a rigorousness to be applied to the treatment of certain pathologies, as doctors on occasions play with so many variables that they can end up confusing some illnesses with others. Thanks to Machine Learning techniques applied to a health-record database, it is possible to make using our algorithm. hClass contains a non-linear classification of either a supervised, non-supervised or semi-supervised type. The machine is configured using other techniques such as validation of the set to be classified (cross-validation), reduction in features (PCA) and committees for assessing the various classifiers. The tool is easy to use, and the sample matrix and features that one wishes to classify, the number of iterations and the subjects who are going to be used to train the machine all need to be introduced as inputs. As a result, the success rate is shown either via a classifier or via a committee if one has been formed. A 90% success rate is obtained in the ADABoost classifier and 89.7% in the case of a committee (comprising three classifiers) when PCA is applied. This tool can be expanded to allow the user to totally characterise the classifiers by adjusting them to each classification use.

  18. Test of spectral/spatial classifier

    NASA Technical Reports Server (NTRS)

    Landgrebe, D. A. (Principal Investigator); Kast, J. L.; Davis, B. J.

    1977-01-01

    The author has identified the following significant results. The supervised ECHO processor (which utilizes class statistics for object identification) successfully exploits the redundancy of states characteristic of sampled imagery of ground scenes to achieve better classification accuracy, reduce the number of classifications required, and reduce the variability of classification results. The nonsupervised ECHO processor (which identifies objects without the benefit of class statistics) successfully reduces the number of classifications required and the variability of the classification results.

  19. When Family-Supportive Supervision Matters: Relations between Multiple Sources of Support and Work-Family Balance

    ERIC Educational Resources Information Center

    Greenhaus, Jeffrey H.; Ziegert, Jonathan C.; Allen, Tammy D.

    2012-01-01

    This study examines the mechanisms by which family-supportive supervision is related to employee work-family balance. Based on a sample of 170 business professionals, we found that the positive relation between family-supportive supervision and balance was fully mediated by work interference with family (WIF) and partially mediated by family…

  20. Organization and Supervision of Elementary Education in 100 Cities. Bulletin, 1949, No. 11

    ERIC Educational Resources Information Center

    Bathurst, Effie G.; Davis, Mary Dabney; Gabbard, Hazel; Mackintosh, Helen K.; Patterson, Don S.

    1949-01-01

    This bulletin is the full report of a study made by the Division of Elementary Education to help answer questions frequently asked about elementary school organization and supervision. These questions concern organization for instruction; supervisory personnel; in-service techniques; scheduling; classification; records; reports to parents;…

  1. Rapid classification of pharmaceutical ingredients with Raman spectroscopy using compressive detection strategy with PLS-DA multivariate filters.

    PubMed

    Cebeci Maltaş, Derya; Kwok, Kaho; Wang, Ping; Taylor, Lynne S; Ben-Amotz, Dor

    2013-06-01

    Identifying pharmaceutical ingredients is a routine procedure required during industrial manufacturing. Here we show that a recently developed Raman compressive detection strategy can be employed to classify various widely used pharmaceutical materials using a hybrid supervised/unsupervised strategy in which only two ingredients are used for training and yet six other ingredients can also be distinguished. More specifically, our liquid crystal spatial light modulator (LC-SLM) based compressive detection instrument is trained using only the active ingredient, tadalafil, and the excipient, lactose, but is tested using these and various other excipients; microcrystalline cellulose, magnesium stearate, titanium (IV) oxide, talc, sodium lauryl sulfate and hydroxypropyl cellulose. Partial least squares discriminant analysis (PLS-DA) is used to generate the compressive detection filters necessary for fast chemical classification. Although the filters used in this study are trained on only lactose and tadalafil, we show that all the pharmaceutical ingredients mentioned above can be differentiated and classified using PLS-DA compressive detection filters with an accumulation time of 10ms per filter. Copyright © 2013 Elsevier B.V. All rights reserved.

  2. Assessing the varietal origin of extra-virgin olive oil using liquid chromatography fingerprints of phenolic compound, data fusion and chemometrics.

    PubMed

    Bajoub, Aadil; Medina-Rodríguez, Santiago; Gómez-Romero, María; Ajal, El Amine; Bagur-González, María Gracia; Fernández-Gutiérrez, Alberto; Carrasco-Pancorbo, Alegría

    2017-01-15

    High Performance Liquid Chromatography (HPLC) with diode array (DAD) and fluorescence (FLD) detection was used to acquire the fingerprints of the phenolic fraction of monovarietal extra-virgin olive oils (extra-VOOs) collected over three consecutive crop seasons (2011/2012-2013/2014). The chromatographic fingerprints of 140 extra-VOO samples processed from olive fruits of seven olive varieties, were recorded and statistically treated for varietal authentication purposes. First, DAD and FLD chromatographic-fingerprint datasets were separately processed and, subsequently, were joined using "Low-level" and "Mid-Level" data fusion methods. After the preliminary examination by principal component analysis (PCA), three supervised pattern recognition techniques, Partial Least Squares Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogies (SIMCA) and K-Nearest Neighbors (k-NN) were applied to the four chromatographic-fingerprinting matrices. The classification models built were very sensitive and selective, showing considerably good recognition and prediction abilities. The combination "chromatographic dataset+chemometric technique" allowing the most accurate classification for each monovarietal extra-VOO was highlighted. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions.

    PubMed

    Chen, Ke; Wang, Shihai

    2011-01-01

    Semi-supervised learning concerns the problem of learning in the presence of labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes all three semi-supervised assumptions, i.e., smoothness, cluster, and manifold assumptions, together into account during boosting learning. In this paper, we propose a novel cost functional consisting of the margin cost on labeled data and the regularization penalty on unlabeled data based on three fundamental semi-supervised assumptions. Thus, minimizing our proposed cost functional with a greedy yet stagewise functional optimization procedure leads to a generic boosting framework for semi-supervised learning. Extensive experiments demonstrate that our algorithm yields favorite results for benchmark and real-world classification tasks in comparison to state-of-the-art semi-supervised learning algorithms, including newly developed boosting algorithms. Finally, we discuss relevant issues and relate our algorithm to the previous work.

  4. Pharmaceutical Raw Material Identification Using Miniature Near-Infrared (MicroNIR) Spectroscopy and Supervised Pattern Recognition Using Support Vector Machine

    PubMed Central

    Hsiung, Chang; Pederson, Christopher G.; Zou, Peng; Smith, Valton; von Gunten, Marc; O’Brien, Nada A.

    2016-01-01

    Near-infrared spectroscopy as a rapid and non-destructive analytical technique offers great advantages for pharmaceutical raw material identification (RMID) to fulfill the quality and safety requirements in pharmaceutical industry. In this study, we demonstrated the use of portable miniature near-infrared (MicroNIR) spectrometers for NIR-based pharmaceutical RMID and solved two challenges in this area, model transferability and large-scale classification, with the aid of support vector machine (SVM) modeling. We used a set of 19 pharmaceutical compounds including various active pharmaceutical ingredients (APIs) and excipients and six MicroNIR spectrometers to test model transferability. For the test of large-scale classification, we used another set of 253 pharmaceutical compounds comprised of both chemically and physically different APIs and excipients. We compared SVM with conventional chemometric modeling techniques, including soft independent modeling of class analogy, partial least squares discriminant analysis, linear discriminant analysis, and quadratic discriminant analysis. Support vector machine modeling using a linear kernel, especially when combined with a hierarchical scheme, exhibited excellent performance in both model transferability and large-scale classification. Hence, ultra-compact, portable and robust MicroNIR spectrometers coupled with SVM modeling can make on-site and in situ pharmaceutical RMID for large-volume applications highly achievable. PMID:27029624

  5. Supervised pixel classification using a feature space derived from an artificial visual system

    NASA Technical Reports Server (NTRS)

    Baxter, Lisa C.; Coggins, James M.

    1991-01-01

    Image segmentation involves labelling pixels according to their membership in image regions. This requires the understanding of what a region is. Using supervised pixel classification, the paper investigates how groups of pixels labelled manually according to perceived image semantics map onto the feature space created by an Artificial Visual System. Multiscale structure of regions are investigated and it is shown that pixels form clusters based on their geometric roles in the image intensity function, not by image semantics. A tentative abstract definition of a 'region' is proposed based on this behavior.

  6. An evaluation of unsupervised and supervised learning algorithms for clustering landscape types in the United States

    USGS Publications Warehouse

    Wendel, Jochen; Buttenfield, Barbara P.; Stanislawski, Larry V.

    2016-01-01

    Knowledge of landscape type can inform cartographic generalization of hydrographic features, because landscape characteristics provide an important geographic context that affects variation in channel geometry, flow pattern, and network configuration. Landscape types are characterized by expansive spatial gradients, lacking abrupt changes between adjacent classes; and as having a limited number of outliers that might confound classification. The US Geological Survey (USGS) is exploring methods to automate generalization of features in the National Hydrography Data set (NHD), to associate specific sequences of processing operations and parameters with specific landscape characteristics, thus obviating manual selection of a unique processing strategy for every NHD watershed unit. A chronology of methods to delineate physiographic regions for the United States is described, including a recent maximum likelihood classification based on seven input variables. This research compares unsupervised and supervised algorithms applied to these seven input variables, to evaluate and possibly refine the recent classification. Evaluation metrics for unsupervised methods include the Davies–Bouldin index, the Silhouette index, and the Dunn index as well as quantization and topographic error metrics. Cross validation and misclassification rate analysis are used to evaluate supervised classification methods. The paper reports the comparative analysis and its impact on the selection of landscape regions. The compared solutions show problems in areas of high landscape diversity. There is some indication that additional input variables, additional classes, or more sophisticated methods can refine the existing classification.

  7. Voxel-Based Neighborhood for Spatial Shape Pattern Classification of Lidar Point Clouds with Supervised Learning

    PubMed Central

    Plaza-Leiva, Victoria; Gomez-Ruiz, Jose Antonio; Mandow, Anthony; García-Cerezo, Alfonso

    2017-01-01

    Improving the effectiveness of spatial shape features classification from 3D lidar data is very relevant because it is largely used as a fundamental step towards higher level scene understanding challenges of autonomous vehicles and terrestrial robots. In this sense, computing neighborhood for points in dense scans becomes a costly process for both training and classification. This paper proposes a new general framework for implementing and comparing different supervised learning classifiers with a simple voxel-based neighborhood computation where points in each non-overlapping voxel in a regular grid are assigned to the same class by considering features within a support region defined by the voxel itself. The contribution provides offline training and online classification procedures as well as five alternative feature vector definitions based on principal component analysis for scatter, tubular and planar shapes. Moreover, the feasibility of this approach is evaluated by implementing a neural network (NN) method previously proposed by the authors as well as three other supervised learning classifiers found in scene processing methods: support vector machines (SVM), Gaussian processes (GP), and Gaussian mixture models (GMM). A comparative performance analysis is presented using real point clouds from both natural and urban environments and two different 3D rangefinders (a tilting Hokuyo UTM-30LX and a Riegl). Classification performance metrics and processing time measurements confirm the benefits of the NN classifier and the feasibility of voxel-based neighborhood. PMID:28294963

  8. Wire connector classification with machine vision and a novel hybrid SVM

    NASA Astrophysics Data System (ADS)

    Chauhan, Vedang; Joshi, Keyur D.; Surgenor, Brian W.

    2018-04-01

    A machine vision-based system has been developed and tested that uses a novel hybrid Support Vector Machine (SVM) in a part inspection application with clear plastic wire connectors. The application required the system to differentiate between 4 different known styles of connectors plus one unknown style, for a total of 5 classes. The requirement to handle an unknown class is what necessitated the hybrid approach. The system was trained with the 4 known classes and tested with 5 classes (the 4 known plus the 1 unknown). The hybrid classification approach used two layers of SVMs: one layer was semi-supervised and the other layer was supervised. The semi-supervised SVM was a special case of unsupervised machine learning that classified test images as one of the 4 known classes (to accept) or as the unknown class (to reject). The supervised SVM classified test images as one of the 4 known classes and consequently would give false positives (FPs). Two methods were tested. The difference between the methods was that the order of the layers was switched. The method with the semi-supervised layer first gave an accuracy of 80% with 20% FPs. The method with the supervised layer first gave an accuracy of 98% with 0% FPs. Further work is being conducted to see if the hybrid approach works with other applications that have an unknown class requirement.

  9. Speech reconstruction using a deep partially supervised neural network.

    PubMed

    McLoughlin, Ian; Li, Jingjie; Song, Yan; Sharifzadeh, Hamid R

    2017-08-01

    Statistical speech reconstruction for larynx-related dysphonia has achieved good performance using Gaussian mixture models and, more recently, restricted Boltzmann machine arrays; however, deep neural network (DNN)-based systems have been hampered by the limited amount of training data available from individual voice-loss patients. The authors propose a novel DNN structure that allows a partially supervised training approach on spectral features from smaller data sets, yielding very good results compared with the current state-of-the-art.

  10. Towards Automatic Classification of Exoplanet-Transit-Like Signals: A Case Study on Kepler Mission Data

    NASA Astrophysics Data System (ADS)

    Valizadegan, Hamed; Martin, Rodney; McCauliff, Sean D.; Jenkins, Jon Michael; Catanzarite, Joseph; Oza, Nikunj C.

    2015-08-01

    Building new catalogues of planetary candidates, astrophysical false alarms, and non-transiting phenomena is a challenging task that currently requires a reviewing team of astrophysicists and astronomers. These scientists need to examine more than 100 diagnostic metrics and associated graphics for each candidate exoplanet-transit-like signal to classify it into one of the three classes. Considering that the NASA Explorer Program's TESS mission and ESA's PLATO mission survey even a larger area of space, the classification of their transit-like signals is more time-consuming for human agents and a bottleneck to successfully construct the new catalogues in a timely manner. This encourages building automatic classification tools that can quickly and reliably classify the new signal data from these missions. The standard tool for building automatic classification systems is the supervised machine learning that requires a large set of highly accurate labeled examples in order to build an effective classifier. This requirement cannot be easily met for classifying transit-like signals because not only are existing labeled signals very limited, but also the current labels may not be reliable (because the labeling process is a subjective task). Our experiments with using different supervised classifiers to categorize transit-like signals verifies that the labeled signals are not rich enough to provide the classifier with enough power to generalize well beyond the observed cases (e.g. to unseen or test signals). That motivated us to utilize a new category of learning techniques, so-called semi-supervised learning, that combines the label information from the costly labeled signals, and distribution information from the cheaply available unlabeled signals in order to construct more effective classifiers. Our study on the Kepler Mission data shows that semi-supervised learning can significantly improve the result of multiple base classifiers (e.g. Support Vector Machines, AdaBoost, and Decision Tree) and is a good technique for automatic classification of exoplanet-transit-like signal.

  11. Arabic Supervised Learning Method Using N-Gram

    ERIC Educational Resources Information Center

    Sanan, Majed; Rammal, Mahmoud; Zreik, Khaldoun

    2008-01-01

    Purpose: Recently, classification of Arabic documents is a real problem for juridical centers. In this case, some of the Lebanese official journal documents are classified, and the center has to classify new documents based on these documents. This paper aims to study and explain the useful application of supervised learning method on Arabic texts…

  12. Supervised machine learning and active learning in classification of radiology reports.

    PubMed

    Nguyen, Dung H M; Patrick, Jon D

    2014-01-01

    This paper presents an automated system for classifying the results of imaging examinations (CT, MRI, positron emission tomography) into reportable and non-reportable cancer cases. This system is part of an industrial-strength processing pipeline built to extract content from radiology reports for use in the Victorian Cancer Registry. In addition to traditional supervised learning methods such as conditional random fields and support vector machines, active learning (AL) approaches were investigated to optimize training production and further improve classification performance. The project involved two pilot sites in Victoria, Australia (Lake Imaging (Ballarat) and Peter MacCallum Cancer Centre (Melbourne)) and, in collaboration with the NSW Central Registry, one pilot site at Westmead Hospital (Sydney). The reportability classifier performance achieved 98.25% sensitivity and 96.14% specificity on the cancer registry's held-out test set. Up to 92% of training data needed for supervised machine learning can be saved by AL. AL is a promising method for optimizing the supervised training production used in classification of radiology reports. When an AL strategy is applied during the data selection process, the cost of manual classification can be reduced significantly. The most important practical application of the reportability classifier is that it can dramatically reduce human effort in identifying relevant reports from the large imaging pool for further investigation of cancer. The classifier is built on a large real-world dataset and can achieve high performance in filtering relevant reports to support cancer registries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  13. Predictive Models of target organ and Systemic toxicities (BOSC)

    EPA Science Inventory

    The objective of this work is to predict the hazard classification and point of departure (PoD) of untested chemicals in repeat-dose animal testing studies. We used supervised machine learning to objectively evaluate the predictive accuracy of different classification and regress...

  14. Feature Inference Learning and Eyetracking

    ERIC Educational Resources Information Center

    Rehder, Bob; Colner, Robert M.; Hoffman, Aaron B.

    2009-01-01

    Besides traditional supervised classification learning, people can learn categories by inferring the missing features of category members. It has been proposed that feature inference learning promotes learning a category's internal structure (e.g., its typical features and interfeature correlations) whereas classification promotes the learning of…

  15. Learning Supervised Topic Models for Classification and Regression from Crowds.

    PubMed

    Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete; Pereira, Francisco C

    2017-12-01

    The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches.

  16. [Exploring Flow and Supervision of Medical Instruments by Standing on Frontier of the Reform of Free Trade Zone].

    PubMed

    Shen, Jianhua; Han, Meixian; Lu, Fei

    2017-11-30

    Shanghai Waigaoqiao Free Trade Zone as one of the special customs supervision areas of China (Shanghai) free trade pilot area, gathered a large number of general agent enterprises related to medical apparatus and instruments. This article analyzes the characteristics of special environment and medical equipment business in Shanghai Waigaoqiao Free Trade Zone in order to further implement the national administrative examination and approval reform. According to the latest requirement in laws and regulations of medical instruments, and trend of development in the industry of medical instruments, as well as research on the basis of practices of market supervision in countries around the world, this article also proposes measures about precision supervision, coordination of supervision, classification supervision and dynamic supervision to establish a new order of fair and standardized competition in market, and create conditions for establishment of allocation and transport hub of international medicine.

  17. Protein classification using modified n-grams and skip-grams.

    PubMed

    Islam, S M Ashiqul; Heil, Benjamin J; Kearney, Christopher Michel; Baker, Erich J

    2018-05-01

    Classification by supervised machine learning greatly facilitates the annotation of protein characteristics from their primary sequence. However, the feature generation step in this process requires detailed knowledge of attributes used to classify the proteins. Lack of this knowledge risks the selection of irrelevant features, resulting in a faulty model. In this study, we introduce a supervised protein classification method with a novel means of automating the work-intensive feature generation step via a Natural Language Processing (NLP)-dependent model, using a modified combination of n-grams and skip-grams (m-NGSG). A meta-comparison of cross-validation accuracy with twelve training datasets from nine different published studies demonstrates a consistent increase in accuracy of m-NGSG when compared to contemporary classification and feature generation models. We expect this model to accelerate the classification of proteins from primary sequence data and increase the accessibility of protein characteristic prediction to a broader range of scientists. m-NGSG is freely available at Bitbucket: https://bitbucket.org/sm_islam/mngsg/src. A web server is available at watson.ecs.baylor.edu/ngsg. erich_baker@baylor.edu. Supplementary data are available at Bioinformatics online.

  18. High resolution mapping and classification of oyster habitats in nearshore Louisiana using sidescan sonar

    USGS Publications Warehouse

    Allen, Y.C.; Wilson, C.A.; Roberts, H.H.; Supan, J.

    2005-01-01

    Sidescan sonar holds great promise as a tool to quantitatively depict the distribution and extent of benthic habitats in Louisiana's turbid estuaries. In this study, we describe an effective protocol for acoustic sampling in this environment. We also compared three methods of classification in detail: mean-based thresholding, supervised, and unsupervised techniques to classify sidescan imagery into categories of mud and shell. Classification results were compared to ground truth results using quadrat and dredge sampling. Supervised classification gave the best overall result (kappa = 75%) when compared to quadrat results. Classification accuracy was less robust when compared to all dredge samples (kappa = 21-56%), but increased greatly (90-100%) when only dredge samples taken from acoustically homogeneous areas were considered. Sidescan sonar when combined with ground truth sampling at an appropriate scale can be effectively used to establish an accurate substrate base map for both research applications and shellfish management. The sidescan imagery presented here also provides, for the first time, a detailed presentation of oyster habitat patchiness and scale in a productive oyster growing area.

  19. A non-parametric, supervised classification of vegetation types on the Kaibab National Forest using decision trees

    Treesearch

    Suzanne M. Joy; R. M. Reich; Richard T. Reynolds

    2003-01-01

    Traditional land classification techniques for large areas that use Landsat Thematic Mapper (TM) imagery are typically limited to the fixed spatial resolution of the sensors (30m). However, the study of some ecological processes requires land cover classifications at finer spatial resolutions. We model forest vegetation types on the Kaibab National Forest (KNF) in...

  20. An Active Learning Framework for Hyperspectral Image Classification Using Hierarchical Segmentation

    NASA Technical Reports Server (NTRS)

    Zhang, Zhou; Pasolli, Edoardo; Crawford, Melba M.; Tilton, James C.

    2015-01-01

    Augmenting spectral data with spatial information for image classification has recently gained significant attention, as classification accuracy can often be improved by extracting spatial information from neighboring pixels. In this paper, we propose a new framework in which active learning (AL) and hierarchical segmentation (HSeg) are combined for spectral-spatial classification of hyperspectral images. The spatial information is extracted from a best segmentation obtained by pruning the HSeg tree using a new supervised strategy. The best segmentation is updated at each iteration of the AL process, thus taking advantage of informative labeled samples provided by the user. The proposed strategy incorporates spatial information in two ways: 1) concatenating the extracted spatial features and the original spectral features into a stacked vector and 2) extending the training set using a self-learning-based semi-supervised learning (SSL) approach. Finally, the two strategies are combined within an AL framework. The proposed framework is validated with two benchmark hyperspectral datasets. Higher classification accuracies are obtained by the proposed framework with respect to five other state-of-the-art spectral-spatial classification approaches. Moreover, the effectiveness of the proposed pruning strategy is also demonstrated relative to the approaches based on a fixed segmentation.

  1. Multi-Modal Curriculum Learning for Semi-Supervised Image Classification.

    PubMed

    Gong, Chen; Tao, Dacheng; Maybank, Stephen J; Liu, Wei; Kang, Guoliang; Yang, Jie

    2016-07-01

    Semi-supervised image classification aims to classify a large quantity of unlabeled images by typically harnessing scarce labeled images. Existing semi-supervised methods often suffer from inadequate classification accuracy when encountering difficult yet critical images, such as outliers, because they treat all unlabeled images equally and conduct classifications in an imperfectly ordered sequence. In this paper, we employ the curriculum learning methodology by investigating the difficulty of classifying every unlabeled image. The reliability and the discriminability of these unlabeled images are particularly investigated for evaluating their difficulty. As a result, an optimized image sequence is generated during the iterative propagations, and the unlabeled images are logically classified from simple to difficult. Furthermore, since images are usually characterized by multiple visual feature descriptors, we associate each kind of features with a teacher, and design a multi-modal curriculum learning (MMCL) strategy to integrate the information from different feature modalities. In each propagation, each teacher analyzes the difficulties of the currently unlabeled images from its own modality viewpoint. A consensus is subsequently reached among all the teachers, determining the currently simplest images (i.e., a curriculum), which are to be reliably classified by the multi-modal learner. This well-organized propagation process leveraging multiple teachers and one learner enables our MMCL to outperform five state-of-the-art methods on eight popular image data sets.

  2. Gender classification system in uncontrolled environments

    NASA Astrophysics Data System (ADS)

    Zeng, Pingping; Zhang, Yu-Jin; Duan, Fei

    2011-01-01

    Most face analysis systems available today perform mainly on restricted databases of images in terms of size, age, illumination. In addition, it is frequently assumed that all images are frontal and unconcealed. Actually, in a non-guided real-time supervision, the face pictures taken may often be partially covered and with head rotation less or more. In this paper, a special system supposed to be used in real-time surveillance with un-calibrated camera and non-guided photography is described. It mainly consists of five parts: face detection, non-face filtering, best-angle face selection, texture normalization, and gender classification. Emphases are focused on non-face filtering and best-angle face selection parts as well as texture normalization. Best-angle faces are figured out by PCA reconstruction, which equals to an implicit face alignment and results in a huge increase of the accuracy for gender classification. Dynamic skin model and a masked PCA reconstruction algorithm are applied to filter out faces detected in error. In order to fully include facial-texture and shape-outline features, a hybrid feature that is a combination of Gabor wavelet and PHoG (pyramid histogram of gradients) was proposed to equitable inner texture and outer contour. Comparative study on the effects of different non-face filtering and texture masking methods in the context of gender classification by SVM is reported through experiments on a set of UT (a company name) face images, a large number of internet images and CAS (Chinese Academy of Sciences) face database. Some encouraging results are obtained.

  3. 77 FR 58868 - Public Land Order No. 7798; Partial Modification of Power Site Classification No. 126; Washington

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-09-24

    ...; WAOR-19641] Public Land Order No. 7798; Partial Modification of Power Site Classification No. 126... partially modifies a withdrawal which established Power Site Classification No. 126, insofar as it affects... under Power Site Classification No. 126 for water power purposes will not be injured by U.S. Forest...

  4. Feasibility study of stain-free classification of cell apoptosis based on diffraction imaging flow cytometry and supervised machine learning techniques.

    PubMed

    Feng, Jingwen; Feng, Tong; Yang, Chengwen; Wang, Wei; Sa, Yu; Feng, Yuanming

    2018-06-01

    This study was to explore the feasibility of prediction and classification of cells in different stages of apoptosis with a stain-free method based on diffraction images and supervised machine learning. Apoptosis was induced in human chronic myelogenous leukemia K562 cells by cis-platinum (DDP). A newly developed technique of polarization diffraction imaging flow cytometry (p-DIFC) was performed to acquire diffraction images of the cells in three different statuses (viable, early apoptotic and late apoptotic/necrotic) after cell separation through fluorescence activated cell sorting with Annexin V-PE and SYTOX® Green double staining. The texture features of the diffraction images were extracted with in-house software based on the Gray-level co-occurrence matrix algorithm to generate datasets for cell classification with supervised machine learning method. Therefore, this new method has been verified in hydrogen peroxide induced apoptosis model of HL-60. Results show that accuracy of higher than 90% was achieved respectively in independent test datasets from each cell type based on logistic regression with ridge estimators, which indicated that p-DIFC system has a great potential in predicting and classifying cells in different stages of apoptosis.

  5. Multilabel user classification using the community structure of online networks

    PubMed Central

    Papadopoulos, Symeon; Kompatsiaris, Yiannis

    2017-01-01

    We study the problem of semi-supervised, multi-label user classification of networked data in the online social platform setting. We propose a framework that combines unsupervised community extraction and supervised, community-based feature weighting before training a classifier. We introduce Approximate Regularized Commute-Time Embedding (ARCTE), an algorithm that projects the users of a social graph onto a latent space, but instead of packing the global structure into a matrix of predefined rank, as many spectral and neural representation learning methods do, it extracts local communities for all users in the graph in order to learn a sparse embedding. To this end, we employ an improvement of personalized PageRank algorithms for searching locally in each user’s graph structure. Then, we perform supervised community feature weighting in order to boost the importance of highly predictive communities. We assess our method performance on the problem of user classification by performing an extensive comparative study among various recent methods based on graph embeddings. The comparison shows that ARCTE significantly outperforms the competition in almost all cases, achieving up to 35% relative improvement compared to the second best competing method in terms of F1-score. PMID:28278242

  6. Classification

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.

    2011-01-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. In supervised learning, a set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. This chapter discusses methods to perform machine learning, with examples involving astronomy.

  7. Multilabel user classification using the community structure of online networks.

    PubMed

    Rizos, Georgios; Papadopoulos, Symeon; Kompatsiaris, Yiannis

    2017-01-01

    We study the problem of semi-supervised, multi-label user classification of networked data in the online social platform setting. We propose a framework that combines unsupervised community extraction and supervised, community-based feature weighting before training a classifier. We introduce Approximate Regularized Commute-Time Embedding (ARCTE), an algorithm that projects the users of a social graph onto a latent space, but instead of packing the global structure into a matrix of predefined rank, as many spectral and neural representation learning methods do, it extracts local communities for all users in the graph in order to learn a sparse embedding. To this end, we employ an improvement of personalized PageRank algorithms for searching locally in each user's graph structure. Then, we perform supervised community feature weighting in order to boost the importance of highly predictive communities. We assess our method performance on the problem of user classification by performing an extensive comparative study among various recent methods based on graph embeddings. The comparison shows that ARCTE significantly outperforms the competition in almost all cases, achieving up to 35% relative improvement compared to the second best competing method in terms of F1-score.

  8. Semi-supervised vibration-based classification and condition monitoring of compressors

    NASA Astrophysics Data System (ADS)

    Potočnik, Primož; Govekar, Edvard

    2017-09-01

    Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.

  9. Active relearning for robust supervised classification of pulmonary emphysema

    NASA Astrophysics Data System (ADS)

    Raghunath, Sushravya; Rajagopalan, Srinivasan; Karwoski, Ronald A.; Bartholmai, Brian J.; Robb, Richard A.

    2012-03-01

    Radiologists are adept at recognizing the appearance of lung parenchymal abnormalities in CT scans. However, the inconsistent differential diagnosis, due to subjective aggregation, mandates supervised classification. Towards optimizing Emphysema classification, we introduce a physician-in-the-loop feedback approach in order to minimize uncertainty in the selected training samples. Using multi-view inductive learning with the training samples, an ensemble of Support Vector Machine (SVM) models, each based on a specific pair-wise dissimilarity metric, was constructed in less than six seconds. In the active relearning phase, the ensemble-expert label conflicts were resolved by an expert. This just-in-time feedback with unoptimized SVMs yielded 15% increase in classification accuracy and 25% reduction in the number of support vectors. The generality of relearning was assessed in the optimized parameter space of six different classifiers across seven dissimilarity metrics. The resultant average accuracy improved to 21%. The co-operative feedback method proposed here could enhance both diagnostic and staging throughput efficiency in chest radiology practice.

  10. Application of satellite data and LARS's data processing techniques to mapping vegetation of the Dismal Swamp. M.S. Thesis - Old Dominion Univ.

    NASA Technical Reports Server (NTRS)

    Messmore, J. A.

    1976-01-01

    The feasibility of using digital satellite imagery and automatic data processing techniques as a means of mapping swamp forest vegetation was considered, using multispectral scanner data acquired by the LANDSAT-1 satellite. The site for this investigation was the Dismal Swamp, a 210,000 acre swamp forest located south of Suffolk, Va. on the Virginia-North Carolina border. Two basic classification strategies were employed. The initial classification utilized unsupervised techniques which produced a map of the swamp indicating the distribution of thirteen forest spectral classes. These classes were later combined into three informational categories: Atlantic white cedar (Chamaecyparis thyoides), Loblolly pine (Pinus taeda), and deciduous forest. The subsequent classification employed supervised techniques which mapped Atlantic white cedar, Loblolly pine, deciduous forest, water and agriculture within the study site. A classification accuracy of 82.5% was produced by unsupervised techniques compared with 89% accuracy using supervised techniques.

  11. Sub-pixel image classification for forest types in East Texas

    NASA Astrophysics Data System (ADS)

    Westbrook, Joey

    Sub-pixel classification is the extraction of information about the proportion of individual materials of interest within a pixel. Landcover classification at the sub-pixel scale provides more discrimination than traditional per-pixel multispectral classifiers for pixels where the material of interest is mixed with other materials. It allows for the un-mixing of pixels to show the proportion of each material of interest. The materials of interest for this study are pine, hardwood, mixed forest and non-forest. The goal of this project was to perform a sub-pixel classification, which allows a pixel to have multiple labels, and compare the result to a traditional supervised classification, which allows a pixel to have only one label. The satellite image used was a Landsat 5 Thematic Mapper (TM) scene of the Stephen F. Austin Experimental Forest in Nacogdoches County, Texas and the four cover type classes are pine, hardwood, mixed forest and non-forest. Once classified, a multi-layer raster datasets was created that comprised four raster layers where each layer showed the percentage of that cover type within the pixel area. Percentage cover type maps were then produced and the accuracy of each was assessed using a fuzzy error matrix for the sub-pixel classifications, and the results were compared to the supervised classification in which a traditional error matrix was used. The overall accuracy of the sub-pixel classification using the aerial photo for both training and reference data had the highest (65% overall) out of the three sub-pixel classifications. This was understandable because the analyst can visually observe the cover types actually on the ground for training data and reference data, whereas using the FIA (Forest Inventory and Analysis) plot data, the analyst must assume that an entire pixel contains the exact percentage of a cover type found in a plot. An increase in accuracy was found after reclassifying each sub-pixel classification from nine classes with 10 percent interval each to five classes with 20 percent interval each. When compared to the supervised classification which has a satisfactory overall accuracy of 90%, none of the sub-pixel classification achieved the same level. However, since traditional per-pixel classifiers assign only one label to pixels throughout the landscape while sub-pixel classifications assign multiple labels to each pixel, the traditional 85% accuracy of acceptance for pixel-based classifications should not apply to sub-pixel classifications. More research is needed in order to define the level of accuracy that is deemed acceptable for sub-pixel classifications.

  12. Extending bicluster analysis to annotate unclassified ORFs and predict novel functional modules using expression data

    PubMed Central

    Bryan, Kenneth; Cunningham, Pádraig

    2008-01-01

    Background Microarrays have the capacity to measure the expressions of thousands of genes in parallel over many experimental samples. The unsupervised classification technique of bicluster analysis has been employed previously to uncover gene expression correlations over subsets of samples with the aim of providing a more accurate model of the natural gene functional classes. This approach also has the potential to aid functional annotation of unclassified open reading frames (ORFs). Until now this aspect of biclustering has been under-explored. In this work we illustrate how bicluster analysis may be extended into a 'semi-supervised' ORF annotation approach referred to as BALBOA. Results The efficacy of the BALBOA ORF classification technique is first assessed via cross validation and compared to a multi-class k-Nearest Neighbour (kNN) benchmark across three independent gene expression datasets. BALBOA is then used to assign putative functional annotations to unclassified yeast ORFs. These predictions are evaluated using existing experimental and protein sequence information. Lastly, we employ a related semi-supervised method to predict the presence of novel functional modules within yeast. Conclusion In this paper we demonstrate how unsupervised classification methods, such as bicluster analysis, may be extended using of available annotations to form semi-supervised approaches within the gene expression analysis domain. We show that such methods have the potential to improve upon supervised approaches and shed new light on the functions of unclassified ORFs and their co-regulation. PMID:18831786

  13. Sea ice type maps from Alaska synthetic aperture radar facility imagery: An assessment

    NASA Technical Reports Server (NTRS)

    Fetterer, Florence M.; Gineris, Denise; Kwok, Ronald

    1994-01-01

    Synthetic aperture radar (SAR) imagery received at the Alaskan SAR Facility is routinely and automatically classified on the Geophysical Processor System (GPS) to create ice type maps. We evaluated the wintertime performance of the GPS classification algorithm by comparing ice type percentages from supervised classification with percentages from the algorithm. The root mean square (RMS) difference for multiyear ice is about 6%, while the inconsistency in supervised classification is about 3%. The algorithm separates first-year from multiyear ice well, although it sometimes fails to correctly classify new ice and open water owing to the wide distribution of backscatter for these classes. Our results imply a high degree of accuracy and consistency in the growing archive of multiyear and first-year ice distribution maps. These results have implications for heat and mass balance studies which are furthered by the ability to accurately characterize ice type distributions over a large part of the Arctic.

  14. Classification of wines according to their production regions with the contained trace elements using laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Tian, Ye; Yan, Chunhua; Zhang, Tianlong; Tang, Hongsheng; Li, Hua; Yu, Jialu; Bernard, Jérôme; Chen, Li; Martin, Serge; Delepine-Gilon, Nicole; Bocková, Jana; Veis, Pavel; Chen, Yanping; Yu, Jin

    2017-09-01

    Laser-induced breakdown spectroscopy (LIBS) has been applied to classify French wines according to their production regions. The use of the surface-assisted (or surface-enhanced) sample preparation method enabled a sub-ppm limit of detection (LOD), which led to the detection and identification of at least 22 metal and nonmetal elements in a typical wine sample including majors, minors and traces. An ensemble of 29 bottles of French wines, either red or white wines, from five production regions, Alsace, Bourgogne, Beaujolais, Bordeaux and Languedoc, was analyzed together with a wine from California, considered as an outlier. A non-supervised classification model based on principal component analysis (PCA) was first developed for the classification. The results showed a limited separation power of the model, which however allowed, in a step by step approach, to understand the physical reasons behind each step of sample separation and especially to observe the influence of the matrix effect in the sample classification. A supervised classification model was then developed based on random forest (RF), which is in addition a nonlinear algorithm. The obtained classification results were satisfactory with, when the parameters of the model were optimized, a classification accuracy of 100% for the tested samples. We especially discuss in the paper, the effect of spectrum normalization with an internal reference, the choice of input variables for the classification models and the optimization of parameters for the developed classification models.

  15. Procedures for gathering ground truth information for a supervised approach to a computer-implemented land cover classification of LANDSAT-acquired multispectral scanner data

    NASA Technical Reports Server (NTRS)

    Joyce, A. T.

    1978-01-01

    Procedures for gathering ground truth information for a supervised approach to a computer-implemented land cover classification of LANDSAT acquired multispectral scanner data are provided in a step by step manner. Criteria for determining size, number, uniformity, and predominant land cover of training sample sites are established. Suggestions are made for the organization and orientation of field team personnel, the procedures used in the field, and the format of the forms to be used. Estimates are made of the probable expenditures in time and costs. Examples of ground truth forms and definitions and criteria of major land cover categories are provided in appendixes.

  16. Supervised Semantic Classification for Nuclear Proliferation Monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vatsavai, Raju; Cheriyadat, Anil M; Gleason, Shaun Scott

    2010-01-01

    Existing feature extraction and classification approaches are not suitable for monitoring proliferation activity using high-resolution multi-temporal remote sensing imagery. In this paper we present a supervised semantic labeling framework based on the Latent Dirichlet Allocation method. This framework is used to analyze over 120 images collected under different spatial and temporal settings over the globe representing three major semantic categories: airports, nuclear, and coal power plants. Initial experimental results show a reasonable discrimination of these three categories even though coal and nuclear images share highly common and overlapping objects. This research also identified several research challenges associated with nuclear proliferationmore » monitoring using high resolution remote sensing images.« less

  17. Classification of autism spectrum disorder using supervised learning of brain connectivity measures extracted from synchrostates

    NASA Astrophysics Data System (ADS)

    Jamal, Wasifa; Das, Saptarshi; Oprescu, Ioana-Anastasia; Maharatna, Koushik; Apicella, Fabio; Sicca, Federico

    2014-08-01

    Objective. The paper investigates the presence of autism using the functional brain connectivity measures derived from electro-encephalogram (EEG) of children during face perception tasks. Approach. Phase synchronized patterns from 128-channel EEG signals are obtained for typical children and children with autism spectrum disorder (ASD). The phase synchronized states or synchrostates temporally switch amongst themselves as an underlying process for the completion of a particular cognitive task. We used 12 subjects in each group (ASD and typical) for analyzing their EEG while processing fearful, happy and neutral faces. The minimal and maximally occurring synchrostates for each subject are chosen for extraction of brain connectivity features, which are used for classification between these two groups of subjects. Among different supervised learning techniques, we here explored the discriminant analysis and support vector machine both with polynomial kernels for the classification task. Main results. The leave one out cross-validation of the classification algorithm gives 94.7% accuracy as the best performance with corresponding sensitivity and specificity values as 85.7% and 100% respectively. Significance. The proposed method gives high classification accuracies and outperforms other contemporary research results. The effectiveness of the proposed method for classification of autistic and typical children suggests the possibility of using it on a larger population to validate it for clinical practice.

  18. Automatic Classification of Time-variable X-Ray Sources

    NASA Astrophysics Data System (ADS)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ~97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7-500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.

  19. Classification of damage in structural systems using time series analysis and supervised and unsupervised pattern recognition techniques

    NASA Astrophysics Data System (ADS)

    Omenzetter, Piotr; de Lautour, Oliver R.

    2010-04-01

    Developed for studying long, periodic records of various measured quantities, time series analysis methods are inherently suited and offer interesting possibilities for Structural Health Monitoring (SHM) applications. However, their use in SHM can still be regarded as an emerging application and deserves more studies. In this research, Autoregressive (AR) models were used to fit experimental acceleration time histories from two experimental structural systems, a 3- storey bookshelf-type laboratory structure and the ASCE Phase II SHM Benchmark Structure, in healthy and several damaged states. The coefficients of the AR models were chosen as damage sensitive features. Preliminary visual inspection of the large, multidimensional sets of AR coefficients to check the presence of clusters corresponding to different damage severities was achieved using Sammon mapping - an efficient nonlinear data compression technique. Systematic classification of damage into states based on the analysis of the AR coefficients was achieved using two supervised classification techniques: Nearest Neighbor Classification (NNC) and Learning Vector Quantization (LVQ), and one unsupervised technique: Self-organizing Maps (SOM). This paper discusses the performance of AR coefficients as damage sensitive features and compares the efficiency of the three classification techniques using experimental data.

  20. Information Forests

    DTIC Science & Technology

    2014-01-01

    Classification confidence, or informative content of the subsets, is quantified by the Information Divergence. Our approach relates to active learning , semi-supervised learning, mixed generative/discriminative learning.

  1. Participation of surgical residents in operations: challenging a common classification.

    PubMed

    Bezemer, Jeff; Cope, Alexandra; Faiz, Omar; Kneebone, Roger

    2012-09-01

    One important form of surgical training for residents is their participation in actual operations, for instance as an assistant or supervised surgeon. The aim of this study was to explore what participation in operations entails and how it might be described and analyzed. A qualitative study was undertaken in a major teaching hospital in London. A total of 122 general surgical operations were observed. A subsample of 14 laparoscopic cholecystectomies involving one or more residents was analyzed in detail. Audio and video recordings of eight operations were transcribed and analyzed linguistically. The degree of participation of trainees frequently shifted as the operation progressed to the next stage. Participation also varied within each stage. When trainees operated under supervision, the supervisors constantly adjusted their degree of control over the resident's operative maneuvers. Classifications such as "assistant" and "supervised surgeon" describing a trainee's overall participation in an operation potentially misrepresent the varying involvement of resident and supervisor. Video recordings provide a useful alternative for documenting and analyzing actual participation in operations.

  2. Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations.

    PubMed

    Fabelo, Himar; Ortega, Samuel; Ravi, Daniele; Kiran, B Ravi; Sosa, Coralia; Bulters, Diederik; Callicó, Gustavo M; Bulstrode, Harry; Szolna, Adam; Piñeiro, Juan F; Kabwama, Silvester; Madroñal, Daniel; Lazcano, Raquel; J-O'Shanahan, Aruma; Bisshopp, Sara; Hernández, María; Báez, Abelardo; Yang, Guang-Zhong; Stanciulescu, Bogdan; Salvador, Rubén; Juárez, Eduardo; Sarmiento, Roberto

    2018-01-01

    Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a non-contact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. Firstly, a supervised pixel-wise classification using a Support Vector Machine classifier is performed. The generated classification map is spatially homogenized using a one-band representation of the HS cube, employing the Fixed Reference t-Stochastic Neighbors Embedding dimensional reduction algorithm, and performing a K-Nearest Neighbors filtering. The information generated by the supervised stage is combined with a segmentation map obtained via unsupervised clustering employing a Hierarchical K-Means algorithm. The fusion is performed using a majority voting approach that associates each cluster with a certain class. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising, obtaining an accurate delineation of the tumor area.

  3. Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations

    PubMed Central

    Kabwama, Silvester; Madroñal, Daniel; Lazcano, Raquel; J-O’Shanahan, Aruma; Bisshopp, Sara; Hernández, María; Báez, Abelardo; Yang, Guang-Zhong; Stanciulescu, Bogdan; Salvador, Rubén; Juárez, Eduardo; Sarmiento, Roberto

    2018-01-01

    Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a non-contact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. Firstly, a supervised pixel-wise classification using a Support Vector Machine classifier is performed. The generated classification map is spatially homogenized using a one-band representation of the HS cube, employing the Fixed Reference t-Stochastic Neighbors Embedding dimensional reduction algorithm, and performing a K-Nearest Neighbors filtering. The information generated by the supervised stage is combined with a segmentation map obtained via unsupervised clustering employing a Hierarchical K-Means algorithm. The fusion is performed using a majority voting approach that associates each cluster with a certain class. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising, obtaining an accurate delineation of the tumor area. PMID:29554126

  4. Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms.

    PubMed

    Bromuri, Stefano; Zufferey, Damien; Hennebert, Jean; Schumacher, Michael

    2014-10-01

    This research is motivated by the issue of classifying illnesses of chronically ill patients for decision support in clinical settings. Our main objective is to propose multi-label classification of multivariate time series contained in medical records of chronically ill patients, by means of quantization methods, such as bag of words (BoW), and multi-label classification algorithms. Our second objective is to compare supervised dimensionality reduction techniques to state-of-the-art multi-label classification algorithms. The hypothesis is that kernel methods and locality preserving projections make such algorithms good candidates to study multi-label medical time series. We combine BoW and supervised dimensionality reduction algorithms to perform multi-label classification on health records of chronically ill patients. The considered algorithms are compared with state-of-the-art multi-label classifiers in two real world datasets. Portavita dataset contains 525 diabetes type 2 (DT2) patients, with co-morbidities of DT2 such as hypertension, dyslipidemia, and microvascular or macrovascular issues. MIMIC II dataset contains 2635 patients affected by thyroid disease, diabetes mellitus, lipoid metabolism disease, fluid electrolyte disease, hypertensive disease, thrombosis, hypotension, chronic obstructive pulmonary disease (COPD), liver disease and kidney disease. The algorithms are evaluated using multi-label evaluation metrics such as hamming loss, one error, coverage, ranking loss, and average precision. Non-linear dimensionality reduction approaches behave well on medical time series quantized using the BoW algorithm, with results comparable to state-of-the-art multi-label classification algorithms. Chaining the projected features has a positive impact on the performance of the algorithm with respect to pure binary relevance approaches. The evaluation highlights the feasibility of representing medical health records using the BoW for multi-label classification tasks. The study also highlights that dimensionality reduction algorithms based on kernel methods, locality preserving projections or both are good candidates to deal with multi-label classification tasks in medical time series with many missing values and high label density. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. A Hybrid Supervised/Unsupervised Machine Learning Approach to Solar Flare Prediction

    NASA Astrophysics Data System (ADS)

    Benvenuto, Federico; Piana, Michele; Campi, Cristina; Massone, Anna Maria

    2018-01-01

    This paper introduces a novel method for flare forecasting, combining prediction accuracy with the ability to identify the most relevant predictive variables. This result is obtained by means of a two-step approach: first, a supervised regularization method for regression, namely, LASSO is applied, where a sparsity-enhancing penalty term allows the identification of the significance with which each data feature contributes to the prediction; then, an unsupervised fuzzy clustering technique for classification, namely, Fuzzy C-Means, is applied, where the regression outcome is partitioned through the minimization of a cost function and without focusing on the optimization of a specific skill score. This approach is therefore hybrid, since it combines supervised and unsupervised learning; realizes classification in an automatic, skill-score-independent way; and provides effective prediction performances even in the case of imbalanced data sets. Its prediction power is verified against NOAA Space Weather Prediction Center data, using as a test set, data in the range between 1996 August and 2010 December and as training set, data in the range between 1988 December and 1996 June. To validate the method, we computed several skill scores typically utilized in flare prediction and compared the values provided by the hybrid approach with the ones provided by several standard (non-hybrid) machine learning methods. The results showed that the hybrid approach performs classification better than all other supervised methods and with an effectiveness comparable to the one of clustering methods; but, in addition, it provides a reliable ranking of the weights with which the data properties contribute to the forecast.

  6. Training Level, Acculturation, Role Ambiguity, and Multicultural Discussions in Training and Supervising International Counseling Students in the United States

    ERIC Educational Resources Information Center

    Ng, Kok-Mun; Smith, Shannon D.

    2012-01-01

    This research partially replicated Nilsson and Anderson's "Professional Psychology: Research and Practice" (2004) study on training and supervising international students. It investigated the relationships among international counseling students' training level, acculturation, supervisory working alliance (SWA), counseling self-efficacy (COSE),…

  7. Investigating the LGBTQ Responsive Model for Supervision of Group Work

    ERIC Educational Resources Information Center

    Luke, Melissa; Goodrich, Kristopher M.

    2013-01-01

    This article reports an investigation of the LGBTQ Responsive Model for Supervision of Group Work, a trans-theoretical supervisory framework to address the needs of lesbian, gay, bisexual, transgender, and questioning (LGBTQ) persons (Goodrich & Luke, 2011). Findings partially supported applicability of the LGBTQ Responsive Model for Supervision…

  8. Weakly Supervised Dictionary Learning

    NASA Astrophysics Data System (ADS)

    You, Zeyu; Raich, Raviv; Fern, Xiaoli Z.; Kim, Jinsub

    2018-05-01

    We present a probabilistic modeling and inference framework for discriminative analysis dictionary learning under a weak supervision setting. Dictionary learning approaches have been widely used for tasks such as low-level signal denoising and restoration as well as high-level classification tasks, which can be applied to audio and image analysis. Synthesis dictionary learning aims at jointly learning a dictionary and corresponding sparse coefficients to provide accurate data representation. This approach is useful for denoising and signal restoration, but may lead to sub-optimal classification performance. By contrast, analysis dictionary learning provides a transform that maps data to a sparse discriminative representation suitable for classification. We consider the problem of analysis dictionary learning for time-series data under a weak supervision setting in which signals are assigned with a global label instead of an instantaneous label signal. We propose a discriminative probabilistic model that incorporates both label information and sparsity constraints on the underlying latent instantaneous label signal using cardinality control. We present the expectation maximization (EM) procedure for maximum likelihood estimation (MLE) of the proposed model. To facilitate a computationally efficient E-step, we propose both a chain and a novel tree graph reformulation of the graphical model. The performance of the proposed model is demonstrated on both synthetic and real-world data.

  9. Using Computational Text Classification for Qualitative Research and Evaluation in Extension

    ERIC Educational Resources Information Center

    Smith, Justin G.; Tissing, Reid

    2018-01-01

    This article introduces a process for computational text classification that can be used in a variety of qualitative research and evaluation settings. The process leverages supervised machine learning based on an implementation of a multinomial Bayesian classifier. Applied to a community of inquiry framework, the algorithm was used to identify…

  10. Detection and Evaluation of Cheating on College Exams Using Supervised Classification

    ERIC Educational Resources Information Center

    Cavalcanti, Elmano Ramalho; Pires, Carlos Eduardo; Cavalcanti, Elmano Pontes; Pires, Vládia Freire

    2012-01-01

    Text mining has been used for various purposes, such as document classification and extraction of domain-specific information from text. In this paper we present a study in which text mining methodology and algorithms were properly employed for academic dishonesty (cheating) detection and evaluation on open-ended college exams, based on document…

  11. Ultra-Wideband EMI Sensing: Non-Metallic Target Detection and Automatic Classification of Unexploded Ordnance

    NASA Astrophysics Data System (ADS)

    Sigman, John Brevard

    Buried explosive hazards present a pressing problem worldwide. Millions of acres and thousands of sites are contaminated in the United States alone [1, 2]. There are three categories of explosive hazards: metallic, intermediate-electrical conducting (IEC), and non-conducting targets. Metallic target detection and classification by electromagnetic (EM) signature has been the subject of research for many years. Key to the success of this research is modern multi-static Electromagnetic Induction (EMI) sensors, which are able to measure the wideband EMI response from metallic buried targets. However, no hardware solutions exist which can characterize IEC and non-conducting targets. While high-conducting metallic targets exhibit a quadrature peak response for frequencies in a traditional EMI regime under 100 kHz, the response of intermediate-conducting objects manifests at higher frequencies, between 100 kHz and 15 MHz. In addition to high-quality electromagnetic sensor data and robust electromagnetic models, a classification procedure is required to discriminate Targets of Interest (TOI) from clutter. Currently, costly human experts are used for this task. This expense and effort can be spared by using statistical signal processing and machine learning. This thesis has two main parts. In the first part, we explore using the high frequency EMI (HFEMI) band (100 kHz-15 MHz) for detection of carbon fiber UXO, voids, and of materials with characteristics that may be associated with improvised explosive devices (IED). We constructed an HFEMI sensing instrument, and apply the techniques of metal detection to sensing in a band of frequencies which are the transition between the induction and radar bands. In this transition domain, physical considerations and technological issues arise that cannot be solved via the approaches used in either of the bracketing lower and higher frequency ranges. In the second half of this thesis, we present a procedure for automatic classification of UXO. For maximum generality, our algorithm is robust and can handle sparse training examples of multi-class data. This procedure uses an unsupervised starter, semi-supervised techniques to gather training data, and concludes with supervised learning until all TOI are found. Additionally, an inference method for estimating the number of remaining true positives from a partial Receiver Operating Characteristic (ROC) curve is presented and applied to live-site dig histories.

  12. Land cover mapping after the tsunami event over Nanggroe Aceh Darussalam (NAD) province, Indonesia

    NASA Astrophysics Data System (ADS)

    Lim, H. S.; MatJafri, M. Z.; Abdullah, K.; Alias, A. N.; Mohd. Saleh, N.; Wong, C. J.; Surbakti, M. S.

    2008-03-01

    Remote sensing offers an important means of detecting and analyzing temporal changes occurring in our landscape. This research used remote sensing to quantify land use/land cover changes at the Nanggroe Aceh Darussalam (Nad) province, Indonesia on a regional scale. The objective of this paper is to assess the changed produced from the analysis of Landsat TM data. A Landsat TM image was used to develop land cover classification map for the 27 March 2005. Four supervised classifications techniques (Maximum Likelihood, Minimum Distance-to- Mean, Parallelepiped and Parallelepiped with Maximum Likelihood Classifier Tiebreaker classifier) were performed to the satellite image. Training sites and accuracy assessment were needed for supervised classification techniques. The training sites were established using polygons based on the colour image. High detection accuracy (>80%) and overall Kappa (>0.80) were achieved by the Parallelepiped with Maximum Likelihood Classifier Tiebreaker classifier in this study. This preliminary study has produced a promising result. This indicates that land cover mapping can be carried out using remote sensing classification method of the satellite digital imagery.

  13. Modeling EEG Waveforms with Semi-Supervised Deep Belief Nets: Fast Classification and Anomaly Measurement

    PubMed Central

    Wulsin, D. F.; Gupta, J. R.; Mani, R.; Blanco, J. A.; Litt, B.

    2011-01-01

    Clinical electroencephalography (EEG) records vast amounts of human complex data yet is still reviewed primarily by human readers. Deep Belief Nets (DBNs) are a relatively new type of multi-layer neural network commonly tested on two-dimensional image data, but are rarely applied to times-series data such as EEG. We apply DBNs in a semi-supervised paradigm to model EEG waveforms for classification and anomaly detection. DBN performance was comparable to standard classifiers on our EEG dataset, and classification time was found to be 1.7 to 103.7 times faster than the other high-performing classifiers. We demonstrate how the unsupervised step of DBN learning produces an autoencoder that can naturally be used in anomaly measurement. We compare the use of raw, unprocessed data—a rarity in automated physiological waveform analysis—to hand-chosen features and find that raw data produces comparable classification and better anomaly measurement performance. These results indicate that DBNs and raw data inputs may be more effective for online automated EEG waveform recognition than other common techniques. PMID:21525569

  14. Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification.

    PubMed

    Zhang, Xiang; Guan, Naiyang; Jia, Zhilong; Qiu, Xiaogang; Luo, Zhigang

    2015-01-01

    Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF) to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR) to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods.

  15. Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning

    PubMed Central

    Zhao, Nan; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

    2014-01-01

    Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/. PMID:24784581

  16. Organic Chemical Attribution Signatures for the Sourcing of a Mustard Agent and Its Starting Materials

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fraga, Carlos G.; Bronk, Krys; Dockendorff, Brian P.

    Chemical attribution signatures (CAS) are being investigated for the sourcing of chemical warfare (CW) agents and their starting materials that may be implicated in chemical attacks or CW proliferation. The work reported here demonstrates for the first time trace impurities produced during the synthesis of tris(2-chloroethyl)amine (HN3) that point to specific reagent stocks used in the synthesis of this CW agent. Thirty batches of HN3 were synthesized using different combinations of commercial stocks of triethanolamine (TEA), thionyl chloride, chloroform, and acetone. The HN3 batches and reagent stocks were then analyzed for impurities by gas chromatography/mass spectrometry. Reaction-produced impurities indicative ofmore » specific TEA and chloroform stocks were exclusively discovered in HN3 batches made with those reagent stocks. In addition, some reagent impurities were found in the HN3 batches that were presumably not altered during synthesis and believed to be indicative of reagent type regardless of stock. Supervised classification using partial least squares discriminant analysis (PLSDA) on the impurity profiles of chloroform samples from seven stocks resulted in an average classification error by cross-validation of 2.4%. A classification error of zero was obtained using the seven-stock PLSDA model on a validation set of samples from an arbitrarily selected chloroform stock. In a separate analysis, all samples from two of seven chloroform stocks that were purposely not modeled had their samples matched to a chloroform stock rather than assigned a “no class” classification.« less

  17. A Semi-supervised Heat Kernel Pagerank MBO Algorithm for Data Classification

    DTIC Science & Technology

    2016-07-01

    financial predictions, etc. and is finding growing use in text mining studies. In this paper, we present an efficient algorithm for classification of high...video data, set of images, hyperspectral data, medical data, text data, etc. Moreover, the framework provides a way to analyze data whose different...also be incorporated. For text classification, one can use tfidf (term frequency inverse document frequency) to form feature vectors for each document

  18. Evaluation of forest cover estimates for Haiti using supervised classification of Landsat data

    NASA Astrophysics Data System (ADS)

    Churches, Christopher E.; Wampler, Peter J.; Sun, Wanxiao; Smith, Andrew J.

    2014-08-01

    This study uses 2010-2011 Landsat Thematic Mapper (TM) imagery to estimate total forested area in Haiti. The thematic map was generated using radiometric normalization of digital numbers by a modified normalization method utilizing pseudo-invariant polygons (PIPs), followed by supervised classification of the mosaicked image using the Food and Agriculture Organization (FAO) of the United Nations Land Cover Classification System. Classification results were compared to other sources of land-cover data produced for similar years, with an emphasis on the statistics presented by the FAO. Three global land cover datasets (GLC2000, Globcover, 2009, and MODIS MCD12Q1), and a national-scale dataset (a land cover analysis by Haitian National Centre for Geospatial Information (CNIGS)) were reclassified and compared. According to our classification, approximately 32.3% of Haiti's total land area was tree covered in 2010-2011. This result was confirmed using an error-adjusted area estimator, which predicted a tree covered area of 32.4%. Standardization to the FAO's forest cover class definition reduces the amount of tree cover of our supervised classification to 29.4%. This result was greater than the reported FAO value of 4% and the value for the recoded GLC2000 dataset of 7.0%, but is comparable to values for three other recoded datasets: MCD12Q1 (21.1%), Globcover (2009) (26.9%), and CNIGS (19.5%). We propose that at coarse resolutions, the segmented and patchy nature of Haiti's forests resulted in a systematic underestimation of the extent of forest cover. It appears the best explanation for the significant difference between our results, FAO statistics, and compared datasets is the accuracy of the data sources and the resolution of the imagery used for land cover analyses. Analysis of recoded global datasets and results from this study suggest a strong linear relationship (R2 = 0.996 for tree cover) between spatial resolution and land cover estimates.

  19. Multisource Data Classification Using A Hybrid Semi-supervised Learning Scheme

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vatsavai, Raju; Bhaduri, Budhendra L; Shekhar, Shashi

    2009-01-01

    In many practical situations thematic classes can not be discriminated by spectral measurements alone. Often one needs additional features such as population density, road density, wetlands, elevation, soil types, etc. which are discrete attributes. On the other hand remote sensing image features are continuous attributes. Finding a suitable statistical model and estimation of parameters is a challenging task in multisource (e.g., discrete and continuous attributes) data classification. In this paper we present a semi-supervised learning method by assuming that the samples were generated by a mixture model, where each component could be either a continuous or discrete distribution. Overall classificationmore » accuracy of the proposed method is improved by 12% in our initial experiments.« less

  20. Weakly Supervised Segmentation-Aided Classification of Urban Scenes from 3d LIDAR Point Clouds

    NASA Astrophysics Data System (ADS)

    Guinard, S.; Landrieu, L.

    2017-05-01

    We consider the problem of the semantic classification of 3D LiDAR point clouds obtained from urban scenes when the training set is limited. We propose a non-parametric segmentation model for urban scenes composed of anthropic objects of simple shapes, partionning the scene into geometrically-homogeneous segments which size is determined by the local complexity. This segmentation can be integrated into a conditional random field classifier (CRF) in order to capture the high-level structure of the scene. For each cluster, this allows us to aggregate the noisy predictions of a weakly-supervised classifier to produce a higher confidence data term. We demonstrate the improvement provided by our method over two publicly-available large-scale data sets.

  1. An unsupervised classification technique for multispectral remote sensing data.

    NASA Technical Reports Server (NTRS)

    Su, M. Y.; Cummings, R. E.

    1973-01-01

    Description of a two-part clustering technique consisting of (a) a sequential statistical clustering, which is essentially a sequential variance analysis, and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum-likelihood classification techniques.

  2. Unsupervised classification of earth resources data.

    NASA Technical Reports Server (NTRS)

    Su, M. Y.; Jayroe, R. R., Jr.; Cummings, R. E.

    1972-01-01

    A new clustering technique is presented. It consists of two parts: (a) a sequential statistical clustering which is essentially a sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by existing supervised maximum liklihood classification technique.

  3. Neutral face classification using personalized appearance models for fast and robust emotion detection.

    PubMed

    Chiranjeevi, Pojala; Gopalakrishnan, Viswanath; Moogi, Pratibha

    2015-09-01

    Facial expression recognition is one of the open problems in computer vision. Robust neutral face recognition in real time is a major challenge for various supervised learning-based facial expression recognition methods. This is due to the fact that supervised methods cannot accommodate all appearance variability across the faces with respect to race, pose, lighting, facial biases, and so on, in the limited amount of training data. Moreover, processing each and every frame to classify emotions is not required, as user stays neutral for majority of the time in usual applications like video chat or photo album/web browsing. Detecting neutral state at an early stage, thereby bypassing those frames from emotion classification would save the computational power. In this paper, we propose a light-weight neutral versus emotion classification engine, which acts as a pre-processer to the traditional supervised emotion classification approaches. It dynamically learns neutral appearance at key emotion (KE) points using a statistical texture model, constructed by a set of reference neutral frames for each user. The proposed method is made robust to various types of user head motions by accounting for affine distortions based on a statistical texture model. Robustness to dynamic shift of KE points is achieved by evaluating the similarities on a subset of neighborhood patches around each KE point using the prior information regarding the directionality of specific facial action units acting on the respective KE point. The proposed method, as a result, improves emotion recognition (ER) accuracy and simultaneously reduces computational complexity of the ER system, as validated on multiple databases.

  4. Automatic classification of time-variable X-ray sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, andmore » other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.« less

  5. Change classification in SAR time series: a functional approach

    NASA Astrophysics Data System (ADS)

    Boldt, Markus; Thiele, Antje; Schulz, Karsten; Hinz, Stefan

    2017-10-01

    Change detection represents a broad field of research in SAR remote sensing, consisting of many different approaches. Besides the simple recognition of change areas, the analysis of type, category or class of the change areas is at least as important for creating a comprehensive result. Conventional strategies for change classification are based on supervised or unsupervised landuse / landcover classifications. The main drawback of such approaches is that the quality of the classification result directly depends on the selection of training and reference data. Additionally, supervised processing methods require an experienced operator who capably selects the training samples. This training step is not necessary when using unsupervised strategies, but nevertheless meaningful reference data must be available for identifying the resulting classes. Consequently, an experienced operator is indispensable. In this study, an innovative concept for the classification of changes in SAR time series data is proposed. Regarding the drawbacks of traditional strategies given above, it copes without using any training data. Moreover, the method can be applied by an operator, who does not have detailed knowledge about the available scenery yet. This knowledge is provided by the algorithm. The final step of the procedure, which main aspect is given by the iterative optimization of an initial class scheme with respect to the categorized change objects, is represented by the classification of these objects to the finally resulting classes. This assignment step is subject of this paper.

  6. Conditional High-Order Boltzmann Machines for Supervised Relation Learning.

    PubMed

    Huang, Yan; Wang, Wei; Wang, Liang; Tan, Tieniu

    2017-09-01

    Relation learning is a fundamental problem in many vision tasks. Recently, high-order Boltzmann machine and its variants have shown their great potentials in learning various types of data relation in a range of tasks. But most of these models are learned in an unsupervised way, i.e., without using relation class labels, which are not very discriminative for some challenging tasks, e.g., face verification. In this paper, with the goal to perform supervised relation learning, we introduce relation class labels into conventional high-order multiplicative interactions with pairwise input samples, and propose a conditional high-order Boltzmann Machine (CHBM), which can learn to classify the data relation in a binary classification way. To be able to deal with more complex data relation, we develop two improved variants of CHBM: 1) latent CHBM, which jointly performs relation feature learning and classification, by using a set of latent variables to block the pathway from pairwise input samples to output relation labels and 2) gated CHBM, which untangles factors of variation in data relation, by exploiting a set of latent variables to multiplicatively gate the classification of CHBM. To reduce the large number of model parameters generated by the multiplicative interactions, we approximately factorize high-order parameter tensors into multiple matrices. Then, we develop efficient supervised learning algorithms, by first pretraining the models using joint likelihood to provide good parameter initialization, and then finetuning them using conditional likelihood to enhance the discriminant ability. We apply the proposed models to a series of tasks including invariant recognition, face verification, and action similarity labeling. Experimental results demonstrate that by exploiting supervised relation labels, our models can greatly improve the performance.

  7. Semantic Shot Classification in Sports Video

    NASA Astrophysics Data System (ADS)

    Duan, Ling-Yu; Xu, Min; Tian, Qi

    2003-01-01

    In this paper, we present a unified framework for semantic shot classification in sports videos. Unlike previous approaches, which focus on clustering by aggregating shots with similar low-level features, the proposed scheme makes use of domain knowledge of a specific sport to perform a top-down video shot classification, including identification of video shot classes for each sport, and supervised learning and classification of the given sports video with low-level and middle-level features extracted from the sports video. It is observed that for each sport we can predefine a small number of semantic shot classes, about 5~10, which covers 90~95% of sports broadcasting video. With the supervised learning method, we can map the low-level features to middle-level semantic video shot attributes such as dominant object motion (a player), camera motion patterns, and court shape, etc. On the basis of the appropriate fusion of those middle-level shot classes, we classify video shots into the predefined video shot classes, each of which has a clear semantic meaning. The proposed method has been tested over 4 types of sports videos: tennis, basketball, volleyball and soccer. Good classification accuracy of 85~95% has been achieved. With correctly classified sports video shots, further structural and temporal analysis, such as event detection, video skimming, table of content, etc, will be greatly facilitated.

  8. Variable Star Signature Classification using Slotted Symbolic Markov Modeling

    NASA Astrophysics Data System (ADS)

    Johnston, K. B.; Peter, A. M.

    2017-01-01

    With the advent of digital astronomy, new benefits and new challenges have been presented to the modern day astronomer. No longer can the astronomer rely on manual processing, instead the profession as a whole has begun to adopt more advanced computational means. This paper focuses on the construction and application of a novel time-domain signature extraction methodology and the development of a supporting supervised pattern classification algorithm for the identification of variable stars. A methodology for the reduction of stellar variable observations (time-domain data) into a novel feature space representation is introduced. The methodology presented will be referred to as Slotted Symbolic Markov Modeling (SSMM) and has a number of advantages which will be demonstrated to be beneficial; specifically to the supervised classification of stellar variables. It will be shown that the methodology outperformed a baseline standard methodology on a standardized set of stellar light curve data. The performance on a set of data derived from the LINEAR dataset will also be shown.

  9. Variable Star Signature Classification using Slotted Symbolic Markov Modeling

    NASA Astrophysics Data System (ADS)

    Johnston, Kyle B.; Peter, Adrian M.

    2016-01-01

    With the advent of digital astronomy, new benefits and new challenges have been presented to the modern day astronomer. No longer can the astronomer rely on manual processing, instead the profession as a whole has begun to adopt more advanced computational means. Our research focuses on the construction and application of a novel time-domain signature extraction methodology and the development of a supporting supervised pattern classification algorithm for the identification of variable stars. A methodology for the reduction of stellar variable observations (time-domain data) into a novel feature space representation is introduced. The methodology presented will be referred to as Slotted Symbolic Markov Modeling (SSMM) and has a number of advantages which will be demonstrated to be beneficial; specifically to the supervised classification of stellar variables. It will be shown that the methodology outperformed a baseline standard methodology on a standardized set of stellar light curve data. The performance on a set of data derived from the LINEAR dataset will also be shown.

  10. MARTA GANs: Unsupervised Representation Learning for Remote Sensing Image Classification

    NASA Astrophysics Data System (ADS)

    Lin, Daoyu; Fu, Kun; Wang, Yang; Xu, Guangluan; Sun, Xian

    2017-11-01

    With the development of deep learning, supervised learning has frequently been adopted to classify remotely sensed images using convolutional networks (CNNs). However, due to the limited amount of labeled data available, supervised learning is often difficult to carry out. Therefore, we proposed an unsupervised model called multiple-layer feature-matching generative adversarial networks (MARTA GANs) to learn a representation using only unlabeled data. MARTA GANs consists of both a generative model $G$ and a discriminative model $D$. We treat $D$ as a feature extractor. To fit the complex properties of remote sensing data, we use a fusion layer to merge the mid-level and global features. $G$ can produce numerous images that are similar to the training data; therefore, $D$ can learn better representations of remotely sensed images using the training data provided by $G$. The classification results on two widely used remote sensing image databases show that the proposed method significantly improves the classification performance compared with other state-of-the-art methods.

  11. Image processing developments and applications for water quality monitoring and trophic state determination

    NASA Technical Reports Server (NTRS)

    Blackwell, R. J.

    1982-01-01

    Remote sensing data analysis of water quality monitoring is evaluated. Data anaysis and image processing techniques are applied to LANDSAT remote sensing data to produce an effective operational tool for lake water quality surveying and monitoring. Digital image processing and analysis techniques were designed, developed, tested, and applied to LANDSAT multispectral scanner (MSS) data and conventional surface acquired data. Utilization of these techniques facilitates the surveying and monitoring of large numbers of lakes in an operational manner. Supervised multispectral classification, when used in conjunction with surface acquired water quality indicators, is used to characterize water body trophic status. Unsupervised multispectral classification, when interpreted by lake scientists familiar with a specific water body, yields classifications of equal validity with supervised methods and in a more cost effective manner. Image data base technology is used to great advantage in characterizing other contributing effects to water quality. These effects include drainage basin configuration, terrain slope, soil, precipitation and land cover characteristics.

  12. Ensemble Semi-supervised Frame-work for Brain Magnetic Resonance Imaging Tissue Segmentation.

    PubMed

    Azmi, Reza; Pishgoo, Boshra; Norozi, Narges; Yeganeh, Samira

    2013-04-01

    Brain magnetic resonance images (MRIs) tissue segmentation is one of the most important parts of the clinical diagnostic tools. Pixel classification methods have been frequently used in the image segmentation with two supervised and unsupervised approaches up to now. Supervised segmentation methods lead to high accuracy, but they need a large amount of labeled data, which is hard, expensive, and slow to obtain. Moreover, they cannot use unlabeled data to train classifiers. On the other hand, unsupervised segmentation methods have no prior knowledge and lead to low level of performance. However, semi-supervised learning which uses a few labeled data together with a large amount of unlabeled data causes higher accuracy with less trouble. In this paper, we propose an ensemble semi-supervised frame-work for segmenting of brain magnetic resonance imaging (MRI) tissues that it has been used results of several semi-supervised classifiers simultaneously. Selecting appropriate classifiers has a significant role in the performance of this frame-work. Hence, in this paper, we present two semi-supervised algorithms expectation filtering maximization and MCo_Training that are improved versions of semi-supervised methods expectation maximization and Co_Training and increase segmentation accuracy. Afterward, we use these improved classifiers together with graph-based semi-supervised classifier as components of the ensemble frame-work. Experimental results show that performance of segmentation in this approach is higher than both supervised methods and the individual semi-supervised classifiers.

  13. A semi-supervised Support Vector Machine model for predicting the language outcomes following cochlear implantation based on pre-implant brain fMRI imaging.

    PubMed

    Tan, Lirong; Holland, Scott K; Deshpande, Aniruddha K; Chen, Ye; Choo, Daniel I; Lu, Long J

    2015-12-01

    We developed a machine learning model to predict whether or not a cochlear implant (CI) candidate will develop effective language skills within 2 years after the CI surgery by using the pre-implant brain fMRI data from the candidate. The language performance was measured 2 years after the CI surgery by the Clinical Evaluation of Language Fundamentals-Preschool, Second Edition (CELF-P2). Based on the CELF-P2 scores, the CI recipients were designated as either effective or ineffective CI users. For feature extraction from the fMRI data, we constructed contrast maps using the general linear model, and then utilized the Bag-of-Words (BoW) approach that we previously published to convert the contrast maps into feature vectors. We trained both supervised models and semi-supervised models to classify CI users as effective or ineffective. Compared with the conventional feature extraction approach, which used each single voxel as a feature, our BoW approach gave rise to much better performance for the classification of effective versus ineffective CI users. The semi-supervised model with the feature set extracted by the BoW approach from the contrast of speech versus silence achieved a leave-one-out cross-validation AUC as high as 0.97. Recursive feature elimination unexpectedly revealed that two features were sufficient to provide highly accurate classification of effective versus ineffective CI users based on our current dataset. We have validated the hypothesis that pre-implant cortical activation patterns revealed by fMRI during infancy correlate with language performance 2 years after cochlear implantation. The two brain regions highlighted by our classifier are potential biomarkers for the prediction of CI outcomes. Our study also demonstrated the superiority of the semi-supervised model over the supervised model. It is always worthwhile to try a semi-supervised model when unlabeled data are available.

  14. Classifying galaxy spectra at 0.5 < z < 1 with self-organizing maps

    NASA Astrophysics Data System (ADS)

    Rahmani, S.; Teimoorinia, H.; Barmby, P.

    2018-05-01

    The spectrum of a galaxy contains information about its physical properties. Classifying spectra using templates helps elucidate the nature of a galaxy's energy sources. In this paper, we investigate the use of self-organizing maps in classifying galaxy spectra against templates. We trained semi-supervised self-organizing map networks using a set of templates covering the wavelength range from far ultraviolet to near infrared. The trained networks were used to classify the spectra of a sample of 142 galaxies with 0.5 < z < 1 and the results compared to classifications performed using K-means clustering, a supervised neural network, and chi-squared minimization. Spectra corresponding to quiescent galaxies were more likely to be classified similarly by all methods while starburst spectra showed more variability. Compared to classification using chi-squared minimization or the supervised neural network, the galaxies classed together by the self-organizing map had more similar spectra. The class ordering provided by the one-dimensional self-organizing maps corresponds to an ordering in physical properties, a potentially important feature for the exploration of large datasets.

  15. Using Supervised Learning Techniques for Diagnosis of Dynamic Systems

    DTIC Science & Technology

    2002-05-04

    M. Gasca 2 , Juan A. Ortega2 Abstract. This paper describes an approach based on supervised diagnose systems faults are needed to maintain the systems...labelled, data will be used for this purpose [5] [6]. treated to add additional information about the running of system. In [7] the fundaments of the based ...8] proposes classification tool to the set of labelled and treated data. This a consistency- based approach with qualitative models. way, any

  16. Molecular activity prediction by means of supervised subspace projection based ensembles of classifiers.

    PubMed

    Cerruela García, G; García-Pedrajas, N; Luque Ruiz, I; Gómez-Nieto, M Á

    2018-03-01

    This paper proposes a method for molecular activity prediction in QSAR studies using ensembles of classifiers constructed by means of two supervised subspace projection methods, namely nonparametric discriminant analysis (NDA) and hybrid discriminant analysis (HDA). We studied the performance of the proposed ensembles compared to classical ensemble methods using four molecular datasets and eight different models for the representation of the molecular structure. Using several measures and statistical tests for classifier comparison, we observe that our proposal improves the classification results with respect to classical ensemble methods. Therefore, we show that ensembles constructed using supervised subspace projections offer an effective way of creating classifiers in cheminformatics.

  17. Image Classification Workflow Using Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Christoffersen, M. S.; Roser, M.; Valadez-Vergara, R.; Fernández-Vega, J. A.; Pierce, S. A.; Arora, R.

    2016-12-01

    Recent increases in the availability and quality of remote sensing datasets have fueled an increasing number of scientifically significant discoveries based on land use classification and land use change analysis. However, much of the software made to work with remote sensing data products, specifically multispectral images, is commercial and often prohibitively expensive. The free to use solutions that are currently available come bundled up as small parts of much larger programs that are very susceptible to bugs and difficult to install and configure. What is needed is a compact, easy to use set of tools to perform land use analysis on multispectral images. To address this need, we have developed software using the Python programming language with the sole function of land use classification and land use change analysis. We chose Python to develop our software because it is relatively readable, has a large body of relevant third party libraries such as GDAL and Spectral Python, and is free to install and use on Windows, Linux, and Macintosh operating systems. In order to test our classification software, we performed a K-means unsupervised classification, Gaussian Maximum Likelihood supervised classification, and a Mahalanobis Distance based supervised classification. The images used for testing were three Landsat rasters of Austin, Texas with a spatial resolution of 60 meters for the years of 1984 and 1999, and 30 meters for the year 2015. The testing dataset was easily downloaded using the Earth Explorer application produced by the USGS. The software should be able to perform classification based on any set of multispectral rasters with little to no modification. Our software makes the ease of land use classification using commercial software available without an expensive license.

  18. Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks.

    PubMed

    Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R; Nguyen, Tuan N; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T

    2017-01-01

    This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively.

  19. Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks

    PubMed Central

    Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R.; Nguyen, Tuan N.; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T.

    2017-01-01

    This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively. PMID:28326009

  20. Space Object Classification Using Fused Features of Time Series Data

    NASA Astrophysics Data System (ADS)

    Jia, B.; Pham, K. D.; Blasch, E.; Shen, D.; Wang, Z.; Chen, G.

    In this paper, a fused feature vector consisting of raw time series and texture feature information is proposed for space object classification. The time series data includes historical orbit trajectories and asteroid light curves. The texture feature is derived from recurrence plots using Gabor filters for both unsupervised learning and supervised learning algorithms. The simulation results show that the classification algorithms using the fused feature vector achieve better performance than those using raw time series or texture features only.

  1. A Robust Geometric Model for Argument Classification

    NASA Astrophysics Data System (ADS)

    Giannone, Cristina; Croce, Danilo; Basili, Roberto; de Cao, Diego

    Argument classification is the task of assigning semantic roles to syntactic structures in natural language sentences. Supervised learning techniques for frame semantics have been recently shown to benefit from rich sets of syntactic features. However argument classification is also highly dependent on the semantics of the involved lexicals. Empirical studies have shown that domain dependence of lexical information causes large performance drops in outside domain tests. In this paper a distributional approach is proposed to improve the robustness of the learning model against out-of-domain lexical phenomena.

  2. Rapid classification of landsat TM imagery for phase 1 stratification using the automated NDVI threshold supervised classification (ANTSC) methodology

    Treesearch

    William H. Cooke; Dennis M. Jacobs

    2002-01-01

    FIA annual inventories require rapid updating of pixel-based Phase 1 estimates. Scientists at the Southern Research Station are developing an automated methodology that uses a Normalized Difference Vegetation Index (NDVI) for identifying and eliminating problem FIA plots from the analysis. Problem plots are those that have questionable land useiland cover information....

  3. Supervised classification of continental shelf sediment off western Donegal, Ireland

    NASA Astrophysics Data System (ADS)

    Monteys, X.; Craven, K.; McCarron, S. G.

    2017-12-01

    Managing human impacts on marine ecosystems requires natural regions to be identified and mapped over a range of hierarchically nested scales. In recent years (2000-present) the Irish National Seabed Survey (INSS) and Integrated Mapping for the Sustainable Development of Ireland's Marine Resources programme (INFOMAR) (Geological Survey Ireland and Marine Institute collaborations) has provided unprecedented quantities of high quality data on Ireland's offshore territories. The increasing availability of large, detailed digital representations of these environments requires the application of objective and quantitative analyses. This study presents results of a new approach for sea floor sediment mapping based on an integrated analysis of INFOMAR multibeam bathymetric data (including the derivatives of slope and relative position), backscatter data (including derivatives of angular response analysis) and sediment groundtruthing over the continental shelf, west of Donegal. It applies a Geographic-Object-Based Image Analysis software package to provide a supervised classification of the surface sediment. This approach can provide a statistically robust, high resolution classification of the seafloor. Initial results display a differentiation of sediment classes and a reduction in artefacts from previously applied methodologies. These results indicate a methodology that could be used during physical habitat mapping and classification of marine environments.

  4. Purification of Training Samples Based on Spectral Feature and Superpixel Segmentation

    NASA Astrophysics Data System (ADS)

    Guan, X.; Qi, W.; He, J.; Wen, Q.; Chen, T.; Wang, Z.

    2018-04-01

    Remote sensing image classification is an effective way to extract information from large volumes of high-spatial resolution remote sensing images. Generally, supervised image classification relies on abundant and high-precision training data, which is often manually interpreted by human experts to provide ground truth for training and evaluating the performance of the classifier. Remote sensing enterprises accumulated lots of manually interpreted products from early lower-spatial resolution remote sensing images by executing their routine research and business programs. However, these manually interpreted products may not match the very high resolution (VHR) image properly because of different dates or spatial resolution of both data, thus, hindering suitability of manually interpreted products in training classification models, or small coverage area of these manually interpreted products. We also face similar problems in our laboratory in 21st Century Aerospace Technology Co. Ltd (short for 21AT). In this work, we propose a method to purify the interpreted product to match newly available VHRI data and provide the best training data for supervised image classifiers in VHR image classification. And results indicate that our proposed method can efficiently purify the input data for future machine learning use.

  5. Supervised neural network classification of pre-sliced cooked pork ham images using quaternionic singular values.

    PubMed

    Valous, Nektarios A; Mendoza, Fernando; Sun, Da-Wen; Allen, Paul

    2010-03-01

    The quaternionic singular value decomposition is a technique to decompose a quaternion matrix (representation of a colour image) into quaternion singular vector and singular value component matrices exposing useful properties. The objective of this study was to use a small portion of uncorrelated singular values, as robust features for the classification of sliced pork ham images, using a supervised artificial neural network classifier. Images were acquired from four qualities of sliced cooked pork ham typically consumed in Ireland (90 slices per quality), having similar appearances. Mahalanobis distances and Pearson product moment correlations were used for feature selection. Six highly discriminating features were used as input to train the neural network. An adaptive feedforward multilayer perceptron classifier was employed to obtain a suitable mapping from the input dataset. The overall correct classification performance for the training, validation and test set were 90.3%, 94.4%, and 86.1%, respectively. The results confirm that the classification performance was satisfactory. Extracting the most informative features led to the recognition of a set of different but visually quite similar textural patterns based on quaternionic singular values. Copyright 2009 Elsevier Ltd. All rights reserved.

  6. Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments.

    PubMed

    Han, Wenjing; Coutinho, Eduardo; Ruan, Huabin; Li, Haifeng; Schuller, Björn; Yu, Xiaojie; Zhu, Xuan

    2016-01-01

    Coping with scarcity of labeled data is a common problem in sound classification tasks. Approaches for classifying sounds are commonly based on supervised learning algorithms, which require labeled data which is often scarce and leads to models that do not generalize well. In this paper, we make an efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training. The proposed method pre-processes the instances that are ready for labeling by calculating their classifier confidence scores, and then delivers the candidates with lower scores to human annotators, and those with high scores are automatically labeled by the machine. We demonstrate the feasibility and efficacy of this method in two practical scenarios: pool-based and stream-based processing. Extensive experimental results indicate that our approach requires significantly less labeled instances to reach the same performance in both scenarios compared to Passive Learning, Active Learning and Self-Training. A reduction of 52.2% in human labeled instances is achieved in both of the pool-based and stream-based scenarios on a sound classification task considering 16,930 sound instances.

  7. Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments

    PubMed Central

    Han, Wenjing; Coutinho, Eduardo; Li, Haifeng; Schuller, Björn; Yu, Xiaojie; Zhu, Xuan

    2016-01-01

    Coping with scarcity of labeled data is a common problem in sound classification tasks. Approaches for classifying sounds are commonly based on supervised learning algorithms, which require labeled data which is often scarce and leads to models that do not generalize well. In this paper, we make an efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training. The proposed method pre-processes the instances that are ready for labeling by calculating their classifier confidence scores, and then delivers the candidates with lower scores to human annotators, and those with high scores are automatically labeled by the machine. We demonstrate the feasibility and efficacy of this method in two practical scenarios: pool-based and stream-based processing. Extensive experimental results indicate that our approach requires significantly less labeled instances to reach the same performance in both scenarios compared to Passive Learning, Active Learning and Self-Training. A reduction of 52.2% in human labeled instances is achieved in both of the pool-based and stream-based scenarios on a sound classification task considering 16,930 sound instances. PMID:27627768

  8. Analysis of the changes in the tarcrete layer on the desert surface of Kuwait using satellite imagery and cell-based modeling

    NASA Astrophysics Data System (ADS)

    Al-Doasari, Ahmad E.

    The 1991 Gulf War caused massive environmental damage in Kuwait. Deposition of oil and soot droplets from hundreds of burning oil-wells created a layer of tarcrete on the desert surface covering over 900 km2. This research investigates the spatial change in the tarcrete extent from 1991 to 1998 using Landsat Thematic Mapper (TM) imagery and statistical modeling techniques. The pixel structure of TM data allows the spatial analysis of the change in tarcrete extent to be conducted at the pixel (cell) level within a geographical information system (GIS). There are two components to this research. The first is a comparison of three remote sensing classification techniques used to map the tarcrete layer. The second is a spatial-temporal analysis and simulation of tarcrete changes through time. The analysis focuses on an area of 389 km2 located south of the Al-Burgan oil field. Five TM images acquired in 1991, 1993, 1994, 1995, and 1998 were geometrically and atmospherically corrected. These images were classified into six classes: oil lakes; heavy, intermediate, light, and traces of tarcrete; and sand. The classification methods tested were unsupervised, supervised, and neural network supervised (fuzzy ARTMAP). Field data of tarcrete characteristics were collected to support the classification process and to evaluate the classification accuracies. Overall, the neural network method is more accurate (60 percent) than the other two methods; both the unsupervised and the supervised classification accuracy assessments resulted in 46 percent accuracy. The five classifications were used in a lagged autologistic model to analyze the spatial changes of the tarcrete through time. The autologistic model correctly identified overall tarcrete contraction between 1991--1993 and 1995--1998. However, tarcrete contraction between 1993--1994 and 1994--1995 was less well marked, in part because of classification errors in the maps from these time periods. Initial simulations of tarcrete contraction with a cellular automaton model were not very successful. However, more accurate classifications could improve the simulations. This study illustrates how an empirical investigation using satellite images, field data, GIS, and spatial statistics can simulate dynamic land-cover change through the use of a discrete statistical and cellular automaton model.

  9. Development of remote sensing based site specific weed management for Midwest mint production

    NASA Astrophysics Data System (ADS)

    Gumz, Mary Saumur Paulson

    Peppermint and spearmint are high value essential oil crops in Indiana, Michigan, and Wisconsin. Although the mints are profitable alternatives to corn and soybeans, mint production efficiency must improve in order to allow industry survival against foreign produced oils and synthetic flavorings. Weed control is the major input cost in mint production and tools to increase efficiency are necessary. Remote sensing-based site-specific weed management offers potential for decreasing weed control costs through simplified weed detection and control from accurate site specific weed and herbicide application maps. This research showed the practicability of remote sensing for weed detection in the mints. Research was designed to compare spectral response curves of field grown mint and weeds, and to use these data to develop spectral vegetation indices for automated weed detection. Viability of remote sensing in mint production was established using unsupervised classification, supervised classification, handheld spectroradiometer readings and spectral vegetation indices (SVIs). Unsupervised classification of multispectral images of peppermint production fields generated crop health maps with 92 and 67% accuracy in meadow and row peppermint, respectively. Supervised classification of multispectral images identified weed infestations with 97% and 85% accuracy for meadow and row peppermint, respectively. Supervised classification showed that peppermint was spectrally distinct from weeds, but the accuracy of these measures was dependent on extensive ground referencing which is impractical and too costly for on-farm use. Handheld spectroradiometer measurements of peppermint, spearmint, and several weeds and crop and weed mixtures were taken over three years from greenhouse grown plants, replicated field plots, and production peppermint and spearmint fields. Results showed that mints have greater near infrared (NIR) and lower green reflectance and a steeper red edge slope than all weed species. These distinguishing characteristics were combined to develop narrow band and broadband spectral vegetation indices (SVIs, ratios of NIR/green reflectance), that were effective in differentiating mint from key weed species. Hyperspectral images of production peppermint and spearmint fields were then classified using SVI-based classification. Narrowband and broadband SVIs classified early season peppermint and spearmint with 64 to 100% accuracy compared to 79 to 100% accuracy for supervised classification of multispectral images of the same fields. Broadband SVIs have potential for use as an automated spectral indicator for weeds in the mints since they require minimal ground referencing and can be calculated from multispectral imagery which is cheaper and more readily available than hyperspectral imagery. This research will allow growers to implement remote sensing based site specific weed management in mint resulting in reduced grower input costs and reduced herbicide entry into the environment and will have applications in other specialty and meadow crops.

  10. Applications of remote sensing, volume 3

    NASA Technical Reports Server (NTRS)

    Landgrebe, D. A. (Principal Investigator)

    1977-01-01

    The author has identified the following significant results. Of the four change detection techniques (post classification comparison, delta data, spectral/temporal, and layered spectral temporal), the post classification comparison was selected for further development. This was based upon test performances of the four change detection method, straightforwardness of the procedures, and the output products desired. A standardized modified, supervised classification procedure for analyzing the Texas coastal zone data was compiled. This procedure was developed in order that all quadrangles in the study are would be classified using similar analysis techniques to allow for meaningful comparisons and evaluations of the classifications.

  11. Optimizing area under the ROC curve using semi-supervised learning

    PubMed Central

    Wang, Shijun; Li, Diana; Petrick, Nicholas; Sahiner, Berkman; Linguraru, Marius George; Summers, Ronald M.

    2014-01-01

    Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multidimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.1 PMID:25395692

  12. Optimizing area under the ROC curve using semi-supervised learning.

    PubMed

    Wang, Shijun; Li, Diana; Petrick, Nicholas; Sahiner, Berkman; Linguraru, Marius George; Summers, Ronald M

    2015-01-01

    Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multidimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.

  13. Feature Fusion of ICP-AES, UV-Vis and FT-MIR for Origin Traceability of Boletus edulis Mushrooms in Combination with Chemometrics.

    PubMed

    Qi, Luming; Liu, Honggao; Li, Jieqing; Li, Tao; Wang, Yuanzhong

    2018-01-15

    Origin traceability is an important step to control the nutritional and pharmacological quality of food products. Boletus edulis mushroom is a well-known food resource in the world. Its nutritional and medicinal properties are drastically varied depending on geographical origins. In this study, three sensor systems (inductively coupled plasma atomic emission spectrophotometer (ICP-AES), ultraviolet-visible (UV-Vis) and Fourier transform mid-infrared spectroscopy (FT-MIR)) were applied for the origin traceability of 192 mushroom samples (caps and stipes) in combination with chemometrics. The difference between cap and stipe was clearly illustrated based on a single sensor technique, respectively. Feature variables from three instruments were used for origin traceability. Two supervised classification methods, partial least square discriminant analysis (FLS-DA) and grid search support vector machine (GS-SVM), were applied to develop mathematical models. Two steps (internal cross-validation and external prediction for unknown samples) were used to evaluate the performance of a classification model. The result is satisfactory with high accuracies ranging from 90.625% to 100%. These models also have an excellent generalization ability with the optimal parameters. Based on the combination of three sensory systems, our study provides a multi-sensory and comprehensive origin traceability of B. edulis mushrooms.

  14. Feature Fusion of ICP-AES, UV-Vis and FT-MIR for Origin Traceability of Boletus edulis Mushrooms in Combination with Chemometrics

    PubMed Central

    Qi, Luming; Liu, Honggao; Li, Jieqing; Li, Tao

    2018-01-01

    Origin traceability is an important step to control the nutritional and pharmacological quality of food products. Boletus edulis mushroom is a well-known food resource in the world. Its nutritional and medicinal properties are drastically varied depending on geographical origins. In this study, three sensor systems (inductively coupled plasma atomic emission spectrophotometer (ICP-AES), ultraviolet-visible (UV-Vis) and Fourier transform mid-infrared spectroscopy (FT-MIR)) were applied for the origin traceability of 184 mushroom samples (caps and stipes) in combination with chemometrics. The difference between cap and stipe was clearly illustrated based on a single sensor technique, respectively. Feature variables from three instruments were used for origin traceability. Two supervised classification methods, partial least square discriminant analysis (FLS-DA) and grid search support vector machine (GS-SVM), were applied to develop mathematical models. Two steps (internal cross-validation and external prediction for unknown samples) were used to evaluate the performance of a classification model. The result is satisfactory with high accuracies ranging from 90.625% to 100%. These models also have an excellent generalization ability with the optimal parameters. Based on the combination of three sensory systems, our study provides a multi-sensory and comprehensive origin traceability of B. edulis mushrooms. PMID:29342969

  15. A comparison of different chemometrics approaches for the robust classification of electronic nose data.

    PubMed

    Gromski, Piotr S; Correa, Elon; Vaughan, Andrew A; Wedge, David C; Turner, Michael L; Goodacre, Royston

    2014-11-01

    Accurate detection of certain chemical vapours is important, as these may be diagnostic for the presence of weapons, drugs of misuse or disease. In order to achieve this, chemical sensors could be deployed remotely. However, the readout from such sensors is a multivariate pattern, and this needs to be interpreted robustly using powerful supervised learning methods. Therefore, in this study, we compared the classification accuracy of four pattern recognition algorithms which include linear discriminant analysis (LDA), partial least squares-discriminant analysis (PLS-DA), random forests (RF) and support vector machines (SVM) which employed four different kernels. For this purpose, we have used electronic nose (e-nose) sensor data (Wedge et al., Sensors Actuators B Chem 143:365-372, 2009). In order to allow direct comparison between our four different algorithms, we employed two model validation procedures based on either 10-fold cross-validation or bootstrapping. The results show that LDA (91.56% accuracy) and SVM with a polynomial kernel (91.66% accuracy) were very effective at analysing these e-nose data. These two models gave superior prediction accuracy, sensitivity and specificity in comparison to the other techniques employed. With respect to the e-nose sensor data studied here, our findings recommend that SVM with a polynomial kernel should be favoured as a classification method over the other statistical models that we assessed. SVM with non-linear kernels have the advantage that they can be used for classifying non-linear as well as linear mapping from analytical data space to multi-group classifications and would thus be a suitable algorithm for the analysis of most e-nose sensor data.

  16. Evaluation of Machine Learning Algorithms for Classification of Primary Biological Aerosol using a new UV-LIF spectrometer

    NASA Astrophysics Data System (ADS)

    Ruske, S. T.; Topping, D. O.; Foot, V. E.; Kaye, P. H.; Stanley, W. R.; Morse, A. P.; Crawford, I.; Gallagher, M. W.

    2016-12-01

    Characterisation of bio-aerosols has important implications within Environment and Public Health sectors. Recent developments in Ultra-Violet Light Induced Fluorescence (UV-LIF) detectors such as the Wideband Integrated bio-aerosol Spectrometer (WIBS) and the newly introduced Multiparameter bio-aerosol Spectrometer (MBS) has allowed for the real time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal Spores and pollen. This new generation of instruments has enabled ever-larger data sets to be compiled with the aim of studying more complex environments, yet the algorithms used for specie classification remain largely invalidated. It is therefore imperative that we validate the performance of different algorithms that can be used for the task of classification, which is the focus of this study. For unsupervised learning we test Hierarchical Agglomerative Clustering with various different linkages. For supervised learning, ten methods were tested; including decision trees, ensemble methods: Random Forests, Gradient Boosting and AdaBoost; two implementations for support vector machines: libsvm and liblinear; Gaussian methods: Gaussian naïve Bayesian, quadratic and linear discriminant analysis and finally the k-nearest neighbours algorithm. The methods were applied to two different data sets measured using a new Multiparameter bio-aerosol Spectrometer. We find that clustering, in general, performs slightly worse than the supervised learning methods correctly classifying, at best, only 72.7 and 91.1 percent for the two data sets. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 88.1 and 97.8 percent of the testing data respectively across the two data sets. We discuss the wider relevance of these results with regards to challenging existing classification in real-world environments.

  17. Hydrometeor classification through statistical clustering of polarimetric radar measurements: a semi-supervised approach

    NASA Astrophysics Data System (ADS)

    Besic, Nikola; Ventura, Jordi Figueras i.; Grazioli, Jacopo; Gabella, Marco; Germann, Urs; Berne, Alexis

    2016-09-01

    Polarimetric radar-based hydrometeor classification is the procedure of identifying different types of hydrometeors by exploiting polarimetric radar observations. The main drawback of the existing supervised classification methods, mostly based on fuzzy logic, is a significant dependency on a presumed electromagnetic behaviour of different hydrometeor types. Namely, the results of the classification largely rely upon the quality of scattering simulations. When it comes to the unsupervised approach, it lacks the constraints related to the hydrometeor microphysics. The idea of the proposed method is to compensate for these drawbacks by combining the two approaches in a way that microphysical hypotheses can, to a degree, adjust the content of the classes obtained statistically from the observations. This is done by means of an iterative approach, performed offline, which, in a statistical framework, examines clustered representative polarimetric observations by comparing them to the presumed polarimetric properties of each hydrometeor class. Aside from comparing, a routine alters the content of clusters by encouraging further statistical clustering in case of non-identification. By merging all identified clusters, the multi-dimensional polarimetric signatures of various hydrometeor types are obtained for each of the studied representative datasets, i.e. for each radar system of interest. These are depicted by sets of centroids which are then employed in operational labelling of different hydrometeors. The method has been applied on three C-band datasets, each acquired by different operational radar from the MeteoSwiss Rad4Alp network, as well as on two X-band datasets acquired by two research mobile radars. The results are discussed through a comparative analysis which includes a corresponding supervised and unsupervised approach, emphasising the operational potential of the proposed method.

  18. A comparison of supervised classification methods for the prediction of substrate type using multibeam acoustic and legacy grain-size data.

    PubMed

    Stephens, David; Diesing, Markus

    2014-01-01

    Detailed seabed substrate maps are increasingly in demand for effective planning and management of marine ecosystems and resources. It has become common to use remotely sensed multibeam echosounder data in the form of bathymetry and acoustic backscatter in conjunction with ground-truth sampling data to inform the mapping of seabed substrates. Whilst, until recently, such data sets have typically been classified by expert interpretation, it is now obvious that more objective, faster and repeatable methods of seabed classification are required. This study compares the performances of a range of supervised classification techniques for predicting substrate type from multibeam echosounder data. The study area is located in the North Sea, off the north-east coast of England. A total of 258 ground-truth samples were classified into four substrate classes. Multibeam bathymetry and backscatter data, and a range of secondary features derived from these datasets were used in this study. Six supervised classification techniques were tested: Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forest and Naive Bayes. Each classifier was trained multiple times using different input features, including i) the two primary features of bathymetry and backscatter, ii) a subset of the features chosen by a feature selection process and iii) all of the input features. The predictive performances of the models were validated using a separate test set of ground-truth samples. The statistical significance of model performances relative to a simple baseline model (Nearest Neighbour predictions on bathymetry and backscatter) were tested to assess the benefits of using more sophisticated approaches. The best performing models were tree based methods and Naive Bayes which achieved accuracies of around 0.8 and kappa coefficients of up to 0.5 on the test set. The models that used all input features didn't generally perform well, highlighting the need for some means of feature selection.

  19. Ensemble Semi-supervised Frame-work for Brain Magnetic Resonance Imaging Tissue Segmentation

    PubMed Central

    Azmi, Reza; Pishgoo, Boshra; Norozi, Narges; Yeganeh, Samira

    2013-01-01

    Brain magnetic resonance images (MRIs) tissue segmentation is one of the most important parts of the clinical diagnostic tools. Pixel classification methods have been frequently used in the image segmentation with two supervised and unsupervised approaches up to now. Supervised segmentation methods lead to high accuracy, but they need a large amount of labeled data, which is hard, expensive, and slow to obtain. Moreover, they cannot use unlabeled data to train classifiers. On the other hand, unsupervised segmentation methods have no prior knowledge and lead to low level of performance. However, semi-supervised learning which uses a few labeled data together with a large amount of unlabeled data causes higher accuracy with less trouble. In this paper, we propose an ensemble semi-supervised frame-work for segmenting of brain magnetic resonance imaging (MRI) tissues that it has been used results of several semi-supervised classifiers simultaneously. Selecting appropriate classifiers has a significant role in the performance of this frame-work. Hence, in this paper, we present two semi-supervised algorithms expectation filtering maximization and MCo_Training that are improved versions of semi-supervised methods expectation maximization and Co_Training and increase segmentation accuracy. Afterward, we use these improved classifiers together with graph-based semi-supervised classifier as components of the ensemble frame-work. Experimental results show that performance of segmentation in this approach is higher than both supervised methods and the individual semi-supervised classifiers. PMID:24098863

  20. A systematic comparison of different object-based classification techniques using high spatial resolution imagery in agricultural environments

    NASA Astrophysics Data System (ADS)

    Li, Manchun; Ma, Lei; Blaschke, Thomas; Cheng, Liang; Tiede, Dirk

    2016-07-01

    Geographic Object-Based Image Analysis (GEOBIA) is becoming more prevalent in remote sensing classification, especially for high-resolution imagery. Many supervised classification approaches are applied to objects rather than pixels, and several studies have been conducted to evaluate the performance of such supervised classification techniques in GEOBIA. However, these studies did not systematically investigate all relevant factors affecting the classification (segmentation scale, training set size, feature selection and mixed objects). In this study, statistical methods and visual inspection were used to compare these factors systematically in two agricultural case studies in China. The results indicate that Random Forest (RF) and Support Vector Machines (SVM) are highly suitable for GEOBIA classifications in agricultural areas and confirm the expected general tendency, namely that the overall accuracies decline with increasing segmentation scale. All other investigated methods except for RF and SVM are more prone to obtain a lower accuracy due to the broken objects at fine scales. In contrast to some previous studies, the RF classifiers yielded the best results and the k-nearest neighbor classifier were the worst results, in most cases. Likewise, the RF and Decision Tree classifiers are the most robust with or without feature selection. The results of training sample analyses indicated that the RF and adaboost. M1 possess a superior generalization capability, except when dealing with small training sample sizes. Furthermore, the classification accuracies were directly related to the homogeneity/heterogeneity of the segmented objects for all classifiers. Finally, it was suggested that RF should be considered in most cases for agricultural mapping.

  1. Land Cover Classification in a Complex Urban-Rural Landscape with Quickbird Imagery

    PubMed Central

    Moran, Emilio Federico.

    2010-01-01

    High spatial resolution images have been increasingly used for urban land use/cover classification, but the high spectral variation within the same land cover, the spectral confusion among different land covers, and the shadow problem often lead to poor classification performance based on the traditional per-pixel spectral-based classification methods. This paper explores approaches to improve urban land cover classification with Quickbird imagery. Traditional per-pixel spectral-based supervised classification, incorporation of textural images and multispectral images, spectral-spatial classifier, and segmentation-based classification are examined in a relatively new developing urban landscape, Lucas do Rio Verde in Mato Grosso State, Brazil. This research shows that use of spatial information during the image classification procedure, either through the integrated use of textural and spectral images or through the use of segmentation-based classification method, can significantly improve land cover classification performance. PMID:21643433

  2. Supervised classification in the presence of misclassified training data: a Monte Carlo simulation study in the three group case.

    PubMed

    Bolin, Jocelyn Holden; Finch, W Holmes

    2014-01-01

    Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification conditions in the three group case will be simulated and results will be presented in terms of overall as well as subgroup classification accuracy. Results show decreased classification accuracy as sample size, group separation and group size ratio decrease and as misclassification percentage increases with random forests demonstrating the highest accuracy across conditions.

  3. A new tool for supervised classification of satellite images available on web servers: Google Maps as a case study

    NASA Astrophysics Data System (ADS)

    García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun

    2016-10-01

    This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.

  4. Patient-specific semi-supervised learning for postoperative brain tumor segmentation.

    PubMed

    Meier, Raphael; Bauer, Stefan; Slotboom, Johannes; Wiest, Roland; Reyes, Mauricio

    2014-01-01

    In contrast to preoperative brain tumor segmentation, the problem of postoperative brain tumor segmentation has been rarely approached so far. We present a fully-automatic segmentation method using multimodal magnetic resonance image data and patient-specific semi-supervised learning. The idea behind our semi-supervised approach is to effectively fuse information from both pre- and postoperative image data of the same patient to improve segmentation of the postoperative image. We pose image segmentation as a classification problem and solve it by adopting a semi-supervised decision forest. The method is evaluated on a cohort of 10 high-grade glioma patients, with segmentation performance and computation time comparable or superior to a state-of-the-art brain tumor segmentation method. Moreover, our results confirm that the inclusion of preoperative MR images lead to a better performance regarding postoperative brain tumor segmentation.

  5. Rapid Classification of Landsat TM Imagery for Phase 1 Stratification Using the Automated NDVI Threshold Supervised Classification (ANTSC) Methodology

    Treesearch

    William H. Cooke; Dennis M. Jacobs

    2005-01-01

    FIA annual inventories require rapid updating of pixel-based Phase 1 estimates. Scientists at the Southern Research Station are developing an automated methodology that uses a Normalized Difference Vegetation Index (NDVI) for identifying and eliminating problem FIA plots from the analysis. Problem plots are those that have questionable land use/land cover information....

  6. Project DIPOLE WEST - Multiburst Environment (Non-Simultaneous Detonations)

    DTIC Science & Technology

    1976-09-01

    PAGE (WIMn Dat• Bntered) Unclassified SECURITY CLASSIFICATION OP’ THIS PAGE(ft• Data .Bnt......, 20. Abstract Purpose of the series was to obtain...HULL hydrodynamic air blast code show good correlation. UNCLASSIFIED SECUFUTY CLASSIFICATION OF THIS PA.GE(When Date Bntered) • • 1...supervision. Contributions were also made by Dr. John Dewey, University of Victoria; Mr. A. P. R. Lambert, Canadian General Electric; Mr. Charles Needham

  7. Couple Graph Based Label Propagation Method for Hyperspectral Remote Sensing Data Classification

    NASA Astrophysics Data System (ADS)

    Wang, X. P.; Hu, Y.; Chen, J.

    2018-04-01

    Graph based semi-supervised classification method are widely used for hyperspectral image classification. We present a couple graph based label propagation method, which contains both the adjacency graph and the similar graph. We propose to construct the similar graph by using the similar probability, which utilize the label similarity among examples probably. The adjacency graph was utilized by a common manifold learning method, which has effective improve the classification accuracy of hyperspectral data. The experiments indicate that the couple graph Laplacian which unite both the adjacency graph and the similar graph, produce superior classification results than other manifold Learning based graph Laplacian and Sparse representation based graph Laplacian in label propagation framework.

  8. Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification

    PubMed Central

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2015-01-01

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice. PMID:25823003

  9. Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification.

    PubMed

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2015-01-01

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.

  10. Fatigue level estimation of monetary bills based on frequency band acoustic signals with feature selection by supervised SOM

    NASA Astrophysics Data System (ADS)

    Teranishi, Masaru; Omatu, Sigeru; Kosaka, Toshihisa

    Fatigued monetary bills adversely affect the daily operation of automated teller machines (ATMs). In order to make the classification of fatigued bills more efficient, the development of an automatic fatigued monetary bill classification method is desirable. We propose a new method by which to estimate the fatigue level of monetary bills from the feature-selected frequency band acoustic energy pattern of banking machines. By using a supervised self-organizing map (SOM), we effectively estimate the fatigue level using only the feature-selected frequency band acoustic energy pattern. Furthermore, the feature-selected frequency band acoustic energy pattern improves the estimation accuracy of the fatigue level of monetary bills by adding frequency domain information to the acoustic energy pattern. The experimental results with real monetary bill samples reveal the effectiveness of the proposed method.

  11. Supervision of Ethylene Propylene Diene M-Class (EPDM) Rubber Vulcanization and Recovery Processes Using Attenuated Total Reflection Fourier Transform Infrared (ATR FT-IR) Spectroscopy and Multivariate Analysis.

    PubMed

    Riba Ruiz, Jordi-Roger; Canals, Trini; Cantero, Rosa

    2017-01-01

    Ethylene propylene diene monomer (EPDM) rubber is widely used in a diverse type of applications, such as the automotive, industrial and construction sectors among others. Due to its appealing features, the consumption of vulcanized EPDM rubber is growing significantly. However, environmental issues are forcing the application of devulcanization processes to facilitate recovery, which has led rubber manufacturers to implement strict quality controls. Consequently, it is important to develop methods for supervising the vulcanizing and recovery processes of such products. This paper deals with the supervision process of EPDM compounds by means of Fourier transform mid-infrared (FT-IR) spectroscopy and suitable multivariate statistical methods. An expedited and nondestructive classification approach was applied to a sufficient number of EPDM samples with different applied processes, that is, with and without application of vulcanizing agents, vulcanized samples, and microwave treated samples. First the FT-IR spectra of the samples is acquired and next it is processed by applying suitable feature extraction methods, i.e., principal component analysis and canonical variate analysis to obtain the latent variables to be used for classifying test EPDM samples. Finally, the k nearest neighbor algorithm was used in the classification stage. Experimental results prove the accuracy of the proposed method and the potential of FT-IR spectroscopy in this area, since the classification accuracy can be as high as 100%.

  12. Classification and analysis of the Rudaki's Area

    NASA Astrophysics Data System (ADS)

    Zambon, F.; De sanctis, M.; Capaccioni, F.; Filacchione, G.; Carli, C.; Ammannito, E.; Frigeri, A.

    2011-12-01

    During the first two MESSENGER flybys the Mercury Dual Imaging System (MDIS) has mapped 90% of the Mercury's surface. An effective way to study the different terrain on planetary surfaces is to apply classification methods. These are based on clustering algorithms and they can be divided in two categories: unsupervised and supervised. The unsupervised classifiers do not require the analyst feedback and the algorithm automatically organizes pixels values into classes. In the supervised method, instead, the analyst must choose the "training area" that define the pixels value of a given class. We applied an unsupervised classifier, ISODATA, to the WAC filter images of the Rudaki's area where several kind of terrain have been identified showing differences in albedo, topography and crater density. ISODATA classifier divides this region in four classes: 1) shadow regions, 2) rough regions, 3) smooth plane, 4) highest reflectance area. ISODATA can not distinguish the high albedo regions from highly reflective illuminated edge of the craters, however the algorithm identify four classes that can be considered different units mainly on the basis of their reflectances at the various wavelengths. Is not possible, instead, to extrapolate compositional information because of the absence of clear spectral features. An additional analysis was made using ISODATA to choose the "training area" for further supervised classifications. These approach would allow, for example, to separate more accurately the edge of the craters from the high reflectance areas and the low reflectance regions from the shadow areas.

  13. Mapping wetland and forest landscapes in Siberia with Landsat data

    NASA Astrophysics Data System (ADS)

    Maksyutov, Shamil; Kleptsova, Irina; Glagolev, Mikhail; Sedykh, Vladimir; Kuzmenko, Ekaterina; Silaev, Anton; Frolov, Alexander; Nikolaeva, Svetlana; Fedorov, Alexander

    2014-05-01

    Landsat data availability provides opportunity for improving the knowledge of the Siberian ecosystems necessary for quantifying the response of the regional carbon cycle to the climate change. We developed a new wetland map based on Landsat data for whole West Siberia aiming at scaling up the methane emission observations. Mid-summer Landsat scenes were used in supervised classification method, based on ground truth data obtained during multiple field surveys. The method allows distinguishing following wetland types: pine-dwarf shrubs-sphagnum bogs or ryams, ridge-hollows complexes, shallow-water complexes, sedge-sphagnum poor fens, herbaceous-sphagnum poor fens, sedge-(moss) poor fens and fens, wooded swamps or sogra, palsa complexes. In our estimates wetlands cover 36% of the taiga area. Total methane emission from WS taiga mires is estimated as 3.6 TgC/yr,which is 77% larger as compared to the earlier estimate based on partial Landsat mapping combined with low resolution map due to higher fraction of fen area. We make an attempt to develop a forest typology system useful for a dynamic vegetation modeling and apply it to the analysis of the forest type distribution for several test areas in West and East Siberia, aiming at capability of mapping whole Siberian forests based on Landsat data. Test region locations are: two in West Siberian middle taiga (Laryegan and Nyagan), and one in East Siberia near Yakutsk. The ground truth data are based on analysis of the field survey, forest inventory data from the point of view of the successional forest type classification. Supervised classification was applied to the areas where ample ground truth and inventory data are available, using several limited area maps and vegetation survey. In Laryegan basin the upland forest areas are dominated (as climax forest species) by Scots pine on sandy soils and Siberian pine with presence of fir and spruce on the others. Those types are separable using Landsat spectral data alone. In the permafrost area around Yakutsk the most widespread succession type is birch to larch succession. Three stages of the birch to larch succession are detectable from Landsat image. When Landsat data is used in both West and East Siberia, distinction between deciduous broad-leaved species (birch, aspen, and willow) is difficult due to similarity in spectral signatures. Same problem exists for distinguishing between dark coniferous species (Siberian pine, fir and spruce). Forest classification can be improved by applying landscape type analysis, such as separation into floodplain, terrace, sloping hills.

  14. A functional supervised learning approach to the study of blood pressure data.

    PubMed

    Papayiannis, Georgios I; Giakoumakis, Emmanuel A; Manios, Efstathios D; Moulopoulos, Spyros D; Stamatelopoulos, Kimon S; Toumanidis, Savvas T; Zakopoulos, Nikolaos A; Yannacopoulos, Athanasios N

    2018-04-15

    In this work, a functional supervised learning scheme is proposed for the classification of subjects into normotensive and hypertensive groups, using solely the 24-hour blood pressure data, relying on the concepts of Fréchet mean and Fréchet variance for appropriate deformable functional models for the blood pressure data. The schemes are trained on real clinical data, and their performance was assessed and found to be very satisfactory. Copyright © 2017 John Wiley & Sons, Ltd.

  15. Supervised Learning Applied to Air Traffic Trajectory Classification

    NASA Technical Reports Server (NTRS)

    Bosson, Christabelle S.; Nikoleris, Tasos

    2018-01-01

    Given the recent increase of interest in introducing new vehicle types and missions into the National Airspace System, a transition towards a more autonomous air traffic control system is required in order to enable and handle increased density and complexity. This paper presents an exploratory effort of the needed autonomous capabilities by exploring supervised learning techniques in the context of aircraft trajectories. In particular, it focuses on the application of machine learning algorithms and neural network models to a runway recognition trajectory-classification study. It investigates the applicability and effectiveness of various classifiers using datasets containing trajectory records for a month of air traffic. A feature importance and sensitivity analysis are conducted to challenge the chosen time-based datasets and the ten selected features. The study demonstrates that classification accuracy levels of 90% and above can be reached in less than 40 seconds of training for most machine learning classifiers when one track data point, described by the ten selected features at a particular time step, per trajectory is used as input. It also shows that neural network models can achieve similar accuracy levels but at higher training time costs.

  16. Supervised Classification Processes for the Characterization of Heritage Elements, Case Study: Cuenca-Ecuador

    NASA Astrophysics Data System (ADS)

    Briones, J. C.; Heras, V.; Abril, C.; Sinchi, E.

    2017-08-01

    The proper control of built heritage entails many challenges related to the complexity of heritage elements and the extent of the area to be managed, for which the available resources must be efficiently used. In this scenario, the preventive conservation approach, based on the concept that prevent is better than cure, emerges as a strategy to avoid the progressive and imminent loss of monuments and heritage sites. Regular monitoring appears as a key tool to identify timely changes in heritage assets. This research demonstrates that the supervised learning model (Support Vector Machines - SVM) is an ideal tool that supports the monitoring process detecting visible elements in aerial images such as roofs structures, vegetation and pavements. The linear, gaussian and polynomial kernel functions were tested; the lineal function provided better results over the other functions. It is important to mention that due to the high level of segmentation generated by the classification procedure, it was necessary to apply a generalization process through opening a mathematical morphological operation, which simplified the over classification for the monitored elements.

  17. Remote sensing change detection methods to track deforestation and growth in threatened rainforests in Madre de Dios, Peru

    USGS Publications Warehouse

    Shermeyer, Jacob S.; Haack, Barry N.

    2015-01-01

    Two forestry-change detection methods are described, compared, and contrasted for estimating deforestation and growth in threatened forests in southern Peru from 2000 to 2010. The methods used in this study rely on freely available data, including atmospherically corrected Landsat 5 Thematic Mapper and Moderate Resolution Imaging Spectroradiometer (MODIS) vegetation continuous fields (VCF). The two methods include a conventional supervised signature extraction method and a unique self-calibrating method called MODIS VCF guided forest/nonforest (FNF) masking. The process chain for each of these methods includes a threshold classification of MODIS VCF, training data or signature extraction, signature evaluation, k-nearest neighbor classification, analyst-guided reclassification, and postclassification image differencing to generate forest change maps. Comparisons of all methods were based on an accuracy assessment using 500 validation pixels. Results of this accuracy assessment indicate that FNF masking had a 5% higher overall accuracy and was superior to conventional supervised classification when estimating forest change. Both methods succeeded in classifying persistently forested and nonforested areas, and both had limitations when classifying forest change.

  18. Exploiting unsupervised and supervised classification for segmentation of the pathological lung in CT

    NASA Astrophysics Data System (ADS)

    Korfiatis, P.; Kalogeropoulou, C.; Daoussis, D.; Petsas, T.; Adonopoulos, A.; Costaridou, L.

    2009-07-01

    Delineation of lung fields in presence of diffuse lung diseases (DLPDs), such as interstitial pneumonias (IP), challenges segmentation algorithms. To deal with IP patterns affecting the lung border an automated image texture classification scheme is proposed. The proposed segmentation scheme is based on supervised texture classification between lung tissue (normal and abnormal) and surrounding tissue (pleura and thoracic wall) in the lung border region. This region is coarsely defined around an initial estimate of lung border, provided by means of Markov Radom Field modeling and morphological operations. Subsequently, a support vector machine classifier was trained to distinguish between the above two classes of tissue, using textural feature of gray scale and wavelet domains. 17 patients diagnosed with IP, secondary to connective tissue diseases were examined. Segmentation performance in terms of overlap was 0.924±0.021, and for shape differentiation mean, rms and maximum distance were 1.663±0.816, 2.334±1.574 and 8.0515±6.549 mm, respectively. An accurate, automated scheme is proposed for segmenting abnormal lung fields in HRC affected by IP

  19. LANDSAT landcover information applied to regional planning decisions. [Prince Edward County, Virginia

    NASA Technical Reports Server (NTRS)

    Dixon, C. M.

    1981-01-01

    Land cover information derived from LANDSAT is being utilized by Piedmont Planning District Commission located in the State of Virginia. Progress to date is reported on a level one land cover classification map being produced with nine categories. The nine categories of classification are defined. The computer compatible tape selection is presented. Two unsupervised classifications were done, with 50 and 70 classes respectively. Twenty-eight spectral classes were developed using the supervised technique, employing actual ground truth training sites. The accuracy of the unsupervised classifications are estimated through comparison with local county statistics and with an actual pixel count of LANDSAT information compared to ground truth.

  20. Self organising maps for visualising and modelling

    PubMed Central

    2012-01-01

    The paper describes the motivation of SOMs (Self Organising Maps) and how they are generally more accessible due to the wider available modern, more powerful, cost-effective computers. Their advantages compared to Principal Components Analysis and Partial Least Squares are discussed. These allow application to non-linear data, are not so dependent on least squares solutions, normality of errors and less influenced by outliers. In addition there are a wide variety of intuitive methods for visualisation that allow full use of the map space. Modern problems in analytical chemistry include applications to cultural heritage studies, environmental, metabolomic and biological problems result in complex datasets. Methods for visualising maps are described including best matching units, hit histograms, unified distance matrices and component planes. Supervised SOMs for classification including multifactor data and variable selection are discussed as is their use in Quality Control. The paper is illustrated using four case studies, namely the Near Infrared of food, the thermal analysis of polymers, metabolomic analysis of saliva using NMR, and on-line HPLC for pharmaceutical process monitoring. PMID:22594434

  1. Deep Unfolding for Topic Models.

    PubMed

    Chien, Jen-Tzung; Lee, Chao-Hsi

    2018-02-01

    Deep unfolding provides an approach to integrate the probabilistic generative models and the deterministic neural networks. Such an approach is benefited by deep representation, easy interpretation, flexible learning and stochastic modeling. This study develops the unsupervised and supervised learning of deep unfolded topic models for document representation and classification. Conventionally, the unsupervised and supervised topic models are inferred via the variational inference algorithm where the model parameters are estimated by maximizing the lower bound of logarithm of marginal likelihood using input documents without and with class labels, respectively. The representation capability or classification accuracy is constrained by the variational lower bound and the tied model parameters across inference procedure. This paper aims to relax these constraints by directly maximizing the end performance criterion and continuously untying the parameters in learning process via deep unfolding inference (DUI). The inference procedure is treated as the layer-wise learning in a deep neural network. The end performance is iteratively improved by using the estimated topic parameters according to the exponentiated updates. Deep learning of topic models is therefore implemented through a back-propagation procedure. Experimental results show the merits of DUI with increasing number of layers compared with variational inference in unsupervised as well as supervised topic models.

  2. Joint Sparse Recovery With Semisupervised MUSIC

    NASA Astrophysics Data System (ADS)

    Wen, Zaidao; Hou, Biao; Jiao, Licheng

    2017-05-01

    Discrete multiple signal classification (MUSIC) with its low computational cost and mild condition requirement becomes a significant noniterative algorithm for joint sparse recovery (JSR). However, it fails in rank defective problem caused by coherent or limited amount of multiple measurement vectors (MMVs). In this letter, we provide a novel sight to address this problem by interpreting JSR as a binary classification problem with respect to atoms. Meanwhile, MUSIC essentially constructs a supervised classifier based on the labeled MMVs so that its performance will heavily depend on the quality and quantity of these training samples. From this viewpoint, we develop a semisupervised MUSIC (SS-MUSIC) in the spirit of machine learning, which declares that the insufficient supervised information in the training samples can be compensated from those unlabeled atoms. Instead of constructing a classifier in a fully supervised manner, we iteratively refine a semisupervised classifier by exploiting the labeled MMVs and some reliable unlabeled atoms simultaneously. Through this way, the required conditions and iterations can be greatly relaxed and reduced. Numerical experimental results demonstrate that SS-MUSIC can achieve much better recovery performances than other MUSIC extended algorithms as well as some typical greedy algorithms for JSR in terms of iterations and recovery probability.

  3. A Cross-Correlated Delay Shift Supervised Learning Method for Spiking Neurons with Application to Interictal Spike Detection in Epilepsy.

    PubMed

    Guo, Lilin; Wang, Zhenzhong; Cabrerizo, Mercedes; Adjouadi, Malek

    2017-05-01

    This study introduces a novel learning algorithm for spiking neurons, called CCDS, which is able to learn and reproduce arbitrary spike patterns in a supervised fashion allowing the processing of spatiotemporal information encoded in the precise timing of spikes. Unlike the Remote Supervised Method (ReSuMe), synapse delays and axonal delays in CCDS are variants which are modulated together with weights during learning. The CCDS rule is both biologically plausible and computationally efficient. The properties of this learning rule are investigated extensively through experimental evaluations in terms of reliability, adaptive learning performance, generality to different neuron models, learning in the presence of noise, effects of its learning parameters and classification performance. Results presented show that the CCDS learning method achieves learning accuracy and learning speed comparable with ReSuMe, but improves classification accuracy when compared to both the Spike Pattern Association Neuron (SPAN) learning rule and the Tempotron learning rule. The merit of CCDS rule is further validated on a practical example involving the automated detection of interictal spikes in EEG records of patients with epilepsy. Results again show that with proper encoding, the CCDS rule achieves good recognition performance.

  4. Accuracy assessments and areal estimates using two-phase stratified random sampling, cluster plots, and the multivariate composite estimator

    Treesearch

    Raymond L. Czaplewski

    2000-01-01

    Consider the following example of an accuracy assessment. Landsat data are used to build a thematic map of land cover for a multicounty region. The map classifier (e.g., a supervised classification algorithm) assigns each pixel into one category of land cover. The classification system includes 12 different types of forest and land cover: black spruce, balsam fir,...

  5. Pulsed terahertz imaging of breast cancer in freshly excised murine tumors

    NASA Astrophysics Data System (ADS)

    Bowman, Tyler; Chavez, Tanny; Khan, Kamrul; Wu, Jingxian; Chakraborty, Avishek; Rajaram, Narasimhan; Bailey, Keith; El-Shenawee, Magda

    2018-02-01

    This paper investigates terahertz (THz) imaging and classification of freshly excised murine xenograft breast cancer tumors. These tumors are grown via injection of E0771 breast adenocarcinoma cells into the flank of mice maintained on high-fat diet. Within 1 h of excision, the tumor and adjacent tissues are imaged using a pulsed THz system in the reflection mode. The THz images are classified using a statistical Bayesian mixture model with unsupervised and supervised approaches. Correlation with digitized pathology images is conducted using classification images assigned by a modal class decision rule. The corresponding receiver operating characteristic curves are obtained based on the classification results. A total of 13 tumor samples obtained from 9 tumors are investigated. The results show good correlation of THz images with pathology results in all samples of cancer and fat tissues. For tumor samples of cancer, fat, and muscle tissues, THz images show reasonable correlation with pathology where the primary challenge lies in the overlapping dielectric properties of cancer and muscle tissues. The use of a supervised regression approach shows improvement in the classification images although not consistently in all tissue regions. Advancing THz imaging of breast tumors from mice and the development of accurate statistical models will ultimately progress the technique for the assessment of human breast tumor margins.

  6. Cancer survival analysis using semi-supervised learning method based on Cox and AFT models with L1/2 regularization.

    PubMed

    Liang, Yong; Chai, Hua; Liu, Xiao-Ying; Xu, Zong-Ben; Zhang, Hai; Leung, Kwong-Sak

    2016-03-01

    One of the most important objectives of the clinical cancer research is to diagnose cancer more accurately based on the patients' gene expression profiles. Both Cox proportional hazards model (Cox) and accelerated failure time model (AFT) have been widely adopted to the high risk and low risk classification or survival time prediction for the patients' clinical treatment. Nevertheless, two main dilemmas limit the accuracy of these prediction methods. One is that the small sample size and censored data remain a bottleneck for training robust and accurate Cox classification model. In addition to that, similar phenotype tumours and prognoses are actually completely different diseases at the genotype and molecular level. Thus, the utility of the AFT model for the survival time prediction is limited when such biological differences of the diseases have not been previously identified. To try to overcome these two main dilemmas, we proposed a novel semi-supervised learning method based on the Cox and AFT models to accurately predict the treatment risk and the survival time of the patients. Moreover, we adopted the efficient L1/2 regularization approach in the semi-supervised learning method to select the relevant genes, which are significantly associated with the disease. The results of the simulation experiments show that the semi-supervised learning model can significant improve the predictive performance of Cox and AFT models in survival analysis. The proposed procedures have been successfully applied to four real microarray gene expression and artificial evaluation datasets. The advantages of our proposed semi-supervised learning method include: 1) significantly increase the available training samples from censored data; 2) high capability for identifying the survival risk classes of patient in Cox model; 3) high predictive accuracy for patients' survival time in AFT model; 4) strong capability of the relevant biomarker selection. Consequently, our proposed semi-supervised learning model is one more appropriate tool for survival analysis in clinical cancer research.

  7. From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

    PubMed

    Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

    2010-01-30

    Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.

  8. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    PubMed Central

    2010-01-01

    Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. PMID:20113515

  9. Comparison of wheat classification accuracy using different classifiers of the image-100 system

    NASA Technical Reports Server (NTRS)

    Dejesusparada, N. (Principal Investigator); Chen, S. C.; Moreira, M. A.; Delima, A. M.

    1981-01-01

    Classification results using single-cell and multi-cell signature acquisition options, a point-by-point Gaussian maximum-likelihood classifier, and K-means clustering of the Image-100 system are presented. Conclusions reached are that: a better indication of correct classification can be provided by using a test area which contains various cover types of the study area; classification accuracy should be evaluated considering both the percentages of correct classification and error of commission; supervised classification approaches are better than K-means clustering; Gaussian distribution maximum likelihood classifier is better than Single-cell and Multi-cell Signature Acquisition Options of the Image-100 system; and in order to obtain a high classification accuracy in a large and heterogeneous crop area, using Gaussian maximum-likelihood classifier, homogeneous spectral subclasses of the study crop should be created to derive training statistics.

  10. Sentiment classification technology based on Markov logic networks

    NASA Astrophysics Data System (ADS)

    He, Hui; Li, Zhigang; Yao, Chongchong; Zhang, Weizhe

    2016-07-01

    With diverse online media emerging, there is a growing concern of sentiment classification problem. At present, text sentiment classification mainly utilizes supervised machine learning methods, which feature certain domain dependency. On the basis of Markov logic networks (MLNs), this study proposed a cross-domain multi-task text sentiment classification method rooted in transfer learning. Through many-to-one knowledge transfer, labeled text sentiment classification, knowledge was successfully transferred into other domains, and the precision of the sentiment classification analysis in the text tendency domain was improved. The experimental results revealed the following: (1) the model based on a MLN demonstrated higher precision than the single individual learning plan model. (2) Multi-task transfer learning based on Markov logical networks could acquire more knowledge than self-domain learning. The cross-domain text sentiment classification model could significantly improve the precision and efficiency of text sentiment classification.

  11. Comparing the Behavior of Polarimetric SAR Imagery (TerraSAR-X and Radarsat-2) for Automated Sea Ice Classification

    NASA Astrophysics Data System (ADS)

    Ressel, Rudolf; Singha, Suman; Lehner, Susanne

    2016-08-01

    Arctic Sea ice monitoring has attracted increasing attention over the last few decades. Besides the scientific interest in sea ice, the operational aspect of ice charting is becoming more important due to growing navigational possibilities in an increasingly ice free Arctic. For this purpose, satellite borne SAR imagery has become an invaluable tool. In past, mostly single polarimetric datasets were investigated with supervised or unsupervised classification schemes for sea ice investigation. Despite proven sea ice classification achievements on single polarimetric data, a fully automatic, general purpose classifier for single-pol data has not been established due to large variation of sea ice manifestations and incidence angle impact. Recently, through the advent of polarimetric SAR sensors, polarimetric features have moved into the focus of ice classification research. The higher information content four polarimetric channels promises to offer greater insight into sea ice scattering mechanism and overcome some of the shortcomings of single- polarimetric classifiers. Two spatially and temporally coincident pairs of fully polarimetric acquisitions from the TerraSAR-X/TanDEM-X and RADARSAT-2 satellites are investigated. Proposed supervised classification algorithm consists of two steps: The first step comprises a feature extraction, the results of which are ingested into a neural network classifier in the second step. Based on the common coherency and covariance matrix, we extract a number of features and analyze the relevance and redundancy by means of mutual information for the purpose of sea ice classification. Coherency matrix based features which require an eigendecomposition are found to be either of low relevance or redundant to other covariance matrix based features. Among the most useful features for classification are matrix invariant based features (Geometric Intensity, Scattering Diversity, Surface Scattering Fraction).

  12. BOREAS TE-18 Landsat TM Maximum Likelihood Classification Image of the NSA

    NASA Technical Reports Server (NTRS)

    Hall, Forrest G. (Editor); Knapp, David

    2000-01-01

    The BOREAS TE-18 team focused its efforts on using remotely sensed data to characterize the successional and disturbance dynamics of the boreal forest for use in carbon modeling. The objective of this classification is to provide the BOREAS investigators with a data product that characterizes the land cover of the NSA. A Landsat-5 TM image from 20-Aug-1988 was used to derive this classification. A standard supervised maximum likelihood classification approach was used to produce this classification. The data are provided in a binary image format file. The data files are available on a CD-ROM (see document number 20010000884), or from the Oak Ridge National Laboratory (ORNL) Distributed Activity Archive Center (DAAC).

  13. Weakly supervised visual dictionary learning by harnessing image attributes.

    PubMed

    Gao, Yue; Ji, Rongrong; Liu, Wei; Dai, Qionghai; Hua, Gang

    2014-12-01

    Bag-of-features (BoFs) representation has been extensively applied to deal with various computer vision applications. To extract discriminative and descriptive BoF, one important step is to learn a good dictionary to minimize the quantization loss between local features and codewords. While most existing visual dictionary learning approaches are engaged with unsupervised feature quantization, the latest trend has turned to supervised learning by harnessing the semantic labels of images or regions. However, such labels are typically too expensive to acquire, which restricts the scalability of supervised dictionary learning approaches. In this paper, we propose to leverage image attributes to weakly supervise the dictionary learning procedure without requiring any actual labels. As a key contribution, our approach establishes a generative hidden Markov random field (HMRF), which models the quantized codewords as the observed states and the image attributes as the hidden states, respectively. Dictionary learning is then performed by supervised grouping the observed states, where the supervised information is stemmed from the hidden states of the HMRF. In such a way, the proposed dictionary learning approach incorporates the image attributes to learn a semantic-preserving BoF representation without any genuine supervision. Experiments in large-scale image retrieval and classification tasks corroborate that our approach significantly outperforms the state-of-the-art unsupervised dictionary learning approaches.

  14. Safe semi-supervised learning based on weighted likelihood.

    PubMed

    Kawakita, Masanori; Takeuchi, Jun'ichi

    2014-05-01

    We are interested in developing a safe semi-supervised learning that works in any situation. Semi-supervised learning postulates that n(') unlabeled data are available in addition to n labeled data. However, almost all of the previous semi-supervised methods require additional assumptions (not only unlabeled data) to make improvements on supervised learning. If such assumptions are not met, then the methods possibly perform worse than supervised learning. Sokolovska, Cappé, and Yvon (2008) proposed a semi-supervised method based on a weighted likelihood approach. They proved that this method asymptotically never performs worse than supervised learning (i.e., it is safe) without any assumption. Their method is attractive because it is easy to implement and is potentially general. Moreover, it is deeply related to a certain statistical paradox. However, the method of Sokolovska et al. (2008) assumes a very limited situation, i.e., classification, discrete covariates, n(')→∞ and a maximum likelihood estimator. In this paper, we extend their method by modifying the weight. We prove that our proposal is safe in a significantly wide range of situations as long as n≤n('). Further, we give a geometrical interpretation of the proof of safety through the relationship with the above-mentioned statistical paradox. Finally, we show that the above proposal is asymptotically safe even when n(')

  15. Multilayer Extreme Learning Machine With Subnetwork Nodes for Representation Learning.

    PubMed

    Yang, Yimin; Wu, Q M Jonathan

    2016-11-01

    The extreme learning machine (ELM), which was originally proposed for "generalized" single-hidden layer feedforward neural networks, provides efficient unified learning solutions for the applications of clustering, regression, and classification. It presents competitive accuracy with superb efficiency in many applications. However, ELM with subnetwork nodes architecture has not attracted much research attentions. Recently, many methods have been proposed for supervised/unsupervised dimension reduction or representation learning, but these methods normally only work for one type of problem. This paper studies the general architecture of multilayer ELM (ML-ELM) with subnetwork nodes, showing that: 1) the proposed method provides a representation learning platform with unsupervised/supervised and compressed/sparse representation learning and 2) experimental results on ten image datasets and 16 classification datasets show that, compared to other conventional feature learning methods, the proposed ML-ELM with subnetwork nodes performs competitively or much better than other feature learning methods.

  16. Deep Learning for Extreme Weather Detection

    NASA Astrophysics Data System (ADS)

    Prabhat, M.; Racah, E.; Biard, J.; Liu, Y.; Mudigonda, M.; Kashinath, K.; Beckham, C.; Maharaj, T.; Kahou, S.; Pal, C.; O'Brien, T. A.; Wehner, M. F.; Kunkel, K.; Collins, W. D.

    2017-12-01

    We will present our latest results from the application of Deep Learning methods for detecting, localizing and segmenting extreme weather patterns in climate data. We have successfully applied supervised convolutional architectures for the binary classification tasks of detecting tropical cyclones and atmospheric rivers in centered, cropped patches. We have subsequently extended our architecture to a semi-supervised formulation, which is capable of learning a unified representation of multiple weather patterns, predicting bounding boxes and object categories, and has the capability to detect novel patterns (w/ few, or no labels). We will briefly present our efforts in scaling the semi-supervised architecture to 9600 nodes of the Cori supercomputer, obtaining 15PF performance. Time permitting, we will highlight our efforts in pixel-level segmentation of weather patterns.

  17. Automated glioblastoma segmentation based on a multiparametric structured unsupervised classification.

    PubMed

    Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V; Robles, Montserrat; Aparici, F; Martí-Bonmatí, L; García-Gómez, Juan M

    2015-01-01

    Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation.

  18. EEG source space analysis of the supervised factor analytic approach for the classification of multi-directional arm movement

    NASA Astrophysics Data System (ADS)

    Shenoy Handiru, Vikram; Vinod, A. P.; Guan, Cuntai

    2017-08-01

    Objective. In electroencephalography (EEG)-based brain-computer interface (BCI) systems for motor control tasks the conventional practice is to decode motor intentions by using scalp EEG. However, scalp EEG only reveals certain limited information about the complex tasks of movement with a higher degree of freedom. Therefore, our objective is to investigate the effectiveness of source-space EEG in extracting relevant features that discriminate arm movement in multiple directions. Approach. We have proposed a novel feature extraction algorithm based on supervised factor analysis that models the data from source-space EEG. To this end, we computed the features from the source dipoles confined to Brodmann areas of interest (BA4a, BA4p and BA6). Further, we embedded class-wise labels of multi-direction (multi-class) source-space EEG to an unsupervised factor analysis to make it into a supervised learning method. Main Results. Our approach provided an average decoding accuracy of 71% for the classification of hand movement in four orthogonal directions, that is significantly higher (>10%) than the classification accuracy obtained using state-of-the-art spatial pattern features in sensor space. Also, the group analysis on the spectral characteristics of source-space EEG indicates that the slow cortical potentials from a set of cortical source dipoles reveal discriminative information regarding the movement parameter, direction. Significance. This study presents evidence that low-frequency components in the source space play an important role in movement kinematics, and thus it may lead to new strategies for BCI-based neurorehabilitation.

  19. Biophysical control of intertidal benthic macroalgae revealed by high-frequency multispectral camera images

    NASA Astrophysics Data System (ADS)

    van der Wal, Daphne; van Dalen, Jeroen; Wielemaker-van den Dool, Annette; Dijkstra, Jasper T.; Ysebaert, Tom

    2014-07-01

    Intertidal benthic macroalgae are a biological quality indicator in estuaries and coasts. While remote sensing has been applied to quantify the spatial distribution of such macroalgae, it is generally not used for their monitoring. We examined the day-to-day and seasonal dynamics of macroalgal cover on a sandy intertidal flat using visible and near-infrared images from a time-lapse camera mounted on a tower. Benthic algae were identified using supervised, semi-supervised and unsupervised classification techniques, validated with monthly ground-truthing over one year. A supervised classification (based on maximum likelihood, using training areas identified in the field) performed best in discriminating between sediment, benthic diatom films and macroalgae, with highest spectral separability between macroalgae and diatoms in spring/summer. An automated unsupervised classification (based on the Normalised Differential Vegetation Index NDVI) allowed detection of daily changes in macroalgal coverage without the need for calibration. This method showed a bloom of macroalgae (filamentous green algae, Ulva sp.) in summer with > 60% cover, but with pronounced superimposed day-to-day variation in cover. Waves were a major factor in regulating macroalgal cover, but regrowth of the thalli after a summer storm was fast (2 weeks). Images and in situ data demonstrated that the protruding tubes of the polychaete Lanice conchilega facilitated both settlement (anchorage) and survival (resistance to waves) of the macroalgae. Thus, high-frequency, high resolution images revealed the mechanisms for regulating the dynamics in cover of the macroalgae and for their spatial structuring. Ramifications for the mode, timing, frequency and evaluation of monitoring macroalgae by field and remote sensing surveys are discussed.

  20. Forest Classification Accuracy as Influenced by Multispectral Scanner Spatial Resolution. [Sam Houston National Forest, Texas

    NASA Technical Reports Server (NTRS)

    Nalepka, R. F. (Principal Investigator); Sadowski, F. E.; Sarno, J. E.

    1976-01-01

    The author has identified the following significant results. A supervised classification within two separate ground areas of the Sam Houston National Forest was carried out for two sq meters spatial resolution MSS data. Data were progressively coarsened to simulate five additional cases of spatial resolution ranging up to 64 sq meters. Similar processing and analysis of all spatial resolutions enabled evaluations of the effect of spatial resolution on classification accuracy for various levels of detail and the effects on area proportion estimation for very general forest features. For very coarse resolutions, a subset of spectral channels which simulated the proposed thematic mapper channels was used to study classification accuracy.

  1. Automated source classification of new transient sources

    NASA Astrophysics Data System (ADS)

    Oertel, M.; Kreikenbohm, A.; Wilms, J.; DeLuca, A.

    2017-10-01

    The EXTraS project harvests the hitherto unexplored temporal domain information buried in the serendipitous data collected by the European Photon Imaging Camera (EPIC) onboard the ESA XMM-Newton mission since its launch. This includes a search for fast transients, missed by standard image analysis, and a search and characterization of variability in hundreds of thousands of sources. We present an automated classification scheme for new transient sources in the EXTraS project. The method is as follows: source classification features of a training sample are used to train machine learning algorithms (performed in R; randomForest (Breiman, 2001) in supervised mode) which are then tested on a sample of known source classes and used for classification.

  2. New insights into the classification and nomenclature of cortical GABAergic interneurons.

    PubMed

    DeFelipe, Javier; López-Cruz, Pedro L; Benavides-Piccione, Ruth; Bielza, Concha; Larrañaga, Pedro; Anderson, Stewart; Burkhalter, Andreas; Cauli, Bruno; Fairén, Alfonso; Feldmeyer, Dirk; Fishell, Gord; Fitzpatrick, David; Freund, Tamás F; González-Burgos, Guillermo; Hestrin, Shaul; Hill, Sean; Hof, Patrick R; Huang, Josh; Jones, Edward G; Kawaguchi, Yasuo; Kisvárday, Zoltán; Kubota, Yoshiyuki; Lewis, David A; Marín, Oscar; Markram, Henry; McBain, Chris J; Meyer, Hanno S; Monyer, Hannah; Nelson, Sacha B; Rockland, Kathleen; Rossier, Jean; Rubenstein, John L R; Rudy, Bernardo; Scanziani, Massimo; Shepherd, Gordon M; Sherwood, Chet C; Staiger, Jochen F; Tamás, Gábor; Thomson, Alex; Wang, Yun; Yuste, Rafael; Ascoli, Giorgio A

    2013-03-01

    A systematic classification and accepted nomenclature of neuron types is much needed but is currently lacking. This article describes a possible taxonomical solution for classifying GABAergic interneurons of the cerebral cortex based on a novel, web-based interactive system that allows experts to classify neurons with pre-determined criteria. Using Bayesian analysis and clustering algorithms on the resulting data, we investigated the suitability of several anatomical terms and neuron names for cortical GABAergic interneurons. Moreover, we show that supervised classification models could automatically categorize interneurons in agreement with experts' assignments. These results demonstrate a practical and objective approach to the naming, characterization and classification of neurons based on community consensus.

  3. New insights into the classification and nomenclature of cortical GABAergic interneurons

    PubMed Central

    DeFelipe, Javier; López-Cruz, Pedro L.; Benavides-Piccione, Ruth; Bielza, Concha; Larrañaga, Pedro; Anderson, Stewart; Burkhalter, Andreas; Cauli, Bruno; Fairén, Alfonso; Feldmeyer, Dirk; Fishell, Gord; Fitzpatrick, David; Freund, Tamás F.; González-Burgos, Guillermo; Hestrin, Shaul; Hill, Sean; Hof, Patrick R.; Huang, Josh; Jones, Edward G.; Kawaguchi, Yasuo; Kisvárday, Zoltán; Kubota, Yoshiyuki; Lewis, David A.; Marín, Oscar; Markram, Henry; McBain, Chris J.; Meyer, Hanno S.; Monyer, Hannah; Nelson, Sacha B.; Rockland, Kathleen; Rossier, Jean; Rubenstein, John L. R.; Rudy, Bernardo; Scanziani, Massimo; Shepherd, Gordon M.; Sherwood, Chet C.; Staiger, Jochen F.; Tamás, Gábor; Thomson, Alex; Wang, Yun; Yuste, Rafael; Ascoli, Giorgio A.

    2013-01-01

    A systematic classification and accepted nomenclature of neuron types is much needed but is currently lacking. This article describes a possible taxonomical solution for classifying GABAergic interneurons of the cerebral cortex based on a novel, web-based interactive system that allows experts to classify neurons with pre-determined criteria. Using Bayesian analysis and clustering algorithms on the resulting data, we investigated the suitability of several anatomical terms and neuron names for cortical GABAergic interneurons. Moreover, we show that supervised classification models could automatically categorize interneurons in agreement with experts’ assignments. These results demonstrate a practical and objective approach to the naming, characterization and classification of neurons based on community consensus. PMID:23385869

  4. Classification of spontaneous EEG signals in migraine

    NASA Astrophysics Data System (ADS)

    Bellotti, R.; De Carlo, F.; de Tommaso, M.; Lucente, M.

    2007-08-01

    We set up a classification system able to detect patients affected by migraine without aura, through the analysis of their spontaneous EEG patterns. First, the signals are characterized by means of wavelet-based features, than a supervised neural network is used to classify the multichannel data. For the feature extraction, scale-dependent and scale-independent methods are considered with a variety of wavelet functions. Both the approaches provide very high and almost comparable classification performances. A complete separation of the two groups is obtained when the data are plotted in the plane spanned by two suitable neural outputs.

  5. Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.

    PubMed

    Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel

    2017-06-01

    Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.

  6. Training strategy for convolutional neural networks in pedestrian gender classification

    NASA Astrophysics Data System (ADS)

    Ng, Choon-Boon; Tay, Yong-Haur; Goi, Bok-Min

    2017-06-01

    In this work, we studied a strategy for training a convolutional neural network in pedestrian gender classification with limited amount of labeled training data. Unsupervised learning by k-means clustering on pedestrian images was used to learn the filters to initialize the first layer of the network. As a form of pre-training, supervised learning for the related task of pedestrian classification was performed. Finally, the network was fine-tuned for gender classification. We found that this strategy improved the network's generalization ability in gender classification, achieving better test results when compared to random weights initialization and slightly more beneficial than merely initializing the first layer filters by unsupervised learning. This shows that unsupervised learning followed by pre-training with pedestrian images is an effective strategy to learn useful features for pedestrian gender classification.

  7. 49 CFR 1245.5 - Classification of job titles.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ..., Computer Programmer, Computer Analyst, Market Analyst, Pricing Analyst, Employment Supervisor, Research..., Traveling Auditors or Accountants Title is descriptive Traveling Auditor, Accounting Specialist Auditors... 21; adds new titles. 207 Supervising and Chief Claim Agents Title is descriptive Chief Claim Agent...

  8. 49 CFR 1245.5 - Classification of job titles.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ..., Computer Programmer, Computer Analyst, Market Analyst, Pricing Analyst, Employment Supervisor, Research..., Traveling Auditors or Accountants Title is descriptive Traveling Auditor, Accounting Specialist Auditors... 21; adds new titles. 207 Supervising and Chief Claim Agents Title is descriptive Chief Claim Agent...

  9. 49 CFR 1245.5 - Classification of job titles.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ..., Computer Programmer, Computer Analyst, Market Analyst, Pricing Analyst, Employment Supervisor, Research..., Traveling Auditors or Accountants Title is descriptive Traveling Auditor, Accounting Specialist Auditors... 21; adds new titles. 207 Supervising and Chief Claim Agents Title is descriptive Chief Claim Agent...

  10. 49 CFR 1245.5 - Classification of job titles.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ..., Computer Programmer, Computer Analyst, Market Analyst, Pricing Analyst, Employment Supervisor, Research..., Traveling Auditors or Accountants Title is descriptive Traveling Auditor, Accounting Specialist Auditors... 21; adds new titles. 207 Supervising and Chief Claim Agents Title is descriptive Chief Claim Agent...

  11. 49 CFR 1245.5 - Classification of job titles.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ..., Computer Programmer, Computer Analyst, Market Analyst, Pricing Analyst, Employment Supervisor, Research..., Traveling Auditors or Accountants Title is descriptive Traveling Auditor, Accounting Specialist Auditors... 21; adds new titles. 207 Supervising and Chief Claim Agents Title is descriptive Chief Claim Agent...

  12. Semi-supervised learning for photometric supernova classification

    NASA Astrophysics Data System (ADS)

    Richards, Joseph W.; Homrighausen, Darren; Freeman, Peter E.; Schafer, Chad M.; Poznanski, Dovi

    2012-01-01

    We present a semi-supervised method for photometric supernova typing. Our approach is to first use the non-linear dimension reduction technique diffusion map to detect structure in a data base of supernova light curves and subsequently employ random forest classification on a spectroscopically confirmed training set to learn a model that can predict the type of each newly observed supernova. We demonstrate that this is an effective method for supernova typing. As supernova numbers increase, our semi-supervised method efficiently utilizes this information to improve classification, a property not enjoyed by template-based methods. Applied to supernova data simulated by Kessler et al. to mimic those of the Dark Energy Survey, our methods achieve (cross-validated) 95 per cent Type Ia purity and 87 per cent Type Ia efficiency on the spectroscopic sample, but only 50 per cent Type Ia purity and 50 per cent efficiency on the photometric sample due to their spectroscopic follow-up strategy. To improve the performance on the photometric sample, we search for better spectroscopic follow-up procedures by studying the sensitivity of our machine-learned supernova classification on the specific strategy used to obtain training sets. With a fixed amount of spectroscopic follow-up time, we find that, despite collecting data on a smaller number of supernovae, deeper magnitude-limited spectroscopic surveys are better for producing training sets. For supernova Ia (II-P) typing, we obtain a 44 per cent (1 per cent) increase in purity to 72 per cent (87 per cent) and 30 per cent (162 per cent) increase in efficiency to 65 per cent (84 per cent) of the sample using a 25th (24.5th) magnitude-limited survey instead of the shallower spectroscopic sample used in the original simulations. When redshift information is available, we incorporate it into our analysis using a novel method of altering the diffusion map representation of the supernovae. Incorporating host redshifts leads to a 5 per cent improvement in Type Ia purity and 13 per cent improvement in Type Ia efficiency. A web service for the supernova classification method used in this paper can be found at .

  13. Cross-Domain Semi-Supervised Learning Using Feature Formulation.

    PubMed

    Xingquan Zhu

    2011-12-01

    Semi-Supervised Learning (SSL) traditionally makes use of unlabeled samples by including them into the training set through an automated labeling process. Such a primitive Semi-Supervised Learning (pSSL) approach suffers from a number of disadvantages including false labeling and incapable of utilizing out-of-domain samples. In this paper, we propose a formative Semi-Supervised Learning (fSSL) framework which explores hidden features between labeled and unlabeled samples to achieve semi-supervised learning. fSSL regards that both labeled and unlabeled samples are generated from some hidden concepts with labeling information partially observable for some samples. The key of the fSSL is to recover the hidden concepts, and take them as new features to link labeled and unlabeled samples for semi-supervised learning. Because unlabeled samples are only used to generate new features, but not to be explicitly included in the training set like pSSL does, fSSL overcomes the inherent disadvantages of the traditional pSSL methods, especially for samples not within the same domain as the labeled instances. Experimental results and comparisons demonstrate that fSSL significantly outperforms pSSL-based methods for both within-domain and cross-domain semi-supervised learning.

  14. New perspectives on archaeological prospecting: Multispectral imagery analysis from Army City, Kansas, USA

    NASA Astrophysics Data System (ADS)

    Banks, Benjamin Daniel

    Aerial imagery analysis has a long history in European archaeology and despite early attempts little progress has been made to promote its use in North America. Recent advances in multispectral satellite and aerial sensors are helping to make aerial imagery analysis more effective in North America, and more cost effective. A site in northeastern Kansas is explored using multispectral aerial and satellite imagery allowing buried features to be mapped. Many of the problems associated with early aerial imagery analysis are explored, such as knowledge of archeological processes that contribute to crop mark formation. Use of multispectral imagery provides a means of detecting and enhancing crop marks not easily distinguishable in visible spectrum imagery. Unsupervised computer classifications of potential archaeological features permits their identification and interpretation while supervised classifications, incorporating limited amounts of geophysical data, provide a more detailed understanding of the site. Supervised classifications allow archaeological processes contributing to crop mark formation to be explored. Aerial imagery analysis is argued to be useful to a wide range of archeological problems, reducing person hours and expenses needed for site delineation and mapping. This technology may be especially useful for cultural resources management.

  15. Automated attribution of remotely-sensed ecological disturbances using spatial and temporal characteristics of common disturbance classes.

    NASA Astrophysics Data System (ADS)

    Cooper, L. A.; Ballantyne, A.

    2017-12-01

    Forest disturbances are critical components of ecosystems. Knowledge of their prevalence and impacts is necessary to accurately describe forest health and ecosystem services through time. While there are currently several methods available to identify and describe forest disturbances, especially those which occur in North America, the process remains inefficient and inaccessible in many parts of the world. Here, we introduce a preliminary approach to streamline and automate both the detection and attribution of forest disturbances. We use a combination of the Breaks for Additive Season and Trend (BFAST) detection algorithm to detect disturbances in combination with supervised and unsupervised classification algorithms to attribute the detections to disturbance classes. Both spatial and temporal disturbance characteristics are derived and utilized for the goal of automating the disturbance attribution process. The resulting preliminary algorithm is applied to up-scaled (100m) Landsat data for several different ecosystems in North America, with varying success. Our results indicate that supervised classification is more reliable than unsupervised classification, but that limited training data are required for a region. Future work will improve the algorithm through refining and validating at sites within North America before applying this approach globally.

  16. Classification of cirrhotic liver in Gadolinium-enhanced MR images

    NASA Astrophysics Data System (ADS)

    Lee, Gobert; Uchiyama, Yoshikazu; Zhang, Xuejun; Kanematsu, Masayuki; Zhou, Xiangrong; Hara, Takeshi; Kato, Hiroki; Kondo, Hiroshi; Fujita, Hiroshi; Hoshi, Hiroaki

    2007-03-01

    Cirrhosis of the liver is characterized by the presence of widespread nodules and fibrosis in the liver. The fibrosis and nodules formation causes distortion of the normal liver architecture, resulting in characteristic texture patterns. Texture patterns are commonly analyzed with the use of co-occurrence matrix based features measured on regions-of-interest (ROIs). A classifier is subsequently used for the classification of cirrhotic or non-cirrhotic livers. Problem arises if the classifier employed falls into the category of supervised classifier which is a popular choice. This is because the 'true disease states' of the ROIs are required for the training of the classifier but is, generally, not available. A common approach is to adopt the 'true disease state' of the liver as the 'true disease state' of all ROIs in that liver. This paper investigates the use of a nonsupervised classifier, the k-means clustering method in classifying livers as cirrhotic or non-cirrhotic using unlabelled ROI data. A preliminary result with a sensitivity and specificity of 72% and 60%, respectively, demonstrates the feasibility of using the k-means non-supervised clustering method in generating a characteristic cluster structure that could facilitate the classification of cirrhotic and non-cirrhotic livers.

  17. Application of classification methods for mapping Mercury's surface composition: analysis on Rudaki's Area

    NASA Astrophysics Data System (ADS)

    Zambon, F.; De Sanctis, M. C.; Capaccioni, F.; Filacchione, G.; Carli, C.; Ammanito, E.; Friggeri, A.

    2011-10-01

    During the first two MESSENGER flybys (14th January 2008 and 6th October 2008) the Mercury Dual Imaging System (MDIS) has extended the coverage of the Mercury surface, obtained by Mariner 10 and now we have images of about 90% of the Mercury surface [1]. MDIS is equipped with a Narrow Angle Camera (NAC) and a Wide Angle Camera (WAC). The NAC uses an off-axis reflective design with a 1.5° field of view (FOV) centered at 747 nm. The WAC has a re- fractive design with a 10.5° FOV and 12-position filters that cover a 395-1040 nm spectral range [2]. The color images can be used to infer information on the surface composition and classification meth- ods are an interesting technique for multispectral image analysis which can be applied to the study of the planetary surfaces. Classification methods are based on clustering algorithms and they can be divided in two categories: unsupervised and supervised. The unsupervised classifiers do not require the analyst feedback, and the algorithm automatically organizes pixels values into classes. In the supervised method, instead, the analyst must choose the "training area" that define the pixels value of a given class [3]. Here we will describe the classification in different compositional units of the region near the Rudaki Crater on Mercury.

  18. Land use and land cover classification for rural residential areas in China using soft-probability cascading of multifeatures

    NASA Astrophysics Data System (ADS)

    Zhang, Bin; Liu, Yueyan; Zhang, Zuyu; Shen, Yonglin

    2017-10-01

    A multifeature soft-probability cascading scheme to solve the problem of land use and land cover (LULC) classification using high-spatial-resolution images to map rural residential areas in China is proposed. The proposed method is used to build midlevel LULC features. Local features are frequently considered as low-level feature descriptors in a midlevel feature learning method. However, spectral and textural features, which are very effective low-level features, are neglected. The acquisition of the dictionary of sparse coding is unsupervised, and this phenomenon reduces the discriminative power of the midlevel feature. Thus, we propose to learn supervised features based on sparse coding, a support vector machine (SVM) classifier, and a conditional random field (CRF) model to utilize the different effective low-level features and improve the discriminability of midlevel feature descriptors. First, three kinds of typical low-level features, namely, dense scale-invariant feature transform, gray-level co-occurrence matrix, and spectral features, are extracted separately. Second, combined with sparse coding and the SVM classifier, the probabilities of the different LULC classes are inferred to build supervised feature descriptors. Finally, the CRF model, which consists of two parts: unary potential and pairwise potential, is employed to construct an LULC classification map. Experimental results show that the proposed classification scheme can achieve impressive performance when the total accuracy reached about 87%.

  19. Cases of Coastal Zone Change and Land Use/Land Cover Change: a learning module that goes beyond the "how" of doing image processing and change detection to asking the "why" about what are the "driving forces" of global change.

    NASA Astrophysics Data System (ADS)

    Ford, R. E.

    2006-12-01

    In 2006 the Loma Linda University ESSE21 Mesoamerican Project (Earth System Science Education for the 21st Century) along with partners such as the University of Redlands and California State University, Pomona, produced an online learning module that is designed to help students learn critical remote sensing skills-- specifically: ecosystem characterization, i.e. doing a supervised or unsupervised classification of satellite imagery in a tropical coastal environment. And, it would teach how to measure land use / land cover change (LULC) over time and then encourage students to use that data to assess the Human Dimensions of Global Change (HDGC). Specific objectives include: 1. Learn where to find remote sensing data and practice downloading, pre-processing, and "cleaning" the data for image analysis. 2. Use Leica-Geosystems ERDAS Imagine or IDRISI Kilimanjaro to analyze and display the data. 3. Do an unsupervised classification of a LANDSAT image of a protected area in Honduras, i.e. Cuero y Salado, Pico Bonito, or Isla del Tigre. 4. Virtually participate in a ground-validation exercise that would allow one to re-classify the image into a supervised classification using the FAO Global Land Cover Network (GLCN) classification system. 5. Learn more about each protected area's landscape, history, livelihood patterns and "sustainability" issues via virtual online tours that provide ground and space photos of different sites. This will help students in identifying potential "training sites" for doing a supervised classification. 6. Study other global, US, Canadian, and European land use/land cover classification systems and compare their advantages and disadvantages over the FAO/GLCN system. 7. Learn to appreciate the advantages and disadvantages of existing LULC classification schemes and adapt them to local-level user needs. 8. Carry out a change detection exercise that shows how land use and/or land cover has changed over time for the protected area of your choice. The presenter will demonstrate the module, assess the collaborative process which created it, and describe how it has been used so far by users in the US as well as in Honduras and elsewhere via a series joint workshops held in Mesoamerica. Suggestions for improvement will be requested. See the module and related content resources at: http://resweb.llu.edu/rford/ESSE21/LUCCModule/

  20. Genetic Classification of Populations Using Supervised Learning

    PubMed Central

    Bridges, Michael; Heron, Elizabeth A.; O'Dushlaine, Colm; Segurado, Ricardo; Morris, Derek; Corvin, Aiden; Gill, Michael; Pinto, Carlos

    2011-01-01

    There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case–control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available. In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies. PMID:21589856

  1. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.

    PubMed

    Stanescu, Ana; Caragea, Doina

    2015-01-01

    Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework.

  2. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

    PubMed Central

    2015-01-01

    Background Recent biochemical advances have led to inexpensive, time-efficient production of massive volumes of raw genomic data. Traditional machine learning approaches to genome annotation typically rely on large amounts of labeled data. The process of labeling data can be expensive, as it requires domain knowledge and expert involvement. Semi-supervised learning approaches that can make use of unlabeled data, in addition to small amounts of labeled data, can help reduce the costs associated with labeling. In this context, we focus on the problem of predicting splice sites in a genome using semi-supervised learning approaches. This is a challenging problem, due to the highly imbalanced distribution of the data, i.e., small number of splice sites as compared to the number of non-splice sites. To address this challenge, we propose to use ensembles of semi-supervised classifiers, specifically self-training and co-training classifiers. Results Our experiments on five highly imbalanced splice site datasets, with positive to negative ratios of 1-to-99, showed that the ensemble-based semi-supervised approaches represent a good choice, even when the amount of labeled data consists of less than 1% of all training data. In particular, we found that ensembles of co-training and self-training classifiers that dynamically balance the set of labeled instances during the semi-supervised iterations show improvements over the corresponding supervised ensemble baselines. Conclusions In the presence of limited amounts of labeled data, ensemble-based semi-supervised approaches can successfully leverage the unlabeled data to enhance supervised ensembles learned from highly imbalanced data distributions. Given that such distributions are common for many biological sequence classification problems, our work can be seen as a stepping stone towards more sophisticated ensemble-based approaches to biological sequence annotation in a semi-supervised framework. PMID:26356316

  3. Online Graph Completion: Multivariate Signal Recovery in Computer Vision.

    PubMed

    Kim, Won Hwa; Jalal, Mona; Hwang, Seongjae; Johnson, Sterling C; Singh, Vikas

    2017-07-01

    The adoption of "human-in-the-loop" paradigms in computer vision and machine learning is leading to various applications where the actual data acquisition (e.g., human supervision) and the underlying inference algorithms are closely interwined. While classical work in active learning provides effective solutions when the learning module involves classification and regression tasks, many practical issues such as partially observed measurements, financial constraints and even additional distributional or structural aspects of the data typically fall outside the scope of this treatment. For instance, with sequential acquisition of partial measurements of data that manifest as a matrix (or tensor), novel strategies for completion (or collaborative filtering) of the remaining entries have only been studied recently. Motivated by vision problems where we seek to annotate a large dataset of images via a crowdsourced platform or alternatively, complement results from a state-of-the-art object detector using human feedback, we study the "completion" problem defined on graphs, where requests for additional measurements must be made sequentially. We design the optimization model in the Fourier domain of the graph describing how ideas based on adaptive submodularity provide algorithms that work well in practice. On a large set of images collected from Imgur, we see promising results on images that are otherwise difficult to categorize. We also show applications to an experimental design problem in neuroimaging.

  4. VHR satellite multitemporal data to extract cultural landscape changes in the roman site of Grumentum

    NASA Astrophysics Data System (ADS)

    masini, nicola; Lasaponara, Rosa

    2013-04-01

    The papers deals with the use of VHR satellite multitemporal data set to extract cultural landscape changes in the roman site of Grumentum Grumentum is an ancient town, 50 km south of Potenza, located near the roman road of Via Herculea which connected the Venusia, in the north est of Basilicata, with Heraclea in the Ionian coast. The first settlement date back to the 6th century BC. It was resettled by the Romans in the 3rd century BC. Its urban fabric which evidences a long history from the Republican age to late Antiquity (III BC-V AD) is composed of the typical urban pattern of cardi and decumani. Its excavated ruins include a large amphitheatre, a theatre, the thermae, the Forum and some temples. There are many techniques nowadays available to capture and record differences in two or more images. In this paper we focus and apply the two main approaches which can be distinguished into : (i) unsupervised and (ii) supervised change detection methods. Unsupervised change detection methods are generally based on the transformation of the two multispectral images in to a single band or multiband image which are further analyzed to identify changes Unsupervised change detection techniques are generally based on three basic steps (i) the preprocessing step, (ii) a pixel-by-pixel comparison is performed, (iii). Identification of changes according to the magnitude an direction (positive /negative). Unsupervised change detection are generally based on the transformation of the two multispectral images into a single band or multiband image which are further analyzed to identify changes. Than the separation between changed and unchanged classes is obtained from the magnitude of the resulting spectral change vectors by means of empirical or theoretical well founded approaches Supervised change detection methods are generally based on supervised classification methods, which require the availability of a suitable training set for the learning process of the classifiers. Unsupervised change detection techniques are generally based on three basic steps (i) the preprocessing step, (ii) supervised classification is performed on the single dates or on the map obtained as the difference of two dates, (iii). Identification of changes according to the magnitude an direction (positive /negative). Supervised change detection are generally based on supervised classification methods, which require the availability of a suitable training set for the learning process of the classifiers, therefore these algorithms require a preliminary knowledge necessary: (i) to generate representative parameters for each class of interest; and (ii) to carry out the training stage Advantages and disadvantages of the supervised and unsupervised approaches are discuss. Finally results from the the satellite multitemporal dataset was also integrated with aerial photos from historical archive in order to expand the time window of the investigation and capture landscape changes occurred from the Agrarian Reform, in the 50s, up today.

  5. Unsupervised classification of remote multispectral sensing data

    NASA Technical Reports Server (NTRS)

    Su, M. Y.

    1972-01-01

    The new unsupervised classification technique for classifying multispectral remote sensing data which can be either from the multispectral scanner or digitized color-separation aerial photographs consists of two parts: (a) a sequential statistical clustering which is a one-pass sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. Applications of the technique using an IBM-7094 computer on multispectral data sets over Purdue's Flight Line C-1 and the Yellowstone National Park test site have been accomplished. Comparisons between the classification maps by the unsupervised technique and the supervised maximum liklihood technique indicate that the classification accuracies are in agreement.

  6. High-order distance-based multiview stochastic learning in image classification.

    PubMed

    Yu, Jun; Rui, Yong; Tang, Yuan Yan; Tao, Dacheng

    2014-12-01

    How do we find all images in a larger set of images which have a specific content? Or estimate the position of a specific object relative to the camera? Image classification methods, like support vector machine (supervised) and transductive support vector machine (semi-supervised), are invaluable tools for the applications of content-based image retrieval, pose estimation, and optical character recognition. However, these methods only can handle the images represented by single feature. In many cases, different features (or multiview data) can be obtained, and how to efficiently utilize them is a challenge. It is inappropriate for the traditionally concatenating schema to link features of different views into a long vector. The reason is each view has its specific statistical property and physical interpretation. In this paper, we propose a high-order distance-based multiview stochastic learning (HD-MSL) method for image classification. HD-MSL effectively combines varied features into a unified representation and integrates the labeling information based on a probabilistic framework. In comparison with the existing strategies, our approach adopts the high-order distance obtained from the hypergraph to replace pairwise distance in estimating the probability matrix of data distribution. In addition, the proposed approach can automatically learn a combination coefficient for each view, which plays an important role in utilizing the complementary information of multiview data. An alternative optimization is designed to solve the objective functions of HD-MSL and obtain different views on coefficients and classification scores simultaneously. Experiments on two real world datasets demonstrate the effectiveness of HD-MSL in image classification.

  7. Tensor Train Neighborhood Preserving Embedding

    NASA Astrophysics Data System (ADS)

    Wang, Wenqi; Aggarwal, Vaneet; Aeron, Shuchin

    2018-05-01

    In this paper, we propose a Tensor Train Neighborhood Preserving Embedding (TTNPE) to embed multi-dimensional tensor data into low dimensional tensor subspace. Novel approaches to solve the optimization problem in TTNPE are proposed. For this embedding, we evaluate novel trade-off gain among classification, computation, and dimensionality reduction (storage) for supervised learning. It is shown that compared to the state-of-the-arts tensor embedding methods, TTNPE achieves superior trade-off in classification, computation, and dimensionality reduction in MNIST handwritten digits and Weizmann face datasets.

  8. Intelligible machine learning with malibu.

    PubMed

    Langlois, Robert E; Lu, Hui

    2008-01-01

    malibu is an open-source machine learning work-bench developed in C/C++ for high-performance real-world applications, namely bioinformatics and medical informatics. It leverages third-party machine learning implementations for more robust bug-free software. This workbench handles several well-studied supervised machine learning problems including classification, regression, importance-weighted classification and multiple-instance learning. The malibu interface was designed to create reproducible experiments ideally run in a remote and/or command line environment. The software can be found at: http://proteomics.bioengr. uic.edu/malibu/index.html.

  9. Gates to Gregg High Voltage Transmission Line Study. [California

    NASA Technical Reports Server (NTRS)

    Bergis, V.; Maw, K.; Newland, W.; Sinnott, D.; Thornbury, G.; Easterwood, P.; Bonderud, J.

    1982-01-01

    The usefulness of LANDSAT data in the planning of transmission line routes was assessed. LANDSAT digital data and image processing techniques, specifically a multi-date supervised classification aproach, were used to develop a land cover map for an agricultural area near Fresno, California. Twenty-six land cover classes were identified, of which twenty classes were agricultural crops. High classification accuracies (greater than 80%) were attained for several classes, including cotton, grain, and vineyards. The primary products generated were 1:24,000, 1:100,000 and 1:250,000 scale maps of the classification and acreage summaries for all land cover classes within four alternate transmission line routes.

  10. Effect of filtration of signals of brain activity on quality of recognition of brain activity patterns using artificial intelligence methods

    NASA Astrophysics Data System (ADS)

    Hramov, Alexander E.; Frolov, Nikita S.; Musatov, Vyachaslav Yu.

    2018-02-01

    In present work we studied features of the human brain states classification, corresponding to the real movements of hands and legs. For this purpose we used supervised learning algorithm based on feed-forward artificial neural networks (ANNs) with error back-propagation along with the support vector machine (SVM) method. We compared the quality of operator movements classification by means of EEG signals obtained experimentally in the absence of preliminary processing and after filtration in different ranges up to 25 Hz. It was shown that low-frequency filtering of multichannel EEG data significantly improved accuracy of operator movements classification.

  11. Synergy between Sentinel-1 radar time series and Sentinel-2 optical for the mapping of restored areas in Danube delta

    NASA Astrophysics Data System (ADS)

    Niculescu, Simona; Lardeux, Cédric; Hanganu, Jenica

    2018-05-01

    Wetlands are important and valuable ecosystems, yet, since 1900, more than 50 % of wetlands have been lost worldwide. An example of altered and partially restored coastal wetlands is the Danube Delta in Romania. Over time, human intervention has manifested itself in more than a quarter of the entire Danube surface. This intervention was brutal and has rendered ecosystem restoration very difficult. Studies for the rehabilitation / re-vegetation were started immediately after the Danube Delta was declared as a Biosphere Reservation in 1990. Remote sensing offers accurate methods for detecting and mapping change in restored wetlands. Vegetation change detection is a powerful indicator of restoration success. The restoration projects use vegetative cover as an important indicator of restoration success. To follow the evolution of the vegetation cover of the restored areas, satellite images radar and optical of last generation have been used, such as Sentinel-1 and Sentinel-2. Indeed the sensor sensitivity to the landscape depends on the wavelength what- ever radar or optical data and their polarization for radar data. Combining this kind of data is particularly relevant for the classification of wetland vegetation, which are associated with the density and size of the vegetation. In addition, the high temporal acquisition frequency of Sentinel-1 which are not sensitive to cloud cover al- low to use temporal signature of the different land cover. Thus we analyse the polarimetric and temporal signature of Sentinel-1 data in order to better understand the signature of the different study classes. In a second phase, we performed classifications based on the Random Forest supervised classification algorithm involving the entire Sentinel-1 time series, then starting from a Sentinel-2 collection and finally involving combinations of Sentinel-1 and -2 data.

  12. Toward a definition of blueprint of virgin olive oil by comprehensive two-dimensional gas chromatography.

    PubMed

    Purcaro, Giorgia; Cordero, Chiara; Liberto, Erica; Bicchi, Carlo; Conte, Lanfranco S

    2014-03-21

    This study investigates the applicability of an iterative approach aimed at defining a chemical blueprint of virgin olive oil volatiles to be correlated to the product sensory quality. The investigation strategy proposed allows to fully exploit the informative content of a comprehensive multidimensional gas chromatography (GC×GC) coupled to a mass spectrometry (MS) data set. Olive oil samples (19), including 5 reference standards, obtained from the International Olive Oil Council, and commercial samples, were submitted to a sensory evaluation by a Panel test, before being analyzed in two laboratories using different instrumentation, column set, and software elaboration packages in view of a cross-validation of the entire methodology. A first classification of samples based on untargeted peak features information, was obtained on raw data from two different column combinations (apolar×polar and polar×apolar) by applying unsupervised multivariate analysis (i.e., principal component analysis-PCA). However, to improve effectiveness and specificity of this classification, peak features were reliably identified (261 compounds), on the basis of the MS spectrum and linear retention index matching, and subjected to successive pair-wise comparisons based on 2D patterns, which revealed peculiar distribution of chemicals correlated with samples sensory classification. The most informative compounds were thus identified and collected in a "blueprint" of specific defects (or combination of defects) successively adopted to discriminate Extra Virgin from defected oils (i.e., lampante oil) with the aid of a supervised approach, i.e., partial least squares-discriminant analysis (PLS-DA). In this last step, the principles of sensomics, which assigns higher information potential to analytes with lower odor threshold proved to be successful, and a much more powerful discrimination of samples was obtained in view of a sensory quality assessment. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Developing an Automated Machine Learning Marine Oil Spill Detection System with Synthetic Aperture Radar

    NASA Astrophysics Data System (ADS)

    Pinales, J. C.; Graber, H. C.; Hargrove, J. T.; Caruso, M. J.

    2016-02-01

    Previous studies have demonstrated the ability to detect and classify marine hydrocarbon films with spaceborne synthetic aperture radar (SAR) imagery. The dampening effects of hydrocarbon discharges on small surface capillary-gravity waves renders the ocean surface "radar dark" compared with the standard wind-borne ocean surfaces. Given the scope and impact of events like the Deepwater Horizon oil spill, the need for improved, automated and expedient monitoring of hydrocarbon-related marine anomalies has become a pressing and complex issue for governments and the extraction industry. The research presented here describes the development, training, and utilization of an algorithm that detects marine oil spills in an automated, semi-supervised manner, utilizing X-, C-, or L-band SAR data as the primary input. Ancillary datasets include related radar-borne variables (incidence angle, etc.), environmental data (wind speed, etc.) and textural descriptors. Shapefiles produced by an experienced human-analyst served as targets (validation) during the training portion of the investigation. Training and testing datasets were chosen for development and assessment of algorithm effectiveness as well as optimal conditions for oil detection in SAR data. The algorithm detects oil spills by following a 3-step methodology: object detection, feature extraction, and classification. Previous oil spill detection and classification methodologies such as machine learning algorithms, artificial neural networks (ANN), and multivariate classification methods like partial least squares-discriminant analysis (PLS-DA) are evaluated and compared. Statistical, transform, and model-based image texture techniques, commonly used for object mapping directly or as inputs for more complex methodologies, are explored to determine optimal textures for an oil spill detection system. The influence of the ancillary variables is explored, with a particular focus on the role of strong vs. weak wind forcing.

  14. Continuous measurement of breast tumor hormone receptor expression: a comparison of two computational pathology platforms

    PubMed Central

    Ahern, Thomas P.; Beck, Andrew H.; Rosner, Bernard A.; Glass, Ben; Frieling, Gretchen; Collins, Laura C.; Tamimi, Rulla M.

    2017-01-01

    Background Computational pathology platforms incorporate digital microscopy with sophisticated image analysis to permit rapid, continuous measurement of protein expression. We compared two computational pathology platforms on their measurement of breast tumor estrogen receptor (ER) and progesterone receptor (PR) expression. Methods Breast tumor microarrays from the Nurses’ Health Study were stained for ER (n=592) and PR (n=187). One expert pathologist scored cases as positive if ≥1% of tumor nuclei exhibited stain. ER and PR were then measured with the Definiens Tissue Studio (automated) and Aperio Digital Pathology (user-supervised) platforms. Platform-specific measurements were compared using boxplots, scatter plots, and correlation statistics. Classification of ER and PR positivity by platform-specific measurements was evaluated with areas under receiver operating characteristic curves (AUC) from univariable logistic regression models, using expert pathologist classification as the standard. Results Both platforms showed considerable overlap in continuous measurements of ER and PR between positive and negative groups classified by expert pathologist. Platform-specific measurements were strongly and positively correlated with one another (rho≥0.77). The user-supervised Aperio workflow performed slightly better than the automated Definiens workflow at classifying ER positivity (AUCAperio=0.97; AUCDefiniens=0.90; difference=0.07, 95% CI: 0.05, 0.09) and PR positivity (AUCAperio=0.94; AUCDefiniens=0.87; difference=0.07, 95% CI: 0.03, 0.12). Conclusion Paired hormone receptor expression measurements from two different computational pathology platforms agreed well with one another. The user-supervised workflow yielded better classification accuracy than the automated workflow. Appropriately validated computational pathology algorithms enrich molecular epidemiology studies with continuous protein expression data and may accelerate tumor biomarker discovery. PMID:27729430

  15. Automated Glioblastoma Segmentation Based on a Multiparametric Structured Unsupervised Classification

    PubMed Central

    Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V.; Robles, Montserrat; Aparici, F.; Martí-Bonmatí, L.; García-Gómez, Juan M.

    2015-01-01

    Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation. PMID:25978453

  16. Supervised segmentation of microelectrode recording artifacts using power spectral density.

    PubMed

    Bakstein, Eduard; Schneider, Jakub; Sieger, Tomas; Novak, Daniel; Wild, Jiri; Jech, Robert

    2015-08-01

    Appropriate detection of clean signal segments in extracellular microelectrode recordings (MER) is vital for maintaining high signal-to-noise ratio in MER studies. Existing alternatives to manual signal inspection are based on unsupervised change-point detection. We present a method of supervised MER artifact classification, based on power spectral density (PSD) and evaluate its performance on a database of 95 labelled MER signals. The proposed method yielded test-set accuracy of 90%, which was close to the accuracy of annotation (94%). The unsupervised methods achieved accuracy of about 77% on both training and testing data.

  17. Kernel and divergence techniques in high energy physics separations

    NASA Astrophysics Data System (ADS)

    Bouř, Petr; Kůs, Václav; Franc, Jiří

    2017-10-01

    Binary decision trees under the Bayesian decision technique are used for supervised classification of high-dimensional data. We present a great potential of adaptive kernel density estimation as the nested separation method of the supervised binary divergence decision tree. Also, we provide a proof of alternative computing approach for kernel estimates utilizing Fourier transform. Further, we apply our method to Monte Carlo data set from the particle accelerator Tevatron at DØ experiment in Fermilab and provide final top-antitop signal separation results. We have achieved up to 82 % AUC while using the restricted feature selection entering the signal separation procedure.

  18. Latent Partially Ordered Classification Models and Normal Mixtures

    ERIC Educational Resources Information Center

    Tatsuoka, Curtis; Varadi, Ferenc; Jaeger, Judith

    2013-01-01

    Latent partially ordered sets (posets) can be employed in modeling cognitive functioning, such as in the analysis of neuropsychological (NP) and educational test data. Posets are cognitively diagnostic in the sense that classification states in these models are associated with detailed profiles of cognitive functioning. These profiles allow for…

  19. Comparison Promotes Learning and Transfer of Relational Categories

    ERIC Educational Resources Information Center

    Kurtz, Kenneth J.; Boukrina, Olga; Gentner, Dedre

    2013-01-01

    We investigated the effect of co-presenting training items during supervised classification learning of novel relational categories. Strong evidence exists that comparison induces a structural alignment process that renders common relational structure more salient. We hypothesized that comparisons between exemplars would facilitate learning and…

  20. 29 CFR 1201.4 - Employee.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... authority to supervise and direct the manner of rendition of his service) who performs any work defined as... existing orders: Provided, however, That no occupational classification made by order of the Interstate Commerce Commission shall be construed to define the crafts according to which railway employees may be...

  1. 29 CFR 1201.4 - Employee.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... authority to supervise and direct the manner of rendition of his service) who performs any work defined as... existing orders: Provided, however, That no occupational classification made by order of the Interstate Commerce Commission shall be construed to define the crafts according to which railway employees may be...

  2. FROM2D to 3d Supervised Segmentation and Classification for Cultural Heritage Applications

    NASA Astrophysics Data System (ADS)

    Grilli, E.; Dininno, D.; Petrucci, G.; Remondino, F.

    2018-05-01

    The digital management of architectural heritage information is still a complex problem, as a heritage object requires an integrated representation of various types of information in order to develop appropriate restoration or conservation strategies. Currently, there is extensive research focused on automatic procedures of segmentation and classification of 3D point clouds or meshes, which can accelerate the study of a monument and integrate it with heterogeneous information and attributes, useful to characterize and describe the surveyed object. The aim of this study is to propose an optimal, repeatable and reliable procedure to manage various types of 3D surveying data and associate them with heterogeneous information and attributes to characterize and describe the surveyed object. In particular, this paper presents an approach for classifying 3D heritage models, starting from the segmentation of their textures based on supervised machine learning methods. Experimental results run on three different case studies demonstrate that the proposed approach is effective and with many further potentials.

  3. Three-dimensional textural features of conventional MRI improve diagnostic classification of childhood brain tumours.

    PubMed

    Fetit, Ahmed E; Novak, Jan; Peet, Andrew C; Arvanitits, Theodoros N

    2015-09-01

    The aim of this study was to assess the efficacy of three-dimensional texture analysis (3D TA) of conventional MR images for the classification of childhood brain tumours in a quantitative manner. The dataset comprised pre-contrast T1 - and T2-weighted MRI series obtained from 48 children diagnosed with brain tumours (medulloblastoma, pilocytic astrocytoma and ependymoma). 3D and 2D TA were carried out on the images using first-, second- and higher order statistical methods. Six supervised classification algorithms were trained with the most influential 3D and 2D textural features, and their performances in the classification of tumour types, using the two feature sets, were compared. Model validation was carried out using the leave-one-out cross-validation (LOOCV) approach, as well as stratified 10-fold cross-validation, in order to provide additional reassurance. McNemar's test was used to test the statistical significance of any improvements demonstrated by 3D-trained classifiers. Supervised learning models trained with 3D textural features showed improved classification performances to those trained with conventional 2D features. For instance, a neural network classifier showed 12% improvement in area under the receiver operator characteristics curve (AUC) and 19% in overall classification accuracy. These improvements were statistically significant for four of the tested classifiers, as per McNemar's tests. This study shows that 3D textural features extracted from conventional T1 - and T2-weighted images can improve the diagnostic classification of childhood brain tumours. Long-term benefits of accurate, yet non-invasive, diagnostic aids include a reduction in surgical procedures, improvement in surgical and therapy planning, and support of discussions with patients' families. It remains necessary, however, to extend the analysis to a multicentre cohort in order to assess the scalability of the techniques used. Copyright © 2015 John Wiley & Sons, Ltd.

  4. Optimized Graph Learning Using Partial Tags and Multiple Features for Image and Video Annotation.

    PubMed

    Song, Jingkuan; Gao, Lianli; Nie, Feiping; Shen, Heng Tao; Yan, Yan; Sebe, Nicu

    2016-11-01

    In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available. This is often done by adding a geometry-based regularization term in the objective function of a supervised learning model. In this case, a similarity graph is indispensable to exploit the geometrical relationships among the training data points, and the graph construction scheme essentially determines the performance of these graph-based learning algorithms. However, most of the existing works construct the graph empirically and are usually based on a single feature without using the label information. In this paper, we propose a semi-supervised annotation approach by learning an optimized graph (OGL) from multi-cues (i.e., partial tags and multiple features), which can more accurately embed the relationships among the data points. Since OGL is a transductive method and cannot deal with novel data points, we further extend our model to address the out-of-sample issue. Extensive experiments on image and video annotation show the consistent superiority of OGL over the state-of-the-art methods.

  5. Identification of milk origin and process-induced changes in milk by stable isotope ratio mass spectrometry.

    PubMed

    Scampicchio, Matteo; Mimmo, Tanja; Capici, Calogero; Huck, Christian; Innocente, Nadia; Drusch, Stephan; Cesco, Stefano

    2012-11-14

    Stable isotope values were used to develop a new analytical approach enabling the simultaneous identification of milk samples either processed with different heating regimens or from different geographical origins. The samples consisted of raw, pasteurized (HTST), and ultrapasteurized (UHT) milk from different Italian origins. The approach consisted of the analysis of the isotope ratio of δ¹³C and δ¹⁵N for the milk samples and their fractions (fat, casein, and whey). The main finding of this work is that as the heat processing affects the composition of the milk fractions, changes in δ¹³C and δ¹⁵N were also observed. These changes were used as markers to develop pattern recognition maps based on principal component analysis and supervised classification models, such as linear discriminant analysis (LDA), multivariate regression (MLR), principal component regression (PCR), and partial least-squares (PLS). The results give proof of the concept that isotope ratio mass spectroscopy can discriminate simultaneously between milk samples according to their geographical origin and type of processing.

  6. Graph-based semi-supervised learning with genomic data integration using condition-responsive genes applied to phenotype classification.

    PubMed

    Doostparast Torshizi, Abolfazl; Petzold, Linda R

    2018-01-01

    Data integration methods that combine data from different molecular levels such as genome, epigenome, transcriptome, etc., have received a great deal of interest in the past few years. It has been demonstrated that the synergistic effects of different biological data types can boost learning capabilities and lead to a better understanding of the underlying interactions among molecular levels. In this paper we present a graph-based semi-supervised classification algorithm that incorporates latent biological knowledge in the form of biological pathways with gene expression and DNA methylation data. The process of graph construction from biological pathways is based on detecting condition-responsive genes, where 3 sets of genes are finally extracted: all condition responsive genes, high-frequency condition-responsive genes, and P-value-filtered genes. The proposed approach is applied to ovarian cancer data downloaded from the Human Genome Atlas. Extensive numerical experiments demonstrate superior performance of the proposed approach compared to other state-of-the-art algorithms, including the latest graph-based classification techniques. Simulation results demonstrate that integrating various data types enhances classification performance and leads to a better understanding of interrelations between diverse omics data types. The proposed approach outperforms many of the state-of-the-art data integration algorithms. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  7. 9 CFR 145.73 - Terminology and classification; flocks and products.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... “National Plan Hatcheries” or have met equivalent requirements for pullorum-typhoid control under official... have met equivalent requirements for pullorum-typhoid control under official supervision: Provided... following terms and the corresponding designs illustrated in § 145.10: (a) [Reserved] (b) U.S. Pullorum...

  8. 9 CFR 145.73 - Terminology and classification; flocks and products.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... “National Plan Hatcheries” or have met equivalent requirements for pullorum-typhoid control under official... have met equivalent requirements for pullorum-typhoid control under official supervision: Provided... following terms and the corresponding designs illustrated in § 145.10: (a) [Reserved] (b) U.S. Pullorum...

  9. 9 CFR 145.73 - Terminology and classification; flocks and products.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... “National Plan Hatcheries” or have met equivalent requirements for pullorum-typhoid control under official... have met equivalent requirements for pullorum-typhoid control under official supervision: Provided... following terms and the corresponding designs illustrated in § 145.10: (a) [Reserved] (b) U.S. Pullorum...

  10. 9 CFR 145.73 - Terminology and classification; flocks and products.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... “National Plan Hatcheries” or have met equivalent requirements for pullorum-typhoid control under official... have met equivalent requirements for pullorum-typhoid control under official supervision: Provided... following terms and the corresponding designs illustrated in § 145.10: (a) [Reserved] (b) U.S. Pullorum...

  11. 9 CFR 145.73 - Terminology and classification; flocks and products.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... “National Plan Hatcheries” or have met equivalent requirements for pullorum-typhoid control under official... have met equivalent requirements for pullorum-typhoid control under official supervision: Provided... following terms and the corresponding designs illustrated in § 145.10: (a) [Reserved] (b) U.S. Pullorum...

  12. 7 CFR 27.46 - Cotton withdrawn from storage.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Cotton withdrawn from storage. 27.46 Section 27.46... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Cotton Class Certificates § 27.46 Cotton withdrawn from storage. The exchange inspection agency under the supervision or control of...

  13. 7 CFR 27.46 - Cotton withdrawn from storage.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Cotton withdrawn from storage. 27.46 Section 27.46... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Cotton Class Certificates § 27.46 Cotton withdrawn from storage. The exchange inspection agency under the supervision or control of...

  14. 7 CFR 27.46 - Cotton withdrawn from storage.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Cotton withdrawn from storage. 27.46 Section 27.46... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Cotton Class Certificates § 27.46 Cotton withdrawn from storage. The exchange inspection agency under the supervision or control of...

  15. 7 CFR 27.46 - Cotton withdrawn from storage.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Cotton withdrawn from storage. 27.46 Section 27.46... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Cotton Class Certificates § 27.46 Cotton withdrawn from storage. The exchange inspection agency under the supervision or control of...

  16. 7 CFR 27.46 - Cotton withdrawn from storage.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Cotton withdrawn from storage. 27.46 Section 27.46... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Cotton Class Certificates § 27.46 Cotton withdrawn from storage. The exchange inspection agency under the supervision or control of...

  17. Classification of octet AB-type binary compounds using dynamical charges: A materials informatics perspective

    DOE PAGES

    Pilania, G.; Gubernatis, J. E.; Lookman, T.

    2015-12-03

    The role of dynamical (or Born effective) charges in classification of octet AB-type binary compounds between four-fold (zincblende/wurtzite crystal structures) and six-fold (rocksalt crystal structure) coordinated systems is discussed. We show that the difference in the dynamical charges of the fourfold and sixfold coordinated structures, in combination with Harrison’s polarity, serves as an excellent feature to classify the coordination of 82 sp–bonded binary octet compounds. We use a support vector machine classifier to estimate the average classification accuracy and the associated variance in our model where a decision boundary is learned in a supervised manner. Lastly, we compare the out-of-samplemore » classification accuracy achieved by our feature pair with those reported previously.« less

  18. An improved arteriovenous classification method for the early diagnostics of various diseases in retinal image.

    PubMed

    Xu, Xiayu; Ding, Wenxiang; Abràmoff, Michael D; Cao, Ruofan

    2017-04-01

    Retinal artery and vein classification is an important task for the automatic computer-aided diagnosis of various eye diseases and systemic diseases. This paper presents an improved supervised artery and vein classification method in retinal image. Intra-image regularization and inter-subject normalization is applied to reduce the differences in feature space. Novel features, including first-order and second-order texture features, are utilized to capture the discriminating characteristics of arteries and veins. The proposed method was tested on the DRIVE dataset and achieved an overall accuracy of 0.923. This retinal artery and vein classification algorithm serves as a potentially important tool for the early diagnosis of various diseases, including diabetic retinopathy and cardiovascular diseases. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. A Novel Hybrid Dimension Reduction Technique for Undersized High Dimensional Gene Expression Data Sets Using Information Complexity Criterion for Cancer Classification

    PubMed Central

    Pamukçu, Esra; Bozdogan, Hamparsum; Çalık, Sinan

    2015-01-01

    Gene expression data typically are large, complex, and highly noisy. Their dimension is high with several thousand genes (i.e., features) but with only a limited number of observations (i.e., samples). Although the classical principal component analysis (PCA) method is widely used as a first standard step in dimension reduction and in supervised and unsupervised classification, it suffers from several shortcomings in the case of data sets involving undersized samples, since the sample covariance matrix degenerates and becomes singular. In this paper we address these limitations within the context of probabilistic PCA (PPCA) by introducing and developing a new and novel approach using maximum entropy covariance matrix and its hybridized smoothed covariance estimators. To reduce the dimensionality of the data and to choose the number of probabilistic PCs (PPCs) to be retained, we further introduce and develop celebrated Akaike's information criterion (AIC), consistent Akaike's information criterion (CAIC), and the information theoretic measure of complexity (ICOMP) criterion of Bozdogan. Six publicly available undersized benchmark data sets were analyzed to show the utility, flexibility, and versatility of our approach with hybridized smoothed covariance matrix estimators, which do not degenerate to perform the PPCA to reduce the dimension and to carry out supervised classification of cancer groups in high dimensions. PMID:25838836

  20. Computer-aided diagnosis system: a Bayesian hybrid classification method.

    PubMed

    Calle-Alonso, F; Pérez, C J; Arias-Nicolás, J P; Martín, J

    2013-10-01

    A novel method to classify multi-class biomedical objects is presented. The method is based on a hybrid approach which combines pairwise comparison, Bayesian regression and the k-nearest neighbor technique. It can be applied in a fully automatic way or in a relevance feedback framework. In the latter case, the information obtained from both an expert and the automatic classification is iteratively used to improve the results until a certain accuracy level is achieved, then, the learning process is finished and new classifications can be automatically performed. The method has been applied in two biomedical contexts by following the same cross-validation schemes as in the original studies. The first one refers to cancer diagnosis, leading to an accuracy of 77.35% versus 66.37%, originally obtained. The second one considers the diagnosis of pathologies of the vertebral column. The original method achieves accuracies ranging from 76.5% to 96.7%, and from 82.3% to 97.1% in two different cross-validation schemes. Even with no supervision, the proposed method reaches 96.71% and 97.32% in these two cases. By using a supervised framework the achieved accuracy is 97.74%. Furthermore, all abnormal cases were correctly classified. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  1. Centered Kernel Alignment Enhancing Neural Network Pretraining for MRI-Based Dementia Diagnosis

    PubMed Central

    Cárdenas-Peña, David; Collazos-Huertas, Diego; Castellanos-Dominguez, German

    2016-01-01

    Dementia is a growing problem that affects elderly people worldwide. More accurate evaluation of dementia diagnosis can help during the medical examination. Several methods for computer-aided dementia diagnosis have been proposed using resonance imaging scans to discriminate between patients with Alzheimer's disease (AD) or mild cognitive impairment (MCI) and healthy controls (NC). Nonetheless, the computer-aided diagnosis is especially challenging because of the heterogeneous and intermediate nature of MCI. We address the automated dementia diagnosis by introducing a novel supervised pretraining approach that takes advantage of the artificial neural network (ANN) for complex classification tasks. The proposal initializes an ANN based on linear projections to achieve more discriminating spaces. Such projections are estimated by maximizing the centered kernel alignment criterion that assesses the affinity between the resonance imaging data kernel matrix and the label target matrix. As a result, the performed linear embedding allows accounting for features that contribute the most to the MCI class discrimination. We compare the supervised pretraining approach to two unsupervised initialization methods (autoencoders and Principal Component Analysis) and against the best four performing classification methods of the 2014 CADDementia challenge. As a result, our proposal outperforms all the baselines (7% of classification accuracy and area under the receiver-operating-characteristic curve) at the time it reduces the class biasing. PMID:27148392

  2. Cell segmentation in phase contrast microscopy images via semi-supervised classification over optics-related features.

    PubMed

    Su, Hang; Yin, Zhaozheng; Huh, Seungil; Kanade, Takeo

    2013-10-01

    Phase-contrast microscopy is one of the most common and convenient imaging modalities to observe long-term multi-cellular processes, which generates images by the interference of lights passing through transparent specimens and background medium with different retarded phases. Despite many years of study, computer-aided phase contrast microscopy analysis on cell behavior is challenged by image qualities and artifacts caused by phase contrast optics. Addressing the unsolved challenges, the authors propose (1) a phase contrast microscopy image restoration method that produces phase retardation features, which are intrinsic features of phase contrast microscopy, and (2) a semi-supervised learning based algorithm for cell segmentation, which is a fundamental task for various cell behavior analysis. Specifically, the image formation process of phase contrast microscopy images is first computationally modeled with a dictionary of diffraction patterns; as a result, each pixel of a phase contrast microscopy image is represented by a linear combination of the bases, which we call phase retardation features. Images are then partitioned into phase-homogeneous atoms by clustering neighboring pixels with similar phase retardation features. Consequently, cell segmentation is performed via a semi-supervised classification technique over the phase-homogeneous atoms. Experiments demonstrate that the proposed approach produces quality segmentation of individual cells and outperforms previous approaches. Copyright © 2013 Elsevier B.V. All rights reserved.

  3. Receptive field optimisation and supervision of a fuzzy spiking neural network.

    PubMed

    Glackin, Cornelius; Maguire, Liam; McDaid, Liam; Sayers, Heather

    2011-04-01

    This paper presents a supervised training algorithm that implements fuzzy reasoning on a spiking neural network. Neuron selectivity is facilitated using receptive fields that enable individual neurons to be responsive to certain spike train firing rates and behave in a similar manner as fuzzy membership functions. The connectivity of the hidden and output layers in the fuzzy spiking neural network (FSNN) is representative of a fuzzy rule base. Fuzzy C-Means clustering is utilised to produce clusters that represent the antecedent part of the fuzzy rule base that aid classification of the feature data. Suitable cluster widths are determined using two strategies; subjective thresholding and evolutionary thresholding respectively. The former technique typically results in compact solutions in terms of the number of neurons, and is shown to be particularly suited to small data sets. In the latter technique a pool of cluster candidates is generated using Fuzzy C-Means clustering and then a genetic algorithm is employed to select the most suitable clusters and to specify cluster widths. In both scenarios, the network is supervised but learning only occurs locally as in the biological case. The advantages and disadvantages of the network topology for the Fisher Iris and Wisconsin Breast Cancer benchmark classification tasks are demonstrated and directions of current and future work are discussed. Copyright © 2010 Elsevier Ltd. All rights reserved.

  4. Learning Robust and Discriminative Subspace With Low-Rank Constraints.

    PubMed

    Li, Sheng; Fu, Yun

    2016-11-01

    In this paper, we aim at learning robust and discriminative subspaces from noisy data. Subspace learning is widely used in extracting discriminative features for classification. However, when data are contaminated with severe noise, the performance of most existing subspace learning methods would be limited. Recent advances in low-rank modeling provide effective solutions for removing noise or outliers contained in sample sets, which motivates us to take advantage of low-rank constraints in order to exploit robust and discriminative subspace for classification. In particular, we present a discriminative subspace learning method called the supervised regularization-based robust subspace (SRRS) approach, by incorporating the low-rank constraint. SRRS seeks low-rank representations from the noisy data, and learns a discriminative subspace from the recovered clean data jointly. A supervised regularization function is designed to make use of the class label information, and therefore to enhance the discriminability of subspace. Our approach is formulated as a constrained rank-minimization problem. We design an inexact augmented Lagrange multiplier optimization algorithm to solve it. Unlike the existing sparse representation and low-rank learning methods, our approach learns a low-dimensional subspace from recovered data, and explicitly incorporates the supervised information. Our approach and some baselines are evaluated on the COIL-100, ALOI, Extended YaleB, FERET, AR, and KinFace databases. The experimental results demonstrate the effectiveness of our approach, especially when the data contain considerable noise or variations.

  5. Classification of asteroid spectra using a neural network

    NASA Technical Reports Server (NTRS)

    Howell, E. S.; Merenyi, E.; Lebofsky, L. A.

    1994-01-01

    The 52-color asteroid survey (Bell et al., 1988) together with the 8-color asteroid survey (Zellner et al., 1985) provide a data set of asteroid spectra spanning 0.3-2.5 micrometers. An artificial neural network clusters these asteroid spectra based on their similarity to each other. We have also trained the neural network with a categorization learning output layer in a supervised mode to associate the established clusters with taxonomic classes. Results of our classification agree with Tholen's classification based on the 8-color data alone. When extending the spectral range using the 52-color survey data, we find that some modification of the Tholen classes is indicated to produce a cleaner, self-consistent set of taxonomic classes. After supervised training using our modified classes, the network correctly classifies both the training examples, and additional spectra into the correct class with an average of 90% accuracy. Our classification supports the separation of the K class from the S class, as suggested by Bell et al. (1987), based on the near-infrared spectrum. We define two end-member subclasses which seem to have compositional significance within the S class: the So class, which is olivine-rich and red, and the Sp class, which is pyroxene-rich and less red. The remaining S-class asteroids have intermediate compositions of both olivine and pyroxene and moderately red continua. The network clustering suggests some additional structure within the E-, M-, and P-class asteroids, even in the absence of albedo information, which is the only discriminant between these in the Tholen classification. New relationships are seen between the C class and related G, B, and F classes. However, in both cases, the number of spectra is too small to interpret or determine the significance of these separations.

  6. The composite sequential clustering technique for analysis of multispectral scanner data

    NASA Technical Reports Server (NTRS)

    Su, M. Y.

    1972-01-01

    The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.

  7. Abnormality detection of mammograms by discriminative dictionary learning on DSIFT descriptors.

    PubMed

    Tavakoli, Nasrin; Karimi, Maryam; Nejati, Mansour; Karimi, Nader; Reza Soroushmehr, S M; Samavi, Shadrokh; Najarian, Kayvan

    2017-07-01

    Detection and classification of breast lesions using mammographic images are one of the most difficult studies in medical image processing. A number of learning and non-learning methods have been proposed for detecting and classifying these lesions. However, the accuracy of the detection/classification still needs improvement. In this paper we propose a powerful classification method based on sparse learning to diagnose breast cancer in mammograms. For this purpose, a supervised discriminative dictionary learning approach is applied on dense scale invariant feature transform (DSIFT) features. A linear classifier is also simultaneously learned with the dictionary which can effectively classify the sparse representations. Our experimental results show the superior performance of our method compared to existing approaches.

  8. A software package for interactive motor unit potential classification using fuzzy k-NN classifier.

    PubMed

    Rasheed, Sarbast; Stashuk, Daniel; Kamel, Mohamed

    2008-01-01

    We present an interactive software package for implementing the supervised classification task during electromyographic (EMG) signal decomposition process using a fuzzy k-NN classifier and utilizing the MATLAB high-level programming language and its interactive environment. The method employs an assertion-based classification that takes into account a combination of motor unit potential (MUP) shapes and two modes of use of motor unit firing pattern information: the passive and the active modes. The developed package consists of several graphical user interfaces used to detect individual MUP waveforms from a raw EMG signal, extract relevant features, and classify the MUPs into motor unit potential trains (MUPTs) using assertion-based classifiers.

  9. Spatial Mutual Information Based Hyperspectral Band Selection for Classification

    PubMed Central

    2015-01-01

    The amount of information involved in hyperspectral imaging is large. Hyperspectral band selection is a popular method for reducing dimensionality. Several information based measures such as mutual information have been proposed to reduce information redundancy among spectral bands. Unfortunately, mutual information does not take into account the spatial dependency between adjacent pixels in images thus reducing its robustness as a similarity measure. In this paper, we propose a new band selection method based on spatial mutual information. As validation criteria, a supervised classification method using support vector machine (SVM) is used. Experimental results of the classification of hyperspectral datasets show that the proposed method can achieve more accurate results. PMID:25918742

  10. Computer-implemented land use classification with pattern recognition software and ERTS digital data. [Mississippi coastal plains

    NASA Technical Reports Server (NTRS)

    Joyce, A. T.

    1974-01-01

    Significant progress has been made in the classification of surface conditions (land uses) with computer-implemented techniques based on the use of ERTS digital data and pattern recognition software. The supervised technique presently used at the NASA Earth Resources Laboratory is based on maximum likelihood ratioing with a digital table look-up approach to classification. After classification, colors are assigned to the various surface conditions (land uses) classified, and the color-coded classification is film recorded on either positive or negative 9 1/2 in. film at the scale desired. Prints of the film strips are then mosaicked and photographed to produce a land use map in the format desired. Computer extraction of statistical information is performed to show the extent of each surface condition (land use) within any given land unit that can be identified in the image. Evaluations of the product indicate that classification accuracy is well within the limits for use by land resource managers and administrators. Classifications performed with digital data acquired during different seasons indicate that the combination of two or more classifications offer even better accuracy.

  11. 9 CFR 145.23 - Terminology and classification; flocks and products.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... the following terms and the corresponding designs illustrated in § 145.10: (a) [Reserved] (b) U.S... from flocks that met equivalent requirements under official supervision; and (iii) The flock is located... from U.S. Pullorum-Typhoid Clean breeding flocks or from flocks that met equivalent requirements under...

  12. Evaluating unsupervised and supervised image classification methods for mapping cotton root rot

    USDA-ARS?s Scientific Manuscript database

    Cotton root rot, caused by the soilborne fungus Phymatotrichopsis omnivora, is one of the most destructive plant diseases occurring throughout the southwestern United States. This disease has plagued the cotton industry for over a century, but effective practices for its control are still lacking. R...

  13. Collected Notes on the Workshop for Pattern Discovery in Large Databases

    NASA Technical Reports Server (NTRS)

    Buntine, Wray (Editor); Delalto, Martha (Editor)

    1991-01-01

    These collected notes are a record of material presented at the Workshop. The core data analysis is addressed that have traditionally required statistical or pattern recognition techniques. Some of the core tasks include classification, discrimination, clustering, supervised and unsupervised learning, discovery and diagnosis, i.e., general pattern discovery.

  14. 2009 ESTCP UXO Discrimination Study, San Luis Obispo, CA

    DTIC Science & Technology

    2010-11-01

    SUPERVISED LEARNING . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.5 ACTIVE LEARNING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8...PERFORMANCE . . . . . . . . . . . . . . . . 29 7.2 ACTIVE LEARNING CLASSIFICATION PERFORMANCE . . . . . . . . . . . 30 8 COST ASSESSMENT 32 9... learning on EM61-array and TEMTADS data. During active learning , SIG started with no a priori labeled data, and acquired labels for a small subset that

  15. Mapping of rock types using a joint approach by combining the multivariate statistics, self-organizing map and Bayesian neural networks: an example from IODP 323 site

    NASA Astrophysics Data System (ADS)

    Karmakar, Mampi; Maiti, Saumen; Singh, Amrita; Ojha, Maheswar; Maity, Bhabani Sankar

    2017-07-01

    Modeling and classification of the subsurface lithology is very important to understand the evolution of the earth system. However, precise classification and mapping of lithology using a single framework are difficult due to the complexity and the nonlinearity of the problem driven by limited core sample information. Here, we implement a joint approach by combining the unsupervised and the supervised methods in a single framework for better classification and mapping of rock types. In the unsupervised method, we use the principal component analysis (PCA), K-means cluster analysis (K-means), dendrogram analysis, Fuzzy C-means (FCM) cluster analysis and self-organizing map (SOM). In the supervised method, we use the Bayesian neural networks (BNN) optimized by the Hybrid Monte Carlo (HMC) (BNN-HMC) and the scaled conjugate gradient (SCG) (BNN-SCG) techniques. We use P-wave velocity, density, neutron porosity, resistivity and gamma ray logs of the well U1343E of the Integrated Ocean Drilling Program (IODP) Expedition 323 in the Bering Sea slope region. While the SOM algorithm allows us to visualize the clustering results in spatial domain, the combined classification schemes (supervised and unsupervised) uncover the different patterns of lithology such of as clayey-silt, diatom-silt and silty-clay from an un-cored section of the drilled hole. In addition, the BNN approach is capable of estimating uncertainty in the predictive modeling of three types of rocks over the entire lithology section at site U1343. Alternate succession of clayey-silt, diatom-silt and silty-clay may be representative of crustal inhomogeneity in general and thus could be a basis for detail study related to the productivity of methane gas in the oceans worldwide. Moreover, at the 530 m depth down below seafloor (DSF), the transition from Pliocene to Pleistocene could be linked to lithological alternation between the clayey-silt and the diatom-silt. The present results could provide the basis for the detailed study to get deeper insight into the Bering Sea' sediment deposition and sequence.

  16. Can segmentation evaluation metric be used as an indicator of land cover classification accuracy?

    NASA Astrophysics Data System (ADS)

    Švab Lenarčič, Andreja; Đurić, Nataša; Čotar, Klemen; Ritlop, Klemen; Oštir, Krištof

    2016-10-01

    It is a broadly established belief that the segmentation result significantly affects subsequent image classification accuracy. However, the actual correlation between the two has never been evaluated. Such an evaluation would be of considerable importance for any attempts to automate the object-based classification process, as it would reduce the amount of user intervention required to fine-tune the segmentation parameters. We conducted an assessment of segmentation and classification by analyzing 100 different segmentation parameter combinations, 3 classifiers, 5 land cover classes, 20 segmentation evaluation metrics, and 7 classification accuracy measures. The reliability definition of segmentation evaluation metrics as indicators of land cover classification accuracy was based on the linear correlation between the two. All unsupervised metrics that are not based on number of segments have a very strong correlation with all classification measures and are therefore reliable as indicators of land cover classification accuracy. On the other hand, correlation at supervised metrics is dependent on so many factors that it cannot be trusted as a reliable classification quality indicator. Algorithms for land cover classification studied in this paper are widely used; therefore, presented results are applicable to a wider area.

  17. LANDSAT data for coastal zone management. [New Jersey

    NASA Technical Reports Server (NTRS)

    Mckenzie, S.

    1981-01-01

    The lack of adequate, current data on land and water surface conditions in New Jersey led to the search for better data collections and analysis techniques. Four-channel MSS data of Cape May County and access to the OSER computer interpretation system were provided by NASA. The spectral resolution of the data was tested and a surface cover map was produced by going through the steps of supervised classification. Topics covered include classification; change detection and improvement of spectral and spatial resolution; merging LANDSAT and map data; and potential applications for New Jersey.

  18. Classification of JERS-1 Image Mosaic of Central Africa Using A Supervised Multiscale Classifier of Texture Features

    NASA Technical Reports Server (NTRS)

    Saatchi, Sassan; DeGrandi, Franco; Simard, Marc; Podest, Erika

    1999-01-01

    In this paper, a multiscale approach is introduced to classify the Japanese Research Satellite-1 (JERS-1) mosaic image over the Central African rainforest. A series of texture maps are generated from the 100 m mosaic image at various scales. Using a quadtree model and relating classes at each scale by a Markovian relationship, the multiscale images are classified from course to finer scale. The results are verified at various scales and the evolution of classification is monitored by calculating the error at each stage.

  19. Natural resources inventory and land evaluation in Switzerland

    NASA Technical Reports Server (NTRS)

    Haefner, H. (Principal Investigator)

    1976-01-01

    The author has identified the following significant results. Using MSS channels 5 and 7 and a supervised classification system with a PPD classification algorithm, it was possible to map the exact areal extent of the snow cover and of the transition zone with melting snow patches and snow free parts of various sizes over a large area under different aspects such as relief, exposure, shadows etc. A correlation of the data from ground control, areal underflights and earth resources satellites provided a very accurate interpretation of the melting procedure of snow in high mountains.

  20. Quantum Support Vector Machine for Big Data Classification

    NASA Astrophysics Data System (ADS)

    Rebentrost, Patrick; Mohseni, Masoud; Lloyd, Seth

    2014-09-01

    Supervised machine learning is the classification of new data based on already classified training examples. In this work, we show that the support vector machine, an optimized binary classifier, can be implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number of training examples. In cases where classical sampling algorithms require polynomial time, an exponential speedup is obtained. At the core of this quantum big data algorithm is a nonsparse matrix exponentiation technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix.

  1. Mediating toxic emotions in the workplace--the impact of abusive supervision.

    PubMed

    Chu, Li-Chuan

    2014-11-01

    This study explores whether abusive supervision can effectively predict employees' counterproductive work behaviour (CWB) and organisational citizenship behaviour (OCB) and the role of toxic emotions at work as a potential mediator of these relationships in nursing settings. Workplace bullying is widespread in nursing. Despite the growing literature on abusive supervision and employees' counterproductive work behaviour and organisational citizenship behaviour, few studies have examined the relationships between abusive supervision and these work behaviours from the viewpoint of the victimed employee's emotion process. This study adopted a two-stage survey of 212 nurses, all of whom were employed by hospitals in Taiwan. Hypotheses were tested through the use of hierarchical multiple regression. The results showed that abusive supervision was positively associated with toxic emotions. Moreover, toxic emotions could effectively predict nurses' counterproductive work behaviour and organisational citizenship behaviour. Finally, it was found that toxic emotions partially mediated the negative effects of abusive supervision on both work behaviours. Toxic emotions at work are a critical mediating variable between abusive supervision and both counterproductive work behaviour and organisational citizenship behaviour. Hospital administrators can implement policies designed to manage events effectively that can spark toxic emotions in their employees. Work empowerment may be an effective way to reduce counterproductive work behaviour and to enhance organisational citizenship behaviour among nurses when supervisors do not promote a healthy work environment for them. © 2013 John Wiley & Sons Ltd.

  2. The DSFPN, a new neural network for optical character recognition.

    PubMed

    Morns, L P; Dlay, S S

    1999-01-01

    A new type of neural network for recognition tasks is presented in this paper. The network, called the dynamic supervised forward-propagation network (DSFPN), is based on the forward only version of the counterpropagation network (CPN). The DSFPN, trains using a supervised algorithm and can grow dynamically during training, allowing subclasses in the training data to be learnt in an unsupervised manner. It is shown to train in times comparable to the CPN while giving better classification accuracies than the popular backpropagation network. Both Fourier descriptors and wavelet descriptors are used for image preprocessing and the wavelets are proven to give a far better performance.

  3. Learning relevant features of data with multi-scale tensor networks

    NASA Astrophysics Data System (ADS)

    Miles Stoudenmire, E.

    2018-07-01

    Inspired by coarse-graining approaches used in physics, we show how similar algorithms can be adapted for data. The resulting algorithms are based on layered tree tensor networks and scale linearly with both the dimension of the input and the training set size. Computing most of the layers with an unsupervised algorithm, then optimizing just the top layer for supervised classification of the MNIST and fashion MNIST data sets gives very good results. We also discuss mixing a prior guess for supervised weights together with an unsupervised representation of the data, yielding a smaller number of features nevertheless able to give good performance.

  4. CNN universal machine as classificaton platform: an art-like clustering algorithm.

    PubMed

    Bálya, David

    2003-12-01

    Fast and robust classification of feature vectors is a crucial task in a number of real-time systems. A cellular neural/nonlinear network universal machine (CNN-UM) can be very efficient as a feature detector. The next step is to post-process the results for object recognition. This paper shows how a robust classification scheme based on adaptive resonance theory (ART) can be mapped to the CNN-UM. Moreover, this mapping is general enough to include different types of feed-forward neural networks. The designed analogic CNN algorithm is capable of classifying the extracted feature vectors keeping the advantages of the ART networks, such as robust, plastic and fault-tolerant behaviors. An analogic algorithm is presented for unsupervised classification with tunable sensitivity and automatic new class creation. The algorithm is extended for supervised classification. The presented binary feature vector classification is implemented on the existing standard CNN-UM chips for fast classification. The experimental evaluation shows promising performance after 100% accuracy on the training set.

  5. Current trends in geomorphological mapping

    NASA Astrophysics Data System (ADS)

    Seijmonsbergen, A. C.

    2012-04-01

    Geomorphological mapping is a world currently in motion, driven by technological advances and the availability of new high resolution data. As a consequence, classic (paper) geomorphological maps which were the standard for more than 50 years are rapidly being replaced by digital geomorphological information layers. This is witnessed by the following developments: 1. the conversion of classic paper maps into digital information layers, mainly performed in a digital mapping environment such as a Geographical Information System, 2. updating the location precision and the content of the converted maps, by adding more geomorphological details, taken from high resolution elevation data and/or high resolution image data, 3. (semi) automated extraction and classification of geomorphological features from digital elevation models, broadly separated into unsupervised and supervised classification techniques and 4. New digital visualization / cartographic techniques and reading interfaces. Newly digital geomorphological information layers can be based on manual digitization of polygons using DEMs and/or aerial photographs, or prepared through (semi) automated extraction and delineation of geomorphological features. DEMs are often used as basis to derive Land Surface Parameter information which is used as input for (un) supervised classification techniques. Especially when using high-res data, object-based classification is used as an alternative to traditional pixel-based classifications, to cluster grid cells into homogeneous objects, which can be classified as geomorphological features. Classic map content can also be used as training material for the supervised classification of geomorphological features. In the classification process, rule-based protocols, including expert-knowledge input, are used to map specific geomorphological features or entire landscapes. Current (semi) automated classification techniques are increasingly able to extract morphometric, hydrological, and in the near future also morphogenetic information. As a result, these new opportunities have changed the workflows for geomorphological mapmaking, and their focus have shifted from field-based techniques to using more computer-based techniques: for example, traditional pre-field air-photo based maps are now replaced by maps prepared in a digital mapping environment, and designated field visits using mobile GIS / digital mapping devices now focus on gathering location information and attribute inventories and are strongly time efficient. The resulting 'modern geomorphological maps' are digital collections of geomorphological information layers consisting of georeferenced vector, raster and tabular data which are stored in a digital environment such as a GIS geodatabase, and are easily visualized as e.g. 'birds' eye' views, as animated 3D displays, on virtual globes, or stored as GeoPDF maps in which georeferenced attribute information can be easily exchanged over the internet. Digital geomorphological information layers are increasingly accessed via web-based services distributed through remote servers. Information can be consulted - or even build using remote geoprocessing servers - by the end user. Therefore, it will not only be the geomorphologist anymore, but also the professional end user that dictates the applied use of digital geomorphological information layers.

  6. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection.

    PubMed

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports.

  7. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection

    PubMed Central

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports. PMID:28166263

  8. The fragmented nature of tundra landscape

    NASA Astrophysics Data System (ADS)

    Virtanen, Tarmo; Ek, Malin

    2014-04-01

    The vegetation and land cover structure of tundra areas is fragmented when compared to other biomes. Thus, satellite images of high resolution are required for producing land cover classifications, in order to reveal the actual distribution of land cover types across these large and remote areas. We produced and compared different land cover classifications using three satellite images (QuickBird, Aster and Landsat TM5) with different pixel sizes (2.4 m, 15 m and 30 m pixel size, respectively). The study area, in north-eastern European Russia, was visited in July 2007 to obtain ground reference data. The QuickBird image was classified using supervised segmentation techniques, while the Aster and Landsat TM5 images were classified using a pixel-based supervised classification method. The QuickBird classification showed the highest accuracy when tested against field data, while the Aster image was generally more problematic to classify than the Landsat TM5 image. Use of smaller pixel sized images distinguished much greater levels of landscape fragmentation. The overall mean patch sizes in the QuickBird, Aster, and Landsat TM5-classifications were 871 m2, 2141 m2 and 7433 m2, respectively. In the QuickBird classification, the mean patch size of all the tundra and peatland vegetation classes was smaller than one pixel of the Landsat TM5 image. Water bodies and fens in particular occur in the landscape in small or elongated patches, and thus cannot be realistically classified from larger pixel sized images. Land cover patterns vary considerably at such a fine-scale, so that a lot of information is lost if only medium resolution satellite images are used. It is crucial to know the amount and spatial distribution of different vegetation types in arctic landscapes, as carbon dynamics and other climate related physical, geological and biological processes are known to vary greatly between vegetation types.

  9. Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription Medication Abuse from Twitter.

    PubMed

    Sarker, Abeed; O'Connor, Karen; Ginn, Rachel; Scotch, Matthew; Smith, Karen; Malone, Dan; Gonzalez, Graciela

    2016-03-01

    Prescription medication overdose is the fastest growing drug-related problem in the USA. The growing nature of this problem necessitates the implementation of improved monitoring strategies for investigating the prevalence and patterns of abuse of specific medications. Our primary aims were to assess the possibility of utilizing social media as a resource for automatic monitoring of prescription medication abuse and to devise an automatic classification technique that can identify potentially abuse-indicating user posts. We collected Twitter user posts (tweets) associated with three commonly abused medications (Adderall(®), oxycodone, and quetiapine). We manually annotated 6400 tweets mentioning these three medications and a control medication (metformin) that is not the subject of abuse due to its mechanism of action. We performed quantitative and qualitative analyses of the annotated data to determine whether posts on Twitter contain signals of prescription medication abuse. Finally, we designed an automatic supervised classification technique to distinguish posts containing signals of medication abuse from those that do not and assessed the utility of Twitter in investigating patterns of abuse over time. Our analyses show that clear signals of medication abuse can be drawn from Twitter posts and the percentage of tweets containing abuse signals are significantly higher for the three case medications (Adderall(®): 23 %, quetiapine: 5.0 %, oxycodone: 12 %) than the proportion for the control medication (metformin: 0.3 %). Our automatic classification approach achieves 82 % accuracy overall (medication abuse class recall: 0.51, precision: 0.41, F measure: 0.46). To illustrate the utility of automatic classification, we show how the classification data can be used to analyze abuse patterns over time. Our study indicates that social media can be a crucial resource for obtaining abuse-related information for medications, and that automatic approaches involving supervised classification and natural language processing hold promises for essential future monitoring and intervention tasks.

  10. Classification of symmetry-protected phases for interacting fermions in two dimensions

    NASA Astrophysics Data System (ADS)

    Cheng, Meng; Bi, Zhen; You, Yi-Zhuang; Gu, Zheng-Cheng

    2018-05-01

    Recently, it has been established that two-dimensional bosonic symmetry-protected topological (SPT) phases with on-site unitary symmetry G can be completely classified by the group cohomology H3( G ,U (1 ) ) . Later, group supercohomology was proposed as a partial classification for SPT phases of interacting fermions. In this work, we revisit this problem based on the algebraic theory of symmetry and defects in two-dimensional topological phases. We reproduce the partial classifications given by group supercohomology, and we also show that with an additional H1(G ,Z2) structure, a complete classification of SPT phases for two-dimensional interacting fermion systems with a total symmetry group G ×Z2f is obtained. We also discuss the classification of interacting fermionic SPT phases protected by time-reversal symmetry.

  11. Hyperspectral Image Classification With Markov Random Fields and a Convolutional Neural Network

    NASA Astrophysics Data System (ADS)

    Cao, Xiangyong; Zhou, Feng; Xu, Lin; Meng, Deyu; Xu, Zongben; Paisley, John

    2018-05-01

    This paper presents a new supervised classification algorithm for remotely sensed hyperspectral image (HSI) which integrates spectral and spatial information in a unified Bayesian framework. First, we formulate the HSI classification problem from a Bayesian perspective. Then, we adopt a convolutional neural network (CNN) to learn the posterior class distributions using a patch-wise training strategy to better use the spatial information. Next, spatial information is further considered by placing a spatial smoothness prior on the labels. Finally, we iteratively update the CNN parameters using stochastic gradient decent (SGD) and update the class labels of all pixel vectors using an alpha-expansion min-cut-based algorithm. Compared with other state-of-the-art methods, the proposed classification method achieves better performance on one synthetic dataset and two benchmark HSI datasets in a number of experimental settings.

  12. Support Vector Machines for Hyperspectral Remote Sensing Classification

    NASA Technical Reports Server (NTRS)

    Gualtieri, J. Anthony; Cromp, R. F.

    1998-01-01

    The Support Vector Machine provides a new way to design classification algorithms which learn from examples (supervised learning) and generalize when applied to new data. We demonstrate its success on a difficult classification problem from hyperspectral remote sensing, where we obtain performances of 96%, and 87% correct for a 4 class problem, and a 16 class problem respectively. These results are somewhat better than other recent results on the same data. A key feature of this classifier is its ability to use high-dimensional data without the usual recourse to a feature selection step to reduce the dimensionality of the data. For this application, this is important, as hyperspectral data consists of several hundred contiguous spectral channels for each exemplar. We provide an introduction to this new approach, and demonstrate its application to classification of an agriculture scene.

  13. The Danish Fracture Database can monitor quality of fracture-related surgery, surgeons' experience level and extent of supervision.

    PubMed

    Andersen, Morten Jon; Gromov, Kiril; Brix, Michael; Troelsen, Anders

    2014-06-01

    The importance of supervision and of surgeons' level of experience in relation to patient outcome have been demonstrated in both hip fracture and arthroplasty surgery. The aim of this study was to describe the surgeons' experience level and the extent of supervision for: 1) fracture-related surgery in general; 2) the three most frequent primary operations and reoperations; and 3) primary operations during and outside regular working hours. A total of 9,767 surgical procedures were identified from the Danish Fracture Database (DFDB). Procedures were grouped based on the surgeons' level of experience, extent of supervision, type (primary, planned secondary or reoperation), classification (AO Müller), and whether they were performed during or outside regular hours. Interns and junior residents combined performed 46% of all procedures. A total of 90% of surgeries by interns were performed under supervision, whereas 32% of operations by junior residents were unsupervised. Supervision was absent in 14-16% and 22-33% of the three most frequent primary procedures and reoperations when performed by interns and junior residents, respectively. The proportion of unsupervised procedures by junior residents grew from 30% during to 40% (p < 0.001) outside regular hours. Interns and junior residents together performed almost half of all fracture-related surgery. The extent of supervision was generally high; however, a third of the primary procedures performed by junior residents were unsupervised. The extent of unsupervised surgery performed by junior residents was significantly higher outside regular hours. not relevant. The Danish Fracture Database ("Dansk Frakturdatabase") was approved by the Danish Data Protection Agency ID: 01321.

  14. An extension of the receiver operating characteristic curve and AUC-optimal classification.

    PubMed

    Takenouchi, Takashi; Komori, Osamu; Eguchi, Shinto

    2012-10-01

    While most proposed methods for solving classification problems focus on minimization of the classification error rate, we are interested in the receiver operating characteristic (ROC) curve, which provides more information about classification performance than the error rate does. The area under the ROC curve (AUC) is a natural measure for overall assessment of a classifier based on the ROC curve. We discuss a class of concave functions for AUC maximization in which a boosting-type algorithm including RankBoost is considered, and the Bayesian risk consistency and the lower bound of the optimum function are discussed. A procedure derived by maximizing a specific optimum function has high robustness, based on gross error sensitivity. Additionally, we focus on the partial AUC, which is the partial area under the ROC curve. For example, in medical screening, a high true-positive rate to the fixed lower false-positive rate is preferable and thus the partial AUC corresponding to lower false-positive rates is much more important than the remaining AUC. We extend the class of concave optimum functions for partial AUC optimality with the boosting algorithm. We investigated the validity of the proposed method through several experiments with data sets in the UCI repository.

  15. Automated simultaneous multiple feature classification of MTI data

    NASA Astrophysics Data System (ADS)

    Harvey, Neal R.; Theiler, James P.; Balick, Lee K.; Pope, Paul A.; Szymanski, John J.; Perkins, Simon J.; Porter, Reid B.; Brumby, Steven P.; Bloch, Jeffrey J.; David, Nancy A.; Galassi, Mark C.

    2002-08-01

    Los Alamos National Laboratory has developed and demonstrated a highly capable system, GENIE, for the two-class problem of detecting a single feature against a background of non-feature. In addition to the two-class case, however, a commonly encountered remote sensing task is the segmentation of multispectral image data into a larger number of distinct feature classes or land cover types. To this end we have extended our existing system to allow the simultaneous classification of multiple features/classes from multispectral data. The technique builds on previous work and its core continues to utilize a hybrid evolutionary-algorithm-based system capable of searching for image processing pipelines optimized for specific image feature extraction tasks. We describe the improvements made to the GENIE software to allow multiple-feature classification and describe the application of this system to the automatic simultaneous classification of multiple features from MTI image data. We show the application of the multiple-feature classification technique to the problem of classifying lava flows on Mauna Loa volcano, Hawaii, using MTI image data and compare the classification results with standard supervised multiple-feature classification techniques.

  16. Malay sentiment analysis based on combined classification approaches and Senti-lexicon algorithm.

    PubMed

    Al-Saffar, Ahmed; Awang, Suryanti; Tao, Hai; Omar, Nazlia; Al-Saiagh, Wafaa; Al-Bared, Mohammed

    2018-01-01

    Sentiment analysis techniques are increasingly exploited to categorize the opinion text to one or more predefined sentiment classes for the creation and automated maintenance of review-aggregation websites. In this paper, a Malay sentiment analysis classification model is proposed to improve classification performances based on the semantic orientation and machine learning approaches. First, a total of 2,478 Malay sentiment-lexicon phrases and words are assigned with a synonym and stored with the help of more than one Malay native speaker, and the polarity is manually allotted with a score. In addition, the supervised machine learning approaches and lexicon knowledge method are combined for Malay sentiment classification with evaluating thirteen features. Finally, three individual classifiers and a combined classifier are used to evaluate the classification accuracy. In experimental results, a wide-range of comparative experiments is conducted on a Malay Reviews Corpus (MRC), and it demonstrates that the feature extraction improves the performance of Malay sentiment analysis based on the combined classification. However, the results depend on three factors, the features, the number of features and the classification approach.

  17. Malay sentiment analysis based on combined classification approaches and Senti-lexicon algorithm

    PubMed Central

    Awang, Suryanti; Tao, Hai; Omar, Nazlia; Al-Saiagh, Wafaa; Al-bared, Mohammed

    2018-01-01

    Sentiment analysis techniques are increasingly exploited to categorize the opinion text to one or more predefined sentiment classes for the creation and automated maintenance of review-aggregation websites. In this paper, a Malay sentiment analysis classification model is proposed to improve classification performances based on the semantic orientation and machine learning approaches. First, a total of 2,478 Malay sentiment-lexicon phrases and words are assigned with a synonym and stored with the help of more than one Malay native speaker, and the polarity is manually allotted with a score. In addition, the supervised machine learning approaches and lexicon knowledge method are combined for Malay sentiment classification with evaluating thirteen features. Finally, three individual classifiers and a combined classifier are used to evaluate the classification accuracy. In experimental results, a wide-range of comparative experiments is conducted on a Malay Reviews Corpus (MRC), and it demonstrates that the feature extraction improves the performance of Malay sentiment analysis based on the combined classification. However, the results depend on three factors, the features, the number of features and the classification approach. PMID:29684036

  18. Simple adaptive sparse representation based classification schemes for EEG based brain-computer interface applications.

    PubMed

    Shin, Younghak; Lee, Seungchan; Ahn, Minkyu; Cho, Hohyun; Jun, Sung Chan; Lee, Heung-No

    2015-11-01

    One of the main problems related to electroencephalogram (EEG) based brain-computer interface (BCI) systems is the non-stationarity of the underlying EEG signals. This results in the deterioration of the classification performance during experimental sessions. Therefore, adaptive classification techniques are required for EEG based BCI applications. In this paper, we propose simple adaptive sparse representation based classification (SRC) schemes. Supervised and unsupervised dictionary update techniques for new test data and a dictionary modification method by using the incoherence measure of the training data are investigated. The proposed methods are very simple and additional computation for the re-training of the classifier is not needed. The proposed adaptive SRC schemes are evaluated using two BCI experimental datasets. The proposed methods are assessed by comparing classification results with the conventional SRC and other adaptive classification methods. On the basis of the results, we find that the proposed adaptive schemes show relatively improved classification accuracy as compared to conventional methods without requiring additional computation. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Bayesian classification for the selection of in vitro human embryos using morphological and clinical data.

    PubMed

    Morales, Dinora Araceli; Bengoetxea, Endika; Larrañaga, Pedro; García, Miguel; Franco, Yosu; Fresnada, Mónica; Merino, Marisa

    2008-05-01

    In vitro fertilization (IVF) is a medically assisted reproduction technique that enables infertile couples to achieve successful pregnancy. Given the uncertainty of the treatment, we propose an intelligent decision support system based on supervised classification by Bayesian classifiers to aid to the selection of the most promising embryos that will form the batch to be transferred to the woman's uterus. The aim of the supervised classification system is to improve overall success rate of each IVF treatment in which a batch of embryos is transferred each time, where the success is achieved when implantation (i.e. pregnancy) is obtained. Due to ethical reasons, different legislative restrictions apply in every country on this technique. In Spain, legislation allows a maximum of three embryos to form each transfer batch. As a result, clinicians prefer to select the embryos by non-invasive embryo examination based on simple methods and observation focused on morphology and dynamics of embryo development after fertilization. This paper proposes the application of Bayesian classifiers to this embryo selection problem in order to provide a decision support system that allows a more accurate selection than with the actual procedures which fully rely on the expertise and experience of embryologists. For this, we propose to take into consideration a reduced subset of feature variables related to embryo morphology and clinical data of patients, and from this data to induce Bayesian classification models. Results obtained applying a filter technique to choose the subset of variables, and the performance of Bayesian classifiers using them, are presented.

  20. Polarimetry based partial least square classification of ex vivo healthy and basal cell carcinoma human skin tissues.

    PubMed

    Ahmad, Iftikhar; Ahmad, Manzoor; Khan, Karim; Ikram, Masroor

    2016-06-01

    Optical polarimetry was employed for assessment of ex vivo healthy and basal cell carcinoma (BCC) tissue samples from human skin. Polarimetric analyses revealed that depolarization and retardance for healthy tissue group were significantly higher (p<0.001) compared to BCC tissue group. Histopathology indicated that these differences partially arise from BCC-related characteristic changes in tissue morphology. Wilks lambda statistics demonstrated the potential of all investigated polarimetric properties for computer assisted classification of the two tissue groups. Based on differences in polarimetric properties, partial least square (PLS) regression classified the samples with 100% accuracy, sensitivity and specificity. These findings indicate that optical polarimetry together with PLS statistics hold promise for automated pathology classification. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Nondestructive detection of chilling injury in cucumber fruit using hyperspectral imaging with feature selection and supervised classification

    USDA-ARS?s Scientific Manuscript database

    Chilling injury, as a physiological disorder in cucumbers, occurs after the fruit has been subjected to low temperatures. It is thus desirable to detect chilling injury at early stages and/or remove chilling injured cucumbers during sorting and grading. This research was aimed to apply hyperspectral...

  2. 9 CFR 145.83 - Terminology and classification; flocks and products.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ...-typhoid control under official supervision; (B) All hatchery supply flocks within the State are qualified as U.S. Pullorum-Typhoid Clean or have met equivalent requirements for pullorum-typhoid control under... laboratory and any group D Salmonella samples have been serotyped: (A) A 25-gram sample of meconium from the...

  3. 9 CFR 145.83 - Terminology and classification; flocks and products.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ...-typhoid control under official supervision; (B) All hatchery supply flocks within the State are qualified as U.S. Pullorum-Typhoid Clean or have met equivalent requirements for pullorum-typhoid control under... laboratory and any group D Salmonella samples have been serotyped: (A) A 25-gram sample of meconium from the...

  4. 9 CFR 145.83 - Terminology and classification; flocks and products.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ...-typhoid control under official supervision; (B) All hatchery supply flocks within the State are qualified as U.S. Pullorum-Typhoid Clean or have met equivalent requirements for pullorum-typhoid control under... laboratory and any group D Salmonella samples have been serotyped: (A) A 25-gram sample of meconium from the...

  5. Detecting Targeted Malicious Email through Supervised Classification of Persistent Threat and Recipient Oriented Features

    ERIC Educational Resources Information Center

    Amin, Rohan Mahesh

    2010-01-01

    Targeted email attacks to enable computer network exploitation have become more prevalent, more insidious, and more widely documented in recent years. Beyond nuisance spam or phishing designed to trick users into revealing personal information, targeted malicious email (TME) facilitates computer network exploitation and the gathering of sensitive…

  6. 28 CFR 523.2 - Good time credit for violators.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ..., CLASSIFICATION, AND TRANSFER COMPUTATION OF SENTENCE Good Time § 523.2 Good time credit for violators. (a) An... 28 Judicial Administration 2 2014-07-01 2014-07-01 false Good time credit for violators. 523.2... good time, upon being returned to custody for violation of supervised release, based on the number of...

  7. 28 CFR 523.2 - Good time credit for violators.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ..., CLASSIFICATION, AND TRANSFER COMPUTATION OF SENTENCE Good Time § 523.2 Good time credit for violators. (a) An... 28 Judicial Administration 2 2012-07-01 2012-07-01 false Good time credit for violators. 523.2... good time, upon being returned to custody for violation of supervised release, based on the number of...

  8. 28 CFR 523.2 - Good time credit for violators.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ..., CLASSIFICATION, AND TRANSFER COMPUTATION OF SENTENCE Good Time § 523.2 Good time credit for violators. (a) An... 28 Judicial Administration 2 2013-07-01 2013-07-01 false Good time credit for violators. 523.2... good time, upon being returned to custody for violation of supervised release, based on the number of...

  9. 28 CFR 523.2 - Good time credit for violators.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ..., CLASSIFICATION, AND TRANSFER COMPUTATION OF SENTENCE Good Time § 523.2 Good time credit for violators. (a) An... 28 Judicial Administration 2 2011-07-01 2011-07-01 false Good time credit for violators. 523.2... good time, upon being returned to custody for violation of supervised release, based on the number of...

  10. 28 CFR 523.2 - Good time credit for violators.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ..., CLASSIFICATION, AND TRANSFER COMPUTATION OF SENTENCE Good Time § 523.2 Good time credit for violators. (a) An... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Good time credit for violators. 523.2... good time, upon being returned to custody for violation of supervised release, based on the number of...

  11. Analysing the Metaphorical Images of Turkish Preschool Teachers

    ERIC Educational Resources Information Center

    Kabadayi, Abdulkadir

    2008-01-01

    The metaphorical basis of teacher reflection about teaching and learning has been a rich area of theory and research. This is a study of metaphor as a shared system of interpretation and classification, which teachers and student teachers and their supervising teachers can cooperatively explore. This study employs metaphor as a means of research…

  12. A comparison of fitness-case sampling methods for genetic programming

    NASA Astrophysics Data System (ADS)

    Martínez, Yuliana; Naredo, Enrique; Trujillo, Leonardo; Legrand, Pierrick; López, Uriel

    2017-11-01

    Genetic programming (GP) is an evolutionary computation paradigm for automatic program induction. GP has produced impressive results but it still needs to overcome some practical limitations, particularly its high computational cost, overfitting and excessive code growth. Recently, many researchers have proposed fitness-case sampling methods to overcome some of these problems, with mixed results in several limited tests. This paper presents an extensive comparative study of four fitness-case sampling methods, namely: Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and Keep-Worst Interleaved Sampling. The algorithms are compared on 11 symbolic regression problems and 11 supervised classification problems, using 10 synthetic benchmarks and 12 real-world data-sets. They are evaluated based on test performance, overfitting and average program size, comparing them with a standard GP search. Comparisons are carried out using non-parametric multigroup tests and post hoc pairwise statistical tests. The experimental results suggest that fitness-case sampling methods are particularly useful for difficult real-world symbolic regression problems, improving performance, reducing overfitting and limiting code growth. On the other hand, it seems that fitness-case sampling cannot improve upon GP performance when considering supervised binary classification.

  13. Faster Trees: Strategies for Accelerated Training and Prediction of Random Forests for Classification of Polsar Images

    NASA Astrophysics Data System (ADS)

    Hänsch, Ronny; Hellwich, Olaf

    2018-04-01

    Random Forests have continuously proven to be one of the most accurate, robust, as well as efficient methods for the supervised classification of images in general and polarimetric synthetic aperture radar data in particular. While the majority of previous work focus on improving classification accuracy, we aim for accelerating the training of the classifier as well as its usage during prediction while maintaining its accuracy. Unlike other approaches we mainly consider algorithmic changes to stay as much as possible independent of platform and programming language. The final model achieves an approximately 60 times faster training and a 500 times faster prediction, while the accuracy is only marginally decreased by roughly 1 %.

  14. (Machine) learning to do more with less

    NASA Astrophysics Data System (ADS)

    Cohen, Timothy; Freytsis, Marat; Ostdiek, Bryan

    2018-02-01

    Determining the best method for training a machine learning algorithm is critical to maximizing its ability to classify data. In this paper, we compare the standard "fully supervised" approach (which relies on knowledge of event-by-event truth-level labels) with a recent proposal that instead utilizes class ratios as the only discriminating information provided during training. This so-called "weakly supervised" technique has access to less information than the fully supervised method and yet is still able to yield impressive discriminating power. In addition, weak supervision seems particularly well suited to particle physics since quantum mechanics is incompatible with the notion of mapping an individual event onto any single Feynman diagram. We examine the technique in detail — both analytically and numerically — with a focus on the robustness to issues of mischaracterizing the training samples. Weakly supervised networks turn out to be remarkably insensitive to a class of systematic mismodeling. Furthermore, we demonstrate that the event level outputs for weakly versus fully supervised networks are probing different kinematics, even though the numerical quality metrics are essentially identical. This implies that it should be possible to improve the overall classification ability by combining the output from the two types of networks. For concreteness, we apply this technology to a signature of beyond the Standard Model physics to demonstrate that all these impressive features continue to hold in a scenario of relevance to the LHC. Example code is provided on GitHub.

  15. Robust evaluation of time series classification algorithms for structural health monitoring

    NASA Astrophysics Data System (ADS)

    Harvey, Dustin Y.; Worden, Keith; Todd, Michael D.

    2014-03-01

    Structural health monitoring (SHM) systems provide real-time damage and performance information for civil, aerospace, and mechanical infrastructure through analysis of structural response measurements. The supervised learning methodology for data-driven SHM involves computation of low-dimensional, damage-sensitive features from raw measurement data that are then used in conjunction with machine learning algorithms to detect, classify, and quantify damage states. However, these systems often suffer from performance degradation in real-world applications due to varying operational and environmental conditions. Probabilistic approaches to robust SHM system design suffer from incomplete knowledge of all conditions a system will experience over its lifetime. Info-gap decision theory enables nonprobabilistic evaluation of the robustness of competing models and systems in a variety of decision making applications. Previous work employed info-gap models to handle feature uncertainty when selecting various components of a supervised learning system, namely features from a pre-selected family and classifiers. In this work, the info-gap framework is extended to robust feature design and classifier selection for general time series classification through an efficient, interval arithmetic implementation of an info-gap data model. Experimental results are presented for a damage type classification problem on a ball bearing in a rotating machine. The info-gap framework in conjunction with an evolutionary feature design system allows for fully automated design of a time series classifier to meet performance requirements under maximum allowable uncertainty.

  16. Machine learning for the assessment of Alzheimer's disease through DTI

    NASA Astrophysics Data System (ADS)

    Lella, Eufemia; Amoroso, Nicola; Bellotti, Roberto; Diacono, Domenico; La Rocca, Marianna; Maggipinto, Tommaso; Monaco, Alfonso; Tangaro, Sabina

    2017-09-01

    Digital imaging techniques have found several medical applications in the development of computer aided detection systems, especially in neuroimaging. Recent advances in Diffusion Tensor Imaging (DTI) aim to discover biological markers for the early diagnosis of Alzheimer's disease (AD), one of the most widespread neurodegenerative disorders. We explore here how different supervised classification models provide a robust support to the diagnosis of AD patients. We use DTI measures, assessing the structural integrity of white matter (WM) fiber tracts, to reveal patterns of disrupted brain connectivity. In particular, we provide a voxel-wise measure of fractional anisotropy (FA) and mean diffusivity (MD), thus identifying the regions of the brain mostly affected by neurodegeneration, and then computing intensity features to feed supervised classification algorithms. In particular, we evaluate the accuracy of discrimination of AD patients from healthy controls (HC) with a dataset of 80 subjects (40 HC, 40 AD), from the Alzheimer's Disease Neurodegenerative Initiative (ADNI). In this study, we compare three state-of-the-art classification models: Random Forests, Naive Bayes and Support Vector Machines (SVMs). We use a repeated five-fold cross validation framework with nested feature selection to perform a fair comparison between these algorithms and evaluate the information content they provide. Results show that AD patterns are well localized within the brain, thus DTI features can support the AD diagnosis.

  17. A Vegetation Database for the Colorado River Ecosystem from Glen Canyon Dam to the Western Boundary of Grand Canyon National Park, Arizona

    USGS Publications Warehouse

    Ralston, Barbara E.; Davis, Philip A.; Weber, Robert M.; Rundall, Jill M.

    2008-01-01

    A vegetation database of the riparian vegetation located within the Colorado River ecosystem (CRE), a subsection of the Colorado River between Glen Canyon Dam and the western boundary of Grand Canyon National Park, was constructed using four-band image mosaics acquired in May 2002. A digital line scanner was flown over the Colorado River corridor in Arizona by ISTAR Americas, using a Leica ADS-40 digital camera to acquire a digital surface model and four-band image mosaics (blue, green, red, and near-infrared) for vegetation mapping. The primary objective of this mapping project was to develop a digital inventory map of vegetation to enable patch- and landscape-scale change detection, and to establish randomized sampling points for ground surveys of terrestrial fauna (principally, but not exclusively, birds). The vegetation base map was constructed through a combination of ground surveys to identify vegetation classes, image processing, and automated supervised classification procedures. Analysis of the imagery and subsequent supervised classification involved multiple steps to evaluate band quality, band ratios, and vegetation texture and density. Identification of vegetation classes involved collection of cover data throughout the river corridor and subsequent analysis using two-way indicator species analysis (TWINSPAN). Vegetation was classified into six vegetation classes, following the National Vegetation Classification Standard, based on cover dominance. This analysis indicated that total area covered by all vegetation within the CRE was 3,346 ha. Considering the six vegetation classes, the sparse shrub (SS) class accounted for the greatest amount of vegetation (627 ha) followed by Pluchea (PLSE) and Tamarix (TARA) at 494 and 366 ha, respectively. The wetland (WTLD) and Prosopis-Acacia (PRGL) classes both had similar areal cover values (227 and 213 ha, respectively). Baccharis-Salix (BAXX) was the least represented at 94 ha. Accuracy assessment of the supervised classification determined that accuracies varied among vegetation classes from 90% to 49%. Causes for low accuracies were similar spectral signatures among vegetation classes. Fuzzy accuracy assessment improved classification accuracies such that Federal mapping standards of 80% accuracies for all classes were met. The scale used to quantify vegetation adequately meets the needs of the stakeholder group. Increasing the scale to meet the U.S. Geological Survey (USGS)-National Park Service (NPS)National Mapping Program's minimum mapping unit of 0.5 ha is unwarranted because this scale would reduce the resolution of some classes (e.g., seep willow/coyote willow would likely be combined with tamarisk). While this would undoubtedly improve classification accuracies, it would not provide the community-level information about vegetation change that would benefit stakeholders. The identification of vegetation classes should follow NPS mapping approaches to complement the national effort and should incorporate the alternative analysis for community identification that is being incorporated into newer NPS mapping efforts. National Vegetation Classification is followed in this report for association- to formation-level categories. Accuracies could be improved by including more environmental variables such as stage elevation in the classification process and incorporating object-based classification methods. Another approach that may address the heterogeneous species issue and classification is to use spectral mixing analysis to estimate the fractional cover of species within each pixel and better quantify the cover of individual species that compose a cover class. Varying flights to capture vegetation at different times of the year might also help separate some vegetation classes, though the cost may be prohibitive. Lastly, photointerpretation instead of automated mapping could be tried. Photointerpretation would likely not improve accuracies in this case, howev

  18. Automatic classification of animal vocalizations

    NASA Astrophysics Data System (ADS)

    Clemins, Patrick J.

    2005-11-01

    Bioacoustics, the study of animal vocalizations, has begun to use increasingly sophisticated analysis techniques in recent years. Some common tasks in bioacoustics are repertoire determination, call detection, individual identification, stress detection, and behavior correlation. Each research study, however, uses a wide variety of different measured variables, called features, and classification systems to accomplish these tasks. The well-established field of human speech processing has developed a number of different techniques to perform many of the aforementioned bioacoustics tasks. Melfrequency cepstral coefficients (MFCCs) and perceptual linear prediction (PLP) coefficients are two popular feature sets. The hidden Markov model (HMM), a statistical model similar to a finite autonoma machine, is the most commonly used supervised classification model and is capable of modeling both temporal and spectral variations. This research designs a framework that applies models from human speech processing for bioacoustic analysis tasks. The development of the generalized perceptual linear prediction (gPLP) feature extraction model is one of the more important novel contributions of the framework. Perceptual information from the species under study can be incorporated into the gPLP feature extraction model to represent the vocalizations as the animals might perceive them. By including this perceptual information and modifying parameters of the HMM classification system, this framework can be applied to a wide range of species. The effectiveness of the framework is shown by analyzing African elephant and beluga whale vocalizations. The features extracted from the African elephant data are used as input to a supervised classification system and compared to results from traditional statistical tests. The gPLP features extracted from the beluga whale data are used in an unsupervised classification system and the results are compared to labels assigned by experts. The development of a framework from which to build animal vocalization classifiers will provide bioacoustics researchers with a consistent platform to analyze and classify vocalizations. A common framework will also allow studies to compare results across species and institutions. In addition, the use of automated classification techniques can speed analysis and uncover behavioral correlations not readily apparent using traditional techniques.

  19. Continuous measurement of breast tumour hormone receptor expression: a comparison of two computational pathology platforms.

    PubMed

    Ahern, Thomas P; Beck, Andrew H; Rosner, Bernard A; Glass, Ben; Frieling, Gretchen; Collins, Laura C; Tamimi, Rulla M

    2017-05-01

    Computational pathology platforms incorporate digital microscopy with sophisticated image analysis to permit rapid, continuous measurement of protein expression. We compared two computational pathology platforms on their measurement of breast tumour oestrogen receptor (ER) and progesterone receptor (PR) expression. Breast tumour microarrays from the Nurses' Health Study were stained for ER (n=592) and PR (n=187). One expert pathologist scored cases as positive if ≥1% of tumour nuclei exhibited stain. ER and PR were then measured with the Definiens Tissue Studio (automated) and Aperio Digital Pathology (user-supervised) platforms. Platform-specific measurements were compared using boxplots, scatter plots and correlation statistics. Classification of ER and PR positivity by platform-specific measurements was evaluated with areas under receiver operating characteristic curves (AUC) from univariable logistic regression models, using expert pathologist classification as the standard. Both platforms showed considerable overlap in continuous measurements of ER and PR between positive and negative groups classified by expert pathologist. Platform-specific measurements were strongly and positively correlated with one another (r≥0.77). The user-supervised Aperio workflow performed slightly better than the automated Definiens workflow at classifying ER positivity (AUC Aperio =0.97; AUC Definiens =0.90; difference=0.07, 95% CI 0.05 to 0.09) and PR positivity (AUC Aperio =0.94; AUC Definiens =0.87; difference=0.07, 95% CI 0.03 to 0.12). Paired hormone receptor expression measurements from two different computational pathology platforms agreed well with one another. The user-supervised workflow yielded better classification accuracy than the automated workflow. Appropriately validated computational pathology algorithms enrich molecular epidemiology studies with continuous protein expression data and may accelerate tumour biomarker discovery. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  20. Sleep in patients with disorders of consciousness characterized by means of machine learning

    PubMed Central

    Lechinger, Julia; Wislowska, Malgorzata; Blume, Christine; Ott, Peter; Wegenkittl, Stefan; del Giudice, Renata; Heib, Dominik P. J.; Mayer, Helmut A.; Laureys, Steven; Pichler, Gerald; Schabus, Manuel

    2018-01-01

    Sleep has been proposed to indicate preserved residual brain functioning in patients suffering from disorders of consciousness (DOC) after awakening from coma. However, a reliable characterization of sleep patterns in this clinical population continues to be challenging given severely altered brain oscillations, frequent and extended artifacts in clinical recordings and the absence of established staging criteria. In the present study, we try to address these issues and investigate the usefulness of a multivariate machine learning technique based on permutation entropy, a complexity measure. Specifically, we used long-term polysomnography (PSG), along with video recordings in day and night periods in a sample of 23 DOC; 12 patients were diagnosed as Unresponsive Wakefulness Syndrome (UWS) and 11 were diagnosed as Minimally Conscious State (MCS). Eight hour PSG recordings of healthy sleepers (N = 26) were additionally used for training and setting parameters of supervised and unsupervised model, respectively. In DOC, the supervised classification (wake, N1, N2, N3 or REM) was validated using simultaneous videos which identified periods with prolonged eye opening or eye closure.The supervised classification revealed that out of the 23 subjects, 11 patients (5 MCS and 6 UWS) yielded highly accurate classification with an average F1-score of 0.87 representing high overlap between the classifier predicting sleep (i.e. one of the 4 sleep stages) and closed eyes. Furthermore, the unsupervised approach revealed a more complex pattern of sleep-wake stages during the night period in the MCS group, as evidenced by the presence of several distinct clusters. In contrast, in UWS patients no such clustering was found. Altogether, we present a novel data-driven method, based on machine learning that can be used to gain new and unambiguous insights into sleep organization and residual brain functioning of patients with DOC. PMID:29293607

  1. Supervision of community health workers in Mozambique: a qualitative study of factors influencing motivation and programme implementation.

    PubMed

    Ndima, Sozinho Daniel; Sidat, Mohsin; Give, Celso; Ormel, Hermen; Kok, Maryse Catelijne; Taegtmeyer, Miriam

    2015-09-01

    Community health workers (CHWs) in Mozambique (known as Agentes Polivalentes Elementares (APEs)) are key actors in providing health services in rural communities. Supervision of CHWs has been shown to improve their work, although details of how it is implemented are scarce. In Mozambique, APE supervision structures and scope of work are clearly outlined in policy and rely on supervisors at the health facility of reference. The aim of this study was to understand how and which aspects of supervision impact on APE motivation and programme implementation. Qualitative research methodologies were used. Twenty-nine in-depth interviews were conducted to capture experiences and perceptions of purposefully selected participants. These included APEs, health facility supervisors, district APE supervisors and community leaders. Interviews were recorded, translated and transcribed, prior to the development of a thematic framework. Supervision was structured as dictated by policy but in practice was irregular and infrequent, which participants identified as affecting APE's motivation. When it did occur, supervision was felt to focus more on fault-finding than being supportive in nature and did not address all areas of APE's work - factors that APEs identified as demotivating. Supervisors, in turn, felt unsupported and felt this negatively impacted performance. They had a high workload in health facilities, where they had multiple roles, including provision of health services, taking care of administrative issues and supervising APEs in communities. A lack of resources for supervision activities was identified, and supervisors felt caught up in administrative issues around APE allowances that they were unable to solve. Many supervisors were not trained in providing supportive supervision. Community governance and accountability mechanisms were only partially able to fill the gaps left by the supervision provided by the health system. The findings indicate the need for an improved supervision system to enhance support and motivation and ultimately performance of APEs. Our study found disconnections between the APE programme policy and its implementation, with gaps in skills, training and support of supervisors leading to sub-optimal supervision. Improved methods of supervision could be implemented including those that maximize the opportunities during face-to-face meetings and through community-monitoring mechanisms.

  2. Quantitative consensus of supervised learners for diffuse lung parenchymal HRCT patterns

    NASA Astrophysics Data System (ADS)

    Raghunath, Sushravya; Rajagopalan, Srinivasan; Karwoski, Ronald A.; Bartholmai, Brian J.; Robb, Richard A.

    2013-03-01

    Automated lung parenchymal classification usually relies on supervised learning of expert chosen regions representative of the visually differentiable HRCT patterns specific to different pathologies (eg. emphysema, ground glass, honey combing, reticular and normal). Considering the elusiveness of a single most discriminating similarity measure, a plurality of weak learners can be combined to improve the machine learnability. Though a number of quantitative combination strategies exist, their efficacy is data and domain dependent. In this paper, we investigate multiple (N=12) quantitative consensus approaches to combine the clusters obtained with multiple (n=33) probability density-based similarity measures. Our study shows that hypergraph based meta-clustering and probabilistic clustering provides optimal expert-metric agreement.

  3. Supervised Variational Relevance Learning, An Analytic Geometric Feature Selection with Applications to Omic Datasets.

    PubMed

    Boareto, Marcelo; Cesar, Jonatas; Leite, Vitor B P; Caticha, Nestor

    2015-01-01

    We introduce Supervised Variational Relevance Learning (Suvrel), a variational method to determine metric tensors to define distance based similarity in pattern classification, inspired in relevance learning. The variational method is applied to a cost function that penalizes large intraclass distances and favors small interclass distances. We find analytically the metric tensor that minimizes the cost function. Preprocessing the patterns by doing linear transformations using the metric tensor yields a dataset which can be more efficiently classified. We test our methods using publicly available datasets, for some standard classifiers. Among these datasets, two were tested by the MAQC-II project and, even without the use of further preprocessing, our results improve on their performance.

  4. Supervised Machine Learning for Regionalization of Environmental Data: Distribution of Uranium in Groundwater in Ukraine

    NASA Astrophysics Data System (ADS)

    Govorov, Michael; Gienko, Gennady; Putrenko, Viktor

    2018-05-01

    In this paper, several supervised machine learning algorithms were explored to define homogeneous regions of con-centration of uranium in surface waters in Ukraine using multiple environmental parameters. The previous study was focused on finding the primary environmental parameters related to uranium in ground waters using several methods of spatial statistics and unsupervised classification. At this step, we refined the regionalization using Artifi-cial Neural Networks (ANN) techniques including Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Convolutional Neural Network (CNN). The study is focused on building local ANN models which may significantly improve the prediction results of machine learning algorithms by taking into considerations non-stationarity and autocorrelation in spatial data.

  5. Electronic Nose Based on Independent Component Analysis Combined with Partial Least Squares and Artificial Neural Networks for Wine Prediction

    PubMed Central

    Aguilera, Teodoro; Lozano, Jesús; Paredes, José A.; Álvarez, Fernando J.; Suárez, José I.

    2012-01-01

    The aim of this work is to propose an alternative way for wine classification and prediction based on an electronic nose (e-nose) combined with Independent Component Analysis (ICA) as a dimensionality reduction technique, Partial Least Squares (PLS) to predict sensorial descriptors and Artificial Neural Networks (ANNs) for classification purpose. A total of 26 wines from different regions, varieties and elaboration processes have been analyzed with an e-nose and tasted by a sensory panel. Successful results have been obtained in most cases for prediction and classification. PMID:22969387

  6. Ground Truth Sampling and LANDSAT Accuracy Assessment

    NASA Technical Reports Server (NTRS)

    Robinson, J. W.; Gunther, F. J.; Campbell, W. J.

    1982-01-01

    It is noted that the key factor in any accuracy assessment of remote sensing data is the method used for determining the ground truth, independent of the remote sensing data itself. The sampling and accuracy procedures developed for nuclear power plant siting study are described. The purpose of the sampling procedure was to provide data for developing supervised classifications for two study sites and for assessing the accuracy of that and the other procedures used. The purpose of the accuracy assessment was to allow the comparison of the cost and accuracy of various classification procedures as applied to various data types.

  7. Trophic classification of selected Colorado lakes

    NASA Technical Reports Server (NTRS)

    Blackwell, R. J.; Boland, D. H. P.

    1979-01-01

    Multispectral scanner data, acquired over several Colorado lakes using LANDSAT-1 and aircraft, were used in conjunction with contact-sensed water quality data to determine the feasibility of assessing lacustrine trophic levels. A trophic state index was developed using contact-sensed data for several trophic indicators. Relationships between the digitally processed multispectral scanner data, several trophic indicators, and the trophic index were examined using a supervised multispectral classification technique and regression techniques. Statistically significant correlations exist between spectral bands, several of the trophic indicators and the trophic state index. Color-coded photomaps were generated which depict the spectral aspects of trophic state.

  8. [Quantitative classification-based occupational health management for electroplating enterprises in Baoan District of Shenzhen, China].

    PubMed

    Zhang, Sheng; Huang, Jinsheng; Yang, Baigbing; Lin, Binjie; Xu, Xinyun; Chen, Jinru; Zhao, Zhuandi; Tu, Xiaozhi; Bin, Haihua

    2014-04-01

    To improve the occupational health management levels in electroplating enterprises with quantitative classification measures and to provide a scientific basis for the prevention and control of occupational hazards in electroplating enterprises and the protection of workers' health. A quantitative classification table was created for the occupational health management in electroplating enterprises. The evaluation indicators included 6 items and 27 sub-items, with a total score of 100 points. Forty electroplating enterprises were selected and scored according to the quantitative classification table. These electroplating enterprises were classified into grades A, B, and C based on the scores. Among 40 electroplating enterprises, 11 (27.5%) had scores of >85 points (grade A), 23 (57.5%) had scores of 60∼85 points (grade B), and 6 (15.0%) had scores of <60 points (grade C). Quantitative classification management for electroplating enterprises is a valuable attempt, which is helpful for the supervision and management by the health department and provides an effective method for the self-management of enterprises.

  9. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

    PubMed

    Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  10. Automotive System for Remote Surface Classification.

    PubMed

    Bystrov, Aleksandr; Hoare, Edward; Tran, Thuy-Yung; Clarke, Nigel; Gashinova, Marina; Cherniakov, Mikhail

    2017-04-01

    In this paper we shall discuss a novel approach to road surface recognition, based on the analysis of backscattered microwave and ultrasonic signals. The novelty of our method is sonar and polarimetric radar data fusion, extraction of features for separate swathes of illuminated surface (segmentation), and using of multi-stage artificial neural network for surface classification. The developed system consists of 24 GHz radar and 40 kHz ultrasonic sensor. The features are extracted from backscattered signals and then the procedures of principal component analysis and supervised classification are applied to feature data. The special attention is paid to multi-stage artificial neural network which allows an overall increase in classification accuracy. The proposed technique was tested for recognition of a large number of real surfaces in different weather conditions with the average accuracy of correct classification of 95%. The obtained results thereby demonstrate that the use of proposed system architecture and statistical methods allow for reliable discrimination of various road surfaces in real conditions.

  11. Automotive System for Remote Surface Classification

    PubMed Central

    Bystrov, Aleksandr; Hoare, Edward; Tran, Thuy-Yung; Clarke, Nigel; Gashinova, Marina; Cherniakov, Mikhail

    2017-01-01

    In this paper we shall discuss a novel approach to road surface recognition, based on the analysis of backscattered microwave and ultrasonic signals. The novelty of our method is sonar and polarimetric radar data fusion, extraction of features for separate swathes of illuminated surface (segmentation), and using of multi-stage artificial neural network for surface classification. The developed system consists of 24 GHz radar and 40 kHz ultrasonic sensor. The features are extracted from backscattered signals and then the procedures of principal component analysis and supervised classification are applied to feature data. The special attention is paid to multi-stage artificial neural network which allows an overall increase in classification accuracy. The proposed technique was tested for recognition of a large number of real surfaces in different weather conditions with the average accuracy of correct classification of 95%. The obtained results thereby demonstrate that the use of proposed system architecture and statistical methods allow for reliable discrimination of various road surfaces in real conditions. PMID:28368297

  12. Cell classification using big data analytics plus time stretch imaging (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Jalali, Bahram; Chen, Claire L.; Mahjoubfar, Ata

    2016-09-01

    We show that blood cells can be classified with high accuracy and high throughput by combining machine learning with time stretch quantitative phase imaging. Our diagnostic system captures quantitative phase images in a flow microscope at millions of frames per second and extracts multiple biophysical features from individual cells including morphological characteristics, light absorption and scattering parameters, and protein concentration. These parameters form a hyperdimensional feature space in which supervised learning and cell classification is performed. We show binary classification of T-cells against colon cancer cells, as well classification of algae cell strains with high and low lipid content. The label-free screening averts the negative impact of staining reagents on cellular viability or cell signaling. The combination of time stretch machine vision and learning offers unprecedented cell analysis capabilities for cancer diagnostics, drug development and liquid biopsy for personalized genomics.

  13. A thyroid nodule classification method based on TI-RADS

    NASA Astrophysics Data System (ADS)

    Wang, Hao; Yang, Yang; Peng, Bo; Chen, Qin

    2017-07-01

    Thyroid Imaging Reporting and Data System(TI-RADS) is a valuable tool for differentiating the benign and the malignant thyroid nodules. In clinic, doctors can determine the extent of being benign or malignant in terms of different classes by using TI-RADS. Classification represents the degree of malignancy of thyroid nodules. TI-RADS as a classification standard can be used to guide the ultrasonic doctor to examine thyroid nodules more accurately and reliably. In this paper, we aim to classify the thyroid nodules with the help of TI-RADS. To this end, four ultrasound signs, i.e., cystic and solid, echo pattern, boundary feature and calcification of thyroid nodules are extracted and converted into feature vectors. Then semi-supervised fuzzy C-means ensemble (SS-FCME) model is applied to obtain the classification results. The experimental results demonstrate that the proposed method can help doctors diagnose the thyroid nodules effectively.

  14. A Comparison of Local Variance, Fractal Dimension, and Moran's I as Aids to Multispectral Image Classification

    NASA Technical Reports Server (NTRS)

    Emerson, Charles W.; Sig-NganLam, Nina; Quattrochi, Dale A.

    2004-01-01

    The accuracy of traditional multispectral maximum-likelihood image classification is limited by the skewed statistical distributions of reflectances from the complex heterogenous mixture of land cover types in urban areas. This work examines the utility of local variance, fractal dimension and Moran's I index of spatial autocorrelation in segmenting multispectral satellite imagery. Tools available in the Image Characterization and Modeling System (ICAMS) were used to analyze Landsat 7 imagery of Atlanta, Georgia. Although segmentation of panchromatic images is possible using indicators of spatial complexity, different land covers often yield similar values of these indices. Better results are obtained when a surface of local fractal dimension or spatial autocorrelation is combined as an additional layer in a supervised maximum-likelihood multispectral classification. The addition of fractal dimension measures is particularly effective at resolving land cover classes within urbanized areas, as compared to per-pixel spectral classification techniques.

  15. Comprehensive 4-stage categorization of bicuspid aortic valve leaflet morphology by cardiac MRI in 386 patients.

    PubMed

    Murphy, I G; Collins, J; Powell, A; Markl, M; McCarthy, P; Malaisrie, S C; Carr, J C; Barker, A J

    2017-08-01

    Bicuspid aortic valve (BAV) disease is heterogeneous and related to valve dysfunction and aortopathy. Appropriate follow up and surveillance of patients with BAV may depend on correct phenotypic categorization. There are multiple classification schemes, however a need exists to comprehensively capture commissure fusion, leaflet asymmetry, and valve orifice orientation. Our aim was to develop a BAV classification scheme for use at MRI to ascertain the frequency of different phenotypes and the consistency of BAV classification. The BAV classification scheme builds on the Sievers surgical BAV classification, adding valve orifice orientation, partial leaflet fusion and leaflet asymmetry. A single observer successfully applied this classification to 386 of 398 Cardiac MRI studies. Repeatability of categorization was ascertained with intraobserver and interobserver kappa scores. Sensitivity and specificity of MRI findings was determined from operative reports, where available. Fusion of the right and left leaflets accounted for over half of all cases. Partial leaflet fusion was seen in 46% of patients. Good interobserver agreement was seen for orientation of the valve opening (κ = 0.90), type (κ = 0.72) and presence of partial fusion (κ = 0.83, p < 0.0001). Retrospective review of operative notes showed sensitivity and specificity for orientation (90, 93%) and for Sievers type (73, 87%). The proposed BAV classification schema was assessed by MRI for its reliability to classify valve morphology in addition to illustrating the wide heterogeneity of leaflet size, orifice orientation, and commissural fusion. The classification may be helpful in further understanding the relationship between valve morphology, flow derangement and aortopathy.

  16. Video mining using combinations of unsupervised and supervised learning techniques

    NASA Astrophysics Data System (ADS)

    Divakaran, Ajay; Miyahara, Koji; Peker, Kadir A.; Radhakrishnan, Regunathan; Xiong, Ziyou

    2003-12-01

    We discuss the meaning and significance of the video mining problem, and present our work on some aspects of video mining. A simple definition of video mining is unsupervised discovery of patterns in audio-visual content. Such purely unsupervised discovery is readily applicable to video surveillance as well as to consumer video browsing applications. We interpret video mining as content-adaptive or "blind" content processing, in which the first stage is content characterization and the second stage is event discovery based on the characterization obtained in stage 1. We discuss the target applications and find that using a purely unsupervised approach are too computationally complex to be implemented on our product platform. We then describe various combinations of unsupervised and supervised learning techniques that help discover patterns that are useful to the end-user of the application. We target consumer video browsing applications such as commercial message detection, sports highlights extraction etc. We employ both audio and video features. We find that supervised audio classification combined with unsupervised unusual event discovery enables accurate supervised detection of desired events. Our techniques are computationally simple and robust to common variations in production styles etc.

  17. The Predictive Validity of a Gender-Responsive Needs Assessment: An Exploratory Study

    ERIC Educational Resources Information Center

    Salisbury, Emily J.; Van Voorhis, Patricia; Spiropoulos, Georgia V.

    2009-01-01

    Risk assessment and classification systems for women have been largely derived from male-based systems. As a result, many of the needs unique to women are not formally assessed or treated. Emerging research advocating a gender-responsive approach to the supervision and treatment of women offenders suggests that needs such as abuse, mental health,…

  18. Menthol smokers: metabolomic profiling and smoking behavior

    PubMed Central

    Hsu, Ping-Ching; Lan, Renny S.; Brasky, Theodore M.; Marian, Catalin; Cheema, Amrita K.; Ressom, Habtom W.; Loffredo, Christopher A.; Pickworth, Wallace B.; Shields, Peter G.

    2016-01-01

    Background The use of menthol in cigarettes and marketing is under consideration for regulation by the FDA. However, the effects of menthol on smoking behavior and carcinogen exposure have been inconclusive. We previously reported metabolomic profiling for cigarette smokers, and novelly identified a menthol-glucuronide (MG) as the most significant metabolite directly related to smoking. Here, MG is studied in relation to smoking behavior and metabolomic profiles. Methods A cross-sectional study of 105 smokers who smoked two cigarettes in the laboratory one hour apart. Blood nicotine, MG and exhaled carbon monoxide (CO) boosts were determined (the difference before and after smoking). Spearman's correlation, Chi-Square and ANCOVA adjusted for gender, race and cotinine levels for menthol smokers assessed the relationship of MG boost, smoking behavior, and metabolic profiles. Multivariate metabolite characterization using supervised Partial Least Squares-Discriminant Analysis (PLS-DA) was carried out for the classification of metabolomics profiles. Results MG boost was positively correlated with CO boost, nicotine boost, average puff volume, puff duration, and total smoke exposure. Classification using PLS-DA, MG was the top metabolite discriminating metabolome of menthol vs. non-menthol smokers. Among menthol smokers, forty-two metabolites were significantly correlated with MG boost, which linked to cellular functions such as of cell death, survival, and movement. Conclusion Plasma MG boost is a new smoking behavior biomarker that may provides novel information over self-reported use of menthol cigarettes by integrating different smoking measures for understanding smoking behavior and harm of menthol cigarettes. Impacts These results provide insight into the biological effect of menthol smoking. PMID:27628308

  19. Generalized multiple kernel learning with data-dependent priors.

    PubMed

    Mao, Qi; Tsang, Ivor W; Gao, Shenghua; Wang, Li

    2015-06-01

    Multiple kernel learning (MKL) and classifier ensemble are two mainstream methods for solving learning problems in which some sets of features/views are more informative than others, or the features/views within a given set are inconsistent. In this paper, we first present a novel probabilistic interpretation of MKL such that maximum entropy discrimination with a noninformative prior over multiple views is equivalent to the formulation of MKL. Instead of using the noninformative prior, we introduce a novel data-dependent prior based on an ensemble of kernel predictors, which enhances the prediction performance of MKL by leveraging the merits of the classifier ensemble. With the proposed probabilistic framework of MKL, we propose a hierarchical Bayesian model to learn the proposed data-dependent prior and classification model simultaneously. The resultant problem is convex and other information (e.g., instances with either missing views or missing labels) can be seamlessly incorporated into the data-dependent priors. Furthermore, a variety of existing MKL models can be recovered under the proposed MKL framework and can be readily extended to incorporate these priors. Extensive experiments demonstrate the benefits of our proposed framework in supervised and semisupervised settings, as well as in tasks with partial correspondence among multiple views.

  20. 42 CFR 410.43 - Partial hospitalization services: Conditions and exclusions.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ...) Include any of the following: (i) Individual and group therapy with physicians or psychologists or other mental health professionals to the extent authorized under State law. (ii) Occupational therapy requiring... appropriate supervision of a qualified occupational therapist by an occupational therapy assistant as...

  1. Supervised extensions of chemography approaches: case studies of chemical liabilities assessment

    PubMed Central

    2014-01-01

    Chemical liabilities, such as adverse effects and toxicity, play a significant role in modern drug discovery process. In silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Herein, we propose an approach combining several classification and chemography methods to be able to predict chemical liabilities and to interpret obtained results in the context of impact of structural changes of compounds on their pharmacological profile. To our knowledge for the first time, the supervised extension of Generative Topographic Mapping is proposed as an effective new chemography method. New approach for mapping new data using supervised Isomap without re-building models from the scratch has been proposed. Two approaches for estimation of model’s applicability domain are used in our study to our knowledge for the first time in chemoinformatics. The structural alerts responsible for the negative characteristics of pharmacological profile of chemical compounds has been found as a result of model interpretation. PMID:24868246

  2. Augmenting the decomposition of EMG signals using supervised feature extraction techniques.

    PubMed

    Parsaei, Hossein; Gangeh, Mehrdad J; Stashuk, Daniel W; Kamel, Mohamed S

    2012-01-01

    Electromyographic (EMG) signal decomposition is the process of resolving an EMG signal into its constituent motor unit potential trains (MUPTs). In this work, the possibility of improving the decomposing results using two supervised feature extraction methods, i.e., Fisher discriminant analysis (FDA) and supervised principal component analysis (SPCA), is explored. Using the MUP labels provided by a decomposition-based quantitative EMG system as a training data for FDA and SPCA, the MUPs are transformed into a new feature space such that the MUPs of a single MU become as close as possible to each other while those created by different MUs become as far as possible. The MUPs are then reclassified using a certainty-based classification algorithm. Evaluation results using 10 simulated EMG signals comprised of 3-11 MUPTs demonstrate that FDA and SPCA on average improve the decomposition accuracy by 6%. The improvement for the most difficult-to-decompose signal is about 12%, which shows the proposed approach is most beneficial in the decomposition of more complex signals.

  3. Supervised classification of brain tissues through local multi-scale texture analysis by coupling DIR and FLAIR MR sequences

    NASA Astrophysics Data System (ADS)

    Poletti, Enea; Veronese, Elisa; Calabrese, Massimiliano; Bertoldo, Alessandra; Grisan, Enrico

    2012-02-01

    The automatic segmentation of brain tissues in magnetic resonance (MR) is usually performed on T1-weighted images, due to their high spatial resolution. T1w sequence, however, has some major downsides when brain lesions are present: the altered appearance of diseased tissues causes errors in tissues classification. In order to overcome these drawbacks, we employed two different MR sequences: fluid attenuated inversion recovery (FLAIR) and double inversion recovery (DIR). The former highlights both gray matter (GM) and white matter (WM), the latter highlights GM alone. We propose here a supervised classification scheme that does not require any anatomical a priori information to identify the 3 classes, "GM", "WM", and "background". Features are extracted by means of a local multi-scale texture analysis, computed for each pixel of the DIR and FLAIR sequences. The 9 textures considered are average, standard deviation, kurtosis, entropy, contrast, correlation, energy, homogeneity, and skewness, evaluated on a neighborhood of 3x3, 5x5, and 7x7 pixels. Hence, the total number of features associated to a pixel is 56 (9 textures x3 scales x2 sequences +2 original pixel values). The classifier employed is a Support Vector Machine with Radial Basis Function as kernel. From each of the 4 brain volumes evaluated, a DIR and a FLAIR slice have been selected and manually segmented by 2 expert neurologists, providing 1st and 2nd human reference observations which agree with an average accuracy of 99.03%. SVM performances have been assessed with a 4-fold cross-validation, yielding an average classification accuracy of 98.79%.

  4. A multi-label, semi-supervised classification approach applied to personality prediction in social media.

    PubMed

    Lima, Ana Carolina E S; de Castro, Leandro Nunes

    2014-10-01

    Social media allow web users to create and share content pertaining to different subjects, exposing their activities, opinions, feelings and thoughts. In this context, online social media has attracted the interest of data scientists seeking to understand behaviours and trends, whilst collecting statistics for social sites. One potential application for these data is personality prediction, which aims to understand a user's behaviour within social media. Traditional personality prediction relies on users' profiles, their status updates, the messages they post, etc. Here, a personality prediction system for social media data is introduced that differs from most approaches in the literature, in that it works with groups of texts, instead of single texts, and does not take users' profiles into account. Also, the proposed approach extracts meta-attributes from texts and does not work directly with the content of the messages. The set of possible personality traits is taken from the Big Five model and allows the problem to be characterised as a multi-label classification task. The problem is then transformed into a set of five binary classification problems and solved by means of a semi-supervised learning approach, due to the difficulty in annotating the massive amounts of data generated in social media. In our implementation, the proposed system was trained with three well-known machine-learning algorithms, namely a Naïve Bayes classifier, a Support Vector Machine, and a Multilayer Perceptron neural network. The system was applied to predict the personality of Tweets taken from three datasets available in the literature, and resulted in an approximately 83% accurate prediction, with some of the personality traits presenting better individual classification rates than others. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. Improving zero-training brain-computer interfaces by mixing model estimators

    NASA Astrophysics Data System (ADS)

    Verhoeven, T.; Hübner, D.; Tangermann, M.; Müller, K. R.; Dambre, J.; Kindermans, P. J.

    2017-06-01

    Objective. Brain-computer interfaces (BCI) based on event-related potentials (ERP) incorporate a decoder to classify recorded brain signals and subsequently select a control signal that drives a computer application. Standard supervised BCI decoders require a tedious calibration procedure prior to every session. Several unsupervised classification methods have been proposed that tune the decoder during actual use and as such omit this calibration. Each of these methods has its own strengths and weaknesses. Our aim is to improve overall accuracy of ERP-based BCIs without calibration. Approach. We consider two approaches for unsupervised classification of ERP signals. Learning from label proportions (LLP) was recently shown to be guaranteed to converge to a supervised decoder when enough data is available. In contrast, the formerly proposed expectation maximization (EM) based decoding for ERP-BCI does not have this guarantee. However, while this decoder has high variance due to random initialization of its parameters, it obtains a higher accuracy faster than LLP when the initialization is good. We introduce a method to optimally combine these two unsupervised decoding methods, letting one method’s strengths compensate for the weaknesses of the other and vice versa. The new method is compared to the aforementioned methods in a resimulation of an experiment with a visual speller. Main results. Analysis of the experimental results shows that the new method exceeds the performance of the previous unsupervised classification approaches in terms of ERP classification accuracy and symbol selection accuracy during the spelling experiment. Furthermore, the method shows less dependency on random initialization of model parameters and is consequently more reliable. Significance. Improving the accuracy and subsequent reliability of calibrationless BCIs makes these systems more appealing for frequent use.

  6. Supervised fully polarimetric classification of the Black Forest test site: From MAESTROI to MAC Europe

    NASA Technical Reports Server (NTRS)

    Degrandi, G.; Lavalle, C.; Degroof, H.; Sieber, A.

    1992-01-01

    A study on the performance of a supervised fully polarimetric maximum likelihood classifier for synthetic aperture radar (SAR) data when applied to a specific classification context: forest classification based on age classes and in the presence of a sloping terrain is presented. For the experimental part, the polarimetric AIRSAR data at P, L, and C-band, acquired over the German Black Forest near Freiburg in the frame of the 1989 MAESTRO-1 campaign and the 1991 MAC Europe campaign was used, MAESTRO-1 with an ESA/JRC sponsored campaign, and MAC Europe (Multi-sensor Aircraft Campaign); in both cases the multi-frequency polarimetric JPL Airborne Synthetic Aperture Radar (AIRSAR) radar was flown over a number of European test sites. The study is structured as follows. At first, the general characteristics of the classifier and the dependencies from some parameters, like frequency bands, feature vector, calibration, using test areas lying on a flat terrain are investigated. Once it is determined the optimal conditions for the classifier performance, we then move on to the study of the slope effect. The bulk of this work is performed using the Maestrol data set. Next the classifier performance with the MAC Europe data is considered. The study is divided into two stages: first some of the tests done on the Maestro data are repeated, to highlight the improvements due to the new processing scheme that delivers 16 look data. Second we experiment with multi images classification with two goals: to assess the possibility of using a training set measured from one image to classify areas in different images; and to classify areas on critical slopes using different viewing angles. The main points of the study are listed and some of the results obtained so far are highlighted.

  7. Contribution of non-negative matrix factorization to the classification of remote sensing images

    NASA Astrophysics Data System (ADS)

    Karoui, M. S.; Deville, Y.; Hosseini, S.; Ouamri, A.; Ducrot, D.

    2008-10-01

    Remote sensing has become an unavoidable tool for better managing our environment, generally by realizing maps of land cover using classification techniques. The classification process requires some pre-processing, especially for data size reduction. The most usual technique is Principal Component Analysis. Another approach consists in regarding each pixel of the multispectral image as a mixture of pure elements contained in the observed area. Using Blind Source Separation (BSS) methods, one can hope to unmix each pixel and to perform the recognition of the classes constituting the observed scene. Our contribution consists in using Non-negative Matrix Factorization (NMF) combined with sparse coding as a solution to BSS, in order to generate new images (which are at least partly separated images) using HRV SPOT images from Oran area, Algeria). These images are then used as inputs of a supervised classifier integrating textural information. The results of classifications of these "separated" images show a clear improvement (correct pixel classification rate improved by more than 20%) compared to classification of initial (i.e. non separated) images. These results show the contribution of NMF as an attractive pre-processing for classification of multispectral remote sensing imagery.

  8. Localized thin-section CT with radiomics feature extraction and machine learning to classify early-detected pulmonary nodules from lung cancer screening

    NASA Astrophysics Data System (ADS)

    Tu, Shu-Ju; Wang, Chih-Wei; Pan, Kuang-Tse; Wu, Yi-Cheng; Wu, Chen-Te

    2018-03-01

    Lung cancer screening aims to detect small pulmonary nodules and decrease the mortality rate of those affected. However, studies from large-scale clinical trials of lung cancer screening have shown that the false-positive rate is high and positive predictive value is low. To address these problems, a technical approach is greatly needed for accurate malignancy differentiation among these early-detected nodules. We studied the clinical feasibility of an additional protocol of localized thin-section CT for further assessment on recalled patients from lung cancer screening tests. Our approach of localized thin-section CT was integrated with radiomics features extraction and machine learning classification which was supervised by pathological diagnosis. Localized thin-section CT images of 122 nodules were retrospectively reviewed and 374 radiomics features were extracted. In this study, 48 nodules were benign and 74 malignant. There were nine patients with multiple nodules and four with synchronous multiple malignant nodules. Different machine learning classifiers with a stratified ten-fold cross-validation were used and repeated 100 times to evaluate classification accuracy. Of the image features extracted from the thin-section CT images, 238 (64%) were useful in differentiating between benign and malignant nodules. These useful features include CT density (p  =  0.002 518), sigma (p  =  0.002 781), uniformity (p  =  0.032 41), and entropy (p  =  0.006 685). The highest classification accuracy was 79% by the logistic classifier. The performance metrics of this logistic classification model was 0.80 for the positive predictive value, 0.36 for the false-positive rate, and 0.80 for the area under the receiver operating characteristic curve. Our approach of direct risk classification supervised by the pathological diagnosis with localized thin-section CT and radiomics feature extraction may support clinical physicians in determining truly malignant nodules and therefore reduce problems in lung cancer screening.

  9. Automated classification of single airborne particles from two-dimensional angle-resolved optical scattering (TAOS) patterns by non-linear filtering

    NASA Astrophysics Data System (ADS)

    Crosta, Giovanni Franco; Pan, Yong-Le; Aptowicz, Kevin B.; Casati, Caterina; Pinnick, Ronald G.; Chang, Richard K.; Videen, Gorden W.

    2013-12-01

    Measurement of two-dimensional angle-resolved optical scattering (TAOS) patterns is an attractive technique for detecting and characterizing micron-sized airborne particles. In general, the interpretation of these patterns and the retrieval of the particle refractive index, shape or size alone, are difficult problems. By reformulating the problem in statistical learning terms, a solution is proposed herewith: rather than identifying airborne particles from their scattering patterns, TAOS patterns themselves are classified through a learning machine, where feature extraction interacts with multivariate statistical analysis. Feature extraction relies on spectrum enhancement, which includes the discrete cosine FOURIER transform and non-linear operations. Multivariate statistical analysis includes computation of the principal components and supervised training, based on the maximization of a suitable figure of merit. All algorithms have been combined together to analyze TAOS patterns, organize feature vectors, design classification experiments, carry out supervised training, assign unknown patterns to classes, and fuse information from different training and recognition experiments. The algorithms have been tested on a data set with more than 3000 TAOS patterns. The parameters that control the algorithms at different stages have been allowed to vary within suitable bounds and are optimized to some extent. Classification has been targeted at discriminating aerosolized Bacillus subtilis particles, a simulant of anthrax, from atmospheric aerosol particles and interfering particles, like diesel soot. By assuming that all training and recognition patterns come from the respective reference materials only, the most satisfactory classification result corresponds to 20% false negatives from B. subtilis particles and <11% false positives from all other aerosol particles. The most effective operations have consisted of thresholding TAOS patterns in order to reject defective ones, and forming training sets from three or four pattern classes. The presented automated classification method may be adapted into a real-time operation technique, capable of detecting and characterizing micron-sized airborne particles.

  10. Localized thin-section CT with radiomics feature extraction and machine learning to classify early-detected pulmonary nodules from lung cancer screening.

    PubMed

    Tu, Shu-Ju; Wang, Chih-Wei; Pan, Kuang-Tse; Wu, Yi-Cheng; Wu, Chen-Te

    2018-03-14

    Lung cancer screening aims to detect small pulmonary nodules and decrease the mortality rate of those affected. However, studies from large-scale clinical trials of lung cancer screening have shown that the false-positive rate is high and positive predictive value is low. To address these problems, a technical approach is greatly needed for accurate malignancy differentiation among these early-detected nodules. We studied the clinical feasibility of an additional protocol of localized thin-section CT for further assessment on recalled patients from lung cancer screening tests. Our approach of localized thin-section CT was integrated with radiomics features extraction and machine learning classification which was supervised by pathological diagnosis. Localized thin-section CT images of 122 nodules were retrospectively reviewed and 374 radiomics features were extracted. In this study, 48 nodules were benign and 74 malignant. There were nine patients with multiple nodules and four with synchronous multiple malignant nodules. Different machine learning classifiers with a stratified ten-fold cross-validation were used and repeated 100 times to evaluate classification accuracy. Of the image features extracted from the thin-section CT images, 238 (64%) were useful in differentiating between benign and malignant nodules. These useful features include CT density (p  =  0.002 518), sigma (p  =  0.002 781), uniformity (p  =  0.032 41), and entropy (p  =  0.006 685). The highest classification accuracy was 79% by the logistic classifier. The performance metrics of this logistic classification model was 0.80 for the positive predictive value, 0.36 for the false-positive rate, and 0.80 for the area under the receiver operating characteristic curve. Our approach of direct risk classification supervised by the pathological diagnosis with localized thin-section CT and radiomics feature extraction may support clinical physicians in determining truly malignant nodules and therefore reduce problems in lung cancer screening.

  11. Automated labelling of cancer textures in colorectal histopathology slides using quasi-supervised learning.

    PubMed

    Onder, Devrim; Sarioglu, Sulen; Karacali, Bilge

    2013-04-01

    Quasi-supervised learning is a statistical learning algorithm that contrasts two datasets by computing estimate for the posterior probability of each sample in either dataset. This method has not been applied to histopathological images before. The purpose of this study is to evaluate the performance of the method to identify colorectal tissues with or without adenocarcinoma. Light microscopic digital images from histopathological sections were obtained from 30 colorectal radical surgery materials including adenocarcinoma and non-neoplastic regions. The texture features were extracted by using local histograms and co-occurrence matrices. The quasi-supervised learning algorithm operates on two datasets, one containing samples of normal tissues labelled only indirectly, and the other containing an unlabeled collection of samples of both normal and cancer tissues. As such, the algorithm eliminates the need for manually labelled samples of normal and cancer tissues for conventional supervised learning and significantly reduces the expert intervention. Several texture feature vector datasets corresponding to different extraction parameters were tested within the proposed framework. The Independent Component Analysis dimensionality reduction approach was also identified as the one improving the labelling performance evaluated in this series. In this series, the proposed method was applied to the dataset of 22,080 vectors with reduced dimensionality 119 from 132. Regions containing cancer tissue could be identified accurately having false and true positive rates up to 19% and 88% respectively without using manually labelled ground-truth datasets in a quasi-supervised strategy. The resulting labelling performances were compared to that of a conventional powerful supervised classifier using manually labelled ground-truth data. The supervised classifier results were calculated as 3.5% and 95% for the same case. The results in this series in comparison with the benchmark classifier, suggest that quasi-supervised image texture labelling may be a useful method in the analysis and classification of pathological slides but further study is required to improve the results. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. 42 CFR 410.43 - Partial hospitalization services: Conditions and exclusions.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... appropriate supervision of a qualified occupational therapist by an occupational therapy assistant as specified in part 484 of this chapter. (iii) Services of social workers, trained psychiatric nurses, and... this chapter for payment on a fee schedule basis. (2) Physician assistant services, as defined in...

  13. 42 CFR 410.43 - Partial hospitalization services: Conditions and exclusions.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... appropriate supervision of a qualified occupational therapist by an occupational therapy assistant as specified in part 484 of this chapter. (iii) Services of social workers, trained psychiatric nurses, and... this chapter for payment on a fee schedule basis. (2) Physician assistant services, as defined in...

  14. 42 CFR 410.43 - Partial hospitalization services: Conditions and exclusions.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... appropriate supervision of a qualified occupational therapist by an occupational therapy assistant as specified in part 484 of this chapter. (iii) Services of social workers, trained psychiatric nurses, and... this chapter for payment on a fee schedule basis. (2) Physician assistant services, as defined in...

  15. 42 CFR 410.43 - Partial hospitalization services: Conditions and exclusions.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... appropriate supervision of a qualified occupational therapist by an occupational therapy assistant as specified in part 484 of this chapter. (iii) Services of social workers, trained psychiatric nurses, and... this chapter for payment on a fee schedule basis. (2) Physician assistant services, as defined in...

  16. Marginal semi-supervised sub-manifold projections with informative constraints for dimensionality reduction and recognition.

    PubMed

    Zhang, Zhao; Zhao, Mingbo; Chow, Tommy W S

    2012-12-01

    In this work, sub-manifold projections based semi-supervised dimensionality reduction (DR) problem learning from partial constrained data is discussed. Two semi-supervised DR algorithms termed Marginal Semi-Supervised Sub-Manifold Projections (MS³MP) and orthogonal MS³MP (OMS³MP) are proposed. MS³MP in the singular case is also discussed. We also present the weighted least squares view of MS³MP. Based on specifying the types of neighborhoods with pairwise constraints (PC) and the defined manifold scatters, our methods can preserve the local properties of all points and discriminant structures embedded in the localized PC. The sub-manifolds of different classes can also be separated. In PC guided methods, exploring and selecting the informative constraints is challenging and random constraint subsets significantly affect the performance of algorithms. This paper also introduces an effective technique to select the informative constraints for DR with consistent constraints. The analytic form of the projection axes can be obtained by eigen-decomposition. The connections between this work and other related work are also elaborated. The validity of the proposed constraint selection approach and DR algorithms are evaluated by benchmark problems. Extensive simulations show that our algorithms can deliver promising results over some widely used state-of-the-art semi-supervised DR techniques. Copyright © 2012 Elsevier Ltd. All rights reserved.

  17. The Utility of Teleultrasound to Guide Acute Patient Management.

    PubMed

    Becker, Christian; Fusaro, Mario; Patel, Dhruv; Shalom, Isaac; Frishman, William H; Scurlock, Corey

    Ultrasound has evolved into a core bedside tool for diagnostic and management purposes for all subsets of adult and pediatric critically-ill patients. Teleintensive care unit coverage has undergone a similar rapid expansion period throughout the United States. Round-the-clock access to ultrasound equipment is very common in today's intensive care unit, but 24/7 coverage with staff trained to acquire and interpret point-of-care ultrasound in real time is lagging behind equipment availability. Medical trainees and physician extenders require attending level supervision to ensure consistent image acquisition and accurate interpretation. Teleintensivists can extend the utility of ultrasound by supervising and guiding providers without or with only partial training in ultrasound, and also by extending direct trainee ultrasound supervision to time periods when no direct bedside attending supervisor is available, and when treatment decisions otherwise would have been made without supervision and feedback on image acquisition and interpretation. Nursing staff without ultrasound training can also be directed to perform basic ultrasound exams, which may have immediate diagnostic and/or treatment consequences, thereby overcoming access barriers in the absence of physicians or physician extenders. We discuss 4 real-life clinical scenarios in which teleintensivist supervision extended and standardized bedside ultrasound exams to guide management decisions which significantly impacted patient outcomes.

  18. Semi-Supervised Sparse Representation Based Classification for Face Recognition With Insufficient Labeled Samples

    NASA Astrophysics Data System (ADS)

    Gao, Yuan; Ma, Jiayi; Yuille, Alan L.

    2017-05-01

    This paper addresses the problem of face recognition when there is only few, or even only a single, labeled examples of the face that we wish to recognize. Moreover, these examples are typically corrupted by nuisance variables, both linear (i.e., additive nuisance variables such as bad lighting, wearing of glasses) and non-linear (i.e., non-additive pixel-wise nuisance variables such as expression changes). The small number of labeled examples means that it is hard to remove these nuisance variables between the training and testing faces to obtain good recognition performance. To address the problem we propose a method called Semi-Supervised Sparse Representation based Classification (S$^3$RC). This is based on recent work on sparsity where faces are represented in terms of two dictionaries: a gallery dictionary consisting of one or more examples of each person, and a variation dictionary representing linear nuisance variables (e.g., different lighting conditions, different glasses). The main idea is that (i) we use the variation dictionary to characterize the linear nuisance variables via the sparsity framework, then (ii) prototype face images are estimated as a gallery dictionary via a Gaussian Mixture Model (GMM), with mixed labeled and unlabeled samples in a semi-supervised manner, to deal with the non-linear nuisance variations between labeled and unlabeled samples. We have done experiments with insufficient labeled samples, even when there is only a single labeled sample per person. Our results on the AR, Multi-PIE, CAS-PEAL, and LFW databases demonstrate that the proposed method is able to deliver significantly improved performance over existing methods.

  19. Response monitoring using quantitative ultrasound methods and supervised dictionary learning in locally advanced breast cancer

    NASA Astrophysics Data System (ADS)

    Gangeh, Mehrdad J.; Fung, Brandon; Tadayyon, Hadi; Tran, William T.; Czarnota, Gregory J.

    2016-03-01

    A non-invasive computer-aided-theragnosis (CAT) system was developed for the early assessment of responses to neoadjuvant chemotherapy in patients with locally advanced breast cancer. The CAT system was based on quantitative ultrasound spectroscopy methods comprising several modules including feature extraction, a metric to measure the dissimilarity between "pre-" and "mid-treatment" scans, and a supervised learning algorithm for the classification of patients to responders/non-responders. One major requirement for the successful design of a high-performance CAT system is to accurately measure the changes in parametric maps before treatment onset and during the course of treatment. To this end, a unified framework based on Hilbert-Schmidt independence criterion (HSIC) was used for the design of feature extraction from parametric maps and the dissimilarity measure between the "pre-" and "mid-treatment" scans. For the feature extraction, HSIC was used to design a supervised dictionary learning (SDL) method by maximizing the dependency between the scans taken from "pre-" and "mid-treatment" with "dummy labels" given to the scans. For the dissimilarity measure, an HSIC-based metric was employed to effectively measure the changes in parametric maps as an indication of treatment effectiveness. The HSIC-based feature extraction and dissimilarity measure used a kernel function to nonlinearly transform input vectors into a higher dimensional feature space and computed the population means in the new space, where enhanced group separability was ideally obtained. The results of the classification using the developed CAT system indicated an improvement of performance compared to a CAT system with basic features using histogram of intensity.

  20. Enhanced low-rank representation via sparse manifold adaption for semi-supervised learning.

    PubMed

    Peng, Yong; Lu, Bao-Liang; Wang, Suhang

    2015-05-01

    Constructing an informative and discriminative graph plays an important role in various pattern recognition tasks such as clustering and classification. Among the existing graph-based learning models, low-rank representation (LRR) is a very competitive one, which has been extensively employed in spectral clustering and semi-supervised learning (SSL). In SSL, the graph is composed of both labeled and unlabeled samples, where the edge weights are calculated based on the LRR coefficients. However, most of existing LRR related approaches fail to consider the geometrical structure of data, which has been shown beneficial for discriminative tasks. In this paper, we propose an enhanced LRR via sparse manifold adaption, termed manifold low-rank representation (MLRR), to learn low-rank data representation. MLRR can explicitly take the data local manifold structure into consideration, which can be identified by the geometric sparsity idea; specifically, the local tangent space of each data point was sought by solving a sparse representation objective. Therefore, the graph to depict the relationship of data points can be built once the manifold information is obtained. We incorporate a regularizer into LRR to make the learned coefficients preserve the geometric constraints revealed in the data space. As a result, MLRR combines both the global information emphasized by low-rank property and the local information emphasized by the identified manifold structure. Extensive experimental results on semi-supervised classification tasks demonstrate that MLRR is an excellent method in comparison with several state-of-the-art graph construction approaches. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease.

    PubMed

    Taguchi, Y-h; Iwadate, Mitsuo; Umeyama, Hideaki

    2015-04-30

    Feature extraction (FE) is difficult, particularly if there are more features than samples, as small sample numbers often result in biased outcomes or overfitting. Furthermore, multiple sample classes often complicate FE because evaluating performance, which is usual in supervised FE, is generally harder than the two-class problem. Developing sample classification independent unsupervised methods would solve many of these problems. Two principal component analysis (PCA)-based FE, specifically, variational Bayes PCA (VBPCA) was extended to perform unsupervised FE, and together with conventional PCA (CPCA)-based unsupervised FE, were tested as sample classification independent unsupervised FE methods. VBPCA- and CPCA-based unsupervised FE both performed well when applied to simulated data, and a posttraumatic stress disorder (PTSD)-mediated heart disease data set that had multiple categorical class observations in mRNA/microRNA expression of stressed mouse heart. A critical set of PTSD miRNAs/mRNAs were identified that show aberrant expression between treatment and control samples, and significant, negative correlation with one another. Moreover, greater stability and biological feasibility than conventional supervised FE was also demonstrated. Based on the results obtained, in silico drug discovery was performed as translational validation of the methods. Our two proposed unsupervised FE methods (CPCA- and VBPCA-based) worked well on simulated data, and outperformed two conventional supervised FE methods on a real data set. Thus, these two methods have suggested equivalence for FE on categorical multiclass data sets, with potential translational utility for in silico drug discovery.

  2. Local classification: Locally weighted-partial least squares-discriminant analysis (LW-PLS-DA).

    PubMed

    Bevilacqua, Marta; Marini, Federico

    2014-08-01

    The possibility of devising a simple, flexible and accurate non-linear classification method, by extending the locally weighted partial least squares (LW-PLS) approach to the cases where the algorithm is used in a discriminant way (partial least squares discriminant analysis, PLS-DA), is presented. In particular, to assess which category an unknown sample belongs to, the proposed algorithm operates by identifying which training objects are most similar to the one to be predicted and building a PLS-DA model using these calibration samples only. Moreover, the influence of the selected training samples on the local model can be further modulated by adopting a not uniform distance-based weighting scheme which allows the farthest calibration objects to have less impact than the closest ones. The performances of the proposed locally weighted-partial least squares-discriminant analysis (LW-PLS-DA) algorithm have been tested on three simulated data sets characterized by a varying degree of non-linearity: in all cases, a classification accuracy higher than 99% on external validation samples was achieved. Moreover, when also applied to a real data set (classification of rice varieties), characterized by a high extent of non-linearity, the proposed method provided an average correct classification rate of about 93% on the test set. By the preliminary results, showed in this paper, the performances of the proposed LW-PLS-DA approach have proved to be comparable and in some cases better than those obtained by other non-linear methods (k nearest neighbors, kernel-PLS-DA and, in the case of rice, counterpropagation neural networks). Copyright © 2014 Elsevier B.V. All rights reserved.

  3. Multiclass fMRI data decoding and visualization using supervised self-organizing maps.

    PubMed

    Hausfeld, Lars; Valente, Giancarlo; Formisano, Elia

    2014-08-01

    When multivariate pattern decoding is applied to fMRI studies entailing more than two experimental conditions, a most common approach is to transform the multiclass classification problem into a series of binary problems. Furthermore, for decoding analyses, classification accuracy is often the only outcome reported although the topology of activation patterns in the high-dimensional features space may provide additional insights into underlying brain representations. Here we propose to decode and visualize voxel patterns of fMRI datasets consisting of multiple conditions with a supervised variant of self-organizing maps (SSOMs). Using simulations and real fMRI data, we evaluated the performance of our SSOM-based approach. Specifically, the analysis of simulated fMRI data with varying signal-to-noise and contrast-to-noise ratio suggested that SSOMs perform better than a k-nearest-neighbor classifier for medium and large numbers of features (i.e. 250 to 1000 or more voxels) and similar to support vector machines (SVMs) for small and medium numbers of features (i.e. 100 to 600voxels). However, for a larger number of features (>800voxels), SSOMs performed worse than SVMs. When applied to a challenging 3-class fMRI classification problem with datasets collected to examine the neural representation of three human voices at individual speaker level, the SSOM-based algorithm was able to decode speaker identity from auditory cortical activation patterns. Classification performances were similar between SSOMs and other decoding algorithms; however, the ability to visualize decoding models and underlying data topology of SSOMs promotes a more comprehensive understanding of classification outcomes. We further illustrated this visualization ability of SSOMs with a re-analysis of a dataset examining the representation of visual categories in the ventral visual cortex (Haxby et al., 2001). This analysis showed that SSOMs could retrieve and visualize topography and neighborhood relations of the brain representation of eight visual categories. We conclude that SSOMs are particularly suited for decoding datasets consisting of more than two classes and are optimally combined with approaches that reduce the number of voxels used for classification (e.g. region-of-interest or searchlight approaches). Copyright © 2014. Published by Elsevier Inc.

  4. A multiple hold-out framework for Sparse Partial Least Squares.

    PubMed

    Monteiro, João M; Rao, Anil; Shawe-Taylor, John; Mourão-Miranda, Janaina

    2016-09-15

    Supervised classification machine learning algorithms may have limitations when studying brain diseases with heterogeneous populations, as the labels might be unreliable. More exploratory approaches, such as Sparse Partial Least Squares (SPLS), may provide insights into the brain's mechanisms by finding relationships between neuroimaging and clinical/demographic data. The identification of these relationships has the potential to improve the current understanding of disease mechanisms, refine clinical assessment tools, and stratify patients. SPLS finds multivariate associative effects in the data by computing pairs of sparse weight vectors, where each pair is used to remove its corresponding associative effect from the data by matrix deflation, before computing additional pairs. We propose a novel SPLS framework which selects the adequate number of voxels and clinical variables to describe each associative effect, and tests their reliability by fitting the model to different splits of the data. As a proof of concept, the approach was applied to find associations between grey matter probability maps and individual items of the Mini-Mental State Examination (MMSE) in a clinical sample with various degrees of dementia. The framework found two statistically significant associative effects between subsets of brain voxels and subsets of the questions/tasks. SPLS was compared with its non-sparse version (PLS). The use of projection deflation versus a classical PLS deflation was also tested in both PLS and SPLS. SPLS outperformed PLS, finding statistically significant effects and providing higher correlation values in hold-out data. Moreover, projection deflation provided better results. Copyright © 2016 The Author(s). Published by Elsevier B.V. All rights reserved.

  5. Use of LANDSAT imagery for wildlife habitat mapping in northeast and eastcentral Alaska. [winter and summer moose range

    NASA Technical Reports Server (NTRS)

    Lent, P. C. (Principal Investigator)

    1976-01-01

    The author has identified the following significant results. Winter and summer moose range maps of three selected areas were produced (1:63,360 scale). The analytic approach is very similar to modified clustering. Preliminary results indicate that this method is not only more accurate but considerably less expensive than supervised classification techniques.

  6. Empirically Supported Psychotherapy in Social Work Training Programs: Does the Definition of Evidence Matter?

    ERIC Educational Resources Information Center

    Bledsoe, Sarah E.; Weissman, Myrna M.; Mullen, Edward J.; Ponniah, Kathryn; Gameroff, Marc J.; Verdeli, Helen; Mufson, Laura; Fitterling, Heidi; Wickramaratne, Priya

    2007-01-01

    Objectives: A national survey finds that 62% of social work programs do not require didactic and clinical supervision in any empirically supported psychotherapy (EST). The authors report the results of analysis of national survey data using two alternative classifications of EST to determine if the results are because of the definition of EST used…

  7. Instance annotation for multi-instance multi-label learning

    Treesearch

    F. Briggs; X.Z. Fern; R. Raich; Q. Lou

    2013-01-01

    Multi-instance multi-label learning (MIML) is a framework for supervised classification where the objects to be classified are bags of instances associated with multiple labels. For example, an image can be represented as a bag of segments and associated with a list of objects it contains. Prior work on MIML has focused on predicting label sets for previously unseen...

  8. A sampling system for estimating the cultivation of wheat (Triticum aestivum L) from LANDSAT data. M.S. Thesis - 21 Jul. 1983

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Moreira, M. A.

    1983-01-01

    Using digitally processed MSS/LANDSAT data as auxiliary variable, a methodology to estimate wheat (Triticum aestivum L) area by means of sampling techniques was developed. To perform this research, aerial photographs covering 720 sq km in Cruz Alta test site at the NW of Rio Grande do Sul State, were visually analyzed. LANDSAT digital data were analyzed using non-supervised and supervised classification algorithms; as post-processing the classification was submitted to spatial filtering. To estimate wheat area, the regression estimation method was applied and different sample sizes and various sampling units (10, 20, 30, 40 and 60 sq km) were tested. Based on the four decision criteria established for this research, it was concluded that: (1) as the size of sampling units decreased the percentage of sampled area required to obtain similar estimation performance also decreased; (2) the lowest percentage of the area sampled for wheat estimation with relatively high precision and accuracy through regression estimation was 90% using 10 sq km s the sampling unit; and (3) wheat area estimation by direct expansion (using only aerial photographs) was less precise and accurate when compared to those obtained by means of regression estimation.

  9. Application of diffusion maps to identify human factors of self-reported anomalies in aviation.

    PubMed

    Andrzejczak, Chris; Karwowski, Waldemar; Mikusinski, Piotr

    2012-01-01

    A study investigating what factors are present leading to pilots submitting voluntary anomaly reports regarding their flight performance was conducted. Diffusion Maps (DM) were selected as the method of choice for performing dimensionality reduction on text records for this study. Diffusion Maps have seen successful use in other domains such as image classification and pattern recognition. High-dimensionality data in the form of narrative text reports from the NASA Aviation Safety Reporting System (ASRS) were clustered and categorized by way of dimensionality reduction. Supervised analyses were performed to create a baseline document clustering system. Dimensionality reduction techniques identified concepts or keywords within records, and allowed the creation of a framework for an unsupervised document classification system. Results from the unsupervised clustering algorithm performed similarly to the supervised methods outlined in the study. The dimensionality reduction was performed on 100 of the most commonly occurring words within 126,000 text records describing commercial aviation incidents. This study demonstrates that unsupervised machine clustering and organization of incident reports is possible based on unbiased inputs. Findings from this study reinforced traditional views on what factors contribute to civil aviation anomalies, however, new associations between previously unrelated factors and conditions were also found.

  10. Inline inspection of textured plastics surfaces

    NASA Astrophysics Data System (ADS)

    Michaeli, Walter; Berdel, Klaus

    2011-02-01

    This article focuses on the inspection of plastics web materials exhibiting irregular textures such as imitation wood or leather. They are produced in a continuous process at high speed. In this process, various defects occur sporadically. However, current inspection systems for plastics surfaces are able to inspect unstructured products or products with regular, i.e., highly periodic, textures, only. The proposed inspection algorithm uses the local binary pattern operator for texture feature extraction. For classification, semisupervised as well as supervised approaches are used. A simple concept for semisupervised classification is presented and applied for defect detection. The resulting defect-maps are presented to the operator. He assigns class labels that are used to train the supervised classifier in order to distinguish between different defect types. A concept for parallelization is presented allowing the efficient use of standard multicore processor PC hardware. Experiments with images of a typical product acquired in an industrial setting show a detection rate of 97% while achieving a false alarm rate below 1%. Real-time tests show that defects can be reliably detected even at haul-off speeds of 30 m/min. Further applications of the presented concept can be found in the inspection of other materials.

  11. Classification of ROTSE Variable Stars using Machine Learning

    NASA Astrophysics Data System (ADS)

    Wozniak, P. R.; Akerlof, C.; Amrose, S.; Brumby, S.; Casperson, D.; Gisler, G.; Kehoe, R.; Lee, B.; Marshall, S.; McGowan, K. E.; McKay, T.; Perkins, S.; Priedhorsky, W.; Rykoff, E.; Smith, D. A.; Theiler, J.; Vestrand, W. T.; Wren, J.; ROTSE Collaboration

    2001-12-01

    We evaluate several Machine Learning algorithms as potential tools for automated classification of variable stars. Using the ROTSE sample of ~1800 variables from a pilot study of 5% of the whole sky, we compare the effectiveness of a supervised technique (Support Vector Machines, SVM) versus unsupervised methods (K-means and Autoclass). There are 8 types of variables in the sample: RR Lyr AB, RR Lyr C, Delta Scuti, Cepheids, detached eclipsing binaries, contact binaries, Miras and LPVs. Preliminary results suggest a very high ( ~95%) efficiency of SVM in isolating a few best defined classes against the rest of the sample, and good accuracy ( ~70-75%) for all classes considered simultaneously. This includes some degeneracies, irreducible with the information at hand. Supervised methods naturally outperform unsupervised methods, in terms of final error rate, but unsupervised methods offer many advantages for large sets of unlabeled data. Therefore, both types of methods should be considered as promising tools for mining vast variability surveys. We project that there are more than 30,000 periodic variables in the ROTSE-I data base covering the entire local sky between V=10 and 15.5 mag. This sample size is already stretching the time capabilities of human analysts.

  12. Constrained Low-Rank Learning Using Least Squares-Based Regularization.

    PubMed

    Li, Ping; Yu, Jun; Wang, Meng; Zhang, Luming; Cai, Deng; Li, Xuelong

    2017-12-01

    Low-rank learning has attracted much attention recently due to its efficacy in a rich variety of real-world tasks, e.g., subspace segmentation and image categorization. Most low-rank methods are incapable of capturing low-dimensional subspace for supervised learning tasks, e.g., classification and regression. This paper aims to learn both the discriminant low-rank representation (LRR) and the robust projecting subspace in a supervised manner. To achieve this goal, we cast the problem into a constrained rank minimization framework by adopting the least squares regularization. Naturally, the data label structure tends to resemble that of the corresponding low-dimensional representation, which is derived from the robust subspace projection of clean data by low-rank learning. Moreover, the low-dimensional representation of original data can be paired with some informative structure by imposing an appropriate constraint, e.g., Laplacian regularizer. Therefore, we propose a novel constrained LRR method. The objective function is formulated as a constrained nuclear norm minimization problem, which can be solved by the inexact augmented Lagrange multiplier algorithm. Extensive experiments on image classification, human pose estimation, and robust face recovery have confirmed the superiority of our method.

  13. Snow mapping and land use studies in Switzerland

    NASA Technical Reports Server (NTRS)

    Haefner, H. (Principal Investigator)

    1977-01-01

    The author has identified the following significant results. A system was developed for operational snow and land use mapping, based on a supervised classification method using various classification algorithms and representation of the results in maplike form on color film with a photomation system. Land use mapping, under European conditions, was achieved with a stepwise linear discriminant analysis by using additional ratio variables. On fall images, signatures of built-up areas were often not separable from wetlands. Two different methods were tested to correlate the size of settlements and the population with an accuracy for the densely populated Swiss Plateau between +2 or -12%.

  14. myBlackBox: Blackbox Mobile Cloud Systems for Personalized Unusual Event Detection.

    PubMed

    Ahn, Junho; Han, Richard

    2016-05-23

    We demonstrate the feasibility of constructing a novel and practical real-world mobile cloud system, called myBlackBox, that efficiently fuses multimodal smartphone sensor data to identify and log unusual personal events in mobile users' daily lives. The system incorporates a hybrid architectural design that combines unsupervised classification of audio, accelerometer and location data with supervised joint fusion classification to achieve high accuracy, customization, convenience and scalability. We show the feasibility of myBlackBox by implementing and evaluating this end-to-end system that combines Android smartphones with cloud servers, deployed for 15 users over a one-month period.

  15. myBlackBox: Blackbox Mobile Cloud Systems for Personalized Unusual Event Detection

    PubMed Central

    Ahn, Junho; Han, Richard

    2016-01-01

    We demonstrate the feasibility of constructing a novel and practical real-world mobile cloud system, called myBlackBox, that efficiently fuses multimodal smartphone sensor data to identify and log unusual personal events in mobile users’ daily lives. The system incorporates a hybrid architectural design that combines unsupervised classification of audio, accelerometer and location data with supervised joint fusion classification to achieve high accuracy, customization, convenience and scalability. We show the feasibility of myBlackBox by implementing and evaluating this end-to-end system that combines Android smartphones with cloud servers, deployed for 15 users over a one-month period. PMID:27223292

  16. Speaker emotion recognition: from classical classifiers to deep neural networks

    NASA Astrophysics Data System (ADS)

    Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

    2018-04-01

    Speaker emotion recognition is considered among the most challenging tasks in recent years. In fact, automatic systems for security, medicine or education can be improved when considering the speech affective state. In this paper, a twofold approach for speech emotion classification is proposed. At the first side, a relevant set of features is adopted, and then at the second one, numerous supervised training techniques, involving classic methods as well as deep learning, are experimented. Experimental results indicate that deep architecture can improve classification performance on two affective databases, the Berlin Dataset of Emotional Speech and the SAVEE Dataset Surrey Audio-Visual Expressed Emotion.

  17. Assessment of Gait Characteristics in Total Knee Arthroplasty Patients Using a Hierarchical Partial Least Squares Method.

    PubMed

    Wang, Wei; Ackland, David C; McClelland, Jodie A; Webster, Kate E; Halgamuge, Saman

    2018-01-01

    Quantitative gait analysis is an important tool in objective assessment and management of total knee arthroplasty (TKA) patients. Studies evaluating gait patterns in TKA patients have tended to focus on discrete data such as spatiotemporal information, joint range of motion and peak values of kinematics and kinetics, or consider selected principal components of gait waveforms for analysis. These strategies may not have the capacity to capture small variations in gait patterns associated with each joint across an entire gait cycle, and may ultimately limit the accuracy of gait classification. The aim of this study was to develop an automatic feature extraction method to analyse patterns from high-dimensional autocorrelated gait waveforms. A general linear feature extraction framework was proposed and a hierarchical partial least squares method derived for discriminant analysis of multiple gait waveforms. The effectiveness of this strategy was verified using a dataset of joint angle and ground reaction force waveforms from 43 patients after TKA surgery and 31 healthy control subjects. Compared with principal component analysis and partial least squares methods, the hierarchical partial least squares method achieved generally better classification performance on all possible combinations of waveforms, with the highest classification accuracy . The novel hierarchical partial least squares method proposed is capable of capturing virtually all significant differences between TKA patients and the controls, and provides new insights into data visualization. The proposed framework presents a foundation for more rigorous classification of gait, and may ultimately be used to evaluate the effects of interventions such as surgery and rehabilitation.

  18. Towards a robust framework for catchment classification

    NASA Astrophysics Data System (ADS)

    Deshmukh, A.; Samal, A.; Singh, R.

    2017-12-01

    Classification of catchments based on various measures of similarity has emerged as an important technique to understand regional scale hydrologic behavior. Classification of catchment characteristics and/or streamflow response has been used reveal which characteristics are more likely to explain the observed variability of hydrologic response. However, numerous algorithms for supervised or unsupervised classification are available, making it hard to identify the algorithm most suitable for the dataset at hand. Consequently, existing catchment classification studies vary significantly in the classification algorithms employed with no previous attempt at understanding the degree of uncertainty in classification due to this algorithmic choice. This hinders the generalizability of interpretations related to hydrologic behavior. Our goal is to develop a protocol that can be followed while classifying hydrologic datasets. We focus on a classification framework for unsupervised classification and provide a step-by-step classification procedure. The steps include testing the clusterabiltiy of original dataset prior to classification, feature selection, validation of clustered data, and quantification of similarity of two clusterings. We test several commonly available methods within this framework to understand the level of similarity of classification results across algorithms. We apply the proposed framework on recently developed datasets for India to analyze to what extent catchment properties can explain observed catchment response. Our testing dataset includes watershed characteristics for over 200 watersheds which comprise of both natural (physio-climatic) characteristics and socio-economic characteristics. This framework allows us to understand the controls on observed hydrologic variability across India.

  19. Logging damage

    Treesearch

    Ralph D. Nyland

    1989-01-01

    The best commercial logging will damage at least some residual trees during all forms of partial cutting, no matter how carefully done. Yet recommendations at the end of this Note show there is much that you can do to limit damage by proper road and trail layout, proper training and supervision of crews, appropriate equipment, and diligence.

  20. Intrapartum fetal heart rate classification from trajectory in Sparse SVM feature space.

    PubMed

    Spilka, J; Frecon, J; Leonarduzzi, R; Pustelnik, N; Abry, P; Doret, M

    2015-01-01

    Intrapartum fetal heart rate (FHR) constitutes a prominent source of information for the assessment of fetal reactions to stress events during delivery. Yet, early detection of fetal acidosis remains a challenging signal processing task. The originality of the present contribution are three-fold: multiscale representations and wavelet leader based multifractal analysis are used to quantify FHR variability ; Supervised classification is achieved by means of Sparse-SVM that aim jointly to achieve optimal detection performance and to select relevant features in a multivariate setting ; Trajectories in the feature space accounting for the evolution along time of features while labor progresses are involved in the construction of indices quantifying fetal health. The classification performance permitted by this combination of tools are quantified on a intrapartum FHR large database (≃ 1250 subjects) collected at a French academic public hospital.

  1. An efficient abnormal cervical cell detection system based on multi-instance extreme learning machine

    NASA Astrophysics Data System (ADS)

    Zhao, Lili; Yin, Jianping; Yuan, Lihuan; Liu, Qiang; Li, Kuan; Qiu, Minghui

    2017-07-01

    Automatic detection of abnormal cells from cervical smear images is extremely demanded in annual diagnosis of women's cervical cancer. For this medical cell recognition problem, there are three different feature sections, namely cytology morphology, nuclear chromatin pathology and region intensity. The challenges of this problem come from feature combination s and classification accurately and efficiently. Thus, we propose an efficient abnormal cervical cell detection system based on multi-instance extreme learning machine (MI-ELM) to deal with above two questions in one unified framework. MI-ELM is one of the most promising supervised learning classifiers which can deal with several feature sections and realistic classification problems analytically. Experiment results over Herlev dataset demonstrate that the proposed method outperforms three traditional methods for two-class classification in terms of well accuracy and less time.

  2. Evaluation of Alzheimer's disease by analysis of MR images using Objective Dialectical Classifiers as an alternative to ADC maps.

    PubMed

    Dos Santos, Wellington P; de Assis, Francisco M; de Souza, Ricardo E; Dos Santos Filho, Plinio B

    2008-01-01

    Alzheimer's disease is the most common cause of dementia, yet hard to diagnose precisely without invasive techniques, particularly at the onset of the disease. This work approaches image analysis and classification of synthetic multispectral images composed by diffusion-weighted (DW) magnetic resonance (MR) cerebral images for the evaluation of cerebrospinal fluid area and measuring the advance of Alzheimer's disease. A clinical 1.5 T MR imaging system was used to acquire all images presented. The classification methods are based on Objective Dialectical Classifiers, a new method based on Dialectics as defined in the Philosophy of Praxis. A 2-degree polynomial network with supervised training is used to generate the ground truth image. The classification results are used to improve the usual analysis of the apparent diffusion coefficient map.

  3. Korean coastal water depth/sediment and land cover mapping (1:25,000) by computer analysis of LANDSAT imagery

    NASA Technical Reports Server (NTRS)

    Park, K. Y.; Miller, L. D.

    1978-01-01

    Computer analysis was applied to single date LANDSAT MSS imagery of a sample coastal area near Seoul, Korea equivalent to a 1:50,000 topographic map. Supervised image processing yielded a test classification map from this sample image containing 12 classes: 5 water depth/sediment classes, 2 shoreline/tidal classes, and 5 coastal land cover classes at a scale of 1:25,000 and with a training set accuracy of 76%. Unsupervised image classification was applied to a subportion of the site analyzed and produced classification maps comparable in results in a spatial sense. The results of this test indicated that it is feasible to produce such quantitative maps for detailed study of dynamic coastal processes given a LANDSAT image data base at sufficiently frequent time intervals.

  4. UNMANNED AERIAL VEHICLE (UAV) HYPERSPECTRAL REMOTE SENSING FOR DRYLAND VEGETATION MONITORING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nancy F. Glenn; Jessica J. Mitchell; Matthew O. Anderson

    2012-06-01

    UAV-based hyperspectral remote sensing capabilities developed by the Idaho National Lab and Idaho State University, Boise Center Aerospace Lab, were recently tested via demonstration flights that explored the influence of altitude on geometric error, image mosaicking, and dryland vegetation classification. The test flights successfully acquired usable flightline data capable of supporting classifiable composite images. Unsupervised classification results support vegetation management objectives that rely on mapping shrub cover and distribution patterns. Overall, supervised classifications performed poorly despite spectral separability in the image-derived endmember pixels. Future mapping efforts that leverage ground reference data, ultra-high spatial resolution photos and time series analysis shouldmore » be able to effectively distinguish native grasses such as Sandberg bluegrass (Poa secunda), from invasives such as burr buttercup (Ranunculus testiculatus) and cheatgrass (Bromus tectorum).« less

  5. The effects of neuromuscular exercise on medial knee joint load post-arthroscopic partial medial meniscectomy: 'SCOPEX', a randomised control trial protocol.

    PubMed

    Hall, Michelle; Hinman, Rana S; Wrigley, Tim V; Roos, Ewa M; Hodges, Paul W; Staples, Margaret; Bennell, Kim L

    2012-11-27

    Meniscectomy is a risk factor for knee osteoarthritis, with increased medial joint loading a likely contributor to the development and progression of knee osteoarthritis in this group. Therefore, post-surgical rehabilitation or interventions that reduce medial knee joint loading have the potential to reduce the risk of developing or progressing osteoarthritis. The primary purpose of this randomised, assessor-blind controlled trial is to determine the effects of a home-based, physiotherapist-supervised neuromuscular exercise program on medial knee joint load during functional tasks in people who have recently undergone a partial medial meniscectomy. 62 people aged 30-50 years who have undergone an arthroscopic partial medial meniscectomy within the previous 3 to 12 months will be recruited and randomly assigned to a neuromuscular exercise or control group using concealed allocation. The neuromuscular exercise group will attend 8 supervised exercise sessions with a physiotherapist and will perform 6 exercises at home, at least 3 times per week for 12 weeks. The control group will not receive the neuromuscular training program. Blinded assessment will be performed at baseline and immediately following the 12-week intervention. The primary outcomes are change in the peak external knee adduction moment measured by 3-dimensional analysis during normal paced walking and one-leg rise. Secondary outcomes include the change in peak external knee adduction moment during fast pace walking and one-leg hop and change in the knee adduction moment impulse during walking, one-leg rise and one-leg hop, knee and hip muscle strength, electromyographic muscle activation patterns, objective measures of physical function, as well as self-reported measures of physical function and symptoms and additional biomechanical parameters. The findings from this trial will provide evidence regarding the effect of a home-based, physiotherapist-supervised neuromuscular exercise program on medial knee joint load during various tasks in people with a partial medial meniscectomy. If shown to reduce the knee adduction moment, neuromuscular exercise has the potential to prevent the onset of osteoarthritis or slow its progression in those with early disease. Australian New Zealand Clinical Trials Registry reference: ACTRN12612000542897.

  6. Manifold regularized multitask learning for semi-supervised multilabel image classification.

    PubMed

    Luo, Yong; Tao, Dacheng; Geng, Bo; Xu, Chao; Maybank, Stephen J

    2013-02-01

    It is a significant challenge to classify images with multiple labels by using only a small number of labeled samples. One option is to learn a binary classifier for each label and use manifold regularization to improve the classification performance by exploring the underlying geometric structure of the data distribution. However, such an approach does not perform well in practice when images from multiple concepts are represented by high-dimensional visual features. Thus, manifold regularization is insufficient to control the model complexity. In this paper, we propose a manifold regularized multitask learning (MRMTL) algorithm. MRMTL learns a discriminative subspace shared by multiple classification tasks by exploiting the common structure of these tasks. It effectively controls the model complexity because different tasks limit one another's search volume, and the manifold regularization ensures that the functions in the shared hypothesis space are smooth along the data manifold. We conduct extensive experiments, on the PASCAL VOC'07 dataset with 20 classes and the MIR dataset with 38 classes, by comparing MRMTL with popular image classification algorithms. The results suggest that MRMTL is effective for image classification.

  7. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

    PubMed

    Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

    2015-08-01

    Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.

  8. Classification of complex networks based on similarity of topological network features

    NASA Astrophysics Data System (ADS)

    Attar, Niousha; Aliakbary, Sadegh

    2017-09-01

    Over the past few decades, networks have been widely used to model real-world phenomena. Real-world networks exhibit nontrivial topological characteristics and therefore, many network models are proposed in the literature for generating graphs that are similar to real networks. Network models reproduce nontrivial properties such as long-tail degree distributions or high clustering coefficients. In this context, we encounter the problem of selecting the network model that best fits a given real-world network. The need for a model selection method reveals the network classification problem, in which a target-network is classified into one of the candidate network models. In this paper, we propose a novel network classification method which is independent of the network size and employs an alignment-free metric of network comparison. The proposed method is based on supervised machine learning algorithms and utilizes the topological similarities of networks for the classification task. The experiments show that the proposed method outperforms state-of-the-art methods with respect to classification accuracy, time efficiency, and robustness to noise.

  9. Classification and data acquisition with incomplete data

    NASA Astrophysics Data System (ADS)

    Williams, David P.

    In remote-sensing applications, incomplete data can result when only a subset of sensors (e.g., radar, infrared, acoustic) are deployed at certain regions. The limitations of single sensor systems have spurred interest in employing multiple sensor modalities simultaneously. For example, in land mine detection tasks, different sensor modalities are better-suited to capture different aspects of the underlying physics of the mines. Synthetic aperture radar sensors may be better at detecting surface mines, while infrared sensors may be better at detecting buried mines. By employing multiple sensor modalities to address the detection task, the strengths of the disparate sensors can be exploited in a synergistic manner to improve performance beyond that which would be achievable with either single sensor alone. When multi-sensor approaches are employed, however, incomplete data can be manifested. If each sensor is located on a separate platform ( e.g., aircraft), each sensor may interrogate---and hence collect data over---only partially overlapping areas of land. As a result, some data points may be characterized by data (i.e., features) from only a subset of the possible sensors employed in the task. Equivalently, this scenario implies that some data points will be missing features. Increasing focus in the future on using---and fusing data from---multiple sensors will make such incomplete-data problems commonplace. In many applications involving incomplete data, it is possible to acquire the missing data at a cost. In multi-sensor remote-sensing applications, data is acquired by deploying sensors to data points. Acquiring data is usually an expensive, time-consuming task, a fact that necessitates an intelligent data acquisition process. Incomplete data is not limited to remote-sensing applications, but rather, can arise in virtually any data set. In this dissertation, we address the general problem of classification when faced with incomplete data. We also address the closely related problem of active data acquisition, which develops a strategy to acquire missing features and labels that will most benefit the classification task. We first address the general problem of classification with incomplete data, maintaining the view that all data (i.e., information) is valuable. We employ a logistic regression framework within which we formulate a supervised classification algorithm for incomplete data. This principled, yet flexible, framework permits several interesting extensions that allow all available data to be utilized. One extension incorporates labeling error, which permits the usage of potentially imperfectly labeled data in learning a classifier. A second major extension converts the proposed algorithm to a semi-supervised approach by utilizing unlabeled data via graph-based regularization. Finally, the classification algorithm is extended to the case in which (image) data---from which features are extracted---are available from multiple resolutions. Taken together, this family of incomplete-data classification algorithms exploits all available data in a principled manner by avoiding explicit imputation. Instead, missing data is integrated out analytically with the aid of an estimated conditional density function (conditioned on the observed features). This feat is accomplished by invoking only mild assumptions. We also address the problem of active data acquisition by determining which missing data should be acquired to most improve performance. Specifically, we examine this data acquisition task when the data to be acquired can be either labels or features. The proposed approach is based on a criterion that accounts for the expected benefit of the acquisition. This approach, which is applicable for any general missing data problem, exploits the incomplete-data classification framework introduced in the first part of this dissertation. This data acquisition approach allows for the acquisition of both labels and features. Moreover, several types of feature acquisition are permitted, including the acquisition of individual or multiple features for individual or multiple data points, which may be either labeled or unlabeled. Furthermore, if different types of data acquisition are feasible for a given application, the algorithm will automatically determine the most beneficial type of data to acquire. Experimental results on both benchmark machine learning data sets and real (i.e., measured) remote-sensing data demonstrate the advantages of the proposed incomplete-data classification and active data acquisition algorithms.

  10. Application of partial least squares near-infrared spectral classification in diabetic identification

    NASA Astrophysics Data System (ADS)

    Yan, Wen-juan; Yang, Ming; He, Guo-quan; Qin, Lin; Li, Gang

    2014-11-01

    In order to identify the diabetic patients by using tongue near-infrared (NIR) spectrum - a spectral classification model of the NIR reflectivity of the tongue tip is proposed, based on the partial least square (PLS) method. 39sample data of tongue tip's NIR spectra are harvested from healthy people and diabetic patients , respectively. After pretreatment of the reflectivity, the spectral data are set as the independent variable matrix, and information of classification as the dependent variables matrix, Samples were divided into two groups - i.e. 53 samples as calibration set and 25 as prediction set - then the PLS is used to build the classification model The constructed modelfrom the 53 samples has the correlation of 0.9614 and the root mean square error of cross-validation (RMSECV) of 0.1387.The predictions for the 25 samples have the correlation of 0.9146 and the RMSECV of 0.2122.The experimental result shows that the PLS method can achieve good classification on features of healthy people and diabetic patients.

  11. Automated classification of cell morphology by coherence-controlled holographic microscopy

    NASA Astrophysics Data System (ADS)

    Strbkova, Lenka; Zicha, Daniel; Vesely, Pavel; Chmelik, Radim

    2017-08-01

    In the last few years, classification of cells by machine learning has become frequently used in biology. However, most of the approaches are based on morphometric (MO) features, which are not quantitative in terms of cell mass. This may result in poor classification accuracy. Here, we study the potential contribution of coherence-controlled holographic microscopy enabling quantitative phase imaging for the classification of cell morphologies. We compare our approach with the commonly used method based on MO features. We tested both classification approaches in an experiment with nutritionally deprived cancer tissue cells, while employing several supervised machine learning algorithms. Most of the classifiers provided higher performance when quantitative phase features were employed. Based on the results, it can be concluded that the quantitative phase features played an important role in improving the performance of the classification. The methodology could be valuable help in refining the monitoring of live cells in an automated fashion. We believe that coherence-controlled holographic microscopy, as a tool for quantitative phase imaging, offers all preconditions for the accurate automated analysis of live cell behavior while enabling noninvasive label-free imaging with sufficient contrast and high-spatiotemporal phase sensitivity.

  12. Classification by Using Multispectral Point Cloud Data

    NASA Astrophysics Data System (ADS)

    Liao, C. T.; Huang, H. H.

    2012-07-01

    Remote sensing images are generally recorded in two-dimensional format containing multispectral information. Also, the semantic information is clearly visualized, which ground features can be better recognized and classified via supervised or unsupervised classification methods easily. Nevertheless, the shortcomings of multispectral images are highly depending on light conditions, and classification results lack of three-dimensional semantic information. On the other hand, LiDAR has become a main technology for acquiring high accuracy point cloud data. The advantages of LiDAR are high data acquisition rate, independent of light conditions and can directly produce three-dimensional coordinates. However, comparing with multispectral images, the disadvantage is multispectral information shortage, which remains a challenge in ground feature classification through massive point cloud data. Consequently, by combining the advantages of both LiDAR and multispectral images, point cloud data with three-dimensional coordinates and multispectral information can produce a integrate solution for point cloud classification. Therefore, this research acquires visible light and near infrared images, via close range photogrammetry, by matching images automatically through free online service for multispectral point cloud generation. Then, one can use three-dimensional affine coordinate transformation to compare the data increment. At last, the given threshold of height and color information is set as threshold in classification.

  13. Automatic Classification of High Resolution Satellite Imagery - a Case Study for Urban Areas in the Kingdom of Saudi Arabia

    NASA Astrophysics Data System (ADS)

    Maas, A.; Alrajhi, M.; Alobeid, A.; Heipke, C.

    2017-05-01

    Updating topographic geospatial databases is often performed based on current remotely sensed images. To automatically extract the object information (labels) from the images, supervised classifiers are being employed. Decisions to be taken in this process concern the definition of the classes which should be recognised, the features to describe each class and the training data necessary in the learning part of classification. With a view to large scale topographic databases for fast developing urban areas in the Kingdom of Saudi Arabia we conducted a case study, which investigated the following two questions: (a) which set of features is best suitable for the classification?; (b) what is the added value of height information, e.g. derived from stereo imagery? Using stereoscopic GeoEye and Ikonos satellite data we investigate these two questions based on our research on label tolerant classification using logistic regression and partly incorrect training data. We show that in between five and ten features can be recommended to obtain a stable solution, that height information consistently yields an improved overall classification accuracy of about 5%, and that label noise can be successfully modelled and thus only marginally influences the classification results.

  14. Automated classification of cell morphology by coherence-controlled holographic microscopy.

    PubMed

    Strbkova, Lenka; Zicha, Daniel; Vesely, Pavel; Chmelik, Radim

    2017-08-01

    In the last few years, classification of cells by machine learning has become frequently used in biology. However, most of the approaches are based on morphometric (MO) features, which are not quantitative in terms of cell mass. This may result in poor classification accuracy. Here, we study the potential contribution of coherence-controlled holographic microscopy enabling quantitative phase imaging for the classification of cell morphologies. We compare our approach with the commonly used method based on MO features. We tested both classification approaches in an experiment with nutritionally deprived cancer tissue cells, while employing several supervised machine learning algorithms. Most of the classifiers provided higher performance when quantitative phase features were employed. Based on the results, it can be concluded that the quantitative phase features played an important role in improving the performance of the classification. The methodology could be valuable help in refining the monitoring of live cells in an automated fashion. We believe that coherence-controlled holographic microscopy, as a tool for quantitative phase imaging, offers all preconditions for the accurate automated analysis of live cell behavior while enabling noninvasive label-free imaging with sufficient contrast and high-spatiotemporal phase sensitivity. (2017) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).

  15. Risks incurred by hydrogen escaping from containers and conduits

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Swain, M.R.; Grilliot, E.S.

    1998-08-01

    This paper is a discussion of a method for hydrogen leak classification. Leaks are classified as; gas escapes into enclosed spaces, gas escapes into partially enclosed spaces (vented), and gas escapes into unenclosed spaces. Each of the three enclosure classifications is further divided into two subclasses; total volume of hydrogen escaped and flow rate of escaping hydrogen. A method to aid in risk assessment determination in partially enclosed spaces is proposed and verified for several enclosure geometries. Examples are discussed for additional enclosure geometries.

  16. An Extended Spectral-Spatial Classification Approach for Hyperspectral Data

    NASA Astrophysics Data System (ADS)

    Akbari, D.

    2017-11-01

    In this paper an extended classification approach for hyperspectral imagery based on both spectral and spatial information is proposed. The spatial information is obtained by an enhanced marker-based minimum spanning forest (MSF) algorithm. Three different methods of dimension reduction are first used to obtain the subspace of hyperspectral data: (1) unsupervised feature extraction methods including principal component analysis (PCA), independent component analysis (ICA), and minimum noise fraction (MNF); (2) supervised feature extraction including decision boundary feature extraction (DBFE), discriminate analysis feature extraction (DAFE), and nonparametric weighted feature extraction (NWFE); (3) genetic algorithm (GA). The spectral features obtained are then fed into the enhanced marker-based MSF classification algorithm. In the enhanced MSF algorithm, the markers are extracted from the classification maps obtained by both SVM and watershed segmentation algorithm. To evaluate the proposed approach, the Pavia University hyperspectral data is tested. Experimental results show that the proposed approach using GA achieves an approximately 8 % overall accuracy higher than the original MSF-based algorithm.

  17. Microcomputer-based classification of environmental data in municipal areas

    NASA Astrophysics Data System (ADS)

    Thiergärtner, H.

    1995-10-01

    Multivariate data-processing methods used in mineral resource identification can be used to classify urban regions. Using elements of expert systems, geographical information systems, as well as known classification and prognosis systems, it is possible to outline a single model that consists of resistant and of temporary parts of a knowledge base including graphical input and output treatment and of resistant and temporary elements of a bank of methods and algorithms. Whereas decision rules created by experts will be stored in expert systems directly, powerful classification rules in form of resistant but latent (implicit) decision algorithms may be implemented in the suggested model. The latent functions will be transformed into temporary explicit decision rules by learning processes depending on the actual task(s), parameter set(s), pixels selection(s), and expert control(s). This takes place both at supervised and nonsupervised classification of multivariately described pixel sets representing municipal subareas. The model is outlined briefly and illustrated by results obtained in a target area covering a part of the city of Berlin (Germany).

  18. Application of support vector machine for the separation of mineralised zones in the Takht-e-Gonbad porphyry deposit, SE Iran

    NASA Astrophysics Data System (ADS)

    Mahvash Mohammadi, Neda; Hezarkhani, Ardeshir

    2018-07-01

    Classification of mineralised zones is an important factor for the analysis of economic deposits. In this paper, the support vector machine (SVM), a supervised learning algorithm, based on subsurface data is proposed for classification of mineralised zones in the Takht-e-Gonbad porphyry Cu-deposit (SE Iran). The effects of the input features are evaluated via calculating the accuracy rates on the SVM performance. Ultimately, the SVM model, is developed based on input features namely lithology, alteration, mineralisation, the level and, radial basis function (RBF) as a kernel function. Moreover, the optimal amount of parameters λ and C, using n-fold cross-validation method, are calculated at level 0.001 and 0.01 respectively. The accuracy of this model is 0.931 for classification of mineralised zones in the Takht-e-Gonbad porphyry deposit. The results of the study confirm the efficiency of SVM method for classification the mineralised zones.

  19. Automatic Identification of Critical Follow-Up Recommendation Sentences in Radiology Reports

    PubMed Central

    Yetisgen-Yildiz, Meliha; Gunn, Martin L.; Xia, Fei; Payne, Thomas H.

    2011-01-01

    Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports. PMID:22195225

  20. Automatic identification of critical follow-up recommendation sentences in radiology reports.

    PubMed

    Yetisgen-Yildiz, Meliha; Gunn, Martin L; Xia, Fei; Payne, Thomas H

    2011-01-01

    Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports.

  1. Classification of JET Neutron and Gamma Emissivity Profiles

    NASA Astrophysics Data System (ADS)

    Craciunescu, T.; Murari, A.; Kiptily, V.; Vega, J.; Contributors, JET

    2016-05-01

    In thermonuclear plasmas, emission tomography uses integrated measurements along lines of sight (LOS) to determine the two-dimensional (2-D) spatial distribution of the volume emission intensity. Due to the availability of only a limited number views and to the coarse sampling of the LOS, the tomographic inversion is a limited data set problem. Several techniques have been developed for tomographic reconstruction of the 2-D gamma and neutron emissivity on JET. In specific experimental conditions the availability of LOSs is restricted to a single view. In this case an explicit reconstruction of the emissivity profile is no longer possible. However, machine learning classification methods can be used in order to derive the type of the distribution. In the present approach the classification is developed using the theory of belief functions which provide the support to fuse the results of independent clustering and supervised classification. The method allows to represent the uncertainty of the results provided by different independent techniques, to combine them and to manage possible conflicts.

  2. Performance analysis of distributed applications using automatic classification of communication inefficiencies

    DOEpatents

    Vetter, Jeffrey S.

    2005-02-01

    The method and system described herein presents a technique for performance analysis that helps users understand the communication behavior of their message passing applications. The method and system described herein may automatically classifies individual communication operations and reveal the cause of communication inefficiencies in the application. This classification allows the developer to quickly focus on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, the method and system described herein trace the message operations of Message Passing Interface (MPI) applications and then classify each individual communication event using a supervised learning technique: decision tree classification. The decision tree may be trained using microbenchmarks that demonstrate both efficient and inefficient communication. Since the method and system described herein adapt to the target system's configuration through these microbenchmarks, they simultaneously automate the performance analysis process and improve classification accuracy. The method and system described herein may improve the accuracy of performance analysis and dramatically reduce the amount of data that users must encounter.

  3. Classification Algorithms for Big Data Analysis, a Map Reduce Approach

    NASA Astrophysics Data System (ADS)

    Ayma, V. A.; Ferreira, R. S.; Happ, P.; Oliveira, D.; Feitosa, R.; Costa, G.; Plaza, A.; Gamba, P.

    2015-03-01

    Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.

  4. Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification.

    PubMed

    Wen, Zaidao; Hou, Biao; Jiao, Licheng

    2017-05-03

    Linear synthesis model based dictionary learning framework has achieved remarkable performances in image classification in the last decade. Behaved as a generative feature model, it however suffers from some intrinsic deficiencies. In this paper, we propose a novel parametric nonlinear analysis cosparse model (NACM) with which a unique feature vector will be much more efficiently extracted. Additionally, we derive a deep insight to demonstrate that NACM is capable of simultaneously learning the task adapted feature transformation and regularization to encode our preferences, domain prior knowledge and task oriented supervised information into the features. The proposed NACM is devoted to the classification task as a discriminative feature model and yield a novel discriminative nonlinear analysis operator learning framework (DNAOL). The theoretical analysis and experimental performances clearly demonstrate that DNAOL will not only achieve the better or at least competitive classification accuracies than the state-of-the-art algorithms but it can also dramatically reduce the time complexities in both training and testing phases.

  5. Model Personnel Policy for Ohio Academic Libraries and Public Libraries; Personnel Guidelines for Governmental Libraries, School Library Media Centers, Special Libraries.

    ERIC Educational Resources Information Center

    Ohio Library Foundation, Columbus.

    A guide which any library may use to achieve its own statement of personnel policy presents policy models which suggest rules and regulations to be used to supervise the staffs of public and academic libraries. These policies cover: (1) appointments; (2) classification of positions; (3) faculty and staff development; (4) performance evaluations;…

  6. Human Supervision of Time Critical Control Systems. Addendum

    DTIC Science & Technology

    2010-02-26

    signals such as electroencephalogram (EEG) and electrooculography ( EOG ). Current research has demonstrated these signals ’ ability to respond to changing...relationships often present in EEG/ EOG data; they routinely achieve classification accuracy greater than 80%. However, the discrete output of these...present data there were seven EEG and EOG signals recorded, thus, ICA assumes each were a mixture of seven independent components (Stone, 2002). Some

  7. A Dirichlet process model for classifying and forecasting epidemic curves.

    PubMed

    Nsoesie, Elaine O; Leman, Scotland C; Marathe, Madhav V

    2014-01-09

    A forecast can be defined as an endeavor to quantitatively estimate a future event or probabilities assigned to a future occurrence. Forecasting stochastic processes such as epidemics is challenging since there are several biological, behavioral, and environmental factors that influence the number of cases observed at each point during an epidemic. However, accurate forecasts of epidemics would impact timely and effective implementation of public health interventions. In this study, we introduce a Dirichlet process (DP) model for classifying and forecasting influenza epidemic curves. The DP model is a nonparametric Bayesian approach that enables the matching of current influenza activity to simulated and historical patterns, identifies epidemic curves different from those observed in the past and enables prediction of the expected epidemic peak time. The method was validated using simulated influenza epidemics from an individual-based model and the accuracy was compared to that of the tree-based classification technique, Random Forest (RF), which has been shown to achieve high accuracy in the early prediction of epidemic curves using a classification approach. We also applied the method to forecasting influenza outbreaks in the United States from 1997-2013 using influenza-like illness (ILI) data from the Centers for Disease Control and Prevention (CDC). We made the following observations. First, the DP model performed as well as RF in identifying several of the simulated epidemics. Second, the DP model correctly forecasted the peak time several days in advance for most of the simulated epidemics. Third, the accuracy of identifying epidemics different from those already observed improved with additional data, as expected. Fourth, both methods correctly classified epidemics with higher reproduction numbers (R) with a higher accuracy compared to epidemics with lower R values. Lastly, in the classification of seasonal influenza epidemics based on ILI data from the CDC, the methods' performance was comparable. Although RF requires less computational time compared to the DP model, the algorithm is fully supervised implying that epidemic curves different from those previously observed will always be misclassified. In contrast, the DP model can be unsupervised, semi-supervised or fully supervised. Since both methods have their relative merits, an approach that uses both RF and the DP model could be beneficial.

  8. Wavelet-based multicomponent denoising on GPU to improve the classification of hyperspectral images

    NASA Astrophysics Data System (ADS)

    Quesada-Barriuso, Pablo; Heras, Dora B.; Argüello, Francisco; Mouriño, J. C.

    2017-10-01

    Supervised classification allows handling a wide range of remote sensing hyperspectral applications. Enhancing the spatial organization of the pixels over the image has proven to be beneficial for the interpretation of the image content, thus increasing the classification accuracy. Denoising in the spatial domain of the image has been shown as a technique that enhances the structures in the image. This paper proposes a multi-component denoising approach in order to increase the classification accuracy when a classification method is applied. It is computed on multicore CPUs and NVIDIA GPUs. The method combines feature extraction based on a 1Ddiscrete wavelet transform (DWT) applied in the spectral dimension followed by an Extended Morphological Profile (EMP) and a classifier (SVM or ELM). The multi-component noise reduction is applied to the EMP just before the classification. The denoising recursively applies a separable 2D DWT after which the number of wavelet coefficients is reduced by using a threshold. Finally, inverse 2D-DWT filters are applied to reconstruct the noise free original component. The computational cost of the classifiers as well as the cost of the whole classification chain is high but it is reduced achieving real-time behavior for some applications through their computation on NVIDIA multi-GPU platforms.

  9. A Comparison of Computer-Based Classification Testing Approaches Using Mixed-Format Tests with the Generalized Partial Credit Model

    ERIC Educational Resources Information Center

    Kim, Jiseon

    2010-01-01

    Classification testing has been widely used to make categorical decisions by determining whether an examinee has a certain degree of ability required by established standards. As computer technologies have developed, classification testing has become more computerized. Several approaches have been proposed and investigated in the context of…

  10. Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods

    NASA Astrophysics Data System (ADS)

    Ma, Xiaoke; Wang, Bingbo; Yu, Liang

    2018-01-01

    Community detection is fundamental for revealing the structure-functionality relationship in complex networks, which involves two issues-the quantitative function for community as well as algorithms to discover communities. Despite significant research on either of them, few attempt has been made to establish the connection between the two issues. To attack this problem, a generalized quantification function is proposed for community in weighted networks, which provides a framework that unifies several well-known measures. Then, we prove that the trace optimization of the proposed measure is equivalent with the objective functions of algorithms such as nonnegative matrix factorization, kernel K-means as well as spectral clustering. It serves as the theoretical foundation for designing algorithms for community detection. On the second issue, a semi-supervised spectral clustering algorithm is developed by exploring the equivalence relation via combining the nonnegative matrix factorization and spectral clustering. Different from the traditional semi-supervised algorithms, the partial supervision is integrated into the objective of the spectral algorithm. Finally, through extensive experiments on both artificial and real world networks, we demonstrate that the proposed method improves the accuracy of the traditional spectral algorithms in community detection.

  11. Hostile climate, abusive supervision, and employee coping: does conscientiousness matter?

    PubMed

    Mawritz, Mary B; Dust, Scott B; Resick, Christian J

    2014-07-01

    The current study draws on the transactional theory of stress to propose that employees cope with hostile work environments by engaging in emotion-based coping in the forms of organization-directed deviance and psychological withdrawal. Specifically, we propose that supervisors' hostile organizational climate perceptions act as distal environmental stressors that are partially transmitted through supervisors' abusive actions and that conscientiousness moderates the proposed effects. First, we hypothesize that supervisor conscientiousness has a buffering effect by decreasing the likelihood of abusive supervision. Second, we hypothesize that highly conscientious employees cope differently from less conscientious employees. Among a sample of employees and their immediate supervisors, results indicated that while hostile climate perceptions provide a breeding ground for destructive behaviors, conscientious individuals are less likely to respond to perceived hostility with hostile acts. As supervisor conscientious levels increased, supervisors were less likely to engage in abusive supervision, which buffered employees from the negative effects of hostile climate perceptions. However, when working for less conscientious supervisors, employees experienced the effects of perceived hostile climates indirectly through abusive supervision. In turn, less conscientious employees tended to cope with the stress of hostile environments transmitted through abusive supervision by engaging in acts of organization-directed deviance. At the same time, all employees, regardless of their levels of conscientiousness, tended to cope with their hostile environments by psychologically withdrawing. Theoretical and practical implications are discussed.

  12. Hybrid generative-discriminative human action recognition by combining spatiotemporal words with supervised topic models

    NASA Astrophysics Data System (ADS)

    Sun, Hao; Wang, Cheng; Wang, Boliang

    2011-02-01

    We present a hybrid generative-discriminative learning method for human action recognition from video sequences. Our model combines a bag-of-words component with supervised latent topic models. A video sequence is represented as a collection of spatiotemporal words by extracting space-time interest points and describing these points using both shape and motion cues. The supervised latent Dirichlet allocation (sLDA) topic model, which employs discriminative learning using labeled data under a generative framework, is introduced to discover the latent topic structure that is most relevant to action categorization. The proposed algorithm retains most of the desirable properties of generative learning while increasing the classification performance though a discriminative setting. It has also been extended to exploit both labeled data and unlabeled data to learn human actions under a unified framework. We test our algorithm on three challenging data sets: the KTH human motion data set, the Weizmann human action data set, and a ballet data set. Our results are either comparable to or significantly better than previously published results on these data sets and reflect the promise of hybrid generative-discriminative learning approaches.

  13. Arrangement and Applying of Movement Patterns in the Cerebellum Based on Semi-supervised Learning.

    PubMed

    Solouki, Saeed; Pooyan, Mohammad

    2016-06-01

    Biological control systems have long been studied as a possible inspiration for the construction of robotic controllers. The cerebellum is known to be involved in the production and learning of smooth, coordinated movements. Therefore, highly regular structure of the cerebellum has been in the core of attention in theoretical and computational modeling. However, most of these models reflect some special features of the cerebellum without regarding the whole motor command computational process. In this paper, we try to make a logical relation between the most significant models of the cerebellum and introduce a new learning strategy to arrange the movement patterns: cerebellar modular arrangement and applying of movement patterns based on semi-supervised learning (CMAPS). We assume here the cerebellum like a big archive of patterns that has an efficient organization to classify and recall them. The main idea is to achieve an optimal use of memory locations by more than just a supervised learning and classification algorithm. Surely, more experimental and physiological researches are needed to confirm our hypothesis.

  14. Exploring the impact of wavelet-based denoising in the classification of remote sensing hyperspectral images

    NASA Astrophysics Data System (ADS)

    Quesada-Barriuso, Pablo; Heras, Dora B.; Argüello, Francisco

    2016-10-01

    The classification of remote sensing hyperspectral images for land cover applications is a very intensive topic. In the case of supervised classification, Support Vector Machines (SVMs) play a dominant role. Recently, the Extreme Learning Machine algorithm (ELM) has been extensively used. The classification scheme previously published by the authors, and called WT-EMP, introduces spatial information in the classification process by means of an Extended Morphological Profile (EMP) that is created from features extracted by wavelets. In addition, the hyperspectral image is denoised in the 2-D spatial domain, also using wavelets and it is joined to the EMP via a stacked vector. In this paper, the scheme is improved achieving two goals. The first one is to reduce the classification time while preserving the accuracy of the classification by using ELM instead of SVM. The second one is to improve the accuracy results by performing not only a 2-D denoising for every spectral band, but also a previous additional 1-D spectral signature denoising applied to each pixel vector of the image. For each denoising the image is transformed by applying a 1-D or 2-D wavelet transform, and then a NeighShrink thresholding is applied. Improvements in terms of classification accuracy are obtained, especially for images with close regions in the classification reference map, because in these cases the accuracy of the classification in the edges between classes is more relevant.

  15. Network-based high level data classification.

    PubMed

    Silva, Thiago Christiano; Zhao, Liang

    2012-06-01

    Traditional supervised data classification considers only physical features (e.g., distance or similarity) of the input data. Here, this type of learning is called low level classification. On the other hand, the human (animal) brain performs both low and high orders of learning and it has facility in identifying patterns according to the semantic meaning of the input data. Data classification that considers not only physical attributes but also the pattern formation is, here, referred to as high level classification. In this paper, we propose a hybrid classification technique that combines both types of learning. The low level term can be implemented by any classification technique, while the high level term is realized by the extraction of features of the underlying network constructed from the input data. Thus, the former classifies the test instances by their physical features or class topologies, while the latter measures the compliance of the test instances to the pattern formation of the data. Our study shows that the proposed technique not only can realize classification according to the pattern formation, but also is able to improve the performance of traditional classification techniques. Furthermore, as the class configuration's complexity increases, such as the mixture among different classes, a larger portion of the high level term is required to get correct classification. This feature confirms that the high level classification has a special importance in complex situations of classification. Finally, we show how the proposed technique can be employed in a real-world application, where it is capable of identifying variations and distortions of handwritten digit images. As a result, it supplies an improvement in the overall pattern recognition rate.

  16. Data-driven advice for applying machine learning to bioinformatics problems

    PubMed Central

    Olson, Randal S.; La Cava, William; Mustahsan, Zairah; Varik, Akshay; Moore, Jason H.

    2017-01-01

    As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems. PMID:29218881

  17. The mapping of marsh vegetation using aircraft multispectral scanner data. [in Louisiana

    NASA Technical Reports Server (NTRS)

    Butera, M. K.

    1975-01-01

    A test was conducted to determine if salinity regimes in coastal marshland could be mapped and monitored by the identification and classification of marsh vegetative species from aircraft multispectral scanner data. The data was acquired at 6.1 km (20,000 ft.) on October 2, 1974, over a test area in the coastal marshland of southern Louisiana including fresh, intermediate, brackish, and saline zones. The data was classified by vegetational species using a supervised, spectral pattern recognition procedure. Accuracies of training sites ranged from 67% to 96%. Marsh zones based on free soil water salinity were determined from the species classification to demonstrate a practical use for mapping marsh vegetation.

  18. Satellite monitoring of vegetation and geology in semi-arid environments. [Tanzania

    NASA Technical Reports Server (NTRS)

    Kihlblom, U.; Johansson, D. (Principal Investigator)

    1980-01-01

    The possibility of mapping various characteristics of the natural environment of Tanzania by various LANDSAT techniques was assessed. Interpretation and mapping were carried out using black and white as well as color infrared images on the scale of 1:250,000. The advantages of several computer techniques were also assessed, including contrast-stretched rationing, differential edge enhancement; supervised classification; multitemporal classification; and change detection. Results Show the most useful image for interpretation comes from band 5, with additional information being obtained from either band 6 or band 7. The advantages of using color infrared images for interpreting vegetation and geology are so great that black and white should be used only to supplement the colored images.

  19. Classification of multiple sclerosis lesions using adaptive dictionary learning.

    PubMed

    Deshpande, Hrishikesh; Maurel, Pierre; Barillot, Christian

    2015-12-01

    This paper presents a sparse representation and an adaptive dictionary learning based method for automated classification of multiple sclerosis (MS) lesions in magnetic resonance (MR) images. Manual delineation of MS lesions is a time-consuming task, requiring neuroradiology experts to analyze huge volume of MR data. This, in addition to the high intra- and inter-observer variability necessitates the requirement of automated MS lesion classification methods. Among many image representation models and classification methods that can be used for such purpose, we investigate the use of sparse modeling. In the recent years, sparse representation has evolved as a tool in modeling data using a few basis elements of an over-complete dictionary and has found applications in many image processing tasks including classification. We propose a supervised classification approach by learning dictionaries specific to the lesions and individual healthy brain tissues, which include white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF). The size of the dictionaries learned for each class plays a major role in data representation but it is an even more crucial element in the case of competitive classification. Our approach adapts the size of the dictionary for each class, depending on the complexity of the underlying data. The algorithm is validated using 52 multi-sequence MR images acquired from 13 MS patients. The results demonstrate the effectiveness of our approach in MS lesion classification. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Gene expression-based molecular diagnostic system for malignant gliomas is superior to histological diagnosis.

    PubMed

    Shirahata, Mitsuaki; Iwao-Koizumi, Kyoko; Saito, Sakae; Ueno, Noriko; Oda, Masashi; Hashimoto, Nobuo; Takahashi, Jun A; Kato, Kikuya

    2007-12-15

    Current morphology-based glioma classification methods do not adequately reflect the complex biology of gliomas, thus limiting their prognostic ability. In this study, we focused on anaplastic oligodendroglioma and glioblastoma, which typically follow distinct clinical courses. Our goal was to construct a clinically useful molecular diagnostic system based on gene expression profiling. The expression of 3,456 genes in 32 patients, 12 and 20 of whom had prognostically distinct anaplastic oligodendroglioma and glioblastoma, respectively, was measured by PCR array. Next to unsupervised methods, we did supervised analysis using a weighted voting algorithm to construct a diagnostic system discriminating anaplastic oligodendroglioma from glioblastoma. The diagnostic accuracy of this system was evaluated by leave-one-out cross-validation. The clinical utility was tested on a microarray-based data set of 50 malignant gliomas from a previous study. Unsupervised analysis showed divergent global gene expression patterns between the two tumor classes. A supervised binary classification model showed 100% (95% confidence interval, 89.4-100%) diagnostic accuracy by leave-one-out cross-validation using 168 diagnostic genes. Applied to a gene expression data set from a previous study, our model correlated better with outcome than histologic diagnosis, and also displayed 96.6% (28 of 29) consistency with the molecular classification scheme used for these histologically controversial gliomas in the original article. Furthermore, we observed that histologically diagnosed glioblastoma samples that shared anaplastic oligodendroglioma molecular characteristics tended to be associated with longer survival. Our molecular diagnostic system showed reproducible clinical utility and prognostic ability superior to traditional histopathologic diagnosis for malignant glioma.

  1. Material classification and automatic content enrichment of images using supervised learning and knowledge bases

    NASA Astrophysics Data System (ADS)

    Mallepudi, Sri Abhishikth; Calix, Ricardo A.; Knapp, Gerald M.

    2011-02-01

    In recent years there has been a rapid increase in the size of video and image databases. Effective searching and retrieving of images from these databases is a significant current research area. In particular, there is a growing interest in query capabilities based on semantic image features such as objects, locations, and materials, known as content-based image retrieval. This study investigated mechanisms for identifying materials present in an image. These capabilities provide additional information impacting conditional probabilities about images (e.g. objects made of steel are more likely to be buildings). These capabilities are useful in Building Information Modeling (BIM) and in automatic enrichment of images. I2T methodologies are a way to enrich an image by generating text descriptions based on image analysis. In this work, a learning model is trained to detect certain materials in images. To train the model, an image dataset was constructed containing single material images of bricks, cloth, grass, sand, stones, and wood. For generalization purposes, an additional set of 50 images containing multiple materials (some not used in training) was constructed. Two different supervised learning classification models were investigated: a single multi-class SVM classifier, and multiple binary SVM classifiers (one per material). Image features included Gabor filter parameters for texture, and color histogram data for RGB components. All classification accuracy scores using the SVM-based method were above 85%. The second model helped in gathering more information from the images since it assigned multiple classes to the images. A framework for the I2T methodology is presented.

  2. Evaluating suitability of Pol-SAR (TerraSAR-X, Radarsat-2) for automated sea ice classification

    NASA Astrophysics Data System (ADS)

    Ressel, Rudolf; Singha, Suman; Lehner, Susanne

    2016-05-01

    Satellite borne SAR imagery has become an invaluable tool in the field of sea ice monitoring. Previously, single polarimetric imagery were employed in supervised and unsupervised classification schemes for sea ice investigation, which was preceded by image processing techniques such as segmentation and textural features. Recently, through the advent of polarimetric SAR sensors, investigation of polarimetric features in sea ice has attracted increased attention. While dual-polarimetric data has already been investigated in a number of works, full-polarimetric data has so far not been a major scientific focus. To explore the possibilities of full-polarimetric data and compare the differences in C- and X-bands, we endeavor to analyze in detail an array of datasets, simultaneously acquired, in C-band (RADARSAT-2) and X-band (TerraSAR-X) over ice infested areas. First, we propose an array of polarimetric features (Pauli and lexicographic based). Ancillary data from national ice services, SMOS data and expert judgement were utilized to identify the governing ice regimes. Based on these observations, we then extracted mentioned features. The subsequent supervised classification approach was based on an Artificial Neural Network (ANN). To gain quantitative insight into the quality of the features themselves (and reduce a possible impact of the Hughes phenomenon), we employed mutual information to unearth the relevance and redundancy of features. The results of this information theoretic analysis guided a pruning process regarding the optimal subset of features. In the last step we compared the classified results of all sensors and images, stated respective accuracies and discussed output discrepancies in the cases of simultaneous acquisitions.

  3. Heterogeneous activation of the TGFβ pathway in glioblastomas identified by gene expression-based classification using TGFβ-responsive genes

    PubMed Central

    Xu, Xie L; Kapoun, Ann M

    2009-01-01

    Background TGFβ has emerged as an attractive target for the therapeutic intervention of glioblastomas. Aberrant TGFβ overproduction in glioblastoma and other high-grade gliomas has been reported, however, to date, none of these reports has systematically examined the components of TGFβ signaling to gain a comprehensive view of TGFβ activation in large cohorts of human glioma patients. Methods TGFβ activation in mammalian cells leads to a transcriptional program that typically affects 5–10% of the genes in the genome. To systematically examine the status of TGFβ activation in high-grade glial tumors, we compiled a gene set of transcriptional response to TGFβ stimulation from tissue culture and in vivo animal studies. These genes were used to examine the status of TGFβ activation in high-grade gliomas including a large cohort of glioblastomas. Unsupervised and supervised classification analysis was performed in two independent, publicly available glioma microarray datasets. Results Unsupervised and supervised classification using the TGFβ-responsive gene list in two independent glial tumor gene expression data sets revealed various levels of TGFβ activation in these tumors. Among glioblastomas, one of the most devastating human cancers, two subgroups were identified that showed distinct TGFβ activation patterns as measured from transcriptional responses. Approximately 62% of glioblastoma samples analyzed showed strong TGFβ activation, while the rest showed a weak TGFβ transcriptional response. Conclusion Our findings suggest heterogeneous TGFβ activation in glioblastomas, which may cause potential differences in responses to anti-TGFβ therapies in these two distinct subgroups of glioblastomas patients. PMID:19192267

  4. Regional estimates of reef carbonate dynamics and productivity Using Landsat 7 ETM+, and potential impacts from ocean acidification

    USGS Publications Warehouse

    Moses, C.S.; Andrefouet, S.; Kranenburg, C.; Muller-Karger, F. E.

    2009-01-01

    Using imagery at 30 m spatial resolution from the most recent Landsat satellite, the Landsat 7 Enhanced Thematic Mapper Plus (ETM+), we scale up reef metabolic productivity and calcification from local habitat-scale (10 -1 to 100 km2) measurements to regional scales (103 to 104 km2). Distribution and spatial extent of the North Florida Reef Tract (NFRT) habitats come from supervised classification of the Landsat imagery within independent Landsat-derived Millennium Coral Reef Map geomorphologic classes. This system minimizes the depth range and variability of benthic habitat characteristics found in the area of supervised classification and limits misclassification. Classification of Landsat imagery into 5 biotopes (sand, dense live cover, sparse live cover, seagrass, and sparse seagrass) by geomorphologic class is >73% accurate at regional scales. Based on recently published habitat-scale in situ metabolic measurements, gross production (P = 3.01 ?? 109 kg C yr -1), excess production (E = -5.70 ?? 108 kg C yr -1), and calcification (G = -1.68 ?? 106 kg CaCO 3 yr-1) are estimated over 2711 km2 of the NFRT. Simple models suggest sensitivity of these values to ocean acidification, which will increase local dissolution of carbonate sediments. Similar approaches could be applied over large areas with poorly constrained bathymetry or water column properties and minimal metabolic sampling. This tool has potential applications for modeling and monitoring large-scale environmental impacts on reef productivity, such as the influence of ocean acidification on coral reef environments. ?? Inter-Research 2009.

  5. Comparing supervised learning methods for classifying sex, age, context and individual Mudi dogs from barking.

    PubMed

    Larrañaga, Ana; Bielza, Concha; Pongrácz, Péter; Faragó, Tamás; Bálint, Anna; Larrañaga, Pedro

    2015-03-01

    Barking is perhaps the most characteristic form of vocalization in dogs; however, very little is known about its role in the intraspecific communication of this species. Besides the obvious need for ethological research, both in the field and in the laboratory, the possible information content of barks can also be explored by computerized acoustic analyses. This study compares four different supervised learning methods (naive Bayes, classification trees, [Formula: see text]-nearest neighbors and logistic regression) combined with three strategies for selecting variables (all variables, filter and wrapper feature subset selections) to classify Mudi dogs by sex, age, context and individual from their barks. The classification accuracy of the models obtained was estimated by means of [Formula: see text]-fold cross-validation. Percentages of correct classifications were 85.13 % for determining sex, 80.25 % for predicting age (recodified as young, adult and old), 55.50 % for classifying contexts (seven situations) and 67.63 % for recognizing individuals (8 dogs), so the results are encouraging. The best-performing method was [Formula: see text]-nearest neighbors following a wrapper feature selection approach. The results for classifying contexts and recognizing individual dogs were better with this method than they were for other approaches reported in the specialized literature. This is the first time that the sex and age of domestic dogs have been predicted with the help of sound analysis. This study shows that dog barks carry ample information regarding the caller's indexical features. Our computerized analysis provides indirect proof that barks may serve as an important source of information for dogs as well.

  6. Colorectal cancer detection by hyperspectral imaging using fluorescence excitation scanning

    NASA Astrophysics Data System (ADS)

    Leavesley, Silas J.; Deal, Joshua; Hill, Shante; Martin, Will A.; Lall, Malvika; Lopez, Carmen; Rider, Paul F.; Rich, Thomas C.; Boudreaux, Carole W.

    2018-02-01

    Hyperspectral imaging technologies have shown great promise for biomedical applications. These techniques have been especially useful for detection of molecular events and characterization of cell, tissue, and biomaterial composition. Unfortunately, hyperspectral imaging technologies have been slow to translate to clinical devices - likely due to increased cost and complexity of the technology as well as long acquisition times often required to sample a spectral image. We have demonstrated that hyperspectral imaging approaches which scan the fluorescence excitation spectrum can provide increased signal strength and faster imaging, compared to traditional emission-scanning approaches. We have also demonstrated that excitation-scanning approaches may be able to detect spectral differences between colonic adenomas and adenocarcinomas and normal mucosa in flash-frozen tissues. Here, we report feasibility results from using excitation-scanning hyperspectral imaging to screen pairs of fresh tumoral and nontumoral colorectal tissues. Tissues were imaged using a novel hyperspectral imaging fluorescence excitation scanning microscope, sampling a wavelength range of 360-550 nm, at 5 nm increments. Image data were corrected to achieve a NIST-traceable flat spectral response. Image data were then analyzed using a range of supervised and unsupervised classification approaches within ENVI software (Harris Geospatial Solutions). Supervised classification resulted in >99% accuracy for single-patient image data, but only 64% accuracy for multi-patient classification (n=9 to date), with the drop in accuracy due to increased false-positive detection rates. Hence, initial data indicate that this approach may be a viable detection approach, but that larger patient sample sizes need to be evaluated and the effects of inter-patient variability studied.

  7. Supervised pixel classification for segmenting geographic atrophy in fundus autofluorescene images

    NASA Astrophysics Data System (ADS)

    Hu, Zhihong; Medioni, Gerard G.; Hernandez, Matthias; Sadda, SriniVas R.

    2014-03-01

    Age-related macular degeneration (AMD) is the leading cause of blindness in people over the age of 65. Geographic atrophy (GA) is a manifestation of the advanced or late-stage of the AMD, which may result in severe vision loss and blindness. Techniques to rapidly and precisely detect and quantify GA lesions would appear to be of important value in advancing the understanding of the pathogenesis of GA and the management of GA progression. The purpose of this study is to develop an automated supervised pixel classification approach for segmenting GA including uni-focal and multi-focal patches in fundus autofluorescene (FAF) images. The image features include region wise intensity (mean and variance) measures, gray level co-occurrence matrix measures (angular second moment, entropy, and inverse difference moment), and Gaussian filter banks. A k-nearest-neighbor (k-NN) pixel classifier is applied to obtain a GA probability map, representing the likelihood that the image pixel belongs to GA. A voting binary iterative hole filling filter is then applied to fill in the small holes. Sixteen randomly chosen FAF images were obtained from sixteen subjects with GA. The algorithm-defined GA regions are compared with manual delineation performed by certified graders. Two-fold cross-validation is applied for the evaluation of the classification performance. The mean Dice similarity coefficients (DSC) between the algorithm- and manually-defined GA regions are 0.84 +/- 0.06 for one test and 0.83 +/- 0.07 for the other test and the area correlations between them are 0.99 (p < 0.05) and 0.94 (p < 0.05) respectively.

  8. Effective Diagnosis of Alzheimer's Disease by Means of Association Rules

    NASA Astrophysics Data System (ADS)

    Chaves, R.; Ramírez, J.; Górriz, J. M.; López, M.; Salas-Gonzalez, D.; Illán, I.; Segovia, F.; Padilla, P.

    In this paper we present a novel classification method of SPECT images for the early diagnosis of the Alzheimer's disease (AD). The proposed method is based on Association Rules (ARs) aiming to discover interesting associations between attributes contained in the database. The system uses firstly voxel-as-features (VAF) and Activation Estimation (AE) to find tridimensional activated brain regions of interest (ROIs) for each patient. These ROIs act as inputs to secondly mining ARs between activated blocks for controls, with a specified minimum support and minimum confidence. ARs are mined in supervised mode, using information previously extracted from the most discriminant rules for centering interest in the relevant brain areas, reducing the computational requirement of the system. Finally classification process is performed depending on the number of previously mined rules verified by each subject, yielding an up to 95.87% classification accuracy, thus outperforming recent developed methods for AD diagnosis.

  9. Classification of Traffic Related Short Texts to Analyse Road Problems in Urban Areas

    NASA Astrophysics Data System (ADS)

    Saldana-Perez, A. M. M.; Moreno-Ibarra, M.; Tores-Ruiz, M.

    2017-09-01

    The Volunteer Geographic Information (VGI) can be used to understand the urban dynamics. In the classification of traffic related short texts to analyze road problems in urban areas, a VGI data analysis is done over a social media's publications, in order to classify traffic events at big cities that modify the movement of vehicles and people through the roads, such as car accidents, traffic and closures. The classification of traffic events described in short texts is done by applying a supervised machine learning algorithm. In the approach users are considered as sensors which describe their surroundings and provide their geographic position at the social network. The posts are treated by a text mining process and classified into five groups. Finally, the classified events are grouped in a data corpus and geo-visualized in the study area, to detect the places with more vehicular problems.

  10. Comparative Analysis of Document level Text Classification Algorithms using R

    NASA Astrophysics Data System (ADS)

    Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.

    2017-08-01

    From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.

  11. A Classification method for eye movements direction during REM sleep trained on wake electro-oculographic recordings.

    PubMed

    Betta, M; Laurino, M; Gemignani, A; Landi, A; Menicucci, D

    2015-01-01

    Rapid eye movements (REMs) are a peculiar and intriguing aspect of REM sleep, even if their physiological function still remains unclear. During this work, a new automatic tool was developed, aimed at a complete description of REMs activity during the night, both in terms of their timing of occurrence that in term of their directional properties. A classification stage of each singular movement detected during the night according to its main direction, was in fact added to our procedure of REMs detection and ocular artifact removal. A supervised classifier was constructed, using as training and validation set EOG data recorded during voluntary saccades of five healthy volunteers. Different classification methods were tested and compared. The further information about REMs directional characteristic provided by the procedure would represent a valuable tool for a deeper investigation into REMs physiological origin and functional meaning.

  12. Convex formulation of multiple instance learning from positive and unlabeled bags.

    PubMed

    Bao, Han; Sakai, Tomoya; Sato, Issei; Sugiyama, Masashi

    2018-05-24

    Multiple instance learning (MIL) is a variation of traditional supervised learning problems where data (referred to as bags) are composed of sub-elements (referred to as instances) and only bag labels are available. MIL has a variety of applications such as content-based image retrieval, text categorization, and medical diagnosis. Most of the previous work for MIL assume that training bags are fully labeled. However, it is often difficult to obtain an enough number of labeled bags in practical situations, while many unlabeled bags are available. A learning framework called PU classification (positive and unlabeled classification) can address this problem. In this paper, we propose a convex PU classification method to solve an MIL problem. We experimentally show that the proposed method achieves better performance with significantly lower computation costs than an existing method for PU-MIL. Copyright © 2018 Elsevier Ltd. All rights reserved.

  13. Clonal Selection Based Artificial Immune System for Generalized Pattern Recognition

    NASA Technical Reports Server (NTRS)

    Huntsberger, Terry

    2011-01-01

    The last two decades has seen a rapid increase in the application of AIS (Artificial Immune Systems) modeled after the human immune system to a wide range of areas including network intrusion detection, job shop scheduling, classification, pattern recognition, and robot control. JPL (Jet Propulsion Laboratory) has developed an integrated pattern recognition/classification system called AISLE (Artificial Immune System for Learning and Exploration) based on biologically inspired models of B-cell dynamics in the immune system. When used for unsupervised or supervised classification, the method scales linearly with the number of dimensions, has performance that is relatively independent of the total size of the dataset, and has been shown to perform as well as traditional clustering methods. When used for pattern recognition, the method efficiently isolates the appropriate matches in the data set. The paper presents the underlying structure of AISLE and the results from a number of experimental studies.

  14. Integrated Change Detection and Classification in Urban Areas Based on Airborne Laser Scanning Point Clouds.

    PubMed

    Tran, Thi Huong Giang; Ressl, Camillo; Pfeifer, Norbert

    2018-02-03

    This paper suggests a new approach for change detection (CD) in 3D point clouds. It combines classification and CD in one step using machine learning. The point cloud data of both epochs are merged for computing features of four types: features describing the point distribution, a feature relating to relative terrain elevation, features specific for the multi-target capability of laser scanning, and features combining the point clouds of both epochs to identify the change. All these features are merged in the points and then training samples are acquired to create the model for supervised classification, which is then applied to the whole study area. The final results reach an overall accuracy of over 90% for both epochs of eight classes: lost tree, new tree, lost building, new building, changed ground, unchanged building, unchanged tree, and unchanged ground.

  15. Transient classification in LIGO data using difference boosting neural network

    NASA Astrophysics Data System (ADS)

    Mukund, N.; Abraham, S.; Kandhasamy, S.; Mitra, S.; Philip, N. S.

    2017-05-01

    Detection and classification of transients in data from gravitational wave detectors are crucial for efficient searches for true astrophysical events and identification of noise sources. We present a hybrid method for classification of short duration transients seen in gravitational wave data using both supervised and unsupervised machine learning techniques. To train the classifiers, we use the relative wavelet energy and the corresponding entropy obtained by applying one-dimensional wavelet decomposition on the data. The prediction accuracy of the trained classifier on nine simulated classes of gravitational wave transients and also LIGO's sixth science run hardware injections are reported. Targeted searches for a couple of known classes of nonastrophysical signals in the first observational run of Advanced LIGO data are also presented. The ability to accurately identify transient classes using minimal training samples makes the proposed method a useful tool for LIGO detector characterization as well as searches for short duration gravitational wave signals.

  16. Tree Classification Software

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1993-01-01

    This paper introduces the IND Tree Package to prospective users. IND does supervised learning using classification trees. This learning task is a basic tool used in the development of diagnosis, monitoring and expert systems. The IND Tree Package was developed as part of a NASA project to semi-automate the development of data analysis and modelling algorithms using artificial intelligence techniques. The IND Tree Package integrates features from CART and C4 with newer Bayesian and minimum encoding methods for growing classification trees and graphs. The IND Tree Package also provides an experimental control suite on top. The newer features give improved probability estimates often required in diagnostic and screening tasks. The package comes with a manual, Unix 'man' entries, and a guide to tree methods and research. The IND Tree Package is implemented in C under Unix and was beta-tested at university and commercial research laboratories in the United States.

  17. Topic Identification and Categorization of Public Information in Community-Based Social Media

    NASA Astrophysics Data System (ADS)

    Kusumawardani, RP; Basri, MH

    2017-01-01

    This paper presents a work on a semi-supervised method for topic identification and classification of short texts in the social media, and its application on tweets containing dialogues in a large community of dwellers in a city, written mostly in Indonesian. These dialogues comprise a wealth of information about the city, shared in real-time. We found that despite the high irregularity of the language used, and the scarcity of suitable linguistic resources, a meaningful identification of topics could be performed by clustering the tweets using the K-Means algorithm. The resulting clusters are found to be robust enough to be the basis of a classification. On three grouping schemes derived from the clusters, we get accuracy of 95.52%, 95.51%, and 96.7 using linear SVMs, reflecting the applicability of applying this method for generating topic identification and classification on such data.

  18. Using partially labeled data for normal mixture identification with application to class definition

    NASA Technical Reports Server (NTRS)

    Shahshahani, Behzad M.; Landgrebe, David A.

    1992-01-01

    The problem of estimating the parameters of a normal mixture density when, in addition to the unlabeled samples, sets of partially labeled samples are available is addressed. The density of the multidimensional feature space is modeled with a normal mixture. It is assumed that the set of components of the mixture can be partitioned into several classes and that training samples are available from each class. Since for any training sample the class of origin is known but the exact component of origin within the corresponding class is unknown, the training samples as considered to be partially labeled. The EM iterative equations are derived for estimating the parameters of the normal mixture in the presence of partially labeled samples. These equations can be used to combine the supervised and nonsupervised learning processes.

  19. Validation of automated supervised segmentation of multibeam backscatter data from the Chatham Rise, New Zealand

    NASA Astrophysics Data System (ADS)

    Hillman, Jess I. T.; Lamarche, Geoffroy; Pallentin, Arne; Pecher, Ingo A.; Gorman, Andrew R.; Schneider von Deimling, Jens

    2018-06-01

    Using automated supervised segmentation of multibeam backscatter data to delineate seafloor substrates is a relatively novel technique. Low-frequency multibeam echosounders (MBES), such as the 12-kHz EM120, present particular difficulties since the signal can penetrate several metres into the seafloor, depending on substrate type. We present a case study illustrating how a non-targeted dataset may be used to derive information from multibeam backscatter data regarding distribution of substrate types. The results allow us to assess limitations associated with low frequency MBES where sub-bottom layering is present, and test the accuracy of automated supervised segmentation performed using SonarScope® software. This is done through comparison of predicted and observed substrate from backscatter facies-derived classes and substrate data, reinforced using quantitative statistical analysis based on a confusion matrix. We use sediment samples, video transects and sub-bottom profiles acquired on the Chatham Rise, east of New Zealand. Inferences on the substrate types are made using the Generic Seafloor Acoustic Backscatter (GSAB) model, and the extents of the backscatter classes are delineated by automated supervised segmentation. Correlating substrate data to backscatter classes revealed that backscatter amplitude may correspond to lithologies up to 4 m below the seafloor. Our results emphasise several issues related to substrate characterisation using backscatter classification, primarily because the GSAB model does not only relate to grain size and roughness properties of substrate, but also accounts for other parameters that influence backscatter. Better understanding these limitations allows us to derive first-order interpretations of sediment properties from automated supervised segmentation.

  20. The Past, Present and Future of Army Dietetics

    DTIC Science & Technology

    1989-03-17

    THIS PAGE ("olin Data En ~tered) REPOT DCUMNTATON ~dEREAD INSTRUCTIONSREPOT DCUMNTATON AGEBEFORE COMPLETING FORM I. REPORT NUMBER 2. GOVT ACCESSION No...JAN , 1473 EoTlomorI bOV GSIS OSOLETE UNCLASSIFIED SECURITY CLASSIFICATIOpt OF THIS PAsE (W?? en 0ate Entered) UNCLASSIFIED SECURITY CLASSIFICATION...began to perform the following duties: *instruction of patients with diabetes in measuring and weighing food *supervision of mess personnel assigned to

Top