SVM and SVM Ensembles in Breast Cancer Prediction.
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
SVM and SVM Ensembles in Breast Cancer Prediction
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807
A linear-RBF multikernel SVM to classify big text corpora.
Romero, R; Iglesias, E L; Borrajo, L
2015-01-01
Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers.
Construction accident narrative classification: An evaluation of text mining techniques.
Goh, Yang Miang; Ubeynarayana, C U
2017-11-01
Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cho, Ming-Yuan; Hoang, Thi Thom
2017-01-01
Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.
Liu, Jingfang; Zhang, Pengzhu; Lu, Yingjie
2014-11-01
User-generated medical messages on Internet contain extensive information related to adverse drug reactions (ADRs) and are known as valuable resources for post-marketing drug surveillance. The aim of this study was to find an effective method to identify messages related to ADRs automatically from online user reviews. We conducted experiments on online user reviews using different feature set and different classification technique. Firstly, the messages from three communities, allergy community, schizophrenia community and pain management community, were collected, the 3000 messages were annotated. Secondly, the N-gram-based features set and medical domain-specific features set were generated. Thirdly, three classification techniques, SVM, C4.5 and Naïve Bayes, were used to perform classification tasks separately. Finally, we evaluated the performance of different method using different feature set and different classification technique by comparing the metrics including accuracy and F-measure. In terms of accuracy, the accuracy of SVM classifier was higher than 0.8, the accuracy of C4.5 classifier or Naïve Bayes classifier was lower than 0.8; meanwhile, the combination feature sets including n-gram-based feature set and domain-specific feature set consistently outperformed single feature set. In terms of F-measure, the highest F-measure is 0.895 which was achieved by using combination feature sets and a SVM classifier. In all, we can get the best classification performance by using combination feature sets and SVM classifier. By using combination feature sets and SVM classifier, we can get an effective method to identify messages related to ADRs automatically from online user reviews.
Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong
2015-09-01
Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.
NASA Astrophysics Data System (ADS)
Zhang, Yanjiao; Lai, Xiaoping; Zeng, Qiuyao; Li, Linfang; Lin, Lin; Li, Shaoxin; Liu, Zhiming; Su, Chengkang; Qi, Minni; Guo, Zhouyi
2018-03-01
This study aims to classify low-grade and high-grade bladder cancer (BC) patients using serum surface-enhanced Raman scattering (SERS) spectra and support vector machine (SVM) algorithms. Serum SERS spectra are acquired from 88 serum samples with silver nanoparticles as the SERS-active substrate. Diagnostic accuracies of 96.4% and 95.4% are obtained when differentiating the serum SERS spectra of all BC patients versus normal subjects and low-grade versus high-grade BC patients, respectively, with optimal SVM classifier models. This study demonstrates that the serum SERS technique combined with SVM has great potential to noninvasively detect and classify high-grade and low-grade BC patients.
Predicting breast cancer using an expression values weighted clinical classifier.
Thomas, Minta; De Brabanter, Kris; Suykens, Johan A K; De Moor, Bart
2014-12-31
Clinical data, such as patient history, laboratory analysis, ultrasound parameters-which are the basis of day-to-day clinical decision support-are often used to guide the clinical management of cancer in the presence of microarray data. Several data fusion techniques are available to integrate genomics or proteomics data, but only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. To improve clinical management, these data should be fully exploited. This requires efficient algorithms to integrate these data sets and design a final classifier. LS-SVM classifiers and generalized eigenvalue/singular value decompositions are successfully used in many bioinformatics applications for prediction tasks. While bringing up the benefits of these two techniques, we propose a machine learning approach, a weighted LS-SVM classifier to integrate two data sources: microarray and clinical parameters. We compared and evaluated the proposed methods on five breast cancer case studies. Compared to LS-SVM classifier on individual data sets, generalized eigenvalue decomposition (GEVD) and kernel GEVD, the proposed weighted LS-SVM classifier offers good prediction performance, in terms of test area under ROC Curve (AUC), on all breast cancer case studies. Thus a clinical classifier weighted with microarray data set results in significantly improved diagnosis, prognosis and prediction responses to therapy. The proposed model has been shown as a promising mathematical framework in both data fusion and non-linear classification problems.
Tuning support vector machines for minimax and Neyman-Pearson classification.
Davenport, Mark A; Baraniuk, Richard G; Scott, Clayton D
2010-10-01
This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and Neyman-Pearson criteria. In principle, these criteria can be optimized in a straightforward way using a cost-sensitive SVM. In practice, however, because these criteria require especially accurate error estimation, standard techniques for tuning SVM parameters, such as cross-validation, can lead to poor classifier performance. To address this issue, we first prove that the usual cost-sensitive SVM, here called the 2C-SVM, is equivalent to another formulation called the 2nu-SVM. We then exploit a characterization of the 2nu-SVM parameter space to develop a simple yet powerful approach to error estimation based on smoothing. In an extensive experimental study, we demonstrate that smoothing significantly improves the accuracy of cross-validation error estimates, leading to dramatic performance gains. Furthermore, we propose coordinate descent strategies that offer significant gains in computational efficiency, with little to no loss in performance.
An expert support system for breast cancer diagnosis using color wavelet features.
Issac Niwas, S; Palanisamy, P; Chibbar, Rajni; Zhang, W J
2012-10-01
Breast cancer diagnosis can be done through the pathologic assessments of breast tissue samples such as core needle biopsy technique. The result of analysis on this sample by pathologist is crucial for breast cancer patient. In this paper, nucleus of tissue samples are investigated after decomposition by means of the Log-Gabor wavelet on HSV color domain and an algorithm is developed to compute the color wavelet features. These features are used for breast cancer diagnosis using Support Vector Machine (SVM) classifier algorithm. The ability of properly trained SVM is to correctly classify patterns and make them particularly suitable for use in an expert system that aids in the diagnosis of cancer tissue samples. The results are compared with other multivariate classifiers such as Naïves Bayes classifier and Artificial Neural Network. The overall accuracy of the proposed method using SVM classifier will be further useful for automation in cancer diagnosis.
NASA Astrophysics Data System (ADS)
Li, Shaoxin; Zhang, Yanjiao; Xu, Junfa; Li, Linfang; Zeng, Qiuyao; Lin, Lin; Guo, Zhouyi; Liu, Zhiming; Xiong, Honglian; Liu, Songhao
2014-09-01
This study aims to present a noninvasive prostate cancer screening methods using serum surface-enhanced Raman scattering (SERS) and support vector machine (SVM) techniques through peripheral blood sample. SERS measurements are performed using serum samples from 93 prostate cancer patients and 68 healthy volunteers by silver nanoparticles. Three types of kernel functions including linear, polynomial, and Gaussian radial basis function (RBF) are employed to build SVM diagnostic models for classifying measured SERS spectra. For comparably evaluating the performance of SVM classification models, the standard multivariate statistic analysis method of principal component analysis (PCA) is also applied to classify the same datasets. The study results show that for the RBF kernel SVM diagnostic model, the diagnostic accuracy of 98.1% is acquired, which is superior to the results of 91.3% obtained from PCA methods. The receiver operating characteristic curve of diagnostic models further confirm above research results. This study demonstrates that label-free serum SERS analysis technique combined with SVM diagnostic algorithm has great potential for noninvasive prostate cancer screening.
Applying machine-learning techniques to Twitter data for automatic hazard-event classification.
NASA Astrophysics Data System (ADS)
Filgueira, R.; Bee, E. J.; Diaz-Doce, D.; Poole, J., Sr.; Singh, A.
2017-12-01
The constant flow of information offered by tweets provides valuable information about all sorts of events at a high temporal and spatial resolution. Over the past year we have been analyzing in real-time geological hazards/phenomenon, such as earthquakes, volcanic eruptions, landslides, floods or the aurora, as part of the GeoSocial project, by geo-locating tweets filtered by keywords in a web-map. However, not all the filtered tweets are related with hazard/phenomenon events. This work explores two classification techniques for automatic hazard-event categorization based on tweets about the "Aurora". First, tweets were filtered using aurora-related keywords, removing stop words and selecting the ones written in English. For classifying the remaining between "aurora-event" or "no-aurora-event" categories, we compared two state-of-art techniques: Support Vector Machine (SVM) and Deep Convolutional Neural Networks (CNN) algorithms. Both approaches belong to the family of supervised learning algorithms, which make predictions based on labelled training dataset. Therefore, we created a training dataset by tagging 1200 tweets between both categories. The general form of SVM is used to separate two classes by a function (kernel). We compared the performance of four different kernels (Linear Regression, Logistic Regression, Multinomial Naïve Bayesian and Stochastic Gradient Descent) provided by Scikit-Learn library using our training dataset to build the SVM classifier. The results shown that the Logistic Regression (LR) gets the best accuracy (87%). So, we selected the SVM-LR classifier to categorise a large collection of tweets using the "dispel4py" framework.Later, we developed a CNN classifier, where the first layer embeds words into low-dimensional vectors. The next layer performs convolutions over the embedded word vectors. Results from the convolutional layer are max-pooled into a long feature vector, which is classified using a softmax layer. The CNN's accuracy is lower (83%) than the SVM-LR, since the algorithm needs a bigger training dataset to increase its accuracy. We used TensorFlow framework for applying CNN classifier to the same collection of tweets.In future we will modify both classifiers to work with other geo-hazards, use larger training datasets and apply them in real-time.
NASA Astrophysics Data System (ADS)
Salehi, Hassan S.; Li, Hai; Merkulov, Alex; Kumavor, Patrick D.; Vavadi, Hamed; Sanders, Melinda; Kueck, Angela; Brewer, Molly A.; Zhu, Quing
2016-04-01
Most ovarian cancers are diagnosed at advanced stages due to the lack of efficacious screening techniques. Photoacoustic tomography (PAT) has a potential to image tumor angiogenesis and detect early neovascular changes of the ovary. We have developed a coregistered PAT and ultrasound (US) prototype system for real-time assessment of ovarian masses. Features extracted from PAT and US angular beams, envelopes, and images were input to a logistic classifier and a support vector machine (SVM) classifier to diagnose ovaries as benign or malignant. A total of 25 excised ovaries of 15 patients were studied and the logistic and SVM classifiers achieved sensitivities of 70.4 and 87.7%, and specificities of 95.6 and 97.9%, respectively. Furthermore, the ovaries of two patients were noninvasively imaged using the PAT/US system before surgical excision. By using five significant features and the logistic classifier, 12 out of 14 images (86% sensitivity) from a malignant ovarian mass and all 17 images (100% specificity) from a benign mass were accurately classified; the SVM correctly classified 10 out of 14 malignant images (71% sensitivity) and all 17 benign images (100% specificity). These initial results demonstrate the clinical potential of the PAT/US technique for ovarian cancer diagnosis.
Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng
2013-01-01
In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR.
NASA Astrophysics Data System (ADS)
Leena, N.; Saju, K. K.
2018-04-01
Nutritional deficiencies in plants are a major concern for farmers as it affects productivity and thus profit. The work aims to classify nutritional deficiencies in maize plant in a non-destructive mannerusing image processing and machine learning techniques. The colored images of the leaves are analyzed and classified with multi-class support vector machine (SVM) method. Several images of maize leaves with known deficiencies like nitrogen, phosphorous and potassium (NPK) are used to train the SVM classifier prior to the classification of test images. The results show that the method was able to classify and identify nutritional deficiencies.
Optimization of Support Vector Machine (SVM) for Object Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew; Dhingra, Neil; Lu, Thomas T.; Chao, Tien-Hsin
2012-01-01
The Support Vector Machine (SVM) is a powerful algorithm, useful in classifying data into species. The SVMs implemented in this research were used as classifiers for the final stage in a Multistage Automatic Target Recognition (ATR) system. A single kernel SVM known as SVMlight, and a modified version known as a SVM with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SVM as a method for classification. From trial to trial, SVM produces consistent results.
NASA Astrophysics Data System (ADS)
Li, Shao-Xin; Zeng, Qiu-Yao; Li, Lin-Fang; Zhang, Yan-Jiao; Wan, Ming-Ming; Liu, Zhi-Ming; Xiong, Hong-Lian; Guo, Zhou-Yi; Liu, Song-Hao
2013-02-01
The ability of combining serum surface-enhanced Raman spectroscopy (SERS) with support vector machine (SVM) for improving classification esophageal cancer patients from normal volunteers is investigated. Two groups of serum SERS spectra based on silver nanoparticles (AgNPs) are obtained: one group from patients with pathologically confirmed esophageal cancer (n=30) and the other group from healthy volunteers (n=31). Principal components analysis (PCA), conventional SVM (C-SVM) and conventional SVM combination with PCA (PCA-SVM) methods are implemented to classify the same spectral dataset. Results show that a diagnostic accuracy of 77.0% is acquired for PCA technique, while diagnostic accuracies of 83.6% and 85.2% are obtained for C-SVM and PCA-SVM methods based on radial basis functions (RBF) models. The results prove that RBF SVM models are superior to PCA algorithm in classification serum SERS spectra. The study demonstrates that serum SERS in combination with SVM technique has great potential to provide an effective and accurate diagnostic schema for noninvasive detection of esophageal cancer.
Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng
2013-01-01
In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR. PMID:23536777
NASA Astrophysics Data System (ADS)
Quitadamo, L. R.; Cavrini, F.; Sbernini, L.; Riillo, F.; Bianchi, L.; Seri, S.; Saggio, G.
2017-02-01
Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported.
Multiclass Classification of Cardiac Arrhythmia Using Improved Feature Selection and SVM Invariants.
Mustaqeem, Anam; Anwar, Syed Muhammad; Majid, Muahammad
2018-01-01
Arrhythmia is considered a life-threatening disease causing serious health issues in patients, when left untreated. An early diagnosis of arrhythmias would be helpful in saving lives. This study is conducted to classify patients into one of the sixteen subclasses, among which one class represents absence of disease and the other fifteen classes represent electrocardiogram records of various subtypes of arrhythmias. The research is carried out on the dataset taken from the University of California at Irvine Machine Learning Data Repository. The dataset contains a large volume of feature dimensions which are reduced using wrapper based feature selection technique. For multiclass classification, support vector machine (SVM) based approaches including one-against-one (OAO), one-against-all (OAA), and error-correction code (ECC) are employed to detect the presence and absence of arrhythmias. The SVM method results are compared with other standard machine learning classifiers using varying parameters and the performance of the classifiers is evaluated using accuracy, kappa statistics, and root mean square error. The results show that OAO method of SVM outperforms all other classifiers by achieving an accuracy rate of 81.11% when used with 80/20 data split and 92.07% using 90/10 data split option.
Karan, Shivesh Kishore; Samadder, Sukha Ranjan
2016-08-01
One objective of the present study was to evaluate the performance of support vector machine (SVM)-based image classification technique with the maximum likelihood classification (MLC) technique for a rapidly changing landscape of an open-cast mine. The other objective was to assess the change in land use pattern due to coal mining from 2006 to 2016. Assessing the change in land use pattern accurately is important for the development and monitoring of coalfields in conjunction with sustainable development. For the present study, Landsat 5 Thematic Mapper (TM) data of 2006 and Landsat 8 Operational Land Imager (OLI)/Thermal Infrared Sensor (TIRS) data of 2016 of a part of Jharia Coalfield, Dhanbad, India, were used. The SVM classification technique provided greater overall classification accuracy when compared to the MLC technique in classifying heterogeneous landscape with limited training dataset. SVM exceeded MLC in handling a difficult challenge of classifying features having near similar reflectance on the mean signature plot, an improvement of over 11 % was observed in classification of built-up area, and an improvement of 24 % was observed in classification of surface water using SVM; similarly, the SVM technique improved the overall land use classification accuracy by almost 6 and 3 % for Landsat 5 and Landsat 8 images, respectively. Results indicated that land degradation increased significantly from 2006 to 2016 in the study area. This study will help in quantifying the changes and can also serve as a basis for further decision support system studies aiding a variety of purposes such as planning and management of mines and environmental impact assessment.
2014-01-01
Background Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. Results The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Conclusion Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database. PMID:24970564
Palaniappan, Rajkumar; Sundaraj, Kenneth; Sundaraj, Sebastian
2014-06-27
Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database.
A SVM-based method for sentiment analysis in Persian language
NASA Astrophysics Data System (ADS)
Hajmohammadi, Mohammad Sadegh; Ibrahim, Roliana
2013-03-01
Persian language is the official language of Iran, Tajikistan and Afghanistan. Local online users often represent their opinions and experiences on the web with written Persian. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product. In this paper, standard machine learning techniques SVM and naive Bayes are incorporated into the domain of online Persian Movie reviews to automatically classify user reviews as positive or negative and performance of these two classifiers is compared with each other in this language. The effects of feature presentations on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The SVM classifier achieves as well as or better accuracy than naive Bayes in Persian movie. Unigrams are proved better features than bigrams and trigrams in capturing Persian sentiment orientation.
Applications of Support Vector Machines In Chemo And Bioinformatics
NASA Astrophysics Data System (ADS)
Jayaraman, V. K.; Sundararajan, V.
2010-10-01
Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.
Wahba, Maram A; Ashour, Amira S; Napoleon, Sameh A; Abd Elnaby, Mustafa M; Guo, Yanhui
2017-12-01
Basal cell carcinoma is one of the most common malignant skin lesions. Automated lesion identification and classification using image processing techniques is highly required to reduce the diagnosis errors. In this study, a novel technique is applied to classify skin lesion images into two classes, namely the malignant Basal cell carcinoma and the benign nevus. A hybrid combination of bi-dimensional empirical mode decomposition and gray-level difference method features is proposed after hair removal. The combined features are further classified using quadratic support vector machine (Q-SVM). The proposed system has achieved outstanding performance of 100% accuracy, sensitivity and specificity compared to other support vector machine procedures as well as with different extracted features. Basal Cell Carcinoma is effectively classified using Q-SVM with the proposed combined features.
Automatic Cataract Hardness Classification Ex Vivo by Ultrasound Techniques.
Caixinha, Miguel; Santos, Mário; Santos, Jaime
2016-04-01
To demonstrate the feasibility of a new methodology for cataract hardness characterization and automatic classification using ultrasound techniques, different cataract degrees were induced in 210 porcine lenses. A 25-MHz ultrasound transducer was used to obtain acoustical parameters (velocity and attenuation) and backscattering signals. B-Scan and parametric Nakagami images were constructed. Ninety-seven parameters were extracted and subjected to a Principal Component Analysis. Bayes, K-Nearest-Neighbours, Fisher Linear Discriminant and Support Vector Machine (SVM) classifiers were used to automatically classify the different cataract severities. Statistically significant increases with cataract formation were found for velocity, attenuation, mean brightness intensity of the B-Scan images and mean Nakagami m parameter (p < 0.01). The four classifiers showed a good performance for healthy versus cataractous lenses (F-measure ≥ 92.68%), while for initial versus severe cataracts the SVM classifier showed the higher performance (90.62%). The results showed that ultrasound techniques can be used for non-invasive cataract hardness characterization and automatic classification. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.
CNN-SVM for Microvascular Morphological Type Recognition with Data Augmentation.
Xue, Di-Xiu; Zhang, Rong; Feng, Hui; Wang, Ya-Lei
2016-01-01
This paper focuses on the problem of feature extraction and the classification of microvascular morphological types to aid esophageal cancer detection. We present a patch-based system with a hybrid SVM model with data augmentation for intraepithelial papillary capillary loop recognition. A greedy patch-generating algorithm and a specialized CNN named NBI-Net are designed to extract hierarchical features from patches. We investigate a series of data augmentation techniques to progressively improve the prediction invariance of image scaling and rotation. For classifier boosting, SVM is used as an alternative to softmax to enhance generalization ability. The effectiveness of CNN feature representation ability is discussed for a set of widely used CNN models, including AlexNet, VGG-16, and GoogLeNet. Experiments are conducted on the NBI-ME dataset. The recognition rate is up to 92.74% on the patch level with data augmentation and classifier boosting. The results show that the combined CNN-SVM model beats models of traditional features with SVM as well as the original CNN with softmax. The synthesis results indicate that our system is able to assist clinical diagnosis to a certain extent.
Trainor, Patrick J; DeFilippis, Andrew P; Rai, Shesh N
2017-06-21
Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k -Nearest Neighbors ( k -NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction, classifier performance (least to greatest error) was ranked as follows: SVM, Random Forest, Naïve Bayes, sPLS-DA, Neural Networks, PLS-DA and k -NN classifiers. When non-normal error distributions were introduced, the performance of PLS-DA and k -NN classifiers deteriorated further relative to the remaining techniques. Over the real datasets, a trend of better performance of SVM and Random Forest classifier performance was observed.
Pirooznia, Mehdi; Deng, Youping
2006-12-12
Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
Effect of finite sample size on feature selection and classification: a simulation study.
Way, Ted W; Sahiner, Berkman; Hadjiiski, Lubomir M; Chan, Heang-Ping
2010-02-01
The small number of samples available for training and testing is often the limiting factor in finding the most effective features and designing an optimal computer-aided diagnosis (CAD) system. Training on a limited set of samples introduces bias and variance in the performance of a CAD system relative to that trained with an infinite sample size. In this work, the authors conducted a simulation study to evaluate the performances of various combinations of classifiers and feature selection techniques and their dependence on the class distribution, dimensionality, and the training sample size. The understanding of these relationships will facilitate development of effective CAD systems under the constraint of limited available samples. Three feature selection techniques, the stepwise feature selection (SFS), sequential floating forward search (SFFS), and principal component analysis (PCA), and two commonly used classifiers, Fisher's linear discriminant analysis (LDA) and support vector machine (SVM), were investigated. Samples were drawn from multidimensional feature spaces of multivariate Gaussian distributions with equal or unequal covariance matrices and unequal means, and with equal covariance matrices and unequal means estimated from a clinical data set. Classifier performance was quantified by the area under the receiver operating characteristic curve Az. The mean Az values obtained by resubstitution and hold-out methods were evaluated for training sample sizes ranging from 15 to 100 per class. The number of simulated features available for selection was chosen to be 50, 100, and 200. It was found that the relative performance of the different combinations of classifier and feature selection method depends on the feature space distributions, the dimensionality, and the available training sample sizes. The LDA and SVM with radial kernel performed similarly for most of the conditions evaluated in this study, although the SVM classifier showed a slightly higher hold-out performance than LDA for some conditions and vice versa for other conditions. PCA was comparable to or better than SFS and SFFS for LDA at small samples sizes, but inferior for SVM with polynomial kernel. For the class distributions simulated from clinical data, PCA did not show advantages over the other two feature selection methods. Under this condition, the SVM with radial kernel performed better than the LDA when few training samples were available, while LDA performed better when a large number of training samples were available. None of the investigated feature selection-classifier combinations provided consistently superior performance under the studied conditions for different sample sizes and feature space distributions. In general, the SFFS method was comparable to the SFS method while PCA may have an advantage for Gaussian feature spaces with unequal covariance matrices. The performance of the SVM with radial kernel was better than, or comparable to, that of the SVM with polynomial kernel under most conditions studied.
Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin
2017-01-01
Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization. PMID:28599282
Amaral, Jorge L M; Lopes, Agnaldo J; Jansen, José M; Faria, Alvaro C D; Melo, Pedro L
2013-12-01
The purpose of this study was to develop an automatic classifier to increase the accuracy of the forced oscillation technique (FOT) for diagnosing early respiratory abnormalities in smoking patients. The data consisted of FOT parameters obtained from 56 volunteers, 28 healthy and 28 smokers with low tobacco consumption. Many supervised learning techniques were investigated, including logistic linear classifiers, k nearest neighbor (KNN), neural networks and support vector machines (SVM). To evaluate performance, the ROC curve of the most accurate parameter was established as baseline. To determine the best input features and classifier parameters, we used genetic algorithms and a 10-fold cross-validation using the average area under the ROC curve (AUC). In the first experiment, the original FOT parameters were used as input. We observed a significant improvement in accuracy (KNN=0.89 and SVM=0.87) compared with the baseline (0.77). The second experiment performed a feature selection on the original FOT parameters. This selection did not cause any significant improvement in accuracy, but it was useful in identifying more adequate FOT parameters. In the third experiment, we performed a feature selection on the cross products of the FOT parameters. This selection resulted in a further increase in AUC (KNN=SVM=0.91), which allows for high diagnostic accuracy. In conclusion, machine learning classifiers can help identify early smoking-induced respiratory alterations. The use of FOT cross products and the search for the best features and classifier parameters can markedly improve the performance of machine learning classifiers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Zhang, Xin; Yan, Lin-Feng; Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin
2017-07-18
Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization.
NASA Astrophysics Data System (ADS)
Sehad, Mounir; Lazri, Mourad; Ameur, Soltane
2017-03-01
In this work, a new rainfall estimation technique based on the high spatial and temporal resolution of the Spinning Enhanced Visible and Infra Red Imager (SEVIRI) aboard the Meteosat Second Generation (MSG) is presented. This work proposes efficient scheme rainfall estimation based on two multiclass support vector machine (SVM) algorithms: SVM_D for daytime and SVM_N for night time rainfall estimations. Both SVM models are trained using relevant rainfall parameters based on optical, microphysical and textural cloud proprieties. The cloud parameters are derived from the Spectral channels of the SEVIRI MSG radiometer. The 3-hourly and daily accumulated rainfall are derived from the 15 min-rainfall estimation given by the SVM classifiers for each MSG observation image pixel. The SVMs were trained with ground meteorological radar precipitation scenes recorded from November 2006 to March 2007 over the north of Algeria located in the Mediterranean region. Further, the SVM_D and SVM_N models were used to estimate 3-hourly and daily rainfall using data set gathered from November 2010 to March 2011 over north Algeria. The results were validated against collocated rainfall observed by rain gauge network. Indeed, the statistical scores given by correlation coefficient, bias, root mean square error and mean absolute error, showed good accuracy of rainfall estimates by the present technique. Moreover, rainfall estimates of our technique were compared with two high accuracy rainfall estimates methods based on MSG SEVIRI imagery namely: random forests (RF) based approach and an artificial neural network (ANN) based technique. The findings of the present technique indicate higher correlation coefficient (3-hourly: 0.78; daily: 0.94), and lower mean absolute error and root mean square error values. The results show that the new technique assign 3-hourly and daily rainfall with good and better accuracy than ANN technique and (RF) model.
Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection
NASA Astrophysics Data System (ADS)
Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu
Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.
Classification of Multiple Chinese Liquors by Means of a QCM-based E-Nose and MDS-SVM Classifier.
Li, Qiang; Gu, Yu; Jia, Jing
2017-01-30
Chinese liquors are internationally well-known fermentative alcoholic beverages. They have unique flavors attributable to the use of various bacteria and fungi, raw materials, and production processes. Developing a novel, rapid, and reliable method to identify multiple Chinese liquors is of positive significance. This paper presents a pattern recognition system for classifying ten brands of Chinese liquors based on multidimensional scaling (MDS) and support vector machine (SVM) algorithms in a quartz crystal microbalance (QCM)-based electronic nose (e-nose) we designed. We evaluated the comprehensive performance of the MDS-SVM classifier that predicted all ten brands of Chinese liquors individually. The prediction accuracy (98.3%) showed superior performance of the MDS-SVM classifier over the back-propagation artificial neural network (BP-ANN) classifier (93.3%) and moving average-linear discriminant analysis (MA-LDA) classifier (87.6%). The MDS-SVM classifier has reasonable reliability, good fitting and prediction (generalization) performance in classification of the Chinese liquors. Taking both application of the e-nose and validation of the MDS-SVM classifier into account, we have thus created a useful method for the classification of multiple Chinese liquors.
Multi-view L2-SVM and its multi-view core vector machine.
Huang, Chengquan; Chung, Fu-lai; Wang, Shitong
2016-03-01
In this paper, a novel L2-SVM based classifier Multi-view L2-SVM is proposed to address multi-view classification tasks. The proposed Multi-view L2-SVM classifier does not have any bias in its objective function and hence has the flexibility like μ-SVC in the sense that the number of the yielded support vectors can be controlled by a pre-specified parameter. The proposed Multi-view L2-SVM classifier can make full use of the coherence and the difference of different views through imposing the consensus among multiple views to improve the overall classification performance. Besides, based on the generalized core vector machine GCVM, the proposed Multi-view L2-SVM classifier is extended into its GCVM version MvCVM which can realize its fast training on large scale multi-view datasets, with its asymptotic linear time complexity with the sample size and its space complexity independent of the sample size. Our experimental results demonstrated the effectiveness of the proposed Multi-view L2-SVM classifier for small scale multi-view datasets and the proposed MvCVM classifier for large scale multi-view datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.
An assessment of support vector machines for land cover classification
Huang, C.; Davis, L.S.; Townshend, J.R.G.
2002-01-01
The support vector machine (SVM) is a group of theoretically superior machine learning algorithms. It was found competitive with the best available machine learning algorithms in classifying high-dimensional data sets. This paper gives an introduction to the theoretical development of the SVM and an experimental evaluation of its accuracy, stability and training speed in deriving land cover classifications from satellite images. The SVM was compared to three other popular classifiers, including the maximum likelihood classifier (MLC), neural network classifiers (NNC) and decision tree classifiers (DTC). The impacts of kernel configuration on the performance of the SVM and of the selection of training data and input variables on the four classifiers were also evaluated in this experiment.
An implementation of support vector machine on sentiment classification of movie reviews
NASA Astrophysics Data System (ADS)
Yulietha, I. M.; Faraby, S. A.; Adiwijaya; Widyaningtyas, W. C.
2018-03-01
With technological advances, all information about movie is available on the internet. If the information is processed properly, it will get the quality of the information. This research proposes to the classify sentiments on movie review documents. This research uses Support Vector Machine (SVM) method because it can classify high dimensional data in accordance with the data used in this research in the form of text. Support Vector Machine is a popular machine learning technique for text classification because it can classify by learning from a collection of documents that have been classified previously and can provide good result. Based on number of datasets, the 90-10 composition has the best result that is 85.6%. Based on SVM kernel, kernel linear with constant 1 has the best result that is 84.9%
NASA Astrophysics Data System (ADS)
Du, Peijun; Tan, Kun; Xing, Xiaoshi
2010-12-01
Combining Support Vector Machine (SVM) with wavelet analysis, we constructed wavelet SVM (WSVM) classifier based on wavelet kernel functions in Reproducing Kernel Hilbert Space (RKHS). In conventional kernel theory, SVM is faced with the bottleneck of kernel parameter selection which further results in time-consuming and low classification accuracy. The wavelet kernel in RKHS is a kind of multidimensional wavelet function that can approximate arbitrary nonlinear functions. Implications on semiparametric estimation are proposed in this paper. Airborne Operational Modular Imaging Spectrometer II (OMIS II) hyperspectral remote sensing image with 64 bands and Reflective Optics System Imaging Spectrometer (ROSIS) data with 115 bands were used to experiment the performance and accuracy of the proposed WSVM classifier. The experimental results indicate that the WSVM classifier can obtain the highest accuracy when using the Coiflet Kernel function in wavelet transform. In contrast with some traditional classifiers, including Spectral Angle Mapping (SAM) and Minimum Distance Classification (MDC), and SVM classifier using Radial Basis Function kernel, the proposed wavelet SVM classifier using the wavelet kernel function in Reproducing Kernel Hilbert Space is capable of improving classification accuracy obviously.
Automatic system for radar echoes filtering based on textural features and artificial intelligence
NASA Astrophysics Data System (ADS)
Hedir, Mehdia; Haddad, Boualem
2017-10-01
Among the very popular Artificial Intelligence (AI) techniques, Artificial Neural Network (ANN) and Support Vector Machine (SVM) have been retained to process Ground Echoes (GE) on meteorological radar images taken from Setif (Algeria) and Bordeaux (France) with different climates and topologies. To achieve this task, AI techniques were associated with textural approaches. We used Gray Level Co-occurrence Matrix (GLCM) and Completed Local Binary Pattern (CLBP); both methods were largely used in image analysis. The obtained results show the efficiency of texture to preserve precipitations forecast on both sites with the accuracy of 98% on Bordeaux and 95% on Setif despite the AI technique used. 98% of GE are suppressed with SVM, this rate is outperforming ANN skills. CLBP approach associated to SVM eliminates 98% of GE and preserves precipitations forecast on Bordeaux site better than on Setif's, while it exhibits lower accuracy with ANN. SVM classifier is well adapted to the proposed application since the average filtering rate is 95-98% with texture and 92-93% with CLBP. These approaches allow removing Anomalous Propagations (APs) too with a better accuracy of 97.15% with texture and SVM. In fact, textural features associated to AI techniques are an efficient tool for incoherent radars to surpass spurious echoes.
Combined data mining/NIR spectroscopy for purity assessment of lime juice
NASA Astrophysics Data System (ADS)
Shafiee, Sahameh; Minaei, Saeid
2018-06-01
This paper reports the data mining study on the NIR spectrum of lime juice samples to determine their purity (natural or synthetic). NIR spectra for 72 pure and synthetic lime juice samples were recorded in reflectance mode. Sample outliers were removed using PCA analysis. Different data mining techniques for feature selection (Genetic Algorithm (GA)) and classification (including the radial basis function (RBF) network, Support Vector Machine (SVM), and Random Forest (RF) tree) were employed. Based on the results, SVM proved to be the most accurate classifier as it achieved the highest accuracy (97%) using the raw spectrum information. The classifier accuracy dropped to 93% when selected feature vector by GA search method was applied as classifier input. It can be concluded that some relevant features which produce good performance with the SVM classifier are removed by feature selection. Also, reduced spectra using PCA do not show acceptable performance (total accuracy of 66% by RBFNN), which indicates that dimensional reduction methods such as PCA do not always lead to more accurate results. These findings demonstrate the potential of data mining combination with near-infrared spectroscopy for monitoring lime juice quality in terms of natural or synthetic nature.
NASA Astrophysics Data System (ADS)
Kale, Mandar; Mukhopadhyay, Sudipta; Dash, Jatindra K.; Garg, Mandeep; Khandelwal, Niranjan
2016-03-01
Interstitial lung disease (ILD) is complicated group of pulmonary disorders. High Resolution Computed Tomography (HRCT) considered to be best imaging technique for analysis of different pulmonary disorders. HRCT findings can be categorised in several patterns viz. Consolidation, Emphysema, Ground Glass Opacity, Nodular, Normal etc. based on their texture like appearance. Clinician often find it difficult to diagnosis these pattern because of their complex nature. In such scenario computer-aided diagnosis system could help clinician to identify patterns. Several approaches had been proposed for classification of ILD patterns. This includes computation of textural feature and training /testing of classifier such as artificial neural network (ANN), support vector machine (SVM) etc. In this paper, wavelet features are calculated from two different ILD database, publically available MedGIFT ILD database and private ILD database, followed by performance evaluation of ANN and SVM classifiers in terms of average accuracy. It is found that average classification accuracy by SVM is greater than ANN where trained and tested on same database. Investigation continued further to test variation in accuracy of classifier when training and testing is performed with alternate database and training and testing of classifier with database formed by merging samples from same class from two individual databases. The average classification accuracy drops when two independent databases used for training and testing respectively. There is significant improvement in average accuracy when classifiers are trained and tested with merged database. It infers dependency of classification accuracy on training data. It is observed that SVM outperforms ANN when same database is used for training and testing.
Tripathy, Rajesh Kumar; Dandapat, Samarendra
2017-04-01
The complex wavelet sub-band bi-spectrum (CWSB) features are proposed for detection and classification of myocardial infarction (MI), heart muscle disease (HMD) and bundle branch block (BBB) from 12-lead ECG. The dual tree CW transform of 12-lead ECG produces CW coefficients at different sub-bands. The higher-order CW analysis is used for evaluation of CWSB. The mean of the absolute value of CWSB, and the number of negative phase angle and the number of positive phase angle features from the phase of CWSB of 12-lead ECG are evaluated. Extreme learning machine and support vector machine (SVM) classifiers are used to evaluate the performance of CWSB features. Experimental results show that the proposed CWSB features of 12-lead ECG and the SVM classifier are successful for classification of various heart pathologies. The individual accuracy values for MI, HMD and BBB classes are obtained as 98.37, 97.39 and 96.40%, respectively, using SVM classifier and radial basis function kernel function. A comparison has also been made with existing 12-lead ECG-based cardiac disease detection techniques.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.
Becker, Natalia; Toedt, Grischa; Lichter, Peter; Benner, Axel
2011-05-09
Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data
2011-01-01
Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters. The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'. We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets. PMID:21554689
NASA Astrophysics Data System (ADS)
Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.
2017-10-01
Crop maps are essential inputs for the agricultural planning done at various governmental and agribusinesses agencies. Remote sensing offers timely and costs efficient technologies to identify and map crop types over large areas. Among the plethora of classification methods, Support Vector Machine (SVM) and Random Forest (RF) are widely used because of their proven performance. In this work, we study the synergic use of both methods by introducing a random forest kernel (RFK) in an SVM classifier. A time series of multispectral WorldView-2 images acquired over Mali (West Africa) in 2014 was used to develop our case study. Ground truth containing five common crop classes (cotton, maize, millet, peanut, and sorghum) were collected at 45 farms and used to train and test the classifiers. An SVM with the standard Radial Basis Function (RBF) kernel, a RF, and an SVM-RFK were trained and tested over 10 random training and test subsets generated from the ground data. Results show that the newly proposed SVM-RFK classifier can compete with both RF and SVM-RBF. The overall accuracies based on the spectral bands only are of 83, 82 and 83% respectively. Adding vegetation indices to the analysis result in the classification accuracy of 82, 81 and 84% for SVM-RFK, RF, and SVM-RBF respectively. Overall, it can be observed that the newly tested RFK can compete with SVM-RBF and RF classifiers in terms of classification accuracy.
Guinness, Robert E
2015-04-28
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.
Guinness, Robert E.
2015-01-01
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity. PMID:25928060
Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA
Ma, Xiaoqi
2015-01-01
A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. PMID:26543867
NASA Astrophysics Data System (ADS)
Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying
2018-06-01
In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.
Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian
2016-01-01
In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%–19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides. PMID:27187430
Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian
2016-05-11
In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%-19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides.
NASA Astrophysics Data System (ADS)
Cubillas, J. E.; Japitana, M.
2016-06-01
This study demonstrates the application of CIELAB, Color intensity, and One Dimensional Scalar Constancy as features for image recognition and classifying benthic habitats in an image with the coastal areas of Hinatuan, Surigao Del Sur, Philippines as the study area. The study area is composed of four datasets, namely: (a) Blk66L005, (b) Blk66L021, (c) Blk66L024, and (d) Blk66L0114. SVM optimization was performed in Matlab® software with the help of Parallel Computing Toolbox to hasten the SVM computing speed. The image used for collecting samples for SVM procedure was Blk66L0114 in which a total of 134,516 sample objects of mangrove, possible coral existence with rocks, sand, sea, fish pens and sea grasses were collected and processed. The collected samples were then used as training sets for the supervised learning algorithm and for the creation of class definitions. The learned hyper-planes separating one class from another in the multi-dimensional feature space can be thought of as a super feature which will then be used in developing the C (classifier) rule set in eCognition® software. The classification results of the sampling site yielded an accuracy of 98.85% which confirms the reliability of remote sensing techniques and analysis employed to orthophotos like the CIELAB, Color Intensity and One dimensional scalar constancy and the use of SVM classification algorithm in classifying benthic habitats.
An improved PSO-SVM model for online recognition defects in eddy current testing
NASA Astrophysics Data System (ADS)
Liu, Baoling; Hou, Dibo; Huang, Pingjie; Liu, Banteng; Tang, Huayi; Zhang, Wubo; Chen, Peihua; Zhang, Guangxin
2013-12-01
Accurate and rapid recognition of defects is essential for structural integrity and health monitoring of in-service device using eddy current (EC) non-destructive testing. This paper introduces a novel model-free method that includes three main modules: a signal pre-processing module, a classifier module and an optimisation module. In the signal pre-processing module, a kind of two-stage differential structure is proposed to suppress the lift-off fluctuation that could contaminate the EC signal. In the classifier module, multi-class support vector machine (SVM) based on one-against-one strategy is utilised for its good accuracy. In the optimisation module, the optimal parameters of classifier are obtained by an improved particle swarm optimisation (IPSO) algorithm. The proposed IPSO technique can improve convergence performance of the primary PSO through the following strategies: nonlinear processing of inertia weight, introductions of the black hole and simulated annealing model with extremum disturbance. The good generalisation ability of the IPSO-SVM model has been validated through adding additional specimen into the testing set. Experiments show that the proposed algorithm can achieve higher recognition accuracy and efficiency than other well-known classifiers and the superiorities are more obvious with less training set, which contributes to online application.
Wissel, Tobias; Pfeiffer, Tim; Frysch, Robert; Knight, Robert T.; Chang, Edward F.; Hinrichs, Hermann; Rieger, Jochem W.; Rose, Georg
2013-01-01
Objective Support Vector Machines (SVM) have developed into a gold standard for accurate classification in Brain-Computer-Interfaces (BCI). The choice of the most appropriate classifier for a particular application depends on several characteristics in addition to decoding accuracy. Here we investigate the implementation of Hidden Markov Models (HMM)for online BCIs and discuss strategies to improve their performance. Approach We compare the SVM, serving as a reference, and HMMs for classifying discrete finger movements obtained from the Electrocorticograms of four subjects doing a finger tapping experiment. The classifier decisions are based on a subset of low-frequency time domain and high gamma oscillation features. Main results We show that decoding optimization between the two approaches is due to the way features are extracted and selected and less dependent on the classifier. An additional gain in HMM performance of up to 6% was obtained by introducing model constraints. Comparable accuracies of up to 90% were achieved with both SVM and HMM with the high gamma cortical response providing the most important decoding information for both techniques. Significance We discuss technical HMM characteristics and adaptations in the context of the presented data as well as for general BCI applications. Our findings suggest that HMMs and their characteristics are promising for efficient online brain-computer interfaces. PMID:24045504
Application of machine learning on brain cancer multiclass classification
NASA Astrophysics Data System (ADS)
Panca, V.; Rustam, Z.
2017-07-01
Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.
Hummel, A D; Maciel, R F; Sousa, F S; Cohrs, F M; Falcão, A E J; Teixeira, F; Baptista, R; Mancini, F; da Costa, T M; Alves, D; Rodrigues, R G D S; Miranda, R; Pisa, I T
2011-05-01
The gold standard for nephrotoxicity and acute cellular rejection (ACR) is a biopsy, an invasive and expensive procedure. More efficient strategies to screen patients for biopsy are important from the clinical and financial points of view. The aim of this study was to evaluate various artificial intelligence techniques to screen for the need for a biopsy among patients suspected of nephrotoxicity or ACR during the first year after renal transplantation. We used classifiers like artificial neural networks (ANN), support vector machines (SVM), and Bayesian inference (BI) to indicate if the clinical course of the event suggestive of the need for a biopsy. Each classifier was evaluated by values of sensitivity and area under the ROC curve (AUC) for each of the classifiers. The technique that showed the best sensitivity value as an indicator for biopsy was SVM with an AUC of 0.79 and an accuracy rate of 79.86%. The results were better than those described in previous works. The accuracy for an indication of biopsy screening was efficient enough to become useful in clinical practice. Copyright © 2011 Elsevier Inc. All rights reserved.
Protein classification based on text document classification techniques.
Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith
2005-03-01
The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.
A Support Vector Machine-Based Gender Identification Using Speech Signal
NASA Astrophysics Data System (ADS)
Lee, Kye-Hwan; Kang, Sang-Ick; Kim, Deok-Hwan; Chang, Joon-Hyuk
We propose an effective voice-based gender identification method using a support vector machine (SVM). The SVM is a binary classification algorithm that classifies two groups by finding the voluntary nonlinear boundary in a feature space and is known to yield high classification performance. In the present work, we compare the identification performance of the SVM with that of a Gaussian mixture model (GMM)-based method using the mel frequency cepstral coefficients (MFCC). A novel approach of incorporating a features fusion scheme based on a combination of the MFCC and the fundamental frequency is proposed with the aim of improving the performance of gender identification. Experimental results demonstrate that the gender identification performance using the SVM is significantly better than that of the GMM-based scheme. Moreover, the performance is substantially improved when the proposed features fusion technique is applied.
Dandapat, Samarendra
2017-01-01
The complex wavelet sub-band bi-spectrum (CWSB) features are proposed for detection and classification of myocardial infarction (MI), heart muscle disease (HMD) and bundle branch block (BBB) from 12-lead ECG. The dual tree CW transform of 12-lead ECG produces CW coefficients at different sub-bands. The higher-order CW analysis is used for evaluation of CWSB. The mean of the absolute value of CWSB, and the number of negative phase angle and the number of positive phase angle features from the phase of CWSB of 12-lead ECG are evaluated. Extreme learning machine and support vector machine (SVM) classifiers are used to evaluate the performance of CWSB features. Experimental results show that the proposed CWSB features of 12-lead ECG and the SVM classifier are successful for classification of various heart pathologies. The individual accuracy values for MI, HMD and BBB classes are obtained as 98.37, 97.39 and 96.40%, respectively, using SVM classifier and radial basis function kernel function. A comparison has also been made with existing 12-lead ECG-based cardiac disease detection techniques. PMID:28894589
Mourão-Miranda, Janaina; Hardoon, David R.; Hahn, Tim; Marquand, Andre F.; Williams, Steve C.R.; Shawe-Taylor, John; Brammer, Michael
2011-01-01
Pattern recognition approaches, such as the Support Vector Machine (SVM), have been successfully used to classify groups of individuals based on their patterns of brain activity or structure. However these approaches focus on finding group differences and are not applicable to situations where one is interested in accessing deviations from a specific class or population. In the present work we propose an application of the one-class SVM (OC-SVM) to investigate if patterns of fMRI response to sad facial expressions in depressed patients would be classified as outliers in relation to patterns of healthy control subjects. We defined features based on whole brain voxels and anatomical regions. In both cases we found a significant correlation between the OC-SVM predictions and the patients' Hamilton Rating Scale for Depression (HRSD), i.e. the more depressed the patients were the more of an outlier they were. In addition the OC-SVM split the patient groups into two subgroups whose membership was associated with future response to treatment. When applied to region-based features the OC-SVM classified 52% of patients as outliers. However among the patients classified as outliers 70% did not respond to treatment and among those classified as non-outliers 89% responded to treatment. In addition 89% of the healthy controls were classified as non-outliers. PMID:21723950
Ghayab, Hadi Ratham Al; Li, Yan; Abdulla, Shahab; Diykh, Mohammed; Wan, Xiangkui
2016-06-01
Electroencephalogram (EEG) signals are used broadly in the medical fields. The main applications of EEG signals are the diagnosis and treatment of diseases such as epilepsy, Alzheimer, sleep problems and so on. This paper presents a new method which extracts and selects features from multi-channel EEG signals. This research focuses on three main points. Firstly, simple random sampling (SRS) technique is used to extract features from the time domain of EEG signals. Secondly, the sequential feature selection (SFS) algorithm is applied to select the key features and to reduce the dimensionality of the data. Finally, the selected features are forwarded to a least square support vector machine (LS_SVM) classifier to classify the EEG signals. The LS_SVM classifier classified the features which are extracted and selected from the SRS and the SFS. The experimental results show that the method achieves 99.90, 99.80 and 100 % for classification accuracy, sensitivity and specificity, respectively.
2012-03-01
with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random
Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram
2015-08-01
In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.
Kianmehr, Keivan; Alhajj, Reda
2008-09-01
In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.
NASA Astrophysics Data System (ADS)
Jing, Ya-Bing; Liu, Chang-Wen; Bi, Feng-Rong; Bi, Xiao-Yang; Wang, Xia; Shao, Kang
2017-07-01
Numerous vibration-based techniques are rarely used in diesel engines fault diagnosis in a direct way, due to the surface vibration signals of diesel engines with the complex non-stationary and nonlinear time-varying features. To investigate the fault diagnosis of diesel engines, fractal correlation dimension, wavelet energy and entropy as features reflecting the diesel engine fault fractal and energy characteristics are extracted from the decomposed signals through analyzing vibration acceleration signals derived from the cylinder head in seven different states of valve train. An intelligent fault detector FastICA-SVM is applied for diesel engine fault diagnosis and classification. The results demonstrate that FastICA-SVM achieves higher classification accuracy and makes better generalization performance in small samples recognition. Besides, the fractal correlation dimension and wavelet energy and entropy as the special features of diesel engine vibration signal are considered as input vectors of classifier FastICA-SVM and could produce the excellent classification results. The proposed methodology improves the accuracy of feature extraction and the fault diagnosis of diesel engines.
A comprehensive simulation study on classification of RNA-Seq data.
Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet
2017-01-01
RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.
SVM classifier on chip for melanoma detection.
Afifi, Shereen; GholamHosseini, Hamid; Sinha, Roopak
2017-07-01
Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM-based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy.
NASA Technical Reports Server (NTRS)
Kocurek, Michael J.
2005-01-01
The HARVIST project seeks to automatically provide an accurate, interactive interface to predict crop yield over the entire United States. In order to accomplish this goal, large images must be quickly and automatically classified by crop type. Current trained and untrained classification algorithms, while accurate, are highly inefficient when operating on large datasets. This project sought to develop new variants of two standard trained and untrained classification algorithms that are optimized to take advantage of the spatial nature of image data. The first algorithm, harvist-cluster, utilizes divide-and-conquer techniques to precluster an image in the hopes of increasing overall clustering speed. The second algorithm, harvistSVM, utilizes support vector machines (SVMs), a type of trained classifier. It seeks to increase classification speed by applying a "meta-SVM" to a quick (but inaccurate) SVM to approximate a slower, yet more accurate, SVM. Speedups were achieved by tuning the algorithm to quickly identify when the quick SVM was incorrect, and then reclassifying low-confidence pixels as necessary. Comparing the classification speeds of both algorithms to known baselines showed a slight speedup for large values of k (the number of clusters) for harvist-cluster, and a significant speedup for harvistSVM. Future work aims to automate the parameter tuning process required for harvistSVM, and further improve classification accuracy and speed. Additionally, this research will move documents created in Canvas into ArcGIS. The launch of the Mars Reconnaissance Orbiter (MRO) will provide a wealth of image data such as global maps of Martian weather and high resolution global images of Mars. The ability to store this new data in a georeferenced format will support future Mars missions by providing data for landing site selection and the search for water on Mars.
Testing of the Support Vector Machine for Binary-Class Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew
2011-01-01
The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results
NASA Astrophysics Data System (ADS)
Guo, Yiqing; Jia, Xiuping; Paull, David
2018-06-01
The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.
Zhang, Ming-Huan; Ma, Jun-Shan; Shen, Ying; Chen, Ying
2016-09-01
This study aimed to investigate the optimal support vector machines (SVM)-based classifier of duchenne muscular dystrophy (DMD) magnetic resonance imaging (MRI) images. T1-weighted (T1W) and T2-weighted (T2W) images of the 15 boys with DMD and 15 normal controls were obtained. Textural features of the images were extracted and wavelet decomposed, and then, principal features were selected. Scale transform was then performed for MRI images. Afterward, SVM-based classifiers of MRI images were analyzed based on the radical basis function and decomposition levels. The cost (C) parameter and kernel parameter [Formula: see text] were used for classification. Then, the optimal SVM-based classifier, expressed as [Formula: see text]), was identified by performance evaluation (sensitivity, specificity and accuracy). Eight of 12 textural features were selected as principal features (eigenvalues [Formula: see text]). The 16 SVM-based classifiers were obtained using combination of (C, [Formula: see text]), and those with lower C and [Formula: see text] values showed higher performances, especially classifier of [Formula: see text]). The SVM-based classifiers of T1W images showed higher performance than T1W images at the same decomposition level. The T1W images in classifier of [Formula: see text]) at level 2 decomposition showed the highest performance of all, and its overall correct sensitivity, specificity, and accuracy reached 96.9, 97.3, and 97.1 %, respectively. The T1W images in SVM-based classifier [Formula: see text] at level 2 decomposition showed the highest performance of all, demonstrating that it was the optimal classification for the diagnosis of DMD.
A mechatronics platform to study prosthetic hand control using EMG signals.
Geethanjali, P
2016-09-01
In this paper, a low-cost mechatronics platform for the design and development of robotic hands as well as a surface electromyogram (EMG) pattern recognition system is proposed. This paper also explores various EMG classification techniques using a low-cost electronics system in prosthetic hand applications. The proposed platform involves the development of a four channel EMG signal acquisition system; pattern recognition of acquired EMG signals; and development of a digital controller for a robotic hand. Four-channel surface EMG signals, acquired from ten healthy subjects for six different movements of the hand, were used to analyse pattern recognition in prosthetic hand control. Various time domain features were extracted and grouped into five ensembles to compare the influence of features in feature-selective classifiers (SLR) with widely considered non-feature-selective classifiers, such as neural networks (NN), linear discriminant analysis (LDA) and support vector machines (SVM) applied with different kernels. The results divulged that the average classification accuracy of the SVM, with a linear kernel function, outperforms other classifiers with feature ensembles, Hudgin's feature set and auto regression (AR) coefficients. However, the slight improvement in classification accuracy of SVM incurs more processing time and memory space in the low-level controller. The Kruskal-Wallis (KW) test also shows that there is no significant difference in the classification performance of SLR with Hudgin's feature set to that of SVM with Hudgin's features along with AR coefficients. In addition, the KW test shows that SLR was found to be better in respect to computation time and memory space, which is vital in a low-level controller. Similar to SVM, with a linear kernel function, other non-feature selective LDA and NN classifiers also show a slight improvement in performance using twice the features but with the drawback of increased memory space requirement and time. This prototype facilitated the study of various issues of pattern recognition and identified an efficient classifier, along with a feature ensemble, in the implementation of EMG controlled prosthetic hands in a laboratory setting at low-cost. This platform may help to motivate and facilitate prosthetic hand research in developing countries.
Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li
2011-01-01
Background Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Methodology/Principal Findings Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. Conclusions/Significance The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice. PMID:21359184
Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li
2011-02-16
Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice.
Feng, Zhichao; Rong, Pengfei; Cao, Peng; Zhou, Qingyu; Zhu, Wenwei; Yan, Zhimin; Liu, Qianyun; Wang, Wei
2018-04-01
To evaluate the diagnostic performance of machine-learning based quantitative texture analysis of CT images to differentiate small (≤ 4 cm) angiomyolipoma without visible fat (AMLwvf) from renal cell carcinoma (RCC). This single-institutional retrospective study included 58 patients with pathologically proven small renal mass (17 in AMLwvf and 41 in RCC groups). Texture features were extracted from the largest possible tumorous regions of interest (ROIs) by manual segmentation in preoperative three-phase CT images. Interobserver reliability and the Mann-Whitney U test were applied to select features preliminarily. Then support vector machine with recursive feature elimination (SVM-RFE) and synthetic minority oversampling technique (SMOTE) were adopted to establish discriminative classifiers, and the performance of classifiers was assessed. Of the 42 extracted features, 16 candidate features showed significant intergroup differences (P < 0.05) and had good interobserver agreement. An optimal feature subset including 11 features was further selected by the SVM-RFE method. The SVM-RFE+SMOTE classifier achieved the best performance in discriminating between small AMLwvf and RCC, with the highest accuracy, sensitivity, specificity and AUC of 93.9 %, 87.8 %, 100 % and 0.955, respectively. Machine learning analysis of CT texture features can facilitate the accurate differentiation of small AMLwvf from RCC. • Although conventional CT is useful for diagnosis of SRMs, it has limitations. • Machine-learning based CT texture analysis facilitate differentiation of small AMLwvf from RCC. • The highest accuracy of SVM-RFE+SMOTE classifier reached 93.9 %. • Texture analysis combined with machine-learning methods might spare unnecessary surgery for AMLwvf.
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
A case study on Discrete Wavelet Transform based Hurst exponent for epilepsy detection.
Madan, Saiby; Srivastava, Kajri; Sharmila, A; Mahalakshmi, P
2018-01-01
Epileptic seizures are manifestations of epilepsy. Careful analysis of EEG records can provide valuable insight and improved understanding of the mechanism causing epileptic disorders. The detection of epileptic form discharges in EEG is an important component in the diagnosis of epilepsy. As EEG signals are non-stationary, the conventional frequency and time domain analysis does not provide better accuracy. So, in this work an attempt has been made to provide an overview of the determination of epilepsy by implementation of Hurst exponent (HE)-based discrete wavelet transform techniques for feature extraction from EEG data sets obtained during ictal and pre ictal stages of affected person and finally classifying EEG signals using SVM and KNN Classifiers. The The highest accuracy of 99% is obtained using SVM.
SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.
SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306
Support vector machine as a binary classifier for automated object detection in remotely sensed data
NASA Astrophysics Data System (ADS)
Wardaya, P. D.
2014-02-01
In the present paper, author proposes the application of Support Vector Machine (SVM) for the analysis of satellite imagery. One of the advantages of SVM is that, with limited training data, it may generate comparable or even better results than the other methods. The SVM algorithm is used for automated object detection and characterization. Specifically, the SVM is applied in its basic nature as a binary classifier where it classifies two classes namely, object and background. The algorithm aims at effectively detecting an object from its background with the minimum training data. The synthetic image containing noises is used for algorithm testing. Furthermore, it is implemented to perform remote sensing image analysis such as identification of Island vegetation, water body, and oil spill from the satellite imagery. It is indicated that SVM provides the fast and accurate analysis with the acceptable result.
The construction of support vector machine classifier using the firefly algorithm.
Chao, Chih-Feng; Horng, Ming-Huwi
2015-01-01
The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.
The Construction of Support Vector Machine Classifier Using the Firefly Algorithm
Chao, Chih-Feng; Horng, Ming-Huwi
2015-01-01
The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy. PMID:25802511
A tri-fold hybrid classification approach for diagnostics with unexampled faulty states
NASA Astrophysics Data System (ADS)
Tamilselvan, Prasanna; Wang, Pingfeng
2015-01-01
System health diagnostics provides diversified benefits such as improved safety, improved reliability and reduced costs for the operation and maintenance of engineered systems. Successful health diagnostics requires the knowledge of system failures. However, with an increasing system complexity, it is extraordinarily difficult to have a well-tested system so that all potential faulty states can be realized and studied at product testing stage. Thus, real time health diagnostics requires automatic detection of unexampled system faulty states based upon sensory data to avoid sudden catastrophic system failures. This paper presents a trifold hybrid classification (THC) approach for structural health diagnosis with unexampled health states (UHS), which comprises of preliminary UHS identification using a new thresholded Mahalanobis distance (TMD) classifier, UHS diagnostics using a two-class support vector machine (SVM) classifier, and exampled health states diagnostics using a multi-class SVM classifier. The proposed THC approach, which takes the advantages of both TMD and SVM-based classification techniques, is able to identify and isolate the unexampled faulty states through interactively detecting the deviation of sensory data from the exampled health states and forming new ones autonomously. The proposed THC approach is further extended to a generic framework for health diagnostics problems with unexampled faulty states and demonstrated with health diagnostics case studies for power transformers and rolling bearings.
Anam, Khairul; Al-Jumaily, Adel
2017-01-01
The success of myoelectric pattern recognition (M-PR) mostly relies on the features extracted and classifier employed. This paper proposes and evaluates a fast classifier, extreme learning machine (ELM), to classify individual and combined finger movements on amputees and non-amputees. ELM is a single hidden layer feed-forward network (SLFN) that avoids iterative learning by determining input weights randomly and output weights analytically. Therefore, it can accelerate the training time of SLFNs. In addition to the classifier evaluation, this paper evaluates various feature combinations to improve the performance of M-PR and investigate some feature projections to improve the class separability of the features. Different from other studies on the implementation of ELM in the myoelectric controller, this paper presents a complete and thorough investigation of various types of ELMs including the node-based and kernel-based ELM. Furthermore, this paper provides comparisons of ELMs and other well-known classifiers such as linear discriminant analysis (LDA), k-nearest neighbour (kNN), support vector machine (SVM) and least-square SVM (LS-SVM). The experimental results show the most accurate ELM classifier is radial basis function ELM (RBF-ELM). The comparison of RBF-ELM and other well-known classifiers shows that RBF-ELM is as accurate as SVM and LS-SVM but faster than the SVM family; it is superior to LDA and kNN. The experimental results also indicate that the accuracy gap of the M-PR on the amputees and non-amputees is not too much with the accuracy of 98.55% on amputees and 99.5% on the non-amputees using six electromyography (EMG) channels. Copyright © 2016 Elsevier Ltd. All rights reserved.
Structural analysis of online handwritten mathematical symbols based on support vector machines
NASA Astrophysics Data System (ADS)
Simistira, Foteini; Papavassiliou, Vassilis; Katsouros, Vassilis; Carayannis, George
2013-01-01
Mathematical expression recognition is still a very challenging task for the research community mainly because of the two-dimensional (2d) structure of mathematical expressions (MEs). In this paper, we present a novel approach for the structural analysis between two on-line handwritten mathematical symbols of a ME, based on spatial features of the symbols. We introduce six features to represent the spatial affinity of the symbols and compare two multi-class classification methods that employ support vector machines (SVMs): one based on the "one-against-one" technique and one based on the "one-against-all", in identifying the relation between a pair of symbols (i.e. subscript, numerator, etc). A dataset containing 1906 spatial relations derived from the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) 2012 training dataset is constructed to evaluate the classifiers and compare them with the rule-based classifier of the ILSP-1 system participated in the contest. The experimental results give an overall mean error rate of 2.61% for the "one-against-one" SVM approach, 6.57% for the "one-against-all" SVM technique and 12.31% error rate for the ILSP-1 classifier.
Al-Qazzaz, Noor Kamal; Ali, Sawal Hamid Bin Mohd; Ahmad, Siti Anom; Islam, Mohd Shabiul; Escudero, Javier
2018-01-01
Stroke survivors are more prone to developing cognitive impairment and dementia. Dementia detection is a challenge for supporting personalized healthcare. This study analyzes the electroencephalogram (EEG) background activity of 5 vascular dementia (VaD) patients, 15 stroke-related patients with mild cognitive impairment (MCI), and 15 control healthy subjects during a working memory (WM) task. The objective of this study is twofold. First, it aims to enhance the discrimination of VaD, stroke-related MCI patients, and control subjects using fuzzy neighborhood preserving analysis with QR-decomposition (FNPAQR); second, it aims to extract and investigate the spectral features that characterize the post-stroke dementia patients compared to the control subjects. Nineteen channels were recorded and analyzed using the independent component analysis and wavelet analysis (ICA-WT) denoising technique. Using ANOVA, linear spectral power including relative powers (RP) and power ratio were calculated to test whether the EEG dominant frequencies were slowed down in VaD and stroke-related MCI patients. Non-linear features including permutation entropy (PerEn) and fractal dimension (FD) were used to test the degree of irregularity and complexity, which was significantly lower in patients with VaD and stroke-related MCI than that in control subjects (ANOVA; p ˂ 0.05). This study is the first to use fuzzy neighborhood preserving analysis with QR-decomposition (FNPAQR) dimensionality reduction technique with EEG background activity of dementia patients. The impairment of post-stroke patients was detected using support vector machine (SVM) and k-nearest neighbors (kNN) classifiers. A comparative study has been performed to check the effectiveness of using FNPAQR dimensionality reduction technique with the SVM and kNN classifiers. FNPAQR with SVM and kNN obtained 91.48 and 89.63% accuracy, respectively, whereas without using the FNPAQR exhibited 70 and 67.78% accuracy for SVM and kNN, respectively, in classifying VaD, stroke-related MCI, and control patients, respectively. Therefore, EEG could be a reliable index for inspecting concise markers that are sensitive to VaD and stroke-related MCI patients compared to control healthy subjects.
Onboard Classifiers for Science Event Detection on a Remote Sensing Spacecraft
NASA Technical Reports Server (NTRS)
Castano, Rebecca; Mazzoni, Dominic; Tang, Nghia; Greeley, Ron; Doggett, Thomas; Cichy, Ben; Chien, Steve; Davies, Ashley
2006-01-01
Typically, data collected by a spacecraft is downlinked to Earth and pre-processed before any analysis is performed. We have developed classifiers that can be used onboard a spacecraft to identify high priority data for downlink to Earth, providing a method for maximizing the use of a potentially bandwidth limited downlink channel. Onboard analysis can also enable rapid reaction to dynamic events, such as flooding, volcanic eruptions or sea ice break-up. Four classifiers were developed to identify cryosphere events using hyperspectral images. These classifiers include a manually constructed classifier, a Support Vector Machine (SVM), a Decision Tree and a classifier derived by searching over combinations of thresholded band ratios. Each of the classifiers was designed to run in the computationally constrained operating environment of the spacecraft. A set of scenes was hand-labeled to provide training and testing data. Performance results on the test data indicate that the SVM and manual classifiers outperformed the Decision Tree and band-ratio classifiers with the SVM yielding slightly better classifications than the manual classifier.
NASA Astrophysics Data System (ADS)
Wang, Qingjie; Xin, Jingmin; Wu, Jiayi; Zheng, Nanning
2017-03-01
Microaneurysms are the earliest clinic signs of diabetic retinopathy, and many algorithms were developed for the automatic classification of these specific pathology. However, the imbalanced class distribution of dataset usually causes the classification accuracy of true microaneurysms be low. Therefore, by combining the borderline synthetic minority over-sampling technique (BSMOTE) with the data cleaning techniques such as Tomek links and Wilson's edited nearest neighbor rule (ENN) to resample the imbalanced dataset, we propose two new support vector machine (SVM) classification algorithms for the microaneurysms. The proposed BSMOTE-Tomek and BSMOTE-ENN algorithms consist of: 1) the adaptive synthesis of the minority samples in the neighborhood of the borderline, and 2) the remove of redundant training samples for improving the efficiency of data utilization. Moreover, the modified SVM classifier with probabilistic outputs is used to divide the microaneurysm candidates into two groups: true microaneurysms and false microaneurysms. The experiments with a public microaneurysms database shows that the proposed algorithms have better classification performance including the receiver operating characteristic (ROC) curve and the free-response receiver operating characteristic (FROC) curve.
Probabilistic topic modeling for the analysis and classification of genomic sequences
2015-01-01
Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734
Classification of Focal and Non Focal Epileptic Seizures Using Multi-Features and SVM Classifier.
Sriraam, N; Raghu, S
2017-09-02
Identifying epileptogenic zones prior to surgery is an essential and crucial step in treating patients having pharmacoresistant focal epilepsy. Electroencephalogram (EEG) is a significant measurement benchmark to assess patients suffering from epilepsy. This paper investigates the application of multi-features derived from different domains to recognize the focal and non focal epileptic seizures obtained from pharmacoresistant focal epilepsy patients from Bern Barcelona database. From the dataset, five different classification tasks were formed. Total 26 features were extracted from focal and non focal EEG. Significant features were selected using Wilcoxon rank sum test by setting p-value (p < 0.05) and z-score (-1.96 > z > 1.96) at 95% significance interval. Hypothesis was made that the effect of removing outliers improves the classification accuracy. Turkey's range test was adopted for pruning outliers from feature set. Finally, 21 features were classified using optimized support vector machine (SVM) classifier with 10-fold cross validation. Bayesian optimization technique was adopted to minimize the cross-validation loss. From the simulation results, it was inferred that the highest sensitivity, specificity, and classification accuracy of 94.56%, 89.74%, and 92.15% achieved respectively and found to be better than the state-of-the-art approaches. Further, it was observed that the classification accuracy improved from 80.2% with outliers to 92.15% without outliers. The classifier performance metrics ensures the suitability of the proposed multi-features with optimized SVM classifier. It can be concluded that the proposed approach can be applied for recognition of focal EEG signals to localize epileptogenic zones.
Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals
2012-09-30
floor 1176 Howell St Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR...multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et al. 2008). This classifier both detects and classifies echolocation...whales. Here Moretti’s group, especially S. Jarvis , will improve the SVM classifier by resolving confusion between species whose clicks overlap in
[Hyperspectral remote sensing image classification based on SVM optimized by clonal selection].
Liu, Qing-Jie; Jing, Lin-Hai; Wang, Meng-Fei; Lin, Qi-Zhong
2013-03-01
Model selection for support vector machine (SVM) involving kernel and the margin parameter values selection is usually time-consuming, impacts training efficiency of SVM model and final classification accuracies of SVM hyperspectral remote sensing image classifier greatly. Firstly, based on combinatorial optimization theory and cross-validation method, artificial immune clonal selection algorithm is introduced to the optimal selection of SVM (CSSVM) kernel parameter a and margin parameter C to improve the training efficiency of SVM model. Then an experiment of classifying AVIRIS in India Pine site of USA was performed for testing the novel CSSVM, as well as a traditional SVM classifier with general Grid Searching cross-validation method (GSSVM) for comparison. And then, evaluation indexes including SVM model training time, classification overall accuracy (OA) and Kappa index of both CSSVM and GSSVM were all analyzed quantitatively. It is demonstrated that OA of CSSVM on test samples and whole image are 85.1% and 81.58, the differences from that of GSSVM are both within 0.08% respectively; And Kappa indexes reach 0.8213 and 0.7728, the differences from that of GSSVM are both within 0.001; While the ratio of model training time of CSSVM and GSSVM is between 1/6 and 1/10. Therefore, CSSVM is fast and accurate algorithm for hyperspectral image classification and is superior to GSSVM.
Bowd, Christopher; Medeiros, Felipe A.; Zhang, Zuohua; Zangwill, Linda M.; Hao, Jiucang; Lee, Te-Won; Sejnowski, Terrence J.; Weinreb, Robert N.; Goldbaum, Michael H.
2010-01-01
Purpose To classify healthy and glaucomatous eyes using relevance vector machine (RVM) and support vector machine (SVM) learning classifiers trained on retinal nerve fiber layer (RNFL) thickness measurements obtained by scanning laser polarimetry (SLP). Methods Seventy-two eyes of 72 healthy control subjects (average age = 64.3 ± 8.8 years, visual field mean deviation =−0.71 ± 1.2 dB) and 92 eyes of 92 patients with glaucoma (average age = 66.9 ± 8.9 years, visual field mean deviation =−5.32 ± 4.0 dB) were imaged with SLP with variable corneal compensation (GDx VCC; Laser Diagnostic Technologies, San Diego, CA). RVM and SVM learning classifiers were trained and tested on SLP-determined RNFL thickness measurements from 14 standard parameters and 64 sectors (approximately 5.6° each) obtained in the circumpapillary area under the instrument-defined measurement ellipse (total 78 parameters). Tenfold cross-validation was used to train and test RVM and SVM classifiers on unique subsets of the full 164-eye data set and areas under the receiver operating characteristic (AUROC) curve for the classification of eyes in the test set were generated. AUROC curve results from RVM and SVM were compared to those for 14 SLP software-generated global and regional RNFL thickness parameters. Also reported was the AUROC curve for the GDx VCC software-generated nerve fiber indicator (NFI). Results The AUROC curves for RVM and SVM were 0.90 and 0.91, respectively, and increased to 0.93 and 0.94 when the training sets were optimized with sequential forward and backward selection (resulting in reduced dimensional data sets). AUROC curves for optimized RVM and SVM were significantly larger than those for all individual SLP parameters. The AUROC curve for the NFI was 0.87. Conclusions Results from RVM and SVM trained on SLP RNFL thickness measurements are similar and provide accurate classification of glaucomatous and healthy eyes. RVM may be preferable to SVM, because it provides a Bayesian-derived probability of glaucoma as an output. These results suggest that these machine learning classifiers show good potential for glaucoma diagnosis. PMID:15790898
Machine learning approach to automatic exudate detection in retinal images from diabetic patients
NASA Astrophysics Data System (ADS)
Sopharak, Akara; Dailey, Matthew N.; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Tom; Thet Nwe, Khine; Aye Moe, Yin
2010-01-01
Exudates are among the preliminary signs of diabetic retinopathy, a major cause of vision loss in diabetic patients. Early detection of exudates could improve patients' chances to avoid blindness. In this paper, we present a series of experiments on feature selection and exudates classification using naive Bayes and support vector machine (SVM) classifiers. We first fit the naive Bayes model to a training set consisting of 15 features extracted from each of 115,867 positive examples of exudate pixels and an equal number of negative examples. We then perform feature selection on the naive Bayes model, repeatedly removing features from the classifier, one by one, until classification performance stops improving. To find the best SVM, we begin with the best feature set from the naive Bayes classifier, and repeatedly add the previously-removed features to the classifier. For each combination of features, we perform a grid search to determine the best combination of hyperparameters ν (tolerance for training errors) and γ (radial basis function width). We compare the best naive Bayes and SVM classifiers to a baseline nearest neighbour (NN) classifier using the best feature sets from both classifiers. We find that the naive Bayes and SVM classifiers perform better than the NN classifier. The overall best sensitivity, specificity, precision, and accuracy are 92.28%, 98.52%, 53.05%, and 98.41%, respectively.
Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals
2013-09-30
N0001411WX21394 Steve W. Martin SPAWAR Systems Center Pacific 53366 Front St. San Diego, CA 92152-6551 phone: (619) 553-9882 email: Steve.W.Martin...multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et al. 2008). This classifier both detects and classifies echolocation...whales. Here Moretti’s group, particularly S. Jarvis , will improve the SVM classifier by resolving confusion between species whose clicks overlap in
Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang
2018-06-14
Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.
SVM based colon polyps classifier in a wireless active stereo endoscope.
Ayoub, J; Granado, B; Mhanna, Y; Romain, O
2010-01-01
This work focuses on the recognition of three-dimensional colon polyps captured by an active stereo vision sensor. The detection algorithm consists of SVM classifier trained on robust feature descriptors. The study is related to Cyclope, this prototype sensor allows real time 3D object reconstruction and continues to be optimized technically to improve its classification task by differentiation between hyperplastic and adenomatous polyps. Experimental results were encouraging and show correct classification rate of approximately 97%. The work contains detailed statistics about the detection rate and the computing complexity. Inspired by intensity histogram, the work shows a new approach that extracts a set of features based on depth histogram and combines stereo measurement with SVM classifiers to correctly classify benign and malignant polyps.
Celik, Turgay; Lee, Hwee Kuan; Petznick, Andrea; Tong, Louis
2013-01-01
Background Infrared (IR) meibography is an imaging technique to capture the Meibomian glands in the eyelids. These ocular surface structures are responsible for producing the lipid layer of the tear film which helps to reduce tear evaporation. In a normal healthy eye, the glands have similar morphological features in terms of spatial width, in-plane elongation, length. On the other hand, eyes with Meibomian gland dysfunction show visible structural irregularities that help in the diagnosis and prognosis of the disease. However, currently there is no universally accepted algorithm for detection of these image features which will be clinically useful. We aim to develop a method of automated gland segmentation which allows images to be classified. Methods A set of 131 meibography images were acquired from patients from the Singapore National Eye Center. We used a method of automated gland segmentation using Gabor wavelets. Features of the imaged glands including orientation, width, length and curvature were extracted and the IR images enhanced. The images were classified as ‘healthy’, ‘intermediate’ or ‘unhealthy’, through the use of a support vector machine classifier (SVM). Half the images were used for training the SVM and the other half for validation. Independently of this procedure, the meibographs were classified by an expert clinician into the same 3 grades. Results The algorithm correctly detected 94% and 98% of mid-line pixels of gland and inter-gland regions, respectively, on healthy images. On intermediate images, correct detection rates of 92% and 97% of mid-line pixels of gland and inter-gland regions were achieved respectively. The true positive rate of detecting healthy images was 86%, and for intermediate images, 74%. The corresponding false positive rates were 15% and 31% respectively. Using the SVM, the proposed method has 88% accuracy in classifying images into the 3 classes. The classification of images into healthy and unhealthy classes achieved a 100% accuracy, but 7/38 intermediate images were incorrectly classified. Conclusions This technique of image analysis in meibography can help clinicians to interpret the degree of gland destruction in patients with dry eye and meibomian gland dysfunction.
Sakr, Sherif; Elshawi, Radwa; Ahmed, Amjad M; Qureshi, Waqas T; Brawner, Clinton A; Keteyian, Steven J; Blaha, Michael J; Al-Mallah, Mouaz H
2017-12-19
Prior studies have demonstrated that cardiorespiratory fitness (CRF) is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of this study is to present an evaluation and comparison of how machine learning techniques can be applied on medical records of cardiorespiratory fitness and how the various techniques differ in terms of capabilities of predicting medical outcomes (e.g. mortality). We use data of 34,212 patients free of known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems Between 1991 and 2009 and had a complete 10-year follow-up. Seven machine learning classification techniques were evaluated: Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayesian Classifier (BC), Bayesian Network (BN), K-Nearest Neighbor (KNN) and Random Forest (RF). In order to handle the imbalanced dataset used, the Synthetic Minority Over-Sampling Technique (SMOTE) is used. Two set of experiments have been conducted with and without the SMOTE sampling technique. On average over different evaluation metrics, SVM Classifier has shown the lowest performance while other models like BN, BC and DT performed better. The RF classifier has shown the best performance (AUC = 0.97) among all models trained using the SMOTE sampling. The results show that various ML techniques can significantly vary in terms of its performance for the different evaluation metrics. It is also not necessarily that the more complex the ML model, the more prediction accuracy can be achieved. The prediction performance of all models trained with SMOTE is much better than the performance of models trained without SMOTE. The study shows the potential of machine learning methods for predicting all-cause mortality using cardiorespiratory fitness data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klement, Rainer J., E-mail: rainer_klement@gmx.de; Department of Radiotherapy and Radiation Oncology, Leopoldina Hospital, Schweinfurt; Allgäuer, Michael
2014-03-01
Background: Several prognostic factors for local tumor control probability (TCP) after stereotactic body radiation therapy (SBRT) for early stage non-small cell lung cancer (NSCLC) have been described, but no attempts have been undertaken to explore whether a nonlinear combination of potential factors might synergistically improve the prediction of local control. Methods and Materials: We investigated a support vector machine (SVM) for predicting TCP in a cohort of 399 patients treated at 13 German and Austrian institutions. Among 7 potential input features for the SVM we selected those most important on the basis of forward feature selection, thereby evaluating classifier performancemore » by using 10-fold cross-validation and computing the area under the ROC curve (AUC). The final SVM classifier was built by repeating the feature selection 10 times with different splitting of the data for cross-validation and finally choosing only those features that were selected at least 5 out of 10 times. It was compared with a multivariate logistic model that was built by forward feature selection. Results: Local failure occurred in 12% of patients. Biologically effective dose (BED) at the isocenter (BED{sub ISO}) was the strongest predictor of TCP in the logistic model and also the most frequently selected input feature for the SVM. A bivariate logistic function of BED{sub ISO} and the pulmonary function indicator forced expiratory volume in 1 second (FEV1) yielded the best description of the data but resulted in a significantly smaller AUC than the final SVM classifier with the input features BED{sub ISO}, age, baseline Karnofsky index, and FEV1 (0.696 ± 0.040 vs 0.789 ± 0.001, P<.03). The final SVM resulted in sensitivity and specificity of 67.0% ± 0.5% and 78.7% ± 0.3%, respectively. Conclusions: These results confirm that machine learning techniques like SVMs can be successfully applied to predict treatment outcome after SBRT. Improvements over traditional TCP modeling are expected through a nonlinear combination of multiple features, eventually helping in the task of personalized treatment planning.« less
Klement, Rainer J; Allgäuer, Michael; Appold, Steffen; Dieckmann, Karin; Ernst, Iris; Ganswindt, Ute; Holy, Richard; Nestle, Ursula; Nevinny-Stickel, Meinhard; Semrau, Sabine; Sterzing, Florian; Wittig, Andrea; Andratschke, Nicolaus; Guckenberger, Matthias
2014-03-01
Several prognostic factors for local tumor control probability (TCP) after stereotactic body radiation therapy (SBRT) for early stage non-small cell lung cancer (NSCLC) have been described, but no attempts have been undertaken to explore whether a nonlinear combination of potential factors might synergistically improve the prediction of local control. We investigated a support vector machine (SVM) for predicting TCP in a cohort of 399 patients treated at 13 German and Austrian institutions. Among 7 potential input features for the SVM we selected those most important on the basis of forward feature selection, thereby evaluating classifier performance by using 10-fold cross-validation and computing the area under the ROC curve (AUC). The final SVM classifier was built by repeating the feature selection 10 times with different splitting of the data for cross-validation and finally choosing only those features that were selected at least 5 out of 10 times. It was compared with a multivariate logistic model that was built by forward feature selection. Local failure occurred in 12% of patients. Biologically effective dose (BED) at the isocenter (BED(ISO)) was the strongest predictor of TCP in the logistic model and also the most frequently selected input feature for the SVM. A bivariate logistic function of BED(ISO) and the pulmonary function indicator forced expiratory volume in 1 second (FEV1) yielded the best description of the data but resulted in a significantly smaller AUC than the final SVM classifier with the input features BED(ISO), age, baseline Karnofsky index, and FEV1 (0.696 ± 0.040 vs 0.789 ± 0.001, P<.03). The final SVM resulted in sensitivity and specificity of 67.0% ± 0.5% and 78.7% ± 0.3%, respectively. These results confirm that machine learning techniques like SVMs can be successfully applied to predict treatment outcome after SBRT. Improvements over traditional TCP modeling are expected through a nonlinear combination of multiple features, eventually helping in the task of personalized treatment planning. Copyright © 2014 Elsevier Inc. All rights reserved.
Research on Classification of Chinese Text Data Based on SVM
NASA Astrophysics Data System (ADS)
Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao
2017-09-01
Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.
Automated classification of neurological disorders of gait using spatio-temporal gait parameters.
Pradhan, Cauchy; Wuehr, Max; Akrami, Farhoud; Neuhaeusser, Maximilian; Huth, Sabrina; Brandt, Thomas; Jahn, Klaus; Schniepp, Roman
2015-04-01
Automated pattern recognition systems have been used for accurate identification of neurological conditions as well as the evaluation of the treatment outcomes. This study aims to determine the accuracy of diagnoses of (oto-)neurological gait disorders using different types of automated pattern recognition techniques. Clinically confirmed cases of phobic postural vertigo (N = 30), cerebellar ataxia (N = 30), progressive supranuclear palsy (N = 30), bilateral vestibulopathy (N = 30), as well as healthy subjects (N = 30) were recruited for the study. 8 measurements with 136 variables using a GAITRite(®) sensor carpet were obtained from each subject. Subjects were randomly divided into two groups (training cases and validation cases). Sensitivity and specificity of k-nearest neighbor (KNN), naive-bayes classifier (NB), artificial neural network (ANN), and support vector machine (SVM) in classifying the validation cases were calculated. ANN and SVM had the highest overall sensitivity with 90.6% and 92.0% respectively, followed by NB (76.0%) and KNN (73.3%). SVM and ANN showed high false negative rates for bilateral vestibulopathy cases (20.0% and 26.0%); while KNN and NB had high false negative rates for progressive supranuclear palsy cases (76.7% and 40.0%). Automated pattern recognition systems are able to identify pathological gait patterns and establish clinical diagnosis with good accuracy. SVM and ANN in particular differentiate gait patterns of several distinct oto-neurological disorders of gait with high sensitivity and specificity compared to KNN and NB. Both SVM and ANN appear to be a reliable diagnostic and management tool for disorders of gait. Copyright © 2015 Elsevier Ltd. All rights reserved.
A support vector machine approach for classification of welding defects from ultrasonic signals
NASA Astrophysics Data System (ADS)
Chen, Yuan; Ma, Hong-Wei; Zhang, Guang-Ming
2014-07-01
Defect classification is an important issue in ultrasonic non-destructive evaluation. A layered multi-class support vector machine (LMSVM) classification system, which combines multiple SVM classifiers through a layered architecture, is proposed in this paper. The proposed LMSVM classification system is applied to the classification of welding defects from ultrasonic test signals. The measured ultrasonic defect echo signals are first decomposed into wavelet coefficients by the wavelet packet transform. The energy of the wavelet coefficients at different frequency channels are used to construct the feature vectors. The bees algorithm (BA) is then used for feature selection and SVM parameter optimisation for the LMSVM classification system. The BA-based feature selection optimises the energy feature vectors. The optimised feature vectors are input to the LMSVM classification system for training and testing. Experimental results of classifying welding defects demonstrate that the proposed technique is highly robust, precise and reliable for ultrasonic defect classification.
Unresolved Galaxy Classifier for ESA/Gaia mission: Support Vector Machines approach
NASA Astrophysics Data System (ADS)
Bellas-Velidis, Ioannis; Kontizas, Mary; Dapergolas, Anastasios; Livanou, Evdokia; Kontizas, Evangelos; Karampelas, Antonios
A software package Unresolved Galaxy Classifier (UGC) is being developed for the ground-based pipeline of ESA's Gaia mission. It aims to provide an automated taxonomic classification and specific parameters estimation analyzing Gaia BP/RP instrument low-dispersion spectra of unresolved galaxies. The UGC algorithm is based on a supervised learning technique, the Support Vector Machines (SVM). The software is implemented in Java as two separate modules. An offline learning module provides functions for SVM-models training. Once trained, the set of models can be repeatedly applied to unknown galaxy spectra by the pipeline's application module. A library of galaxy models synthetic spectra, simulated for the BP/RP instrument, is used to train and test the modules. Science tests show a very good classification performance of UGC and relatively good regression performance, except for some of the parameters. Possible approaches to improve the performance are discussed.
Using deep learning for detecting gender in adult chest radiographs
NASA Astrophysics Data System (ADS)
Xue, Zhiyun; Antani, Sameer; Long, L. Rodney; Thoma, George R.
2018-03-01
In this paper, we present a method for automatically identifying the gender of an imaged person using their frontal chest x-ray images. Our work is motivated by the need to determine missing gender information in some datasets. The proposed method employs the technique of convolutional neural network (CNN) based deep learning and transfer learning to overcome the challenge of developing handcrafted features in limited data. Specifically, the method consists of four main steps: pre-processing, CNN feature extractor, feature selection, and classifier. The method is tested on a combined dataset obtained from several sources with varying acquisition quality resulting in different pre-processing steps that are applied for each. For feature extraction, we tested and compared four CNN architectures, viz., AlexNet, VggNet, GoogLeNet, and ResNet. We applied a feature selection technique, since the feature length is larger than the number of images. Two popular classifiers: SVM and Random Forest, are used and compared. We evaluated the classification performance by cross-validation and used seven performance measures. The best performer is the VggNet-16 feature extractor with the SVM classifier, with accuracy of 86.6% and ROC Area being 0.932 for 5-fold cross validation. We also discuss several misclassified cases and describe future work for performance improvement.
Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders.
Subasi, Abdulhamit
2013-06-01
Support vector machine (SVM) is an extensively used machine learning method with many biomedical signal classification applications. In this study, a novel PSO-SVM model has been proposed that hybridized the particle swarm optimization (PSO) and SVM to improve the EMG signal classification accuracy. This optimization mechanism involves kernel parameter setting in the SVM training procedure, which significantly influences the classification accuracy. The experiments were conducted on the basis of EMG signal to classify into normal, neurogenic or myopathic. In the proposed method the EMG signals were decomposed into the frequency sub-bands using discrete wavelet transform (DWT) and a set of statistical features were extracted from these sub-bands to represent the distribution of wavelet coefficients. The obtained results obviously validate the superiority of the SVM method compared to conventional machine learning methods, and suggest that further significant enhancements in terms of classification accuracy can be achieved by the proposed PSO-SVM classification system. The PSO-SVM yielded an overall accuracy of 97.41% on 1200 EMG signals selected from 27 subject records against 96.75%, 95.17% and 94.08% for the SVM, the k-NN and the RBF classifiers, respectively. PSO-SVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of PSO-SVM for diagnosis of neuromuscular disorders. Copyright © 2013 Elsevier Ltd. All rights reserved.
Generative Models for Similarity-based Classification
2007-01-01
NC), local nearest centroid (local NC), k-nearest neighbors ( kNN ), and condensed nearest neighbors (CNN) are all similarity-based classifiers which...vector machine to the k nearest neighbors of the test sample [80]. The SVM- KNN method was developed to address the robustness and dimensionality...concerns that afflict nearest neighbors and SVMs. Similarly to the nearest-means classifier, the SVM- KNN is a hybrid local and global classifier developed
Schnyer, David M; Clasen, Peter C; Gonzalez, Christopher; Beevers, Christopher G
2017-06-30
Using MRI to diagnose mental disorders has been a long-term goal. Despite this, the vast majority of prior neuroimaging work has been descriptive rather than predictive. The current study applies support vector machine (SVM) learning to MRI measures of brain white matter to classify adults with Major Depressive Disorder (MDD) and healthy controls. In a precisely matched group of individuals with MDD (n =25) and healthy controls (n =25), SVM learning accurately (74%) classified patients and controls across a brain map of white matter fractional anisotropy values (FA). The study revealed three main findings: 1) SVM applied to DTI derived FA maps can accurately classify MDD vs. healthy controls; 2) prediction is strongest when only right hemisphere white matter is examined; and 3) removing FA values from a region identified by univariate contrast as significantly different between MDD and healthy controls does not change the SVM accuracy. These results indicate that SVM learning applied to neuroimaging data can classify the presence versus absence of MDD and that predictive information is distributed across brain networks rather than being highly localized. Finally, MDD group differences revealed through typical univariate contrasts do not necessarily reveal patterns that provide accurate predictive information. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Accuracy of dementia diagnosis: a direct comparison between radiologists and a computerized method.
Klöppel, Stefan; Stonnington, Cynthia M; Barnes, Josephine; Chen, Frederick; Chu, Carlton; Good, Catriona D; Mader, Irina; Mitchell, L Anne; Patel, Ameet C; Roberts, Catherine C; Fox, Nick C; Jack, Clifford R; Ashburner, John; Frackowiak, Richard S J
2008-11-01
There has been recent interest in the application of machine learning techniques to neuroimaging-based diagnosis. These methods promise fully automated, standard PC-based clinical decisions, unbiased by variable radiological expertise. We recently used support vector machines (SVMs) to separate sporadic Alzheimer's disease from normal ageing and from fronto-temporal lobar degeneration (FTLD). In this study, we compare the results to those obtained by radiologists. A binary diagnostic classification was made by six radiologists with different levels of experience on the same scans and information that had been previously analysed with SVM. SVMs correctly classified 95% (sensitivity/specificity: 95/95) of sporadic Alzheimer's disease and controls into their respective groups. Radiologists correctly classified 65-95% (median 89%; sensitivity/specificity: 88/90) of scans. SVM correctly classified another set of sporadic Alzheimer's disease in 93% (sensitivity/specificity: 100/86) of cases, whereas radiologists ranged between 80% and 90% (median 83%; sensitivity/specificity: 80/85). SVMs were better at separating patients with sporadic Alzheimer's disease from those with FTLD (SVM 89%; sensitivity/specificity: 83/95; compared to radiological range from 63% to 83%; median 71%; sensitivity/specificity: 64/76). Radiologists were always accurate when they reported a high degree of diagnostic confidence. The results show that well-trained neuroradiologists classify typical Alzheimer's disease-associated scans comparable to SVMs. However, SVMs require no expert knowledge and trained SVMs can readily be exchanged between centres for use in diagnostic classification. These results are encouraging and indicate a role for computerized diagnostic methods in clinical practice.
Accuracy of dementia diagnosis—a direct comparison between radiologists and a computerized method
Stonnington, Cynthia M.; Barnes, Josephine; Chen, Frederick; Chu, Carlton; Good, Catriona D.; Mader, Irina; Mitchell, L. Anne; Patel, Ameet C.; Roberts, Catherine C.; Fox, Nick C.; Jack, Clifford R.; Ashburner, John; Frackowiak, Richard S. J.
2008-01-01
There has been recent interest in the application of machine learning techniques to neuroimaging-based diagnosis. These methods promise fully automated, standard PC-based clinical decisions, unbiased by variable radiological expertise. We recently used support vector machines (SVMs) to separate sporadic Alzheimer's disease from normal ageing and from fronto-temporal lobar degeneration (FTLD). In this study, we compare the results to those obtained by radiologists. A binary diagnostic classification was made by six radiologists with different levels of experience on the same scans and information that had been previously analysed with SVM. SVMs correctly classified 95% (sensitivity/specificity: 95/95) of sporadic Alzheimer's disease and controls into their respective groups. Radiologists correctly classified 65–95% (median 89%; sensitivity/specificity: 88/90) of scans. SVM correctly classified another set of sporadic Alzheimer's disease in 93% (sensitivity/specificity: 100/86) of cases, whereas radiologists ranged between 80% and 90% (median 83%; sensitivity/specificity: 80/85). SVMs were better at separating patients with sporadic Alzheimer's disease from those with FTLD (SVM 89%; sensitivity/specificity: 83/95; compared to radiological range from 63% to 83%; median 71%; sensitivity/specificity: 64/76). Radiologists were always accurate when they reported a high degree of diagnostic confidence. The results show that well-trained neuroradiologists classify typical Alzheimer's disease-associated scans comparable to SVMs. However, SVMs require no expert knowledge and trained SVMs can readily be exchanged between centres for use in diagnostic classification. These results are encouraging and indicate a role for computerized diagnostic methods in clinical practice. PMID:18835868
Chesnokov, Yuriy V
2008-06-01
Paroxysmal atrial fibrillation (PAF) is a serious arrhythmia associated with morbidity and mortality. We explore the possibility of distant prediction of PAF by analyzing changes in heart rate variability (HRV) dynamics of non-PAF rhythms immediately before PAF event. We use that model for distant prognosis of PAF onset with artificial intelligence methods. We analyzed 30-min non-PAF HRV records from 51 subjects immediately before PAF onset and at least 45min distant from any PAF event. We used spectral and complexity analysis with sample (SmEn) and approximate (ApEn) entropies and their multiscale versions on extracted HRV data. We used that features to train the artificial neural networks (ANNs) and support vector machine (SVM) classifiers to differentiate the subjects. The trained classifiers were further tested for distant PAF event prognosis on 16 subjects from independent database on non-PAF rhythm lasting from 60 to 320 min before PAF onset classifying the 30-min segments as distant or leading to PAF. We found statistically significant increase in 30-min non-PAF HRV recordings from 51 subjects in the VLF, LF, HF bands and total power (p<0.0001) before PAF event compared to PAF distant ones. The SmEn and ApEn analysis provided significant decrease in complexity (p<0.0001 and p<0.001) before PAF onset. For training ANN and SVM classifiers the data from 51 subjects were randomly split to training, validation and testing. ANN provided better results in terms of sensitivity (Se), specificity (Sp) and positive predictivity (Pp) compared to SVM which became biased towards positive case. The validation results of the ANN classifier we achieved: Se 76%, Sp 93%, Pp 94%. Testing ANN and SVM classifiers on 16 subjects with non-PAF HRV data preceding PAF events we obtained distant prediction of PAF onset with SVM classifier in 10 subjects (58+/-18 min in advance). ANN classifier provided distant prediction of PAF event in 13 subjects (62+/-21 min in advance). From the results of distant PAF prediction we conclude that ANN and SVM classifiers learned the changes in the HRV dynamics immediately before PAF event and successfully identified them during distant PAF prognosis on independent database. This confirms the reported in the literature results that corresponding changes in the HRV data occur about 60 min before PAF onset and proves the possibility of distant PAF prediction with ANN and SVM methods.
Automated diagnosis of dry eye using infrared thermography images
NASA Astrophysics Data System (ADS)
Acharya, U. Rajendra; Tan, Jen Hong; Koh, Joel E. W.; Sudarshan, Vidya K.; Yeo, Sharon; Too, Cheah Loon; Chua, Chua Kuang; Ng, E. Y. K.; Tong, Louis
2015-07-01
Dry Eye (DE) is a condition of either decreased tear production or increased tear film evaporation. Prolonged DE damages the cornea causing the corneal scarring, thinning and perforation. There is no single uniform diagnosis test available to date; combinations of diagnostic tests are to be performed to diagnose DE. The current diagnostic methods available are subjective, uncomfortable and invasive. Hence in this paper, we have developed an efficient, fast and non-invasive technique for the automated identification of normal and DE classes using infrared thermography images. The features are extracted from nonlinear method called Higher Order Spectra (HOS). Features are ranked using t-test ranking strategy. These ranked features are fed to various classifiers namely, K-Nearest Neighbor (KNN), Nave Bayesian Classifier (NBC), Decision Tree (DT), Probabilistic Neural Network (PNN), and Support Vector Machine (SVM) to select the best classifier using minimum number of features. Our proposed system is able to identify the DE and normal classes automatically with classification accuracy of 99.8%, sensitivity of 99.8%, and specificity if 99.8% for left eye using PNN and KNN classifiers. And we have reported classification accuracy of 99.8%, sensitivity of 99.9%, and specificity if 99.4% for right eye using SVM classifier with polynomial order 2 kernel.
Classification of Regional Ionospheric Disturbances Based on Support Vector Machines
NASA Astrophysics Data System (ADS)
Begüm Terzi, Merve; Arikan, Feza; Arikan, Orhan; Karatay, Secil
2016-07-01
Ionosphere is an anisotropic, inhomogeneous, time varying and spatio-temporally dispersive medium whose parameters can be estimated almost always by using indirect measurements. Geomagnetic, gravitational, solar or seismic activities cause variations of ionosphere at various spatial and temporal scales. This complex spatio-temporal variability is challenging to be identified due to extensive scales in period, duration, amplitude and frequency of disturbances. Since geomagnetic and solar indices such as Disturbance storm time (Dst), F10.7 solar flux, Sun Spot Number (SSN), Auroral Electrojet (AE), Kp and W-index provide information about variability on a global scale, identification and classification of regional disturbances poses a challenge. The main aim of this study is to classify the regional effects of global geomagnetic storms and classify them according to their risk levels. For this purpose, Total Electron Content (TEC) estimated from GPS receivers, which is one of the major parameters of ionosphere, will be used to model the regional and local variability that differs from global activity along with solar and geomagnetic indices. In this work, for the automated classification of the regional disturbances, a classification technique based on a robust machine learning technique that have found wide spread use, Support Vector Machine (SVM) is proposed. SVM is a supervised learning model used for classification with associated learning algorithm that analyze the data and recognize patterns. In addition to performing linear classification, SVM can efficiently perform nonlinear classification by embedding data into higher dimensional feature spaces. Performance of the developed classification technique is demonstrated for midlatitude ionosphere over Anatolia using TEC estimates generated from the GPS data provided by Turkish National Permanent GPS Network (TNPGN-Active) for solar maximum year of 2011. As a result of implementing the developed classification technique to the Global Ionospheric Map (GIM) TEC data which is provided by the NASA Jet Propulsion Laboratory (JPL), it will be shown that SVM can be a suitable learning method to detect the anomalies in Total Electron Content (TEC) variations. This study is supported by TUBITAK 114E541 project as a part of the Scientific and Technological Research Projects Funding Program (1001).
2012-01-01
Background Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in electrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophic disease. To this end, we propose a new method, which employs wavelets and simple feature selection. Methods For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in 90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for differentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalized and signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point. We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers to those features. Results We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods. Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively. The SVM classifier detects 355 ischemic ST episodes. Conclusions We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any numerical values of the parameters to be supplied in advance. In the case of the SVM classifier, one has to select a single parameter. PMID:22703641
NASA Astrophysics Data System (ADS)
Wang, Danshi; Zhang, Min; Cai, Zhongle; Cui, Yue; Li, Ze; Han, Huanhuan; Fu, Meixia; Luo, Bin
2016-06-01
An effective machine learning algorithm, the support vector machine (SVM), is presented in the context of a coherent optical transmission system. As a classifier, the SVM can create nonlinear decision boundaries to mitigate the distortions caused by nonlinear phase noise (NLPN). Without any prior information or heuristic assumptions, the SVM can learn and capture the link properties from only a few training data. Compared with the maximum likelihood estimation (MLE) algorithm, a lower bit-error rate (BER) is achieved by the SVM for a given launch power; moreover, the launch power dynamic range (LPDR) is increased by 3.3 dBm for 8 phase-shift keying (8 PSK), 1.2 dBm for QPSK, and 0.3 dBm for BPSK. The maximum transmission distance corresponding to a BER of 1 ×10-3 is increased by 480 km for the case of 8 PSK. The larger launch power range and longer transmission distance improve the tolerance to amplitude and phase noise, which demonstrates the feasibility of the SVM in digital signal processing for M-PSK formats. Meanwhile, in order to apply the SVM method to 16 quadratic amplitude modulation (16 QAM) detection, we propose a parameter optimization scheme. By utilizing a cross-validation and grid-search techniques, the optimal parameters of SVM can be selected, thus leading to the LPDR improvement by 2.8 dBm. Additionally, we demonstrate that the SVM is also effective in combating the laser phase noise combined with the inphase and quadrature (I/Q) modulator imperfections, but the improvement is insignificant for the linear noise and separate I/Q imbalance. The computational complexity of SVM is also discussed. The relatively low complexity makes it possible for SVM to implement the real-time processing.
[Study on application of SVM in prediction of coronary heart disease].
Zhu, Yue; Wu, Jianghua; Fang, Ying
2013-12-01
Base on the data of blood pressure, plasma lipid, Glu and UA by physical test, Support Vector Machine (SVM) was applied to identify coronary heart disease (CHD) in patients and non-CHD individuals in south China population for guide of further prevention and treatment of the disease. Firstly, the SVM classifier was built using radial basis kernel function, liner kernel function and polynomial kernel function, respectively. Secondly, the SVM penalty factor C and kernel parameter sigma were optimized by particle swarm optimization (PSO) and then employed to diagnose and predict the CHD. By comparison with those from artificial neural network with the back propagation (BP) model, linear discriminant analysis, logistic regression method and non-optimized SVM, the overall results of our calculation demonstrated that the classification performance of optimized RBF-SVM model could be superior to other classifier algorithm with higher accuracy rate, sensitivity and specificity, which were 94.51%, 92.31% and 96.67%, respectively. So, it is well concluded that SVM could be used as a valid method for assisting diagnosis of CHD.
DCS-SVM: a novel semi-automated method for human brain MR image segmentation.
Ahmadvand, Ali; Daliri, Mohammad Reza; Hajiali, Mohammadtaghi
2017-11-27
In this paper, a novel method is proposed which appropriately segments magnetic resonance (MR) brain images into three main tissues. This paper proposes an extension of our previous work in which we suggested a combination of multiple classifiers (CMC)-based methods named dynamic classifier selection-dynamic local training local Tanimoto index (DCS-DLTLTI) for MR brain image segmentation into three main cerebral tissues. This idea is used here and a novel method is developed that tries to use more complex and accurate classifiers like support vector machine (SVM) in the ensemble. This work is challenging because the CMC-based methods are time consuming, especially on huge datasets like three-dimensional (3D) brain MR images. Moreover, SVM is a powerful method that is used for modeling datasets with complex feature space, but it also has huge computational cost for big datasets, especially those with strong interclass variability problems and with more than two classes such as 3D brain images; therefore, we cannot use SVM in DCS-DLTLTI. Therefore, we propose a novel approach named "DCS-SVM" to use SVM in DCS-DLTLTI to improve the accuracy of segmentation results. The proposed method is applied on well-known datasets of the Internet Brain Segmentation Repository (IBSR) and promising results are obtained.
Solution Path for Pin-SVM Classifiers With Positive and Negative $\\tau $ Values.
Huang, Xiaolin; Shi, Lei; Suykens, Johan A K
2017-07-01
Applying the pinball loss in a support vector machine (SVM) classifier results in pin-SVM. The pinball loss is characterized by a parameter τ . Its value is related to the quantile level and different τ values are suitable for different problems. In this paper, we establish an algorithm to find the entire solution path for pin-SVM with different τ values. This algorithm is based on the fact that the optimal solution to pin-SVM is continuous and piecewise linear with respect to τ . We also show that the nonnegativity constraint on τ is not necessary, i.e., τ can be extended to negative values. First, in some applications, a negative τ leads to better accuracy. Second, τ = -1 corresponds to a simple solution that links SVM and the classical kernel rule. The solution for τ = -1 can be obtained directly and then be used as a starting point of the solution path. The proposed method efficiently traverses τ values through the solution path, and then achieves good performance by a suitable τ . In particular, τ = 0 corresponds to C-SVM, meaning that the traversal algorithm can output a result at least as good as C-SVM with respect to validation error.
Improving imbalanced scientific text classification using sampling strategies and dictionaries.
Borrajo, L; Romero, R; Iglesias, E L; Redondo Marey, C M
2011-09-15
Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation. Sampling strategies such as Oversampling and Subsampling are popular in tackling the problem of class imbalance. In this work, we study their effects on three types of classifiers (Knn, SVM and Naive-Bayes) when they are applied to search on the PubMed scientific database. Another purpose of this paper is to study the use of dictionaries in the classification of biomedical texts. Experiments are conducted with three different dictionaries (BioCreative, NLPBA, and an ad-hoc subset of the UniProt database named Protein) using the mentioned classifiers and sampling strategies. Best results were obtained with NLPBA and Protein dictionaries and the SVM classifier using the Subsampling balancing technique. These results were compared with those obtained by other authors using the TREC Genomics 2005 public corpus. Copyright 2011 The Author(s). Published by Journal of Integrative Bioinformatics.
Classification of hadith into positive suggestion, negative suggestion, and information
NASA Astrophysics Data System (ADS)
Faraby, Said Al; Riviera Rachmawati Jasin, Eliza; Kusumaningrum, Andina; Adiwijaya
2018-03-01
As one of the Muslim life guidelines, based on the meaning of its sentence(s), a hadith can be viewed as a suggestion for doing something, or a suggestion for not doing something, or just information without any suggestion. In this paper, we tried to classify the Bahasa translation of hadith into the three categories using machine learning approach. We tried stemming and stopword removal in preprocessing, and TF-IDF of unigram, bigram, and trigram as the extracted features. As the classifier, we compared between SVM and Neural Network. Since the categories are new, so in order to compare the results of the previous pipelines, we created a baseline classifier using simple rule-based string matching technique. The rule-based algorithm conditions on the occurrence of words such as “janganlah, sholatlah, and so on” to determine the category. The baseline method achieved F1-Score of 0.69, while the best F1-Score from the machine learning approach was 0.88, and it was produced by SVM model with the linear kernel.
Automated diagnosis of epilepsy using CWT, HOS and texture parameters.
Acharya, U Rajendra; Yanti, Ratna; Zheng, Jia Wei; Krishnan, M Muthu Rama; Tan, Jen Hong; Martis, Roshan Joy; Lim, Choo Min
2013-06-01
Epilepsy is a chronic brain disorder which manifests as recurrent seizures. Electroencephalogram (EEG) signals are generally analyzed to study the characteristics of epileptic seizures. In this work, we propose a method for the automated classification of EEG signals into normal, interictal and ictal classes using Continuous Wavelet Transform (CWT), Higher Order Spectra (HOS) and textures. First the CWT plot was obtained for the EEG signals and then the HOS and texture features were extracted from these plots. Then the statistically significant features were fed to four classifiers namely Decision Tree (DT), K-Nearest Neighbor (KNN), Probabilistic Neural Network (PNN) and Support Vector Machine (SVM) to select the best classifier. We observed that the SVM classifier with Radial Basis Function (RBF) kernel function yielded the best results with an average accuracy of 96%, average sensitivity of 96.9% and average specificity of 97% for 23.6 s duration of EEG data. Our proposed technique can be used as an automatic seizure monitoring software. It can also assist the doctors to cross check the efficacy of their prescribed drugs.
Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging.
Wang, Rui; Li, Rui; Lei, Yanyan; Zhu, Quing
2015-01-01
Support vector machine (SVM) is one of the most effective classification methods for cancer detection. The efficiency and quality of a SVM classifier depends strongly on several important features and a set of proper parameters. Here, a series of classification analyses, with one set of photoacoustic data from ovarian tissues ex vivo and a widely used breast cancer dataset- the Wisconsin Diagnostic Breast Cancer (WDBC), revealed the different accuracy of a SVM classification in terms of the number of features used and the parameters selected. A pattern recognition system is proposed by means of SVM-Recursive Feature Elimination (RFE) with the Radial Basis Function (RBF) kernel. To improve the effectiveness and robustness of the system, an optimized tuning ensemble algorithm called as SVM-RFE(C) with correlation filter was implemented to quantify feature and parameter information based on cross validation. The proposed algorithm is first demonstrated outperforming SVM-RFE on WDBC. Then the best accuracy of 94.643% and sensitivity of 94.595% were achieved when using SVM-RFE(C) to test 57 new PAT data from 19 patients. The experiment results show that the classifier constructed with SVM-RFE(C) algorithm is able to learn additional information from new data and has significant potential in ovarian cancer diagnosis.
Jiang, Rou; You, Rui; Pei, Xiao-Qing; Zou, Xiong; Zhang, Meng-Xia; Wang, Tong-Min; Sun, Rui; Luo, Dong-Hua; Huang, Pei-Yu; Chen, Qiu-Yan; Hua, Yi-Jun; Tang, Lin-Quan; Guo, Ling; Mo, Hao-Yuan; Qian, Chao-Nan; Mai, Hai-Qiang; Hong, Ming-Huang; Cai, Hong-Min; Chen, Ming-Yuan
2016-01-19
The aim of this study was to develop a prognostic classifier and subdivided the M1 stage for nasopharyngeal carcinoma patients with synchronous metastases (mNPC). A retrospective cohort of 347 mNPC patients was recruited between January 2000 and December 2010. Thirty hematological markers and 11 clinical characteristics were collected, and the association of these factors with overall survival (OS) was evaluated. Advanced machine learning schemes of a support vector machine (SVM) were used to select a subset of highly informative factors and to construct a prognostic model (mNPC-SVM). The mNPC-SVM classifier identified ten informative variables, including three clinical indexes and seven hematological markers. The median survival time for low-risk patients (M1a) as identified by the mNPC-SVM classifier was 38.0 months, and survival time was dramatically reduced to 13.8 months for high-risk patients (M1b) (P < 0.001). Multivariate adjustment using prognostic factors revealed that the mNPC-SVM classifier remained a powerful predictor of OS (M1a vs. M1b, hazard ratio, 3.45; 95% CI, 2.59 to 4.60, P < 0.001). Moreover, combination treatment of systemic chemotherapy and loco-regional radiotherapy was associated with significantly better survival outcomes than chemotherapy alone (the 5-year OS, 47.0% vs. 10.0%, P < 0.001) in the M1a subgroup but not in the M1b subgroup (12.0% vs. 3.0%, P = 0.101). These findings were validated by a separate cohort. In conclusion, the newly developed mNPC-SVM classifier led to more precise risk definitions that offer a promising subdivision of the M1 stage and individualized selection for future therapeutic regimens in mNPC patients.
Kazemi, Fatemeh; Najafabadi, Tooraj Abbasian; Araabi, Babak Nadjar
2016-01-01
Acute myelogenous leukemia (AML) is a subtype of acute leukemia, which is characterized by the accumulation of myeloid blasts in the bone marrow. Careful microscopic examination of stained blood smear or bone marrow aspirate is still the most significant diagnostic methodology for initial AML screening and considered as the first step toward diagnosis. It is time-consuming and due to the elusive nature of the signs and symptoms of AML; wrong diagnosis may occur by pathologists. Therefore, the need for automation of leukemia detection has arisen. In this paper, an automatic technique for identification and detection of AML and its prevalent subtypes, i.e., M2-M5 is presented. At first, microscopic images are acquired from blood smears of patients with AML and normal cases. After applying image preprocessing, color segmentation strategy is applied for segmenting white blood cells from other blood components and then discriminative features, i.e., irregularity, nucleus-cytoplasm ratio, Hausdorff dimension, shape, color, and texture features are extracted from the entire nucleus in the whole images containing multiple nuclei. Images are classified to cancerous and noncancerous images by binary support vector machine (SVM) classifier with 10-fold cross validation technique. Classifier performance is evaluated by three parameters, i.e., sensitivity, specificity, and accuracy. Cancerous images are also classified into their prevalent subtypes by multi-SVM classifier. The results show that the proposed algorithm has achieved an acceptable performance for diagnosis of AML and its common subtypes. Therefore, it can be used as an assistant diagnostic tool for pathologists.
Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.
Tuo, Youlin; An, Ning; Zhang, Ming
2018-03-01
The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.
A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease
NASA Astrophysics Data System (ADS)
Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas
2017-08-01
The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.
Progressive Classification Using Support Vector Machines
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri; Kocurek, Michael
2009-01-01
An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user can halt this reclassification process at any point, thereby obtaining the best possible result for a given amount of computation time. Alternatively, the results can be displayed as they are generated, providing the user with real-time feedback about the current accuracy of classification.
Computer-aided diagnosis of malignant mammograms using Zernike moments and SVM.
Sharma, Shubhi; Khanna, Pritee
2015-02-01
This work is directed toward the development of a computer-aided diagnosis (CAD) system to detect abnormalities or suspicious areas in digital mammograms and classify them as malignant or nonmalignant. Original mammogram is preprocessed to separate the breast region from its background. To work on the suspicious area of the breast, region of interest (ROI) patches of a fixed size of 128×128 are extracted from the original large-sized digital mammograms. For training, patches are extracted manually from a preprocessed mammogram. For testing, patches are extracted from a highly dense area identified by clustering technique. For all extracted patches corresponding to a mammogram, Zernike moments of different orders are computed and stored as a feature vector. A support vector machine (SVM) is used to classify extracted ROI patches. The experimental study shows that the use of Zernike moments with order 20 and SVM classifier gives better results among other studies. The proposed system is tested on Image Retrieval In Medical Application (IRMA) reference dataset and Digital Database for Screening Mammography (DDSM) mammogram database. On IRMA reference dataset, it attains 99% sensitivity and 99% specificity, and on DDSM mammogram database, it obtained 97% sensitivity and 96% specificity. To verify the applicability of Zernike moments as a fitting texture descriptor, the performance of the proposed CAD system is compared with the other well-known texture descriptors namely gray-level co-occurrence matrix (GLCM) and discrete cosine transform (DCT).
Lamb wave based damage detection using Matching Pursuit and Support Vector Machine classifier
NASA Astrophysics Data System (ADS)
Agarwal, Sushant; Mitra, Mira
2014-03-01
In this paper, the suitability of using Matching Pursuit (MP) and Support Vector Machine (SVM) for damage detection using Lamb wave response of thin aluminium plate is explored. Lamb wave response of thin aluminium plate with or without damage is simulated using finite element. Simulations are carried out at different frequencies for various kinds of damage. The procedure is divided into two parts - signal processing and machine learning. Firstly, MP is used for denoising and to maintain the sparsity of the dataset. In this study, MP is extended by using a combination of time-frequency functions as the dictionary and is deployed in two stages. Selection of a particular type of atoms lead to extraction of important features while maintaining the sparsity of the waveform. The resultant waveform is then passed as input data for SVM classifier. SVM is used to detect the location of the potential damage from the reduced data. The study demonstrates that SVM is a robust classifier in presence of noise and more efficient as compared to Artificial Neural Network (ANN). Out-of-sample data is used for the validation of the trained and tested classifier. Trained classifiers are found successful in detection of the damage with more than 95% detection rate.
Classification of human carcinoma cells using multispectral imagery
NASA Astrophysics Data System (ADS)
Ćinar, Umut; Y. Ćetin, Yasemin; Ćetin-Atalay, Rengul; Ćetin, Enis
2016-03-01
In this paper, we present a technique for automatically classifying human carcinoma cell images using textural features. An image dataset containing microscopy biopsy images from different patients for 14 distinct cancer cell line type is studied. The images are captured using a RGB camera attached to an inverted microscopy device. Texture based Gabor features are extracted from multispectral input images. SVM classifier is used to generate a descriptive model for the purpose of cell line classification. The experimental results depict satisfactory performance, and the proposed method is versatile for various microscopy magnification options.
Multivariate pattern analysis of obsessive-compulsive disorder using structural neuroanatomy.
Hu, Xinyu; Liu, Qi; Li, Bin; Tang, Wanjie; Sun, Huaiqiang; Li, Fei; Yang, Yanchun; Gong, Qiyong; Huang, Xiaoqi
2016-02-01
Magnetic resonance imaging (MRI) studies have revealed brain structural abnormalities in obsessive-compulsive disorder (OCD) patients, involving both gray matter (GM) and white matter (WM). However, the results of previous publications were based on average differences between groups, which limited their usages in clinical practice. Therefore, the aim of this study was to examine whether the application of multivariate pattern analysis (MVPA) to high-dimensional structural images would allow accurate discrimination between OCD patients and healthy control subjects (HCS). High-resolution T1-weighted images were acquired from 33 OCD patients and 33 demographically matched HCS in a 3.0 T scanner. Differences in GM and WM volume between OCD and HCS were examined using two types of well-established MVPA techniques: support vector machine (SVM) and Gaussian process classifier (GPC). We also drew a receiver operating characteristic (ROC) curve to evaluate the performance of each classifier. The classification accuracies for both classifiers using GM and WM anatomy were all above 75%. The highest classification accuracy (81.82%, P<0.001) was achieved with the SVM classifier using WM information. Regional brain anomalies with high discriminative power were based on three distributed networks including the fronto-striatal circuit, the temporo-parieto-occipital junction and the cerebellum. Our study illustrated that both GM and WM anatomical features may be useful in differentiating OCD patients from HCS. WM volume using the SVM approach showed the highest accuracy in our population for revealing group differences, which suggested its potential diagnostic role in detecting highly enriched OCD patients at the level of the individual. Copyright © 2015 Elsevier B.V. and ECNP. All rights reserved.
A Transcriptional Signature of Fatigue Derived from Patients with Primary Sjögren's Syndrome.
James, Katherine; Al-Ali, Shereen; Tarn, Jessica; Cockell, Simon J; Gillespie, Colin S; Hindmarsh, Victoria; Locke, James; Mitchell, Sheryl; Lendrem, Dennis; Bowman, Simon; Price, Elizabeth; Pease, Colin T; Emery, Paul; Lanyon, Peter; Hunter, John A; Gupta, Monica; Bombardieri, Michele; Sutcliffe, Nurhan; Pitzalis, Costantino; McLaren, John; Cooper, Annie; Regan, Marian; Giles, Ian; Isenberg, David; Saravanan, Vadivelu; Coady, David; Dasgupta, Bhaskar; McHugh, Neil; Young-Min, Steven; Moots, Robert; Gendi, Nagui; Akil, Mohammed; Griffiths, Bridget; Wipat, Anil; Newton, Julia; Jones, David E; Isaacs, John; Hallinan, Jennifer; Ng, Wan-Fai
2015-01-01
Fatigue is a debilitating condition with a significant impact on patients' quality of life. Fatigue is frequently reported by patients suffering from primary Sjögren's Syndrome (pSS), a chronic autoimmune condition characterised by dryness of the eyes and the mouth. However, although fatigue is common in pSS, it does not manifest in all sufferers, providing an excellent model with which to explore the potential underpinning biological mechanisms. Whole blood samples from 133 fully-phenotyped pSS patients stratified for the presence of fatigue, collected by the UK primary Sjögren's Syndrome Registry, were used for whole genome microarray. The resulting data were analysed both on a gene by gene basis and using pre-defined groups of genes. Finally, gene set enrichment analysis (GSEA) was used as a feature selection technique for input into a support vector machine (SVM) classifier. Classification was assessed using area under curve (AUC) of receiver operator characteristic and standard error of Wilcoxon statistic, SE(W). Although no genes were individually found to be associated with fatigue, 19 metabolic pathways were enriched in the high fatigue patient group using GSEA. Analysis revealed that these enrichments arose from the presence of a subset of 55 genes. A radial kernel SVM classifier with this subset of genes as input displayed significantly improved performance over classifiers using all pathway genes as input. The classifiers had AUCs of 0.866 (SE(W) 0.002) and 0.525 (SE(W) 0.006), respectively. Systematic analysis of gene expression data from pSS patients discordant for fatigue identified 55 genes which are predictive of fatigue level using SVM classification. This list represents the first step in understanding the underlying pathophysiological mechanisms of fatigue in patients with pSS.
Park, Eunjeong; Chang, Hyuk-Jae; Nam, Hyo Suk
2017-04-18
The pronator drift test (PDT), a neurological examination, is widely used in clinics to measure motor weakness of stroke patients. The aim of this study was to develop a PDT tool with machine learning classifiers to detect stroke symptoms based on quantification of proximal arm weakness using inertial sensors and signal processing. We extracted features of drift and pronation from accelerometer signals of wearable devices on the inner wrists of 16 stroke patients and 10 healthy controls. Signal processing and feature selection approach were applied to discriminate PDT features used to classify stroke patients. A series of machine learning techniques, namely support vector machine (SVM), radial basis function network (RBFN), and random forest (RF), were implemented to discriminate stroke patients from controls with leave-one-out cross-validation. Signal processing by the PDT tool extracted a total of 12 PDT features from sensors. Feature selection abstracted the major attributes from the 12 PDT features to elucidate the dominant characteristics of proximal weakness of stroke patients using machine learning classification. Our proposed PDT classifiers had an area under the receiver operating characteristic curve (AUC) of .806 (SVM), .769 (RBFN), and .900 (RF) without feature selection, and feature selection improves the AUCs to .913 (SVM), .956 (RBFN), and .975 (RF), representing an average performance enhancement of 15.3%. Sensors and machine learning methods can reliably detect stroke signs and quantify proximal arm weakness. Our proposed solution will facilitate pervasive monitoring of stroke patients. ©Eunjeong Park, Hyuk-Jae Chang, Hyo Suk Nam. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 18.04.2017.
Age and gender estimation using Region-SIFT and multi-layered SVM
NASA Astrophysics Data System (ADS)
Kim, Hyunduk; Lee, Sang-Heon; Sohn, Myoung-Kyu; Hwang, Byunghun
2018-04-01
In this paper, we propose an age and gender estimation framework using the region-SIFT feature and multi-layered SVM classifier. The suggested framework entails three processes. The first step is landmark based face alignment. The second step is the feature extraction step. In this step, we introduce the region-SIFT feature extraction method based on facial landmarks. First, we define sub-regions of the face. We then extract SIFT features from each sub-region. In order to reduce the dimensions of features we employ a Principal Component Analysis (PCA) and a Linear Discriminant Analysis (LDA). Finally, we classify age and gender using a multi-layered Support Vector Machines (SVM) for efficient classification. Rather than performing gender estimation and age estimation independently, the use of the multi-layered SVM can improve the classification rate by constructing a classifier that estimate the age according to gender. Moreover, we collect a dataset of face images, called by DGIST_C, from the internet. A performance evaluation of proposed method was performed with the FERET database, CACD database, and DGIST_C database. The experimental results demonstrate that the proposed approach classifies age and performs gender estimation very efficiently and accurately.
Extraction and classification of 3D objects from volumetric CT data
NASA Astrophysics Data System (ADS)
Song, Samuel M.; Kwon, Junghyun; Ely, Austin; Enyeart, John; Johnson, Chad; Lee, Jongkyu; Kim, Namho; Boyd, Douglas P.
2016-05-01
We propose an Automatic Threat Detection (ATD) algorithm for Explosive Detection System (EDS) using our multistage Segmentation Carving (SC) followed by Support Vector Machine (SVM) classifier. The multi-stage Segmentation and Carving (SC) step extracts all suspect 3-D objects. The feature vector is then constructed for all extracted objects and the feature vector is classified by the Support Vector Machine (SVM) previously learned using a set of ground truth threat and benign objects. The learned SVM classifier has shown to be effective in classification of different types of threat materials. The proposed ATD algorithm robustly deals with CT data that are prone to artifacts due to scatter, beam hardening as well as other systematic idiosyncrasies of the CT data. Furthermore, the proposed ATD algorithm is amenable for including newly emerging threat materials as well as for accommodating data from newly developing sensor technologies. Efficacy of the proposed ATD algorithm with the SVM classifier is demonstrated by the Receiver Operating Characteristics (ROC) curve that relates Probability of Detection (PD) as a function of Probability of False Alarm (PFA). The tests performed using CT data of passenger bags shows excellent performance characteristics.
Dates fruits classification using SVM
NASA Astrophysics Data System (ADS)
Alzu'bi, Reem; Anushya, A.; Hamed, Ebtisam; Al Sha'ar, Eng. Abdelnour; Vincy, B. S. Angela
2018-04-01
In this paper, we used SVM in classifying various types of dates using their images. Dates have interesting different characteristics that can be valuable to distinguish and determine a particular date type. These characteristics include shape, texture, and color. A system that achieves 100% accuracy was built to classify the dates which can be eatable and cannot be eatable. The built system helps the food industry and customer in classifying dates depending on specific quality measures giving best performance with specific type of dates.
Epileptic seizure detection in EEG signal using machine learning techniques.
Jaiswal, Abeg Kumar; Banka, Haider
2018-03-01
Epilepsy is a well-known nervous system disorder characterized by seizures. Electroencephalograms (EEGs), which capture brain neural activity, can detect epilepsy. Traditional methods for analyzing an EEG signal for epileptic seizure detection are time-consuming. Recently, several automated seizure detection frameworks using machine learning technique have been proposed to replace these traditional methods. The two basic steps involved in machine learning are feature extraction and classification. Feature extraction reduces the input pattern space by keeping informative features and the classifier assigns the appropriate class label. In this paper, we propose two effective approaches involving subpattern based PCA (SpPCA) and cross-subpattern correlation-based PCA (SubXPCA) with Support Vector Machine (SVM) for automated seizure detection in EEG signals. Feature extraction was performed using SpPCA and SubXPCA. Both techniques explore the subpattern correlation of EEG signals, which helps in decision-making process. SVM is used for classification of seizure and non-seizure EEG signals. The SVM was trained with radial basis kernel. All the experiments have been carried out on the benchmark epilepsy EEG dataset. The entire dataset consists of 500 EEG signals recorded under different scenarios. Seven different experimental cases for classification have been conducted. The classification accuracy was evaluated using tenfold cross validation. The classification results of the proposed approaches have been compared with the results of some of existing techniques proposed in the literature to establish the claim.
NASA Astrophysics Data System (ADS)
Teodoro, Ana C.; Araujo, Ricardo
2016-01-01
The use of unmanned aerial vehicles (UAVs) for remote sensing applications is becoming more frequent. However, this type of information can result in several software problems related to the huge amount of data available. Object-based image analysis (OBIA) has proven to be superior to pixel-based analysis for very high-resolution images. The main objective of this work was to explore the potentialities of the OBIA methods available in two different open source software applications, Spring and OTB/Monteverdi, in order to generate an urban land cover map. An orthomosaic derived from UAVs was considered, 10 different regions of interest were selected, and two different approaches were followed. The first one (Spring) uses the region growing segmentation algorithm followed by the Bhattacharya classifier. The second approach (OTB/Monteverdi) uses the mean shift segmentation algorithm followed by the support vector machine (SVM) classifier. Two strategies were followed: four classes were considered using Spring and thereafter seven classes were considered for OTB/Monteverdi. The SVM classifier produces slightly better results and presents a shorter processing time. However, the poor spectral resolution of the data (only RGB bands) is an important factor that limits the performance of the classifiers applied.
Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology.
Heinson, Ashley I; Gunawardana, Yawwani; Moesker, Bastiaan; Hume, Carmen C Denman; Vataga, Elena; Hall, Yper; Stylianou, Elena; McShane, Helen; Williams, Ann; Niranjan, Mahesan; Woelk, Christopher H
2017-02-01
Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.
Cerebral 18F-FDG PET in macrophagic myofasciitis: An individual SVM-based approach.
Blanc-Durand, Paul; Van Der Gucht, Axel; Guedj, Eric; Abulizi, Mukedaisi; Aoun-Sebaiti, Mehdi; Lerman, Lionel; Verger, Antoine; Authier, François-Jérôme; Itti, Emmanuel
2017-01-01
Macrophagic myofasciitis (MMF) is an emerging condition with highly specific myopathological alterations. A peculiar spatial pattern of a cerebral glucose hypometabolism involving occipito-temporal cortex and cerebellum have been reported in patients with MMF; however, the full pattern is not systematically present in routine interpretation of scans, and with varying degrees of severity depending on the cognitive profile of patients. Aim was to generate and evaluate a support vector machine (SVM) procedure to classify patients between healthy or MMF 18F-FDG brain profiles. 18F-FDG PET brain images of 119 patients with MMF and 64 healthy subjects were retrospectively analyzed. The whole-population was divided into two groups; a training set (100 MMF, 44 healthy subjects) and a testing set (19 MMF, 20 healthy subjects). Dimensionality reduction was performed using a t-map from statistical parametric mapping (SPM) and a SVM with a linear kernel was trained on the training set. To evaluate the performance of the SVM classifier, values of sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV) and accuracy (Acc) were calculated. The SPM12 analysis on the training set exhibited the already reported hypometabolism pattern involving occipito-temporal and fronto-parietal cortices, limbic system and cerebellum. The SVM procedure, based on the t-test mask generated from the training set, correctly classified MMF patients of the testing set with following Se, Sp, PPV, NPV and Acc: 89%, 85%, 85%, 89%, and 87%. We developed an original and individual approach including a SVM to classify patients between healthy or MMF metabolic brain profiles using 18F-FDG-PET. Machine learning algorithms are promising for computer-aided diagnosis but will need further validation in prospective cohorts.
gkmSVM: an R package for gapped-kmer SVM
Ghandi, Mahmoud; Mohammad-Noori, Morteza; Ghareghani, Narges; Lee, Dongwon; Garraway, Levi; Beer, Michael A.
2016-01-01
Summary: We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel. Availability and Implementation: gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The C ++ implementation is available at www.beerlab.org/gkmsvm Contact: mghandi@gmail.com or mbeer@jhu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153639
Discrimination of transgenic soybean seeds by terahertz spectroscopy
NASA Astrophysics Data System (ADS)
Liu, Wei; Liu, Changhong; Chen, Feng; Yang, Jianbo; Zheng, Lei
2016-10-01
Discrimination of genetically modified organisms is increasingly demanded by legislation and consumers worldwide. The feasibility of a non-destructive discrimination of glyphosate-resistant and conventional soybean seeds and their hybrid descendants was examined by terahertz time-domain spectroscopy system combined with chemometrics. Principal component analysis (PCA), least squares-support vector machines (LS-SVM) and PCA-back propagation neural network (PCA-BPNN) models with the first and second derivative and standard normal variate (SNV) transformation pre-treatments were applied to classify soybean seeds based on genotype. Results demonstrated clear differences among glyphosate-resistant, hybrid descendants and conventional non-transformed soybean seeds could easily be visualized with an excellent classification (accuracy was 88.33% in validation set) using the LS-SVM and the spectra with SNV pre-treatment. The results indicated that THz spectroscopy techniques together with chemometrics would be a promising technique to distinguish transgenic soybean seeds from non-transformed seeds with high efficiency and without any major sample preparation.
Wang, Lei; Pedersen, Peder C; Agu, Emmanuel; Strong, Diane M; Tulu, Bengisu
2017-09-01
The standard chronic wound assessment method based on visual examination is potentially inaccurate and also represents a significant clinical workload. Hence, computer-based systems providing quantitative wound assessment may be valuable for accurately monitoring wound healing status, with the wound area the best suited for automated analysis. Here, we present a novel approach, using support vector machines (SVM) to determine the wound boundaries on foot ulcer images captured with an image capture box, which provides controlled lighting and range. After superpixel segmentation, a cascaded two-stage classifier operates as follows: in the first stage, a set of k binary SVM classifiers are trained and applied to different subsets of the entire training images dataset, and incorrectly classified instances are collected. In the second stage, another binary SVM classifier is trained on the incorrectly classified set. We extracted various color and texture descriptors from superpixels that are used as input for each stage in the classifier training. Specifically, color and bag-of-word representations of local dense scale invariant feature transformation features are descriptors for ruling out irrelevant regions, and color and wavelet-based features are descriptors for distinguishing healthy tissue from wound regions. Finally, the detected wound boundary is refined by applying the conditional random field method. We have implemented the wound classification on a Nexus 5 smartphone platform, except for training which was done offline. Results are compared with other classifiers and show that our approach provides high global performance rates (average sensitivity = 73.3%, specificity = 94.6%) and is sufficiently efficient for a smartphone-based image analysis.
Guo, Shuxiang; Pang, Muye; Gao, Baofeng; Hirata, Hideyuki; Ishihara, Hidenori
2015-01-01
The surface electromyography (sEMG) technique is proposed for muscle activation detection and intuitive control of prostheses or robot arms. Motion recognition is widely used to map sEMG signals to the target motions. One of the main factors preventing the implementation of this kind of method for real-time applications is the unsatisfactory motion recognition rate and time consumption. The purpose of this paper is to compare eight combinations of four feature extraction methods (Root Mean Square (RMS), Detrended Fluctuation Analysis (DFA), Weight Peaks (WP), and Muscular Model (MM)) and two classifiers (Neural Networks (NN) and Support Vector Machine (SVM)), for the task of mapping sEMG signals to eight upper-limb motions, to find out the relation between these methods and propose a proper combination to solve this issue. Seven subjects participated in the experiment and six muscles of the upper-limb were selected to record sEMG signals. The experimental results showed that NN classifier obtained the highest recognition accuracy rate (88.7%) during the training process while SVM performed better in real-time experiments (85.9%). For time consumption, SVM took less time than NN during the training process but needed more time for real-time computation. Among the four feature extraction methods, WP had the highest recognition rate for the training process (97.7%) while MM performed the best during real-time tests (94.3%). The combination of MM and NN is recommended for strict real-time applications while a combination of MM and SVM will be more suitable when time consumption is not a key requirement. PMID:25894941
Acharya, U Rajendra; Sree, S Vinitha; Chattopadhyay, Subhagata; Yu, Wenwei; Ang, Peng Chuan Alvin
2011-06-01
Epilepsy is a common neurological disorder that is characterized by the recurrence of seizures. Electroencephalogram (EEG) signals are widely used to diagnose seizures. Because of the non-linear and dynamic nature of the EEG signals, it is difficult to effectively decipher the subtle changes in these signals by visual inspection and by using linear techniques. Therefore, non-linear methods are being researched to analyze the EEG signals. In this work, we use the recorded EEG signals in Recurrence Plots (RP), and extract Recurrence Quantification Analysis (RQA) parameters from the RP in order to classify the EEG signals into normal, ictal, and interictal classes. Recurrence Plot (RP) is a graph that shows all the times at which a state of the dynamical system recurs. Studies have reported significantly different RQA parameters for the three classes. However, more studies are needed to develop classifiers that use these promising features and present good classification accuracy in differentiating the three types of EEG segments. Therefore, in this work, we have used ten RQA parameters to quantify the important features in the EEG signals.These features were fed to seven different classifiers: Support vector machine (SVM), Gaussian Mixture Model (GMM), Fuzzy Sugeno Classifier, K-Nearest Neighbor (KNN), Naive Bayes Classifier (NBC), Decision Tree (DT), and Radial Basis Probabilistic Neural Network (RBPNN). Our results show that the SVM classifier was able to identify the EEG class with an average efficiency of 95.6%, sensitivity and specificity of 98.9% and 97.8%, respectively.
NASA Astrophysics Data System (ADS)
Li, Manchun; Ma, Lei; Blaschke, Thomas; Cheng, Liang; Tiede, Dirk
2016-07-01
Geographic Object-Based Image Analysis (GEOBIA) is becoming more prevalent in remote sensing classification, especially for high-resolution imagery. Many supervised classification approaches are applied to objects rather than pixels, and several studies have been conducted to evaluate the performance of such supervised classification techniques in GEOBIA. However, these studies did not systematically investigate all relevant factors affecting the classification (segmentation scale, training set size, feature selection and mixed objects). In this study, statistical methods and visual inspection were used to compare these factors systematically in two agricultural case studies in China. The results indicate that Random Forest (RF) and Support Vector Machines (SVM) are highly suitable for GEOBIA classifications in agricultural areas and confirm the expected general tendency, namely that the overall accuracies decline with increasing segmentation scale. All other investigated methods except for RF and SVM are more prone to obtain a lower accuracy due to the broken objects at fine scales. In contrast to some previous studies, the RF classifiers yielded the best results and the k-nearest neighbor classifier were the worst results, in most cases. Likewise, the RF and Decision Tree classifiers are the most robust with or without feature selection. The results of training sample analyses indicated that the RF and adaboost. M1 possess a superior generalization capability, except when dealing with small training sample sizes. Furthermore, the classification accuracies were directly related to the homogeneity/heterogeneity of the segmented objects for all classifiers. Finally, it was suggested that RF should be considered in most cases for agricultural mapping.
Land use/land cover mapping using multi-scale texture processing of high resolution data
NASA Astrophysics Data System (ADS)
Wong, S. N.; Sarker, M. L. R.
2014-02-01
Land use/land cover (LULC) maps are useful for many purposes, and for a long time remote sensing techniques have been used for LULC mapping using different types of data and image processing techniques. In this research, high resolution satellite data from IKONOS was used to perform land use/land cover mapping in Johor Bahru city and adjacent areas (Malaysia). Spatial image processing was carried out using the six texture algorithms (mean, variance, contrast, homogeneity, entropy, and GLDV angular second moment) with five difference window sizes (from 3×3 to 11×11). Three different classifiers i.e. Maximum Likelihood Classifier (MLC), Artificial Neural Network (ANN) and Supported Vector Machine (SVM) were used to classify the texture parameters of different spectral bands individually and all bands together using the same training and validation samples. Results indicated that texture parameters of all bands together generally showed a better performance (overall accuracy = 90.10%) for land LULC mapping, however, single spectral band could only achieve an overall accuracy of 72.67%. This research also found an improvement of the overall accuracy (OA) using single-texture multi-scales approach (OA = 89.10%) and single-scale multi-textures approach (OA = 90.10%) compared with all original bands (OA = 84.02%) because of the complementary information from different bands and different texture algorithms. On the other hand, all of the three different classifiers have showed high accuracy when using different texture approaches, but SVM generally showed higher accuracy (90.10%) compared to MLC (89.10%) and ANN (89.67%) especially for the complex classes such as urban and road.
On the classification techniques in data mining for microarray data classification
NASA Astrophysics Data System (ADS)
Aydadenta, Husna; Adiwijaya
2018-03-01
Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.
An ensemble of SVM classifiers based on gene pairs.
Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin
2013-07-01
In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.
Gabere, Musa Nur; Hussein, Mohamed Aly; Aziz, Mohammad Azhar
2016-01-01
Purpose There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. Methods In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). Results The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. Conclusion This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. PMID:27330311
Support Vector Machine Model for Automatic Detection and Classification of Seismic Events
NASA Astrophysics Data System (ADS)
Barros, Vesna; Barros, Lucas
2016-04-01
The automated processing of multiple seismic signals to detect, localize and classify seismic events is a central tool in both natural hazards monitoring and nuclear treaty verification. However, false detections and missed detections caused by station noise and incorrect classification of arrivals are still an issue and the events are often unclassified or poorly classified. Thus, machine learning techniques can be used in automatic processing for classifying the huge database of seismic recordings and provide more confidence in the final output. Applied in the context of the International Monitoring System (IMS) - a global sensor network developed for the Comprehensive Nuclear-Test-Ban Treaty (CTBT) - we propose a fully automatic method for seismic event detection and classification based on a supervised pattern recognition technique called the Support Vector Machine (SVM). According to Kortström et al., 2015, the advantages of using SVM are handleability of large number of features and effectiveness in high dimensional spaces. Our objective is to detect seismic events from one IMS seismic station located in an area of high seismicity and mining activity and classify them as earthquakes or quarry blasts. It is expected to create a flexible and easily adjustable SVM method that can be applied in different regions and datasets. Taken a step further, accurate results for seismic stations could lead to a modification of the model and its parameters to make it applicable to other waveform technologies used to monitor nuclear explosions such as infrasound and hydroacoustic waveforms. As an authorized user, we have direct access to all IMS data and bulletins through a secure signatory account. A set of significant seismic waveforms containing different types of events (e.g. earthquake, quarry blasts) and noise is being analysed to train the model and learn the typical pattern of the signal from these events. Moreover, comparing the performance of the support-vector network to various classical learning algorithms used before in seismic detection and classification is an essential final step to analyze the advantages and disadvantages of the model.
Hybrid Disease Diagnosis Using Multiobjective Optimization with Evolutionary Parameter Optimization
Nalluri, MadhuSudana Rao; K., Kannan; M., Manisha
2017-01-01
With the widespread adoption of e-Healthcare and telemedicine applications, accurate, intelligent disease diagnosis systems have been profoundly coveted. In recent years, numerous individual machine learning-based classifiers have been proposed and tested, and the fact that a single classifier cannot effectively classify and diagnose all diseases has been almost accorded with. This has seen a number of recent research attempts to arrive at a consensus using ensemble classification techniques. In this paper, a hybrid system is proposed to diagnose ailments using optimizing individual classifier parameters for two classifier techniques, namely, support vector machine (SVM) and multilayer perceptron (MLP) technique. We employ three recent evolutionary algorithms to optimize the parameters of the classifiers above, leading to six alternative hybrid disease diagnosis systems, also referred to as hybrid intelligent systems (HISs). Multiple objectives, namely, prediction accuracy, sensitivity, and specificity, have been considered to assess the efficacy of the proposed hybrid systems with existing ones. The proposed model is evaluated on 11 benchmark datasets, and the obtained results demonstrate that our proposed hybrid diagnosis systems perform better in terms of disease prediction accuracy, sensitivity, and specificity. Pertinent statistical tests were carried out to substantiate the efficacy of the obtained results. PMID:29065626
NASA Astrophysics Data System (ADS)
Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher
2012-10-01
Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.
"When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.
Daniulaityte, Raminta; Chen, Lu; Lamy, Francois R; Carlson, Robert G; Thirunarayan, Krishnaprasad; Sheth, Amit
2016-10-24
To harness the full potential of social media for epidemiological surveillance of drug abuse trends, the field needs a greater level of automation in processing and analyzing social media content. The objective of the study is to describe the development of supervised machine-learning techniques for the eDrugTrends platform to automatically classify tweets by type/source of communication (personal, official/media, retail) and sentiment (positive, negative, neutral) expressed in cannabis- and synthetic cannabinoid-related tweets. Tweets were collected using Twitter streaming Application Programming Interface and filtered through the eDrugTrends platform using keywords related to cannabis, marijuana edibles, marijuana concentrates, and synthetic cannabinoids. After creating coding rules and assessing intercoder reliability, a manually labeled data set (N=4000) was developed by coding several batches of randomly selected subsets of tweets extracted from the pool of 15,623,869 collected by eDrugTrends (May-November 2015). Out of 4000 tweets, 25% (1000/4000) were used to build source classifiers and 75% (3000/4000) were used for sentiment classifiers. Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) were used to train the classifiers. Source classification (n=1000) tested Approach 1 that used short URLs, and Approach 2 where URLs were expanded and included into the bag-of-words analysis. For sentiment classification, Approach 1 used all tweets, regardless of their source/type (n=3000), while Approach 2 applied sentiment classification to personal communication tweets only (2633/3000, 88%). Multiclass and binary classification tasks were examined, and machine-learning sentiment classifier performance was compared with Valence Aware Dictionary for sEntiment Reasoning (VADER), a lexicon and rule-based method. The performance of each classifier was assessed using 5-fold cross validation that calculated average F-scores. One-tailed t test was used to determine if differences in F-scores were statistically significant. In multiclass source classification, the use of expanded URLs did not contribute to significant improvement in classifier performance (0.7972 vs 0.8102 for SVM, P=.19). In binary classification, the identification of all source categories improved significantly when unshortened URLs were used, with personal communication tweets benefiting the most (0.8736 vs 0.8200, P<.001). In multiclass sentiment classification Approach 1, SVM (0.6723) performed similarly to NB (0.6683) and LR (0.6703). In Approach 2, SVM (0.7062) did not differ from NB (0.6980, P=.13) or LR (F=0.6931, P=.05), but it was over 40% more accurate than VADER (F=0.5030, P<.001). In multiclass task, improvements in sentiment classification (Approach 2 vs Approach 1) did not reach statistical significance (eg, SVM: 0.7062 vs 0.6723, P=.052). In binary sentiment classification (positive vs negative), Approach 2 (focus on personal communication tweets only) improved classification results, compared with Approach 1, for LR (0.8752 vs 0.8516, P=.04) and SVM (0.8800 vs 0.8557, P=.045). The study provides an example of the use of supervised machine learning methods to categorize cannabis- and synthetic cannabinoid-related tweets with fairly high accuracy. Use of these content analysis tools along with geographic identification capabilities developed by the eDrugTrends platform will provide powerful methods for tracking regional changes in user opinions related to cannabis and synthetic cannabinoids use over time and across different regions.
Activity Recognition in Egocentric video using SVM, kNN and Combined SVMkNN Classifiers
NASA Astrophysics Data System (ADS)
Sanal Kumar, K. P.; Bhavani, R., Dr.
2017-08-01
Egocentric vision is a unique perspective in computer vision which is human centric. The recognition of egocentric actions is a challenging task which helps in assisting elderly people, disabled patients and so on. In this work, life logging activity videos are taken as input. There are 2 categories, first one is the top level and second one is second level. Here, the recognition is done using the features like Histogram of Oriented Gradients (HOG), Motion Boundary Histogram (MBH) and Trajectory. The features are fused together and it acts as a single feature. The extracted features are reduced using Principal Component Analysis (PCA). The features that are reduced are provided as input to the classifiers like Support Vector Machine (SVM), k nearest neighbor (kNN) and combined Support Vector Machine (SVM) and k Nearest Neighbor (kNN) (combined SVMkNN). These classifiers are evaluated and the combined SVMkNN provided better results than other classifiers in the literature.
gkmSVM: an R package for gapped-kmer SVM.
Ghandi, Mahmoud; Mohammad-Noori, Morteza; Ghareghani, Narges; Lee, Dongwon; Garraway, Levi; Beer, Michael A
2016-07-15
We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel. gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The C ++ implementation is available at www.beerlab.org/gkmsvm mghandi@gmail.com or mbeer@jhu.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals
2011-09-30
Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR Systems Center Pacific...APPROACH Odontocete click detection and classification. A multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et...beaked whales, Risso’s dolphins, short-finned pilot whales, and sperm whales. Here Moretti’s group, especially S. Jarvis , will improve the SVM classifier
NASA Astrophysics Data System (ADS)
Alqasemi, Umar; Kumavor, Patrick; Aguirre, Andres; Zhu, Quing
2012-12-01
Unique features and the underlining hypotheses of how these features may relate to the tumor physiology in coregistered ultrasound and photoacoustic images of ex vivo ovarian tissue are introduced. The images were first compressed with wavelet transform. The mean Radon transform of photoacoustic images was then computed and fitted with a Gaussian function to find the centroid of a suspicious area for shift-invariant recognition process. Twenty-four features were extracted from a training set by several methods, including Fourier transform, image statistics, and different composite filters. The features were chosen from more than 400 training images obtained from 33 ex vivo ovaries of 24 patients, and used to train three classifiers, including generalized linear model, neural network, and support vector machine (SVM). The SVM achieved the best training performance and was able to exclusively separate cancerous from non-cancerous cases with 100% sensitivity and specificity. At the end, the classifiers were used to test 95 new images obtained from 37 ovaries of 20 additional patients. The SVM classifier achieved 76.92% sensitivity and 95.12% specificity. Furthermore, if we assume that recognizing one image as a cancer is sufficient to consider an ovary as malignant, the SVM classifier achieves 100% sensitivity and 87.88% specificity.
NASA Astrophysics Data System (ADS)
Rama Krishna, K.; Ramachandran, K. I.
2018-02-01
Crack propagation is a major cause of failure in rotating machines. It adversely affects the productivity, safety, and the machining quality. Hence, detecting the crack’s severity accurately is imperative for the predictive maintenance of such machines. Fault diagnosis is an established concept in identifying the faults, for observing the non-linear behaviour of the vibration signals at various operating conditions. In this work, we find the classification efficiencies for both original and the reconstructed vibrational signals. The reconstructed signals are obtained using Variational Mode Decomposition (VMD), by splitting the original signal into three intrinsic mode functional components and framing them accordingly. Feature extraction, feature selection and feature classification are the three phases in obtaining the classification efficiencies. All the statistical features from the original signals and reconstructed signals are found out in feature extraction process individually. A few statistical parameters are selected in feature selection process and are classified using the SVM classifier. The obtained results show the best parameters and appropriate kernel in SVM classifier for detecting the faults in bearings. Hence, we conclude that better results were obtained by VMD and SVM process over normal process using SVM. This is owing to denoising and filtering the raw vibrational signals.
The formation method of the feature space for the identification of fatigued bills
NASA Astrophysics Data System (ADS)
Kang, Dongshik; Oshiro, Ayumu; Ozawa, Kenji; Mitsui, Ikugo
2014-10-01
Fatigued bills make a trouble such as the paper jam in a bill handling machine. In the discrimination of fatigued bills using an acoustic signal, the variation of an observed bill sound is considered to be one of causes in misclassification. Therefore a technique has demanded in order to make the classification of fatigued bills more efficient. In this paper, we proposed the algorithm that extracted feature quantity of bill sound from acoustic signal using the frequency difference, and carried out discrimination experiment of fatigued bill money by Support Vector Machine(SVM). The feature quantity of frequency difference can represent the frequency components of an acoustic signal is varied by the fatigued degree of bill money. The generalization performance of SVM does not depend on the size of dimensions of the feature space, even in a high dimensional feature space such as bill-acoustic signals. Furthermore, SVM can induce an optimal classifier which considers the combination of features by the virtue of polynomial kernel functions.
Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil
2014-09-07
Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods. Copyright © 2014 Elsevier Ltd. All rights reserved.
Nonlinear Demodulation and Channel Coding in EBPSK Scheme
Chen, Xianqing; Wu, Lenan
2012-01-01
The extended binary phase shift keying (EBPSK) is an efficient modulation technique, and a special impacting filter (SIF) is used in its demodulator to improve the bit error rate (BER) performance. However, the conventional threshold decision cannot achieve the optimum performance, and the SIF brings more difficulty in obtaining the posterior probability for LDPC decoding. In this paper, we concentrate not only on reducing the BER of demodulation, but also on providing accurate posterior probability estimates (PPEs). A new approach for the nonlinear demodulation based on the support vector machine (SVM) classifier is introduced. The SVM method which selects only a few sampling points from the filter output was used for getting PPEs. The simulation results show that the accurate posterior probability can be obtained with this method and the BER performance can be improved significantly by applying LDPC codes. Moreover, we analyzed the effect of getting the posterior probability with different methods and different sampling rates. We show that there are more advantages of the SVM method under bad condition and it is less sensitive to the sampling rate than other methods. Thus, SVM is an effective method for EBPSK demodulation and getting posterior probability for LDPC decoding. PMID:23213281
Nonlinear demodulation and channel coding in EBPSK scheme.
Chen, Xianqing; Wu, Lenan
2012-01-01
The extended binary phase shift keying (EBPSK) is an efficient modulation technique, and a special impacting filter (SIF) is used in its demodulator to improve the bit error rate (BER) performance. However, the conventional threshold decision cannot achieve the optimum performance, and the SIF brings more difficulty in obtaining the posterior probability for LDPC decoding. In this paper, we concentrate not only on reducing the BER of demodulation, but also on providing accurate posterior probability estimates (PPEs). A new approach for the nonlinear demodulation based on the support vector machine (SVM) classifier is introduced. The SVM method which selects only a few sampling points from the filter output was used for getting PPEs. The simulation results show that the accurate posterior probability can be obtained with this method and the BER performance can be improved significantly by applying LDPC codes. Moreover, we analyzed the effect of getting the posterior probability with different methods and different sampling rates. We show that there are more advantages of the SVM method under bad condition and it is less sensitive to the sampling rate than other methods. Thus, SVM is an effective method for EBPSK demodulation and getting posterior probability for LDPC decoding.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Yongjun; Lim, Jonghyuck; Kim, Namkug
2013-05-15
Purpose: To investigate the effect of using different computed tomography (CT) scanners on the accuracy of high-resolution CT (HRCT) images in classifying regional disease patterns in patients with diffuse lung disease, support vector machine (SVM) and Bayesian classifiers were applied to multicenter data. Methods: Two experienced radiologists marked sets of 600 rectangular 20 Multiplication-Sign 20 pixel regions of interest (ROIs) on HRCT images obtained from two scanners (GE and Siemens), including 100 ROIs for each of local patterns of lungs-normal lung and five of regional pulmonary disease patterns (ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation). Each ROI was assessedmore » using 22 quantitative features belonging to one of the following descriptors: histogram, gradient, run-length, gray level co-occurrence matrix, low-attenuation area cluster, and top-hat transform. For automatic classification, a Bayesian classifier and a SVM classifier were compared under three different conditions. First, classification accuracies were estimated using data from each scanner. Next, data from the GE and Siemens scanners were used for training and testing, respectively, and vice versa. Finally, all ROI data were integrated regardless of the scanner type and were then trained and tested together. All experiments were performed based on forward feature selection and fivefold cross-validation with 20 repetitions. Results: For each scanner, better classification accuracies were achieved with the SVM classifier than the Bayesian classifier (92% and 82%, respectively, for the GE scanner; and 92% and 86%, respectively, for the Siemens scanner). The classification accuracies were 82%/72% for training with GE data and testing with Siemens data, and 79%/72% for the reverse. The use of training and test data obtained from the HRCT images of different scanners lowered the classification accuracy compared to the use of HRCT images from the same scanner. For integrated ROI data obtained from both scanners, the classification accuracies with the SVM and Bayesian classifiers were 92% and 77%, respectively. The selected features resulting from the classification process differed by scanner, with more features included for the classification of the integrated HRCT data than for the classification of the HRCT data from each scanner. For the integrated data, consisting of HRCT images of both scanners, the classification accuracy based on the SVM was statistically similar to the accuracy of the data obtained from each scanner. However, the classification accuracy of the integrated data using the Bayesian classifier was significantly lower than the classification accuracy of the ROI data of each scanner. Conclusions: The use of an integrated dataset along with a SVM classifier rather than a Bayesian classifier has benefits in terms of the classification accuracy of HRCT images acquired with more than one scanner. This finding is of relevance in studies involving large number of images, as is the case in a multicenter trial with different scanners.« less
Semisupervised learning using Bayesian interpretation: application to LS-SVM.
Adankon, Mathias M; Cheriet, Mohamed; Biem, Alain
2011-04-01
Bayesian reasoning provides an ideal basis for representing and manipulating uncertain knowledge, with the result that many interesting algorithms in machine learning are based on Bayesian inference. In this paper, we use the Bayesian approach with one and two levels of inference to model the semisupervised learning problem and give its application to the successful kernel classifier support vector machine (SVM) and its variant least-squares SVM (LS-SVM). Taking advantage of Bayesian interpretation of LS-SVM, we develop a semisupervised learning algorithm for Bayesian LS-SVM using our approach based on two levels of inference. Experimental results on both artificial and real pattern recognition problems show the utility of our method.
Yu, Wei; Clyne, Melinda; Dolan, Siobhan M; Yesupriya, Ajay; Wulf, Anja; Liu, Tiebin; Khoury, Muin J; Gwinn, Marta
2008-04-22
Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.
Chen, Zhenyu; Li, Jianping; Wei, Liwei
2007-10-01
Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.
Support vector machines-based fault diagnosis for turbo-pump rotor
NASA Astrophysics Data System (ADS)
Yuan, Sheng-Fa; Chu, Fu-Lei
2006-05-01
Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.
A Classification Method for Seed Viability Assessment with Infrared Thermography.
Men, Sen; Yan, Lei; Liu, Jiaxin; Qian, Hua; Luo, Qinjuan
2017-04-12
This paper presents a viability assessment method for Pisum sativum L. seeds based on the infrared thermography technique. In this work, different artificial treatments were conducted to prepare seeds samples with different viability. Thermal images and visible images were recorded every five minutes during the standard five day germination test. After the test, the root length of each sample was measured, which can be used as the viability index of that seed. Each individual seed area in the visible images was segmented with an edge detection method, and the average temperature of the corresponding area in the infrared images was calculated as the representative temperature for this seed at that time. The temperature curve of each seed during germination was plotted. Thirteen characteristic parameters extracted from the temperature curve were analyzed to show the difference of the temperature fluctuations between the seeds samples with different viability. With above parameters, support vector machine (SVM) was used to classify the seed samples into three categories: viable, aged and dead according to the root length, the classification accuracy rate was 95%. On this basis, with the temperature data of only the first three hours during the germination, another SVM model was proposed to classify the seed samples, and the accuracy rate was about 91.67%. From these experimental results, it can be seen that infrared thermography can be applied for the prediction of seed viability, based on the SVM algorithm.
Data mining for the analysis of hippocampal zones in Alzheimer's disease
NASA Astrophysics Data System (ADS)
Ovando Vázquez, Cesaré M.
2012-02-01
In this work, a methodology to classify people with Alzheimer's Disease (AD), Healthy Controls (HC) and people with Mild Cognitive Impairment (MCI) is presented. This methodology consists of an ensemble of Support Vector Machines (SVM) with the hippocampal boxes (HB) as input data, these hippocampal zones are taken from Magnetic Resonance (MRI) and Positron Emission Tomography (PET) images. Two ways of constructing this ensemble are presented, the first consists of linear SVM models and the second of non-linear SVM models. Results demonstrate that the linear models classify HBs more accurately than the non-linear models between HC and MCI and that there are no differences between HC and AD.
Carbon Nanotube Growth Rate Regression using Support Vector Machines and Artificial Neural Networks
2014-03-27
intensity D peak. Reprinted with permission from [38]. The SVM classifier is trained using custom written Java code leveraging the Sequential Minimal...Society Encog is a machine learning framework for Java , C++ and .Net applications that supports Bayesian Networks, Hidden Markov Models, SVMs and ANNs [13...SVM classifiers are trained using Weka libraries and leveraging custom written Java code. The data set is created as an Attribute Relationship File
Batres-Mendoza, Patricia; Montoro-Sanjose, Carlos R; Guerra-Hernandez, Erick I; Almanza-Ojeda, Dora L; Rostro-Gonzalez, Horacio; Romero-Troncoso, Rene J; Ibarra-Manzano, Mario A
2016-03-05
Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG) signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA) to represent EEG signals obtained using a brain-computer interface (BCI) device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT), support vector machine (SVM) and k-nearest neighbor (KNN) techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states.
Batres-Mendoza, Patricia; Montoro-Sanjose, Carlos R.; Guerra-Hernandez, Erick I.; Almanza-Ojeda, Dora L.; Rostro-Gonzalez, Horacio; Romero-Troncoso, Rene J.; Ibarra-Manzano, Mario A.
2016-01-01
Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG) signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA) to represent EEG signals obtained using a brain-computer interface (BCI) device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT), support vector machine (SVM) and k-nearest neighbor (KNN) techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states. PMID:26959029
Mapping membrane activity in undiscovered peptide sequence space using machine learning
Fulan, Benjamin M.; Wong, Gerard C. L.
2016-01-01
There are some ∼1,100 known antimicrobial peptides (AMPs), which permeabilize microbial membranes but have diverse sequences. Here, we develop a support vector machine (SVM)-based classifier to investigate ⍺-helical AMPs and the interrelated nature of their functional commonality and sequence homology. SVM is used to search the undiscovered peptide sequence space and identify Pareto-optimal candidates that simultaneously maximize the distance σ from the SVM hyperplane (thus maximize its “antimicrobialness”) and its ⍺-helicity, but minimize mutational distance to known AMPs. By calibrating SVM machine learning results with killing assays and small-angle X-ray scattering (SAXS), we find that the SVM metric σ correlates not with a peptide’s minimum inhibitory concentration (MIC), but rather its ability to generate negative Gaussian membrane curvature. This surprising result provides a topological basis for membrane activity common to AMPs. Moreover, we highlight an important distinction between the maximal recognizability of a sequence to a trained AMP classifier (its ability to generate membrane curvature) and its maximal antimicrobial efficacy. As mutational distances are increased from known AMPs, we find AMP-like sequences that are increasingly difficult for nature to discover via simple mutation. Using the sequence map as a discovery tool, we find a unexpectedly diverse taxonomy of sequences that are just as membrane-active as known AMPs, but with a broad range of primary functions distinct from AMP functions, including endogenous neuropeptides, viral fusion proteins, topogenic peptides, and amyloids. The SVM classifier is useful as a general detector of membrane activity in peptide sequences. PMID:27849600
Generalized SMO algorithm for SVM-based multitask learning.
Cai, Feng; Cherkassky, Vladimir
2012-06-01
Exploiting additional information to improve traditional inductive learning is an active research area in machine learning. In many supervised-learning applications, training data can be naturally separated into several groups, and incorporating this group information into learning may improve generalization. Recently, Vapnik proposed a general approach to formalizing such problems, known as "learning with structured data" and its support vector machine (SVM) based optimization formulation called SVM+. Liang and Cherkassky showed the connection between SVM+ and multitask learning (MTL) approaches in machine learning, and proposed an SVM-based formulation for MTL called SVM+MTL for classification. Training the SVM+MTL classifier requires the solution of a large quadratic programming optimization problem which scales as O(n(3)) with sample size n. So there is a need to develop computationally efficient algorithms for implementing SVM+MTL. This brief generalizes Platt's sequential minimal optimization (SMO) algorithm to the SVM+MTL setting. Empirical results show that, for typical SVM+MTL problems, the proposed generalized SMO achieves over 100 times speed-up, in comparison with general-purpose optimization routines.
Muthu Rama Krishnan, M; Shah, Pratik; Chakraborty, Chandan; Ray, Ajoy K
2012-04-01
The objective of this paper is to provide an improved technique, which can assist oncopathologists in correct screening of oral precancerous conditions specially oral submucous fibrosis (OSF) with significant accuracy on the basis of collagen fibres in the sub-epithelial connective tissue. The proposed scheme is composed of collagen fibres segmentation, its textural feature extraction and selection, screening perfomance enhancement under Gaussian transformation and finally classification. In this study, collagen fibres are segmented on R,G,B color channels using back-probagation neural network from 60 normal and 59 OSF histological images followed by histogram specification for reducing the stain intensity variation. Henceforth, textural features of collgen area are extracted using fractal approaches viz., differential box counting and brownian motion curve . Feature selection is done using Kullback-Leibler (KL) divergence criterion and the screening performance is evaluated based on various statistical tests to conform Gaussian nature. Here, the screening performance is enhanced under Gaussian transformation of the non-Gaussian features using hybrid distribution. Moreover, the routine screening is designed based on two statistical classifiers viz., Bayesian classification and support vector machines (SVM) to classify normal and OSF. It is observed that SVM with linear kernel function provides better classification accuracy (91.64%) as compared to Bayesian classifier. The addition of fractal features of collagen under Gaussian transformation improves Bayesian classifier's performance from 80.69% to 90.75%. Results are here studied and discussed.
NASA Astrophysics Data System (ADS)
Hu, Yan-Yan; Li, Dong-Sheng
2016-01-01
The hyperspectral images(HSI) consist of many closely spaced bands carrying the most object information. While due to its high dimensionality and high volume nature, it is hard to get satisfactory classification performance. In order to reduce HSI data dimensionality preparation for high classification accuracy, it is proposed to combine a band selection method of artificial immune systems (AIS) with a hybrid kernels support vector machine (SVM-HK) algorithm. In fact, after comparing different kernels for hyperspectral analysis, the approach mixed radial basis function kernel (RBF-K) with sigmoid kernel (Sig-K) and applied the optimized hybrid kernels in SVM classifiers. Then the SVM-HK algorithm used to induce the bands selection of an improved version of AIS. The AIS was composed of clonal selection and elite antibody mutation, including evaluation process with optional index factor (OIF). Experimental classification performance was on a San Diego Naval Base acquired by AVIRIS, the HRS dataset shows that the method is able to efficiently achieve bands redundancy removal while outperforming the traditional SVM classifier.
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition
Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina
2007-01-01
Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145
Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin
2013-03-01
Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
STAR-GALAXY CLASSIFICATION IN MULTI-BAND OPTICAL IMAGING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fadely, Ross; Willman, Beth; Hogg, David W.
2012-11-20
Ground-based optical surveys such as PanSTARRS, DES, and LSST will produce large catalogs to limiting magnitudes of r {approx}> 24. Star-galaxy separation poses a major challenge to such surveys because galaxies-even very compact galaxies-outnumber halo stars at these depths. We investigate photometric classification techniques on stars and galaxies with intrinsic FWHM <0.2 arcsec. We consider unsupervised spectral energy distribution template fitting and supervised, data-driven support vector machines (SVMs). For template fitting, we use a maximum likelihood (ML) method and a new hierarchical Bayesian (HB) method, which learns the prior distribution of template probabilities from the data. SVM requires training datamore » to classify unknown sources; ML and HB do not. We consider (1) a best-case scenario (SVM{sub best}) where the training data are (unrealistically) a random sampling of the data in both signal-to-noise and demographics and (2) a more realistic scenario where training is done on higher signal-to-noise data (SVM{sub real}) at brighter apparent magnitudes. Testing with COSMOS ugriz data, we find that HB outperforms ML, delivering {approx}80% completeness, with purity of {approx}60%-90% for both stars and galaxies. We find that no algorithm delivers perfect performance and that studies of metal-poor main-sequence turnoff stars may be challenged by poor star-galaxy separation. Using the Receiver Operating Characteristic curve, we find a best-to-worst ranking of SVM{sub best}, HB, ML, and SVM{sub real}. We conclude, therefore, that a well-trained SVM will outperform template-fitting methods. However, a normally trained SVM performs worse. Thus, HB template fitting may prove to be the optimal classification method in future surveys.« less
Tharwat, Alaa; Moemen, Yasmine S; Hassanien, Aboul Ella
2017-04-01
Measuring toxicity is an important step in drug development. Nevertheless, the current experimental methods used to estimate the drug toxicity are expensive and time-consuming, indicating that they are not suitable for large-scale evaluation of drug toxicity in the early stage of drug development. Hence, there is a high demand to develop computational models that can predict the drug toxicity risks. In this study, we used a dataset that consists of 553 drugs that biotransformed in liver. The toxic effects were calculated for the current data, namely, mutagenic, tumorigenic, irritant and reproductive effect. Each drug is represented by 31 chemical descriptors (features). The proposed model consists of three phases. In the first phase, the most discriminative subset of features is selected using rough set-based methods to reduce the classification time while improving the classification performance. In the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique (SMOTE), BorderLine SMOTE and Safe Level SMOTE are used to solve the problem of imbalanced dataset. In the third phase, the Support Vector Machines (SVM) classifier is used to classify an unknown drug into toxic or non-toxic. SVM parameters such as the penalty parameter and kernel parameter have a great impact on the classification accuracy of the model. In this paper, Whale Optimization Algorithm (WOA) has been proposed to optimize the parameters of SVM, so that the classification error can be reduced. The experimental results proved that the proposed model achieved high sensitivity to all toxic effects. Overall, the high sensitivity of the WOA+SVM model indicates that it could be used for the prediction of drug toxicity in the early stage of drug development. Copyright © 2017 Elsevier Inc. All rights reserved.
Efficient HIK SVM learning for image classification.
Wu, Jianxin
2012-10-01
Histograms are used in almost every aspect of image processing and computer vision, from visual descriptors to image representations. Histogram intersection kernel (HIK) and support vector machine (SVM) classifiers are shown to be very effective in dealing with histograms. This paper presents contributions concerning HIK SVM for image classification. First, we propose intersection coordinate descent (ICD), a deterministic and scalable HIK SVM solver. ICD is much faster than, and has similar accuracies to, general purpose SVM solvers and other fast HIK SVM training methods. We also extend ICD to the efficient training of a broader family of kernels. Second, we show an important empirical observation that ICD is not sensitive to the C parameter in SVM, and we provide some theoretical analyses to explain this observation. ICD achieves high accuracies in many problems, using its default parameters. This is an attractive property for practitioners, because many image processing tasks are too large to choose SVM parameters using cross-validation.
A novel and efficient technique for identification and classification of GPCRs.
Gupta, Ravi; Mittal, Ankush; Singh, Kuldip
2008-07-01
G-protein coupled receptors (GPCRs) play a vital role in different biological processes, such as regulation of growth, death, and metabolism of cells. GPCRs are the focus of significant amount of current pharmaceutical research since they interact with more than 50% of prescription drugs. The dipeptide-based support vector machine (SVM) approach is the most accurate technique to identify and classify the GPCRs. However, this approach has two major disadvantages. First, the dimension of dipeptide-based feature vector is equal to 400. The large dimension makes the classification task computationally and memory wise inefficient. Second, it does not consider the biological properties of protein sequence for identification and classification of GPCRs. In this paper, we present a novel-feature-based SVM classification technique. The novel features are derived by applying wavelet-based time series analysis approach on protein sequences. The proposed feature space summarizes the variance information of seven important biological properties of amino acids in a protein sequence. In addition, the dimension of the feature vector for proposed technique is equal to 35. Experiments were performed on GPCRs protein sequences available at GPCRs Database. Our approach achieves an accuracy of 99.9%, 98.06%, 97.78%, and 94.08% for GPCR superfamily, families, subfamilies, and subsubfamilies (amine group), respectively, when evaluated using fivefold cross-validation. Further, an accuracy of 99.8%, 97.26%, and 97.84% was obtained when evaluated on unseen or recall datasets of GPCR superfamily, families, and subfamilies, respectively. Comparison with dipeptide-based SVM technique shows the effectiveness of our approach.
Understanding user intents in online health forums.
Zhang, Thomas; Cho, Jason H D; Zhai, Chengxiang
2015-07-01
Online health forums provide a convenient way for patients to obtain medical information and connect with physicians and peers outside of clinical settings. However, large quantities of unstructured and diversified content generated on these forums make it difficult for users to digest and extract useful information. Understanding user intents would enable forums to find and recommend relevant information to users by filtering out threads that do not match particular intents. In this paper, we derive a taxonomy of intents to capture user information needs in online health forums and propose novel pattern-based features for use with a multiclass support vector machine (SVM) classifier to classify original thread posts according to their underlying intents. Since no dataset existed for this task, we employ three annotators to manually label a dataset of 1192 HealthBoards posts spanning four forum topics. Experimental results show that a SVM using pattern-based features is highly capable of identifying user intents in forum posts, reaching a maximum precision of 75%, and that a SVM-based hierarchical classifier using both pattern and word features outperforms its SVM counterpart that uses only word features. Furthermore, comparable classification performance can be achieved by training and testing on posts from different forum topics.
Analysis of miRNA expression profile based on SVM algorithm
NASA Astrophysics Data System (ADS)
Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian
2018-05-01
Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.
Cho, Yongwon; Lee, Areum; Park, Jongha; Ko, Bemseok; Kim, Namkug
2018-07-01
Contactless operating room (OR) interfaces are important for computer-aided surgery, and have been developed to decrease the risk of contamination during surgical procedures. In this study, we used Leap Motion™, with a personalized automated classifier, to enhance the accuracy of gesture recognition for contactless interfaces. This software was trained and tested on a personal basis that means the training of gesture per a user. We used 30 features including finger and hand data, which were computed, selected, and fed into a multiclass support vector machine (SVM), and Naïve Bayes classifiers and to predict and train five types of gestures including hover, grab, click, one peak, and two peaks. Overall accuracy of the five gestures was 99.58% ± 0.06, and 98.74% ± 3.64 on a personal basis using SVM and Naïve Bayes classifiers, respectively. We compared gesture accuracy across the entire dataset and used SVM and Naïve Bayes classifiers to examine the strength of personal basis training. We developed and enhanced non-contact interfaces with gesture recognition to enhance OR control systems. Copyright © 2018 Elsevier B.V. All rights reserved.
A method of distributed avionics data processing based on SVM classifier
NASA Astrophysics Data System (ADS)
Guo, Hangyu; Wang, Jinyan; Kang, Minyang; Xu, Guojing
2018-03-01
Under the environment of system combat, in order to solve the problem on management and analysis of the massive heterogeneous data on multi-platform avionics system, this paper proposes a management solution which called avionics "resource cloud" based on big data technology, and designs an aided decision classifier based on SVM algorithm. We design an experiment with STK simulation, the result shows that this method has a high accuracy and a broad application prospect.
Combination of minimum enclosing balls classifier with SVM in coal-rock recognition.
Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan
2017-01-01
Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition.
Combination of minimum enclosing balls classifier with SVM in coal-rock recognition
Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan
2017-01-01
Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition. PMID:28937987
Recognizing ovarian cancer from co-registered ultrasound and photoacoustic images
NASA Astrophysics Data System (ADS)
Alqasemi, Umar; Kumavor, Patrick; Aguirre, Andres; Zhu, Quing
2013-03-01
Unique features in co-registered ultrasound and photoacoustic images of ex vivo ovarian tissue are introduced, along with the hypotheses of how these features may relate to the physiology of tumors. The images are compressed with wavelet transform, after which the mean Radon transform of the photoacoustic image is computed and fitted with a Gaussian function to find the centroid of the suspicious area for shift-invariant recognition process. In the next step, 24 features are extracted from a training set of images by several methods; including features from the Fourier domain, image statistics, and the outputs of different composite filters constructed from the joint frequency response of different cancerous images. The features were chosen from more than 400 training images obtained from 33 ex vivo ovaries of 24 patients, and used to train a support vector machine (SVM) structure. The SVM classifier was able to exclusively separate the cancerous from the non-cancerous cases with 100% sensitivity and specificity. At the end, the classifier was used to test 95 new images, obtained from 37 ovaries of 20 additional patients. The SVM classifier achieved 76.92% sensitivity and 95.12% specificity. Furthermore, if we assume that recognizing one image as a cancerous case is sufficient to consider the ovary as malignant, then the SVM classifier achieves 100% sensitivity and 87.88% specificity.
An Automated and Intelligent Medical Decision Support System for Brain MRI Scans Classification.
Siddiqui, Muhammad Faisal; Reza, Ahmed Wasif; Kanesan, Jeevan
2015-01-01
A wide interest has been observed in the medical health care applications that interpret neuroimaging scans by machine learning systems. This research proposes an intelligent, automatic, accurate, and robust classification technique to classify the human brain magnetic resonance image (MRI) as normal or abnormal, to cater down the human error during identifying the diseases in brain MRIs. In this study, fast discrete wavelet transform (DWT), principal component analysis (PCA), and least squares support vector machine (LS-SVM) are used as basic components. Firstly, fast DWT is employed to extract the salient features of brain MRI, followed by PCA, which reduces the dimensions of the features. These reduced feature vectors also shrink the memory storage consumption by 99.5%. At last, an advanced classification technique based on LS-SVM is applied to brain MR image classification using reduced features. For improving the efficiency, LS-SVM is used with non-linear radial basis function (RBF) kernel. The proposed algorithm intelligently determines the optimized values of the hyper-parameters of the RBF kernel and also applied k-fold stratified cross validation to enhance the generalization of the system. The method was tested by 340 patients' benchmark datasets of T1-weighted and T2-weighted scans. From the analysis of experimental results and performance comparisons, it is observed that the proposed medical decision support system outperformed all other modern classifiers and achieves 100% accuracy rate (specificity/sensitivity 100%/100%). Furthermore, in terms of computation time, the proposed technique is significantly faster than the recent well-known methods, and it improves the efficiency by 71%, 3%, and 4% on feature extraction stage, feature reduction stage, and classification stage, respectively. These results indicate that the proposed well-trained machine learning system has the potential to make accurate predictions about brain abnormalities from the individual subjects, therefore, it can be used as a significant tool in clinical practice.
Alexandridis, Thomas K; Tamouridou, Afroditi Alexandra; Pantazi, Xanthoula Eirini; Lagopodi, Anastasia L; Kashefi, Javid; Ovakoglou, Georgios; Polychronos, Vassilios; Moshou, Dimitrios
2017-09-01
In the present study, the detection and mapping of Silybum marianum (L.) Gaertn. weed using novelty detection classifiers is reported. A multispectral camera (green-red-NIR) on board a fixed wing unmanned aerial vehicle (UAV) was employed for obtaining high-resolution images. Four novelty detection classifiers were used to identify S. marianum between other vegetation in a field. The classifiers were One Class Support Vector Machine (OC-SVM), One Class Self-Organizing Maps (OC-SOM), Autoencoders and One Class Principal Component Analysis (OC-PCA). As input features to the novelty detection classifiers, the three spectral bands and texture were used. The S. marianum identification accuracy using OC-SVM reached an overall accuracy of 96%. The results show the feasibility of effective S. marianum mapping by means of novelty detection classifiers acting on multispectral UAV imagery.
Geographical classification of apple based on hyperspectral imaging
NASA Astrophysics Data System (ADS)
Guo, Zhiming; Huang, Wenqian; Chen, Liping; Zhao, Chunjiang; Peng, Yankun
2013-05-01
Attribute of apple according to geographical origin is often recognized and appreciated by the consumers. It is usually an important factor to determine the price of a commercial product. Hyperspectral imaging technology and supervised pattern recognition was attempted to discriminate apple according to geographical origins in this work. Hyperspectral images of 207 Fuji apple samples were collected by hyperspectral camera (400-1000nm). Principal component analysis (PCA) was performed on hyperspectral imaging data to determine main efficient wavelength images, and then characteristic variables were extracted by texture analysis based on gray level co-occurrence matrix (GLCM) from dominant waveband image. All characteristic variables were obtained by fusing the data of images in efficient spectra. Support vector machine (SVM) was used to construct the classification model, and showed excellent performance in classification results. The total classification rate had the high classify accuracy of 92.75% in the training set and 89.86% in the prediction sets, respectively. The overall results demonstrated that the hyperspectral imaging technique coupled with SVM classifier can be efficiently utilized to discriminate Fuji apple according to geographical origins.
LBP and SIFT based facial expression recognition
NASA Astrophysics Data System (ADS)
Sumer, Omer; Gunes, Ece O.
2015-02-01
This study compares the performance of local binary patterns (LBP) and scale invariant feature transform (SIFT) with support vector machines (SVM) in automatic classification of discrete facial expressions. Facial expression recognition is a multiclass classification problem and seven classes; happiness, anger, sadness, disgust, surprise, fear and comtempt are classified. Using SIFT feature vectors and linear SVM, 93.1% mean accuracy is acquired on CK+ database. On the other hand, the performance of LBP-based classifier with linear SVM is reported on SFEW using strictly person independent (SPI) protocol. Seven-class mean accuracy on SFEW is 59.76%. Experiments on both databases showed that LBP features can be used in a fairly descriptive way if a good localization of facial points and partitioning strategy are followed.
Power line identification of millimeter wave radar based on PCA-GS-SVM
NASA Astrophysics Data System (ADS)
Fang, Fang; Zhang, Guifeng; Cheng, Yansheng
2017-12-01
Aiming at the problem that the existing detection method can not effectively solve the security of UAV's ultra low altitude flight caused by power line, a power line recognition method based on grid search (GS) and the principal component analysis and support vector machine (PCA-SVM) is proposed. Firstly, the candidate line of Hough transform is reduced by PCA, and the main feature of candidate line is extracted. Then, upport vector machine (SVM is) optimized by grid search method (GS). Finally, using support vector machine classifier optimized parameters to classify the candidate line. MATLAB simulation results show that this method can effectively identify the power line and noise, and has high recognition accuracy and algorithm efficiency.
A Transcriptional Signature of Fatigue Derived from Patients with Primary Sjögren’s Syndrome
James, Katherine; Al-Ali, Shereen; Tarn, Jessica; Cockell, Simon J.; Gillespie, Colin S.; Hindmarsh, Victoria; Locke, James; Mitchell, Sheryl; Lendrem, Dennis; Bowman, Simon; Price, Elizabeth; Pease, Colin T.; Emery, Paul; Lanyon, Peter; Hunter, John A.; Gupta, Monica; Bombardieri, Michele; Sutcliffe, Nurhan; Pitzalis, Costantino; McLaren, John; Cooper, Annie; Regan, Marian; Giles, Ian; Isenberg, David; Saravanan, Vadivelu; Coady, David; Dasgupta, Bhaskar; McHugh, Neil; Young-Min, Steven; Moots, Robert; Gendi, Nagui; Akil, Mohammed; Griffiths, Bridget; Wipat, Anil; Newton, Julia; Jones, David E.; Isaacs, John; Hallinan, Jennifer; Ng, Wan-Fai
2015-01-01
Background Fatigue is a debilitating condition with a significant impact on patients’ quality of life. Fatigue is frequently reported by patients suffering from primary Sjögren’s Syndrome (pSS), a chronic autoimmune condition characterised by dryness of the eyes and the mouth. However, although fatigue is common in pSS, it does not manifest in all sufferers, providing an excellent model with which to explore the potential underpinning biological mechanisms. Methods Whole blood samples from 133 fully-phenotyped pSS patients stratified for the presence of fatigue, collected by the UK primary Sjögren’s Syndrome Registry, were used for whole genome microarray. The resulting data were analysed both on a gene by gene basis and using pre-defined groups of genes. Finally, gene set enrichment analysis (GSEA) was used as a feature selection technique for input into a support vector machine (SVM) classifier. Classification was assessed using area under curve (AUC) of receiver operator characteristic and standard error of Wilcoxon statistic, SE(W). Results Although no genes were individually found to be associated with fatigue, 19 metabolic pathways were enriched in the high fatigue patient group using GSEA. Analysis revealed that these enrichments arose from the presence of a subset of 55 genes. A radial kernel SVM classifier with this subset of genes as input displayed significantly improved performance over classifiers using all pathway genes as input. The classifiers had AUCs of 0.866 (SE(W) 0.002) and 0.525 (SE(W) 0.006), respectively. Conclusions Systematic analysis of gene expression data from pSS patients discordant for fatigue identified 55 genes which are predictive of fatigue level using SVM classification. This list represents the first step in understanding the underlying pathophysiological mechanisms of fatigue in patients with pSS. PMID:26694930
E-Nose Vapor Identification Based on Dempster-Shafer Fusion of Multiple Classifiers
NASA Technical Reports Server (NTRS)
Li, Winston; Leung, Henry; Kwan, Chiman; Linnell, Bruce R.
2005-01-01
Electronic nose (e-nose) vapor identification is an efficient approach to monitor air contaminants in space stations and shuttles in order to ensure the health and safety of astronauts. Data preprocessing (measurement denoising and feature extraction) and pattern classification are important components of an e-nose system. In this paper, a wavelet-based denoising method is applied to filter the noisy sensor measurements. Transient-state features are then extracted from the denoised sensor measurements, and are used to train multiple classifiers such as multi-layer perceptions (MLP), support vector machines (SVM), k nearest neighbor (KNN), and Parzen classifier. The Dempster-Shafer (DS) technique is used at the end to fuse the results of the multiple classifiers to get the final classification. Experimental analysis based on real vapor data shows that the wavelet denoising method can remove both random noise and outliers successfully, and the classification rate can be improved by using classifier fusion.
Thanh Noi, Phan; Kappas, Martin
2017-01-01
In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909
Thanh Noi, Phan; Kappas, Martin
2017-12-22
In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.
Frick, Andreas; Gingnell, Malin; Marquand, Andre F.; Howner, Katarina; Fischer, Håkan; Kristiansson, Marianne; Williams, Steven C.R.; Fredrikson, Mats; Furmark, Tomas
2014-01-01
Functional neuroimaging of social anxiety disorder (SAD) support altered neural activation to threat-provoking stimuli focally in the fear network, while structural differences are distributed over the temporal and frontal cortices as well as limbic structures. Previous neuroimaging studies have investigated the brain at the voxel level using mass-univariate methods which do not enable detection of more complex patterns of activity and structural alterations that may separate SAD from healthy individuals. Support vector machine (SVM) is a supervised machine learning method that capitalizes on brain activation and structural patterns to classify individuals. The aim of this study was to investigate if it is possible to discriminate SAD patients (n = 14) from healthy controls (n = 12) using SVM based on (1) functional magnetic resonance imaging during fearful face processing and (2) regional gray matter volume. Whole brain and region of interest (fear network) SVM analyses were performed for both modalities. For functional scans, significant classifications were obtained both at whole brain level and when restricting the analysis to the fear network while gray matter SVM analyses correctly classified participants only when using the whole brain search volume. These results support that SAD is characterized by aberrant neural activation to affective stimuli in the fear network, while disorder-related alterations in regional gray matter volume are more diffusely distributed over the whole brain. SVM may thus be useful for identifying imaging biomarkers of SAD. PMID:24239689
NASA Astrophysics Data System (ADS)
Rokni Deilmai, B.; Ahmad, B. Bin; Zabihi, H.
2014-06-01
Mapping is essential for the analysis of the land use and land cover, which influence many environmental processes and properties. For the purpose of the creation of land cover maps, it is important to minimize error. These errors will propagate into later analyses based on these land cover maps. The reliability of land cover maps derived from remotely sensed data depends on an accurate classification. In this study, we have analyzed multispectral data using two different classifiers including Maximum Likelihood Classifier (MLC) and Support Vector Machine (SVM). To pursue this aim, Landsat Thematic Mapper data and identical field-based training sample datasets in Johor Malaysia used for each classification method, which results indicate in five land cover classes forest, oil palm, urban area, water, rubber. Classification results indicate that SVM was more accurate than MLC. With demonstrated capability to produce reliable cover results, the SVM methods should be especially useful for land cover classification.
Semi-supervised SVM for individual tree crown species classification
NASA Astrophysics Data System (ADS)
Dalponte, Michele; Ene, Liviu Theodor; Marconcini, Mattia; Gobakken, Terje; Næsset, Erik
2015-12-01
In this paper a novel semi-supervised SVM classifier is presented, specifically developed for tree species classification at individual tree crown (ITC) level. In ITC tree species classification, all the pixels belonging to an ITC should have the same label. This assumption is used in the learning of the proposed semi-supervised SVM classifier (ITC-S3VM). This method exploits the information contained in the unlabeled ITC samples in order to improve the classification accuracy of a standard SVM. The ITC-S3VM method can be easily implemented using freely available software libraries. The datasets used in this study include hyperspectral imagery and laser scanning data acquired over two boreal forest areas characterized by the presence of three information classes (Pine, Spruce, and Broadleaves). The experimental results quantify the effectiveness of the proposed approach, which provides classification accuracies significantly higher (from 2% to above 27%) than those obtained by the standard supervised SVM and by a state-of-the-art semi-supervised SVM (S3VM). Particularly, by reducing the number of training samples (i.e. from 100% to 25%, and from 100% to 5% for the two datasets, respectively) the proposed method still exhibits results comparable to the ones of a supervised SVM trained with the full available training set. This property of the method makes it particularly suitable for practical forest inventory applications in which collection of in situ information can be very expensive both in terms of cost and time.
NASA Astrophysics Data System (ADS)
Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin
2010-12-01
We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.
Classification of brain tumours using short echo time 1H MR spectra
NASA Astrophysics Data System (ADS)
Devos, A.; Lukas, L.; Suykens, J. A. K.; Vanhamme, L.; Tate, A. R.; Howe, F. A.; Majós, C.; Moreno-Torres, A.; van der Graaf, M.; Arús, C.; Van Huffel, S.
2004-09-01
The purpose was to objectively compare the application of several techniques and the use of several input features for brain tumour classification using Magnetic Resonance Spectroscopy (MRS). Short echo time 1H MRS signals from patients with glioblastomas ( n = 87), meningiomas ( n = 57), metastases ( n = 39), and astrocytomas grade II ( n = 22) were provided by six centres in the European Union funded INTERPRET project. Linear discriminant analysis, least squares support vector machines (LS-SVM) with a linear kernel and LS-SVM with radial basis function kernel were applied and evaluated over 100 stratified random splittings of the dataset into training and test sets. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of binary classifiers, while the percentage of correct classifications was used to evaluate the multiclass classifiers. The influence of several factors on the classification performance has been tested: L2- vs. water normalization, magnitude vs. real spectra and baseline correction. The effect of input feature reduction was also investigated by using only the selected frequency regions containing the most discriminatory information, and peak integrated values. Using L2-normalized complete spectra the automated binary classifiers reached a mean test AUC of more than 0.95, except for glioblastomas vs. metastases. Similar results were obtained for all classification techniques and input features except for water normalized spectra, where classification performance was lower. This indicates that data acquisition and processing can be simplified for classification purposes, excluding the need for separate water signal acquisition, baseline correction or phasing.
NASA Astrophysics Data System (ADS)
Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore
2017-10-01
This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.
NASA Astrophysics Data System (ADS)
Zhao, Shouwei; Zhang, Yong; Zhou, Bin; Ma, Dongxi
2014-09-01
Interaction is one of the key techniques of augmented reality (AR) maintenance guiding system. Because of the complexity of the maintenance guiding system's image background and the high dimensionality of gesture characteristics, the whole process of gesture recognition can be divided into three stages which are gesture segmentation, gesture characteristic feature modeling and trick recognition. In segmentation stage, for solving the misrecognition of skin-like region, a segmentation algorithm combing background mode and skin color to preclude some skin-like regions is adopted. In gesture characteristic feature modeling of image attributes stage, plenty of characteristic features are analyzed and acquired, such as structure characteristics, Hu invariant moments features and Fourier descriptor. In trick recognition stage, a classifier based on Support Vector Machine (SVM) is introduced into the augmented reality maintenance guiding process. SVM is a novel learning method based on statistical learning theory, processing academic foundation and excellent learning ability, having a lot of issues in machine learning area and special advantages in dealing with small samples, non-linear pattern recognition at high dimension. The gesture recognition of augmented reality maintenance guiding system is realized by SVM after the granulation of all the characteristic features. The experimental results of the simulation of number gesture recognition and its application in augmented reality maintenance guiding system show that the real-time performance and robustness of gesture recognition of AR maintenance guiding system can be greatly enhanced by improved SVM.
Lin, Dongyun; Sun, Lei; Toh, Kar-Ann; Zhang, Jing Bo; Lin, Zhiping
2018-05-01
Automated biomedical image classification could confront the challenges of high level noise, image blur, illumination variation and complicated geometric correspondence among various categorical biomedical patterns in practice. To handle these challenges, we propose a cascade method consisting of two stages for biomedical image classification. At stage 1, we propose a confidence score based classification rule with a reject option for a preliminary decision using the support vector machine (SVM). The testing images going through stage 1 are separated into two groups based on their confidence scores. Those testing images with sufficiently high confidence scores are classified at stage 1 while the others with low confidence scores are rejected and fed to stage 2. At stage 2, the rejected images from stage 1 are first processed by a subspace analysis technique called eigenfeature regularization and extraction (ERE), and then classified by another SVM trained in the transformed subspace learned by ERE. At both stages, images are represented based on two types of local features, i.e., SIFT and SURF, respectively. They are encoded using various bag-of-words (BoW) models to handle biomedical patterns with and without geometric correspondence, respectively. Extensive experiments are implemented to evaluate the proposed method on three benchmark real-world biomedical image datasets. The proposed method significantly outperforms several competing state-of-the-art methods in terms of classification accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.
Comparison of water extraction methods in Tibet based on GF-1 data
NASA Astrophysics Data System (ADS)
Jia, Lingjun; Shang, Kun; Liu, Jing; Sun, Zhongqing
2018-03-01
In this study, we compared four different water extraction methods with GF-1 data according to different water types in Tibet, including Support Vector Machine (SVM), Principal Component Analysis (PCA), Decision Tree Classifier based on False Normalized Difference Water Index (FNDWI-DTC), and PCA-SVM. The results show that all of the four methods can extract large area water body, but only SVM and PCA-SVM can obtain satisfying extraction results for small size water body. The methods were evaluated by both overall accuracy (OAA) and Kappa coefficient (KC). The OAA of PCA-SVM, SVM, FNDWI-DTC, PCA are 96.68%, 94.23%, 93.99%, 93.01%, and the KCs are 0.9308, 0.8995, 0.8962, 0.8842, respectively, in consistent with visual inspection. In summary, SVM is better for narrow rivers extraction and PCA-SVM is suitable for water extraction of various types. As for dark blue lakes, the methods using PCA can extract more quickly and accurately.
Prediction of Flood Warning in Taiwan Using Nonlinear SVM with Simulated Annealing Algorithm
NASA Astrophysics Data System (ADS)
Lee, C.
2013-12-01
The issue of the floods is important in Taiwan. It is because the narrow and high topography of the island make lots of rivers steep in Taiwan. The tropical depression likes typhoon always causes rivers to flood. Prediction of river flow under the extreme rainfall circumstances is important for government to announce the warning of flood. Every time typhoon passed through Taiwan, there were always floods along some rivers. The warning is classified to three levels according to the warning water levels in Taiwan. The propose of this study is to predict the level of floods warning from the information of precipitation, rainfall duration and slope of riverbed. To classify the level of floods warning by the above-mentioned information and modeling the problems, a machine learning model, nonlinear Support vector machine (SVM), is formulated to classify the level of floods warning. In addition, simulated annealing (SA), a probabilistic heuristic algorithm, is used to determine the optimal parameter of the SVM model. A case study of flooding-trend rivers of different gradients in Taiwan is conducted. The contribution of this SVM model with simulated annealing is capable of making efficient announcement for flood warning and keeping the danger of flood from residents along the rivers.
Rajagopal, Rekha; Ranganathan, Vidhyapriya
2018-06-05
Automation in cardiac arrhythmia classification helps medical professionals make accurate decisions about the patient's health. The aim of this work was to design a hybrid classification model to classify cardiac arrhythmias. The design phase of the classification model comprises the following stages: preprocessing of the cardiac signal by eliminating detail coefficients that contain noise, feature extraction through Daubechies wavelet transform, and arrhythmia classification using a collaborative decision from the K nearest neighbor classifier (KNN) and a support vector machine (SVM). The proposed model is able to classify 5 arrhythmia classes as per the ANSI/AAMI EC57: 1998 classification standard. Level 1 of the proposed model involves classification using the KNN and the classifier is trained with examples from all classes. Level 2 involves classification using an SVM and is trained specifically to classify overlapped classes. The final classification of a test heartbeat pertaining to a particular class is done using the proposed KNN/SVM hybrid model. The experimental results demonstrated that the average sensitivity of the proposed model was 92.56%, the average specificity 99.35%, the average positive predictive value 98.13%, the average F-score 94.5%, and the average accuracy 99.78%. The results obtained using the proposed model were compared with the results of discriminant, tree, and KNN classifiers. The proposed model is able to achieve a high classification accuracy.
Serum and Plasma Metabolomic Biomarkers for Lung Cancer.
Kumar, Nishith; Shahjaman, Md; Mollah, Md Nurul Haque; Islam, S M Shahinul; Hoque, Md Aminul
2017-01-01
In drug invention and early disease prediction of lung cancer, metabolomic biomarker detection is very important. Mortality rate can be decreased, if cancer is predicted at the earlier stage. Recent diagnostic techniques for lung cancer are not prognosis diagnostic techniques. However, if we know the name of the metabolites, whose intensity levels are considerably changing between cancer subject and control subject, then it will be easy to early diagnosis the disease as well as to discover the drug. Therefore, in this paper we have identified the influential plasma and serum blood sample metabolites for lung cancer and also identified the biomarkers that will be helpful for early disease prediction as well as for drug invention. To identify the influential metabolites, we considered a parametric and a nonparametric test namely student׳s t-test as parametric and Kruskal-Wallis test as non-parametric test. We also categorized the up-regulated and down-regulated metabolites by the heatmap plot and identified the biomarkers by support vector machine (SVM) classifier and pathway analysis. From our analysis, we got 27 influential (p-value<0.05) metabolites from plasma sample and 13 influential (p-value<0.05) metabolites from serum sample. According to the importance plot through SVM classifier, pathway analysis and correlation network analysis, we declared 4 metabolites (taurine, aspertic acid, glutamine and pyruvic acid) as plasma biomarker and 3 metabolites (aspartic acid, taurine and inosine) as serum biomarker.
Application of the Teager-Kaiser energy operator in bearing fault diagnosis.
Henríquez Rodríguez, Patricia; Alonso, Jesús B; Ferrer, Miguel A; Travieso, Carlos M
2013-03-01
Condition monitoring of rotating machines is important in the prevention of failures. As most machine malfunctions are related to bearing failures, several bearing diagnosis techniques have been developed. Some of them feature the bearing vibration signal with statistical measures and others extract the bearing fault characteristic frequency from the AM component of the vibration signal. In this paper, we propose to transform the vibration signal to the Teager-Kaiser domain and feature it with statistical and energy-based measures. A bearing database with normal and faulty bearings is used. The diagnosis is performed with two classifiers: a neural network classifier and a LS-SVM classifier. Experiments show that the Teager domain features outperform those based on the temporal or AM signal. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Support Vector Machine (SVM) was used in the Genetic Algorithms (GA) process to select and classify a subset of hyperspectral image bands. The method was applied to fluorescence hyperspectral data for the detection of aflatoxin contamination in Aspergillus flavus infected single corn kernels. In the...
Arshad, Sannia; Rho, Seungmin
2014-01-01
We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes. PMID:25295302
Khalid, Shehzad; Arshad, Sannia; Jabbar, Sohail; Rho, Seungmin
2014-01-01
We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.
Support Vector Machine Based on Adaptive Acceleration Particle Swarm Optimization
Abdulameer, Mohammed Hasan; Othman, Zulaiha Ali
2014-01-01
Existing face recognition methods utilize particle swarm optimizer (PSO) and opposition based particle swarm optimizer (OPSO) to optimize the parameters of SVM. However, the utilization of random values in the velocity calculation decreases the performance of these techniques; that is, during the velocity computation, we normally use random values for the acceleration coefficients and this creates randomness in the solution. To address this problem, an adaptive acceleration particle swarm optimization (AAPSO) technique is proposed. To evaluate our proposed method, we employ both face and iris recognition based on AAPSO with SVM (AAPSO-SVM). In the face and iris recognition systems, performance is evaluated using two human face databases, YALE and CASIA, and the UBiris dataset. In this method, we initially perform feature extraction and then recognition on the extracted features. In the recognition process, the extracted features are used for SVM training and testing. During the training and testing, the SVM parameters are optimized with the AAPSO technique, and in AAPSO, the acceleration coefficients are computed using the particle fitness values. The parameters in SVM, which are optimized by AAPSO, perform efficiently for both face and iris recognition. A comparative analysis between our proposed AAPSO-SVM and the PSO-SVM technique is presented. PMID:24790584
Sørensen, Lauge; Nielsen, Mads
2018-05-15
The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.
Computer aided detection system for lung cancer using computer tomography scans
NASA Astrophysics Data System (ADS)
Mahesh, Shanthi; Rakesh, Spoorthi; Patil, Vidya C.
2018-04-01
Lung Cancer is a disease can be defined as uncontrolled cell growth in tissues of the lung. If we detect the Lung Cancer in its early stage, then that could be the key of its cure. In this work the non-invasive methods are studied for assisting in nodule detection. It supplies a Computer Aided Diagnosis System (CAD) for early detection of lung cancer nodules from the Computer Tomography (CT) images. CAD system is the one which helps to improve the diagnostic performance of radiologists in their image interpretations. The main aim of this technique is to develop a CAD system for finding the lung cancer using the lung CT images and classify the nodule as Benign or Malignant. For classifying cancer cells, SVM classifier is used. Here, image processing techniques have been used to de-noise, to enhance, for segmentation and edge detection of an image is used to extract the area, perimeter and shape of nodule. The core factors of this research are Image quality and accuracy.
Yu, Wei; Clyne, Melinda; Dolan, Siobhan M; Yesupriya, Ajay; Wulf, Anja; Liu, Tiebin; Khoury, Muin J; Gwinn, Marta
2008-01-01
Background Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. Results The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. Conclusion GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge. PMID:18430222
Modeling Personalized Email Prioritization: Classification-based and Regression-based Approaches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo S.; Yang, Y.; Carbonell, J.
2011-10-24
Email overload, even after spam filtering, presents a serious productivity challenge for busy professionals and executives. One solution is automated prioritization of incoming emails to ensure the most important are read and processed quickly, while others are processed later as/if time permits in declining priority levels. This paper presents a study of machine learning approaches to email prioritization into discrete levels, comparing ordinal regression versus classier cascades. Given the ordinal nature of discrete email priority levels, SVM ordinal regression would be expected to perform well, but surprisingly a cascade of SVM classifiers significantly outperforms ordinal regression for email prioritization. Inmore » contrast, SVM regression performs well -- better than classifiers -- on selected UCI data sets. This unexpected performance inversion is analyzed and results are presented, providing core functionality for email prioritization systems.« less
Zhang, Jiang; Wang, James Z; Yuan, Zhen; Sobel, Eric S; Jiang, Huabei
2011-01-01
This study presents a computer-aided classification method to distinguish osteoarthritis finger joints from healthy ones based on the functional images captured by x-ray guided diffuse optical tomography. Three imaging features, joint space width, optical absorption, and scattering coefficients, are employed to train a Least Squares Support Vector Machine (LS-SVM) classifier for osteoarthritis classification. The 10-fold validation results show that all osteoarthritis joints are clearly identified and all healthy joints are ruled out by the LS-SVM classifier. The best sensitivity, specificity, and overall accuracy of the classification by experienced technicians based on manual calculation of optical properties and visual examination of optical images are only 85%, 93%, and 90%, respectively. Therefore, our LS-SVM based computer-aided classification is a considerably improved method for osteoarthritis diagnosis.
Fast and robust segmentation of white blood cell images by self-supervised learning.
Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo
2018-04-01
A fast and accurate white blood cell (WBC) segmentation remains a challenging task, as different WBCs vary significantly in color and shape due to cell type differences, staining technique variations and the adhesion between the WBC and red blood cells. In this paper, a self-supervised learning approach, consisting of unsupervised initial segmentation and supervised segmentation refinement, is presented. The first module extracts the overall foreground region from the cell image by K-means clustering, and then generates a coarse WBC region by touching-cell splitting based on concavity analysis. The second module further uses the coarse segmentation result of the first module as automatic labels to actively train a support vector machine (SVM) classifier. Then, the trained SVM classifier is further used to classify each pixel of the image and achieve a more accurate segmentation result. To improve its segmentation accuracy, median color features representing the topological structure and a new weak edge enhancement operator (WEEO) handling fuzzy boundary are introduced. To further reduce its time cost, an efficient cluster sampling strategy is also proposed. We tested the proposed approach with two blood cell image datasets obtained under various imaging and staining conditions. The experiment results show that our approach has a superior performance of accuracy and time cost on both datasets. Copyright © 2018 Elsevier Ltd. All rights reserved.
Classification of fMRI resting-state maps using machine learning techniques: A comparative study
NASA Astrophysics Data System (ADS)
Gallos, Ioannis; Siettos, Constantinos
2017-11-01
We compare the efficiency of Principal Component Analysis (PCA) and nonlinear learning manifold algorithms (ISOMAP and Diffusion maps) for classifying brain maps between groups of schizophrenia patients and healthy from fMRI scans during a resting-state experiment. After a standard pre-processing pipeline, we applied spatial Independent component analysis (ICA) to reduce (a) noise and (b) spatial-temporal dimensionality of fMRI maps. On the cross-correlation matrix of the ICA components, we applied PCA, ISOMAP and Diffusion Maps to find an embedded low-dimensional space. Finally, support-vector-machines (SVM) and k-NN algorithms were used to evaluate the performance of the algorithms in classifying between the two groups.
Computer-Vision-Assisted Palm Rehabilitation With Supervised Learning.
Vamsikrishna, K M; Dogra, Debi Prosad; Desarkar, Maunendra Sankar
2016-05-01
Physical rehabilitation supported by the computer-assisted-interface is gaining popularity among health-care fraternity. In this paper, we have proposed a computer-vision-assisted contactless methodology to facilitate palm and finger rehabilitation. Leap motion controller has been interfaced with a computing device to record parameters describing 3-D movements of the palm of a user undergoing rehabilitation. We have proposed an interface using Unity3D development platform. Our interface is capable of analyzing intermediate steps of rehabilitation without the help of an expert, and it can provide online feedback to the user. Isolated gestures are classified using linear discriminant analysis (DA) and support vector machines (SVM). Finally, a set of discrete hidden Markov models (HMM) have been used to classify gesture sequence performed during rehabilitation. Experimental validation using a large number of samples collected from healthy volunteers reveals that DA and SVM perform similarly while applied on isolated gesture recognition. We have compared the results of HMM-based sequence classification with CRF-based techniques. Our results confirm that both HMM and CRF perform quite similarly when tested on gesture sequences. The proposed system can be used for home-based palm or finger rehabilitation in the absence of experts.
Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying
2015-01-01
Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.
2015-01-01
Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes. PMID:26681483
Identification and classification of similar looking food grains
NASA Astrophysics Data System (ADS)
Anami, B. S.; Biradar, Sunanda D.; Savakar, D. G.; Kulkarni, P. V.
2013-01-01
This paper describes the comparative study of Artificial Neural Network (ANN) and Support Vector Machine (SVM) classifiers by taking a case study of identification and classification of four pairs of similar looking food grains namely, Finger Millet, Mustard, Soyabean, Pigeon Pea, Aniseed, Cumin-seeds, Split Greengram and Split Blackgram. Algorithms are developed to acquire and process color images of these grains samples. The developed algorithms are used to extract 18 colors-Hue Saturation Value (HSV), and 42 wavelet based texture features. Back Propagation Neural Network (BPNN)-based classifier is designed using three feature sets namely color - HSV, wavelet-texture and their combined model. SVM model for color- HSV model is designed for the same set of samples. The classification accuracies ranging from 93% to 96% for color-HSV, ranging from 78% to 94% for wavelet texture model and from 92% to 97% for combined model are obtained for ANN based models. The classification accuracy ranging from 80% to 90% is obtained for color-HSV based SVM model. Training time required for the SVM based model is substantially lesser than ANN for the same set of images.
Bias in error estimation when using cross-validation for model selection.
Varma, Sudhir; Simon, Richard
2006-02-23
Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these "null" datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With "null" and "non null" (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the "null" datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training data-sets. For SVM with optimal parameters the estimated error rate was less than 30% on 38% of "null" data-sets. Performance of the optimized classifiers on the independent test set was no better than chance. The nested CV procedure reduces the bias considerably and gives an estimate of the error that is very close to that obtained on the independent testing set for both Shrunken Centroids and SVM classifiers for "null" and "non-null" data distributions. We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.
Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat
2016-12-22
The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.
Ozcift, Akin
2012-08-01
Parkinson disease (PD) is an age-related deterioration of certain nerve systems, which affects movement, balance, and muscle control of clients. PD is one of the common diseases which affect 1% of people older than 60 years. A new classification scheme based on support vector machine (SVM) selected features to train rotation forest (RF) ensemble classifiers is presented for improving diagnosis of PD. The dataset contains records of voice measurements from 31 people, 23 with PD and each record in the dataset is defined with 22 features. The diagnosis model first makes use of a linear SVM to select ten most relevant features from 22. As a second step of the classification model, six different classifiers are trained with the subset of features. Subsequently, at the third step, the accuracies of classifiers are improved by the utilization of RF ensemble classification strategy. The results of the experiments are evaluated using three metrics; classification accuracy (ACC), Kappa Error (KE) and Area under the Receiver Operating Characteristic (ROC) Curve (AUC). Performance measures of two base classifiers, i.e. KStar and IBk, demonstrated an apparent increase in PD diagnosis accuracy compared to similar studies in literature. After all, application of RF ensemble classification scheme improved PD diagnosis in 5 of 6 classifiers significantly. We, numerically, obtained about 97% accuracy in RF ensemble of IBk (a K-Nearest Neighbor variant) algorithm, which is a quite high performance for Parkinson disease diagnosis.
Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading.
Sahran, Shahnorbanun; Albashish, Dheeb; Abdullah, Azizi; Shukor, Nordashima Abd; Hayati Md Pauzi, Suria
2018-04-18
Feature selection (FS) methods are widely used in grading and diagnosing prostate histopathological images. In this context, FS is based on the texture features obtained from the lumen, nuclei, cytoplasm and stroma, all of which are important tissue components. However, it is difficult to represent the high-dimensional textures of these tissue components. To solve this problem, we propose a new FS method that enables the selection of features with minimal redundancy in the tissue components. We categorise tissue images based on the texture of individual tissue components via the construction of a single classifier and also construct an ensemble learning model by merging the values obtained by each classifier. Another issue that arises is overfitting due to the high-dimensional texture of individual tissue components. We propose a new FS method, SVM-RFE(AC), that integrates a Support Vector Machine-Recursive Feature Elimination (SVM-RFE) embedded procedure with an absolute cosine (AC) filter method to prevent redundancy in the selected features of the SV-RFE and an unoptimised classifier in the AC. We conducted experiments on H&E histopathological prostate and colon cancer images with respect to three prostate classifications, namely benign vs. grade 3, benign vs. grade 4 and grade 3 vs. grade 4. The colon benchmark dataset requires a distinction between grades 1 and 2, which are the most difficult cases to distinguish in the colon domain. The results obtained by both the single and ensemble classification models (which uses the product rule as its merging method) confirm that the proposed SVM-RFE(AC) is superior to the other SVM and SVM-RFE-based methods. We developed an FS method based on SVM-RFE and AC and successfully showed that its use enabled the identification of the most crucial texture feature of each tissue component. Thus, it makes possible the distinction between multiple Gleason grades (e.g. grade 3 vs. grade 4) and its performance is far superior to other reported FS methods. Copyright © 2018 Elsevier B.V. All rights reserved.
Mediterranean Land Use and Land Cover Classification Assessment Using High Spatial Resolution Data
NASA Astrophysics Data System (ADS)
Elhag, Mohamed; Boteva, Silvena
2016-10-01
Landscape fragmentation is noticeably practiced in Mediterranean regions and imposes substantial complications in several satellite image classification methods. To some extent, high spatial resolution data were able to overcome such complications. For better classification performances in Land Use Land Cover (LULC) mapping, the current research adopts different classification methods comparison for LULC mapping using Sentinel-2 satellite as a source of high spatial resolution. Both of pixel-based and an object-based classification algorithms were assessed; the pixel-based approach employs Maximum Likelihood (ML), Artificial Neural Network (ANN) algorithms, Support Vector Machine (SVM), and, the object-based classification uses the Nearest Neighbour (NN) classifier. Stratified Masking Process (SMP) that integrates a ranking process within the classes based on spectral fluctuation of the sum of the training and testing sites was implemented. An analysis of the overall and individual accuracy of the classification results of all four methods reveals that the SVM classifier was the most efficient overall by distinguishing most of the classes with the highest accuracy. NN succeeded to deal with artificial surface classes in general while agriculture area classes, and forest and semi-natural area classes were segregated successfully with SVM. Furthermore, a comparative analysis indicates that the conventional classification method yielded better accuracy results than the SMP method overall with both classifiers used, ML and SVM.
Nanthini, B. Suguna; Santhi, B.
2017-01-01
Background: Epilepsy causes when the repeated seizure occurs in the brain. Electroencephalogram (EEG) test provides valuable information about the brain functions and can be useful to detect brain disorder, especially for epilepsy. In this study, application for an automated seizure detection model has been introduced successfully. Materials and Methods: The EEG signals are decomposed into sub-bands by discrete wavelet transform using db2 (daubechies) wavelet. The eight statistical features, the four gray level co-occurrence matrix and Renyi entropy estimation with four different degrees of order, are extracted from the raw EEG and its sub-bands. Genetic algorithm (GA) is used to select eight relevant features from the 16 dimension features. The model has been trained and tested using support vector machine (SVM) classifier successfully for EEG signals. The performance of the SVM classifier is evaluated for two different databases. Results: The study has been experimented through two different analyses and achieved satisfactory performance for automated seizure detection using relevant features as the input to the SVM classifier. Conclusion: Relevant features using GA give better accuracy performance for seizure detection. PMID:28781480
Cancer survival classification using integrated data sets and intermediate information.
Kim, Shinuk; Park, Taesung; Kon, Mark
2014-09-01
Although numerous studies related to cancer survival have been published, increasing the prediction accuracy of survival classes still remains a challenge. Integration of different data sets, such as microRNA (miRNA) and mRNA, might increase the accuracy of survival class prediction. Therefore, we suggested a machine learning (ML) approach to integrate different data sets, and developed a novel method based on feature selection with Cox proportional hazard regression model (FSCOX) to improve the prediction of cancer survival time. FSCOX provides us with intermediate survival information, which is usually discarded when separating survival into 2 groups (short- and long-term), and allows us to perform survival analysis. We used an ML-based protocol for feature selection, integrating information from miRNA and mRNA expression profiles at the feature level. To predict survival phenotypes, we used the following classifiers, first, existing ML methods, support vector machine (SVM) and random forest (RF), second, a new median-based classifier using FSCOX (FSCOX_median), and third, an SVM classifier using FSCOX (FSCOX_SVM). We compared these methods using 3 types of cancer tissue data sets: (i) miRNA expression, (ii) mRNA expression, and (iii) combined miRNA and mRNA expression. The latter data set included features selected either from the combined miRNA/mRNA profile or independently from miRNAs and mRNAs profiles (IFS). In the ovarian data set, the accuracy of survival classification using the combined miRNA/mRNA profiles with IFS was 75% using RF, 86.36% using SVM, 84.09% using FSCOX_median, and 88.64% using FSCOX_SVM with a balanced 22 short-term and 22 long-term survivor data set. These accuracies are higher than those using miRNA alone (70.45%, RF; 75%, SVM; 75%, FSCOX_median; and 75%, FSCOX_SVM) or mRNA alone (65.91%, RF; 63.64%, SVM; 72.73%, FSCOX_median; and 70.45%, FSCOX_SVM). Similarly in the glioblastoma multiforme data, the accuracy of miRNA/mRNA using IFS was 75.51% (RF), 87.76% (SVM) 85.71% (FSCOX_median), 85.71% (FSCOX_SVM). These results are higher than the results of using miRNA expression and mRNA expression alone. In addition we predict 16 hsa-miR-23b and hsa-miR-27b target genes in ovarian cancer data sets, obtained by SVM-based feature selection through integration of sequence information and gene expression profiles. Among the approaches used, the integrated miRNA and mRNA data set yielded better results than the individual data sets. The best performance was achieved using the FSCOX_SVM method with independent feature selection, which uses intermediate survival information between short-term and long-term survival time and the combination of the 2 different data sets. The results obtained using the combined data set suggest that there are some strong interactions between miRNA and mRNA features that are not detectable in the individual analyses. Copyright © 2014 Elsevier B.V. All rights reserved.
Parkinson's disease detection based on dysphonia measurements
NASA Astrophysics Data System (ADS)
Lahmiri, Salim
2017-04-01
Assessing dysphonic symptoms is a noninvasive and effective approach to detect Parkinson's disease (PD) in patients. The main purpose of this study is to investigate the effect of different dysphonia measurements on PD detection by support vector machine (SVM). Seven categories of dysphonia measurements are considered. Experimental results from ten-fold cross-validation technique demonstrate that vocal fundamental frequency statistics yield the highest accuracy of 88 % ± 0.04. When all dysphonia measurements are employed, the SVM classifier achieves 94 % ± 0.03 accuracy. A refinement of the original patterns space by removing dysphonia measurements with similar variation across healthy and PD subjects allows achieving 97.03 % ± 0.03 accuracy. The latter performance is larger than what is reported in the literature on the same dataset with ten-fold cross-validation technique. Finally, it was found that measures of ratio of noise to tonal components in the voice are the most suitable dysphonic symptoms to detect PD subjects as they achieve 99.64 % ± 0.01 specificity. This finding is highly promising for understanding PD symptoms.
Terahertz spectroscopic investigation of human gastric normal and tumor tissues
NASA Astrophysics Data System (ADS)
Hou, Dibo; Li, Xian; Cai, Jinhui; Ma, Yehao; Kang, Xusheng; Huang, Pingjie; Zhang, Guangxin
2014-09-01
Human dehydrated normal and cancerous gastric tissues were measured using transmission time-domain terahertz spectroscopy. Based on the obtained terahertz absorption spectra, the contrasts between the two kinds of tissue were investigated and techniques for automatic identification of cancerous tissue were studied. Distinctive differences were demonstrated in both the shape and amplitude of the absorption spectra between normal and tumor tissue. Additionally, some spectral features in the range of 0.2~0.5 THz and 1~1.5 THz were revealed for all cancerous gastric tissues. To systematically achieve the identification of gastric cancer, principal component analysis combined with t-test was used to extract valuable information indicating the best distinction between the two types. Two clustering approaches, K-means and support vector machine (SVM), were then performed to classify the processed terahertz data into normal and cancerous groups. SVM presented a satisfactory result with less false classification cases. The results of this study implicate the potential of the terahertz technique to detect gastric cancer. The applied data analysis methodology provides a suggestion for automatic discrimination of terahertz spectra in other applications.
NASA Astrophysics Data System (ADS)
Van Gordon, M.; Van Gordon, S.; Min, A.; Sullivan, J.; Weiner, Z.; Tappan, G. G.
2017-12-01
Using support vector machine (SVM) learning and high-accuracy hand-classified maps, we have developed a publicly available land cover classification tool for the West African Sahel. Our classifier produces high-resolution and regionally calibrated land cover maps for the Sahel, representing a significant contribution to the data available for this region. Global land cover products are unreliable for the Sahel, and accurate land cover data for the region are sparse. To address this gap, the U.S. Geological Survey and the Regional Center for Agriculture, Hydrology and Meteorology (AGRHYMET) in Niger produced high-quality land cover maps for the region via hand-classification of Landsat images. This method produces highly accurate maps, but the time and labor required constrain the spatial and temporal resolution of the data products. By using these hand-classified maps alongside SVM techniques, we successfully increase the resolution of the land cover maps by 1-2 orders of magnitude, from 2km-decadal resolution to 30m-annual resolution. These high-resolution regionally calibrated land cover datasets, along with the classifier we developed to produce them, lay the foundation for major advances in studies of land surface processes in the region. These datasets will provide more accurate inputs for food security modeling, hydrologic modeling, analyses of land cover change and climate change adaptation efforts. The land cover classification tool we have developed will be publicly available for use in creating additional West Africa land cover datasets with future remote sensing data and can be adapted for use in other parts of the world.
Motor Oil Classification using Color Histograms and Pattern Recognition Techniques.
Ahmadi, Shiva; Mani-Varnosfaderani, Ahmad; Habibi, Biuck
2018-04-20
Motor oil classification is important for quality control and the identification of oil adulteration. In thiswork, we propose a simple, rapid, inexpensive and nondestructive approach based on image analysis and pattern recognition techniques for the classification of nine different types of motor oils according to their corresponding color histograms. For this, we applied color histogram in different color spaces such as red green blue (RGB), grayscale, and hue saturation intensity (HSI) in order to extract features that can help with the classification procedure. These color histograms and their combinations were used as input for model development and then were statistically evaluated by using linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and support vector machine (SVM) techniques. Here, two common solutions for solving a multiclass classification problem were applied: (1) transformation to binary classification problem using a one-against-all (OAA) approach and (2) extension from binary classifiers to a single globally optimized multilabel classification model. In the OAA strategy, LDA, QDA, and SVM reached up to 97% in terms of accuracy, sensitivity, and specificity for both the training and test sets. In extension from binary case, despite good performances by the SVM classification model, QDA and LDA provided better results up to 92% for RGB-grayscale-HSI color histograms and up to 93% for the HSI color map, respectively. In order to reduce the numbers of independent variables for modeling, a principle component analysis algorithm was used. Our results suggest that the proposed method is promising for the identification and classification of different types of motor oils.
NASA Astrophysics Data System (ADS)
Taha, Z.; Razman, M. A. M.; Adnan, F. A.; Ghani, A. S. Abdul; Majeed, A. P. P. Abdul; Musa, R. M.; Sallehudin, M. F.; Mukai, Y.
2018-03-01
Fish Hunger behaviour is one of the important element in determining the fish feeding routine, especially for farmed fishes. Inaccurate feeding routines (under-feeding or over-feeding) lead the fishes to die and thus, reduces the total production of fishes. The excessive food which is not eaten by fish will be dissolved in the water and thus, reduce the water quality (oxygen quantity in the water will be reduced). The reduction of oxygen (water quality) leads the fish to die and in some cases, may lead to fish diseases. This study correlates Barramundi fish-school behaviour with hunger condition through the hybrid data integration of image processing technique. The behaviour is clustered with respect to the position of the centre of gravity of the school of fish prior feeding, during feeding and after feeding. The clustered fish behaviour is then classified by means of a machine learning technique namely Support vector machine (SVM). It has been shown from the study that the Fine Gaussian variation of SVM is able to provide a reasonably accurate classification of fish feeding behaviour with a classification accuracy of 79.7%. The proposed integration technique may increase the usefulness of the captured data and thus better differentiates the various behaviour of farmed fishes.
Soft Computing Application in Fault Detection of Induction Motor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Konar, P.; Puhan, P. S.; Chattopadhyay, P. Dr.
2010-10-26
The paper investigates the effectiveness of different patter classifier like Feed Forward Back Propagation (FFBPN), Radial Basis Function (RBF) and Support Vector Machine (SVM) for detection of bearing faults in Induction Motor. The steady state motor current with Park's Transformation has been used for discrimination of inner race and outer race bearing defects. The RBF neural network shows very encouraging results for multi-class classification problems and is hoped to set up a base for incipient fault detection of induction motor. SVM is also found to be a very good fault classifier which is highly competitive with RBF.
PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons.
Long, Yi; Du, Zhi-Jiang; Wang, Wei-Dong; Zhao, Guang-Yu; Xu, Guo-Qiang; He, Long; Mao, Xi-Wang; Dong, Wei
2016-09-02
Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM) optimized by particle swarm optimization (PSO) to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS) attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz), a three-layer wavelet packet analysis (WPA) is used for feature extraction, after which, the kernel principal component analysis (kPCA) is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA) is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance.
PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons
Long, Yi; Du, Zhi-Jiang; Wang, Wei-Dong; Zhao, Guang-Yu; Xu, Guo-Qiang; He, Long; Mao, Xi-Wang; Dong, Wei
2016-01-01
Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM) optimized by particle swarm optimization (PSO) to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS) attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz), a three-layer wavelet packet analysis (WPA) is used for feature extraction, after which, the kernel principal component analysis (kPCA) is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA) is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance. PMID:27598160
Paiva, Joana S; Cardoso, João; Pereira, Tânia
2018-01-01
The main goal of this study was to develop an automatic method based on supervised learning methods, able to distinguish healthy from pathologic arterial pulse wave (APW), and those two from noisy waveforms (non-relevant segments of the signal), from the data acquired during a clinical examination with a novel optical system. The APW dataset analysed was composed by signals acquired in a clinical environment from a total of 213 subjects, including healthy volunteers and non-healthy patients. The signals were parameterised by means of 39pulse features: morphologic, time domain statistics, cross-correlation features, wavelet features. Multiclass Support Vector Machine Recursive Feature Elimination (SVM RFE) method was used to select the most relevant features. A comparative study was performed in order to evaluate the performance of the two classifiers: Support Vector Machine (SVM) and Artificial Neural Network (ANN). SVM achieved a statistically significant better performance for this problem with an average accuracy of 0.9917±0.0024 and a F-Measure of 0.9925±0.0019, in comparison with ANN, which reached the values of 0.9847±0.0032 and 0.9852±0.0031 for Accuracy and F-Measure, respectively. A significant difference was observed between the performances obtained with SVM classifier using a different number of features from the original set available. The comparison between SVM and NN allowed reassert the higher performance of SVM. The results obtained in this study showed the potential of the proposed method to differentiate those three important signal outcomes (healthy, pathologic and noise) and to reduce bias associated with clinical diagnosis of cardiovascular disease using APW. Copyright © 2017 Elsevier B.V. All rights reserved.
Realistic Subsurface Anomaly Discrimination Using Electromagnetic Induction and an SVM Classifier
NASA Astrophysics Data System (ADS)
Pablo Fernández, Juan; Shubitidze, Fridon; Shamatava, Irma; Barrowes, Benjamin E.; O'Neill, Kevin
2010-12-01
The environmental research program of the United States military has set up blind tests for detection and discrimination of unexploded ordnance. One such test consists of measurements taken with the EM-63 sensor at Camp Sibert, AL. We review the performance on the test of a procedure that combines a field-potential (HAP) method to locate targets, the normalized surface magnetic source (NSMS) model to characterize them, and a support vector machine (SVM) to classify them. The HAP method infers location from the scattered magnetic field and its associated scalar potential, the latter reconstructed using equivalent sources. NSMS replaces the target with an enclosing spheroid of equivalent radial magnetization whose integral it uses as a discriminator. SVM generalizes from empirical evidence and can be adapted for multiclass discrimination using a voting system. Our method identifies all potentially dangerous targets correctly and has a false-alarm rate of about 5%.
USDA-ARS?s Scientific Manuscript database
It is important to find an appropriate pattern-recognition method for in-field plant identification based on spectral measurement in order to classify the crop and weeds accurately. In this study, the method of Support Vector Machine (SVM) was evaluated and compared with two other methods, Decision ...
Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines
del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J.; Raboso, Mariano
2015-01-01
Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation—based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking—to reduce the dimensions of images—and binarization—to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements. PMID:26091392
Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines.
del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J; Raboso, Mariano
2015-06-17
Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation-based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking-to reduce the dimensions of images-and binarization-to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements.
Texture- and deformability-based surface recognition by tactile image analysis.
Khasnobish, Anwesha; Pal, Monalisa; Tibarewala, D N; Konar, Amit; Pal, Kunal
2016-08-01
Deformability and texture are two unique object characteristics which are essential for appropriate surface recognition by tactile exploration. Tactile sensation is required to be incorporated in artificial arms for rehabilitative and other human-computer interface applications to achieve efficient and human-like manoeuvring. To accomplish the same, surface recognition by tactile data analysis is one of the prerequisites. The aim of this work is to develop effective technique for identification of various surfaces based on deformability and texture by analysing tactile images which are obtained during dynamic exploration of the item by artificial arms whose gripper is fitted with tactile sensors. Tactile data have been acquired, while human beings as well as a robot hand fitted with tactile sensors explored the objects. The tactile images are pre-processed, and relevant features are extracted from the tactile images. These features are provided as input to the variants of support vector machine (SVM), linear discriminant analysis and k-nearest neighbour (kNN) for classification. Based on deformability, six household surfaces are recognized from their corresponding tactile images. Moreover, based on texture five surfaces of daily use are classified. The method adopted in the former two cases has also been applied for deformability- and texture-based recognition of four biomembranes, i.e. membranes prepared from biomaterials which can be used for various applications such as drug delivery and implants. Linear SVM performed best for recognizing surface deformability with an accuracy of 83 % in 82.60 ms, whereas kNN classifier recognizes surfaces of daily use having different textures with an accuracy of 89 % in 54.25 ms and SVM with radial basis function kernel recognizes biomembranes with an accuracy of 78 % in 53.35 ms. The classifiers are observed to generalize well on the unseen test datasets with very high performance to achieve efficient material recognition based on its deformability and texture.
Ali, Safdar; Majid, Abdul; Khan, Asifullah
2014-04-01
Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our analysis demonstrates that ensemble-RF, ensemble-SVM and ensemble-KNN are more effective than their individual counterparts. The proposed 'IDM-PhyChm-Ens' method has shown improved performance compared to existing techniques.
Machine learning search for variable stars
NASA Astrophysics Data System (ADS)
Pashchenko, Ilya N.; Sokolovsky, Kirill V.; Gavras, Panagiotis
2018-04-01
Photometric variability detection is often considered as a hypothesis testing problem: an object is variable if the null hypothesis that its brightness is constant can be ruled out given the measurements and their uncertainties. The practical applicability of this approach is limited by uncorrected systematic errors. We propose a new variability detection technique sensitive to a wide range of variability types while being robust to outliers and underestimated measurement uncertainties. We consider variability detection as a classification problem that can be approached with machine learning. Logistic Regression (LR), Support Vector Machines (SVM), k Nearest Neighbours (kNN), Neural Nets (NN), Random Forests (RF), and Stochastic Gradient Boosting classifier (SGB) are applied to 18 features (variability indices) quantifying scatter and/or correlation between points in a light curve. We use a subset of Optical Gravitational Lensing Experiment phase two (OGLE-II) Large Magellanic Cloud (LMC) photometry (30 265 light curves) that was searched for variability using traditional methods (168 known variable objects) as the training set and then apply the NN to a new test set of 31 798 OGLE-II LMC light curves. Among 205 candidates selected in the test set, 178 are real variables, while 13 low-amplitude variables are new discoveries. The machine learning classifiers considered are found to be more efficient (select more variables and fewer false candidates) compared to traditional techniques using individual variability indices or their linear combination. The NN, SGB, SVM, and RF show a higher efficiency compared to LR and kNN.
Feature generation using genetic programming with application to fault classification.
Guo, Hong; Jack, Lindsay B; Nandi, Asoke K
2005-02-01
One of the major challenges in pattern recognition problems is the feature extraction process which derives new features from existing features, or directly from raw data in order to reduce the cost of computation during the classification process, while improving classifier efficiency. Most current feature extraction techniques transform the original pattern vector into a new vector with increased discrimination capability but lower dimensionality. This is conducted within a predefined feature space, and thus, has limited searching power. Genetic programming (GP) can generate new features from the original dataset without prior knowledge of the probabilistic distribution. In this paper, a GP-based approach is developed for feature extraction from raw vibration data recorded from a rotating machine with six different conditions. The created features are then used as the inputs to a neural classifier for the identification of six bearing conditions. Experimental results demonstrate the ability of GP to discover autimatically the different bearing conditions using features expressed in the form of nonlinear functions. Furthermore, four sets of results--using GP extracted features with artificial neural networks (ANN) and support vector machines (SVM), as well as traditional features with ANN and SVM--have been obtained. This GP-based approach is used for bearing fault classification for the first time and exhibits superior searching power over other techniques. Additionaly, it significantly reduces the time for computation compared with genetic algorithm (GA), therefore, makes a more practical realization of the solution.
Moghaddasi, Hanie; Nourian, Saeed
2016-06-01
Heart disease is the major cause of death as well as a leading cause of disability in the developed countries. Mitral Regurgitation (MR) is a common heart disease which does not cause symptoms until its end stage. Therefore, early diagnosis of the disease is of crucial importance in the treatment process. Echocardiography is a common method of diagnosis in the severity of MR. Hence, a method which is based on echocardiography videos, image processing techniques and artificial intelligence could be helpful for clinicians, especially in borderline cases. In this paper, we introduce novel features to detect micro-patterns of echocardiography images in order to determine the severity of MR. Extensive Local Binary Pattern (ELBP) and Extensive Volume Local Binary Pattern (EVLBP) are presented as image descriptors which include details from different viewpoints of the heart in feature vectors. Support Vector Machine (SVM), Linear Discriminant Analysis (LDA) and Template Matching techniques are used as classifiers to determine the severity of MR based on textural descriptors. The SVM classifier with Extensive Uniform Local Binary Pattern (ELBPU) and Extensive Volume Local Binary Pattern (EVLBP) have the best accuracy with 99.52%, 99.38%, 99.31% and 99.59%, respectively, for the detection of Normal, Mild MR, Moderate MR and Severe MR subjects among echocardiography videos. The proposed method achieves 99.38% sensitivity and 99.63% specificity for the detection of the severity of MR and normal subjects. Copyright © 2016 Elsevier Ltd. All rights reserved.
Breast Cancer Recognition Using a Novel Hybrid Intelligent Method
Addeh, Jalil; Ebrahimzadeh, Ata
2012-01-01
Breast cancer is the second largest cause of cancer deaths among women. At the same time, it is also among the most curable cancer types if it can be diagnosed early. This paper presents a novel hybrid intelligent method for recognition of breast cancer tumors. The proposed method includes three main modules: the feature extraction module, the classifier module, and the optimization module. In the feature extraction module, fuzzy features are proposed as the efficient characteristic of the patterns. In the classifier module, because of the promising generalization capability of support vector machines (SVM), a SVM-based classifier is proposed. In support vector machine training, the hyperparameters have very important roles for its recognition accuracy. Therefore, in the optimization module, the bees algorithm (BA) is proposed for selecting appropriate parameters of the classifier. The proposed system is tested on Wisconsin Breast Cancer database and simulation results show that the recommended system has a high accuracy. PMID:23626945
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆
Cao, Houwei; Verma, Ragini; Nenkova, Ani
2014-01-01
We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion. PMID:25422534
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆
Cao, Houwei; Verma, Ragini; Nenkova, Ani
2015-01-01
We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features
Mohammad-Noori, Morteza; Beer, Michael A.
2014-01-01
Abstract Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. PMID:25033408
Enhanced regulatory sequence prediction using gapped k-mer features.
Ghandi, Mahmoud; Lee, Dongwon; Mohammad-Noori, Morteza; Beer, Michael A
2014-07-01
Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.
lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.
Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia
2015-01-01
Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.
Meier, Timothy B.; Desphande, Alok S.; Vergun, Svyatoslav; Nair, Veena A.; Song, Jie; Biswal, Bharat B.; Meyerand, Mary E.; Birn, Rasmus M.; Prabhakaran, Vivek
2012-01-01
Most of what is known about the reorganization of functional brain networks that accompanies normal aging is based on neuroimaging studies in which participants perform specific tasks. In these studies, reorganization is defined by the differences in task activation between young and old adults. However, task activation differences could be the result of differences in task performance, strategy, or motivation, and not necessarily reflect reorganization. Resting-state fMRI provides a method of investigating functional brain networks without such confounds. Here, a support vector machine (SVM) classifier was used in an attempt to differentiate older adults from younger adults based on their resting-state functional connectivity. In addition, the information used by the SVM was investigated to see what functional connections best differentiated younger adult brains from older adult brains. Three separate resting-state scans from 26 younger adults (18-35 yrs) and 26 older adults (55-85) were obtained from the International Consortium for Brain Mapping (ICBM) dataset made publically available in the 1000 Functional Connectomes project www.nitrc.org/projects/fcon_1000. 100 seed-regions from four functional networks with 5 mm3 radius were defined based on a recent study using machine learning classifiers on adolescent brains. Time-series for every seed-region were averaged and three matrices of z-transformed correlation coefficients were created for each subject corresponding to each individual’s three resting-state scans. SVM was then applied using leave-one-out cross-validation. The SVM classifier was 84% accurate in classifying older and younger adult brains. The majority of the connections used by the classifier to distinguish subjects by age came from seed-regions belonging to the sensorimotor and cingulo-opercular networks. These results suggest that age-related decreases in positive correlations within the cingulo-opercular and default networks, and decreases in negative correlations between the default and sensorimotor networks, are the distinguishing characteristics of age-related reorganization. PMID:22227886
Meier, Timothy B; Desphande, Alok S; Vergun, Svyatoslav; Nair, Veena A; Song, Jie; Biswal, Bharat B; Meyerand, Mary E; Birn, Rasmus M; Prabhakaran, Vivek
2012-03-01
Most of what is known about the reorganization of functional brain networks that accompanies normal aging is based on neuroimaging studies in which participants perform specific tasks. In these studies, reorganization is defined by the differences in task activation between young and old adults. However, task activation differences could be the result of differences in task performance, strategy, or motivation, and not necessarily reflect reorganization. Resting-state fMRI provides a method of investigating functional brain networks without such confounds. Here, a support vector machine (SVM) classifier was used in an attempt to differentiate older adults from younger adults based on their resting-state functional connectivity. In addition, the information used by the SVM was investigated to see what functional connections best differentiated younger adult brains from older adult brains. Three separate resting-state scans from 26 younger adults (18-35 yrs) and 26 older adults (55-85) were obtained from the International Consortium for Brain Mapping (ICBM) dataset made publically available in the 1000 Functional Connectomes project www.nitrc.org/projects/fcon_1000. 100 seed-regions from four functional networks with 5mm(3) radius were defined based on a recent study using machine learning classifiers on adolescent brains. Time-series for every seed-region were averaged and three matrices of z-transformed correlation coefficients were created for each subject corresponding to each individual's three resting-state scans. SVM was then applied using leave-one-out cross-validation. The SVM classifier was 84% accurate in classifying older and younger adult brains. The majority of the connections used by the classifier to distinguish subjects by age came from seed-regions belonging to the sensorimotor and cingulo-opercular networks. These results suggest that age-related decreases in positive correlations within the cingulo-opercular and default networks, and decreases in negative correlations between the default and sensorimotor networks, are the distinguishing characteristics of age-related reorganization. Copyright © 2011 Elsevier Inc. All rights reserved.
Design of Clinical Support Systems Using Integrated Genetic Algorithm and Support Vector Machine
NASA Astrophysics Data System (ADS)
Chen, Yung-Fu; Huang, Yung-Fa; Jiang, Xiaoyi; Hsu, Yuan-Nian; Lin, Hsuan-Hung
Clinical decision support system (CDSS) provides knowledge and specific information for clinicians to enhance diagnostic efficiency and improving healthcare quality. An appropriate CDSS can highly elevate patient safety, improve healthcare quality, and increase cost-effectiveness. Support vector machine (SVM) is believed to be superior to traditional statistical and neural network classifiers. However, it is critical to determine suitable combination of SVM parameters regarding classification performance. Genetic algorithm (GA) can find optimal solution within an acceptable time, and is faster than greedy algorithm with exhaustive searching strategy. By taking the advantage of GA in quickly selecting the salient features and adjusting SVM parameters, a method using integrated GA and SVM (IGS), which is different from the traditional method with GA used for feature selection and SVM for classification, was used to design CDSSs for prediction of successful ventilation weaning, diagnosis of patients with severe obstructive sleep apnea, and discrimination of different cell types form Pap smear. The results show that IGS is better than methods using SVM alone or linear discriminator.
Sriwastava, Brijesh Kumar; Basu, Subhadip; Maulik, Ujjwal
2015-10-01
Protein-protein interaction (PPI) site prediction aids to ascertain the interface residues that participate in interaction processes. Fuzzy support vector machine (F-SVM) is proposed as an effective method to solve this problem, and we have shown that the performance of the classical SVM can be enhanced with the help of an interaction-affinity based fuzzy membership function. The performances of both SVM and F-SVM on the PPI databases of the Homo sapiens and E. coli organisms are evaluated and estimated the statistical significance of the developed method over classical SVM and other fuzzy membership-based SVM methods available in the literature. Our membership function uses the residue-level interaction affinity scores for each pair of positive and negative sequence fragments. The average AUC scores in the 10-fold cross-validation experiments are measured as 79.94% and 80.48% for the Homo sapiens and E. coli organisms respectively. On the independent test datasets, AUC scores are obtained as 76.59% and 80.17% respectively for the two organisms. In almost all cases, the developed F-SVM method improves the performances obtained by the corresponding classical SVM and the other classifiers, available in the literature.
SVM Pixel Classification on Colour Image Segmentation
NASA Astrophysics Data System (ADS)
Barui, Subhrajit; Latha, S.; Samiappan, Dhanalakshmi; Muthu, P.
2018-04-01
The aim of image segmentation is to simplify the representation of an image with the help of cluster pixels into something meaningful to analyze. Segmentation is typically used to locate boundaries and curves in an image, precisely to label every pixel in an image to give each pixel an independent identity. SVM pixel classification on colour image segmentation is the topic highlighted in this paper. It holds useful application in the field of concept based image retrieval, machine vision, medical imaging and object detection. The process is accomplished step by step. At first we need to recognize the type of colour and the texture used as an input to the SVM classifier. These inputs are extracted via local spatial similarity measure model and Steerable filter also known as Gabon Filter. It is then trained by using FCM (Fuzzy C-Means). Both the pixel level information of the image and the ability of the SVM Classifier undergoes some sophisticated algorithm to form the final image. The method has a well developed segmented image and efficiency with respect to increased quality and faster processing of the segmented image compared with the other segmentation methods proposed earlier. One of the latest application result is the Light L16 camera.
Diagnosis of periodontal diseases using different classification algorithms: a preliminary study.
Ozden, F O; Özgönenel, O; Özden, B; Aydogdu, A
2015-01-01
The purpose of the proposed study was to develop an identification unit for classifying periodontal diseases using support vector machine (SVM), decision tree (DT), and artificial neural networks (ANNs). A total of 150 patients was divided into two groups such as training (100) and testing (50). The codes created for risk factors, periodontal data, and radiographically bone loss were formed as a matrix structure and regarded as inputs for the classification unit. A total of six periodontal conditions was the outputs of the classification unit. The accuracy of the suggested methods was compared according to their resolution and working time. DT and SVM were best to classify the periodontal diseases with a high accuracy according to the clinical research based on 150 patients. The performances of SVM and DT were found 98% with total computational time of 19.91 and 7.00 s, respectively. ANN had the worst correlation between input and output variable, and its performance was calculated as 46%. SVM and DT appeared to be sufficiently complex to reflect all the factors associated with the periodontal status, simple enough to be understandable and practical as a decision-making aid for prediction of periodontal disease.
Padma, A; Sukanesh, R
2013-01-01
A computer software system is designed for the segmentation and classification of benign from malignant tumour slices in brain computed tomography (CT) images. This paper presents a method to find and select both the dominant run length and co-occurrence texture features of region of interest (ROI) of the tumour region of each slice to be segmented by Fuzzy c means clustering (FCM) and evaluate the performance of support vector machine (SVM)-based classifiers in classifying benign and malignant tumour slices. Two hundred and six tumour confirmed CT slices are considered in this study. A total of 17 texture features are extracted by a feature extraction procedure, and six features are selected using Principal Component Analysis (PCA). This study constructed the SVM-based classifier with the selected features and by comparing the segmentation results with the experienced radiologist labelled ground truth (target). Quantitative analysis between ground truth and segmented tumour is presented in terms of segmentation accuracy, segmentation error and overlap similarity measures such as the Jaccard index. The classification performance of the SVM-based classifier with the same selected features is also evaluated using a 10-fold cross-validation method. The proposed system provides some newly found texture features have an important contribution in classifying benign and malignant tumour slices efficiently and accurately with less computational time. The experimental results showed that the proposed system is able to achieve the highest segmentation and classification accuracy effectiveness as measured by jaccard index and sensitivity and specificity.
Classification of burn wounds using support vector machines
NASA Astrophysics Data System (ADS)
Acha, Begona; Serrano, Carmen; Palencia, Sergio; Murillo, Juan Jose
2004-05-01
The purpose of this work is to improve a previous method developed by the authors for the classification of burn wounds into their depths. The inputs of the system are color and texture information, as these are the characteristics observed by physicians in order to give a diagnosis. Our previous work consisted in segmenting the burn wound from the rest of the image and classifying the burn into its depth. In this paper we focus on the classification problem only. We already proposed to use a Fuzzy-ARTMAP neural network (NN). However, we may take advantage of new powerful classification tools such as Support Vector Machines (SVM). We apply the five-folded cross validation scheme to divide the database into training and validating sets. Then, we apply a feature selection method for each classifier, which will give us the set of features that yields the smallest classification error for each classifier. Features used to classify are first-order statistical parameters extracted from the L*, u* and v* color components of the image. The feature selection algorithms used are the Sequential Forward Selection (SFS) and the Sequential Backward Selection (SBS) methods. As data of the problem faced here are not linearly separable, the SVM was trained using some different kernels. The validating process shows that the SVM method, when using a Gaussian kernel of variance 1, outperforms classification results obtained with the rest of the classifiers, yielding an error classification rate of 0.7% whereas the Fuzzy-ARTMAP NN attained 1.6 %.
NASA Astrophysics Data System (ADS)
Lee, Donghoon; Kim, Ye-seul; Choi, Sunghoon; Lee, Haenghwa; Jo, Byungdu; Choi, Seungyeon; Shin, Jungwook; Kim, Hee-Joung
2017-03-01
The chest digital tomosynthesis(CDT) is recently developed medical device that has several advantage for diagnosing lung disease. For example, CDT provides depth information with relatively low radiation dose compared to computed tomography (CT). However, a major problem with CDT is the image artifacts associated with data incompleteness resulting from limited angle data acquisition in CDT geometry. For this reason, the sensitivity of lung disease was not clear compared to CT. In this study, to improve sensitivity of lung disease detection in CDT, we developed computer aided diagnosis (CAD) systems based on machine learning. For design CAD systems, we used 100 cases of lung nodules cropped images and 100 cases of normal lesion cropped images acquired by lung man phantoms and proto type CDT. We used machine learning techniques based on support vector machine and Gabor filter. The Gabor filter was used for extracting characteristics of lung nodules and we compared performance of feature extraction of Gabor filter with various scale and orientation parameters. We used 3, 4, 5 scales and 4, 6, 8 orientations. After extracting features, support vector machine (SVM) was used for classifying feature of lesions. The linear, polynomial and Gaussian kernels of SVM were compared to decide the best SVM conditions for CDT reconstruction images. The results of CAD system with machine learning showed the capability of automatically lung lesion detection. Furthermore detection performance was the best when Gabor filter with 5 scale and 8 orientation and SVM with Gaussian kernel were used. In conclusion, our suggested CAD system showed improving sensitivity of lung lesion detection in CDT and decide Gabor filter and SVM conditions to achieve higher detection performance of our developed CAD system for CDT.
Lim, Dong Kyu; Long, Nguyen Phuoc; Mo, Changyeun; Dong, Ziyuan; Cui, Lingmei; Kim, Giyoung; Kwon, Sung Won
2017-10-01
The mixing of extraneous ingredients with original products is a common adulteration practice in food and herbal medicines. In particular, authenticity of white rice and its corresponding blended products has become a key issue in food industry. Accordingly, our current study aimed to develop and evaluate a novel discrimination method by combining targeted lipidomics with powerful supervised learning methods, and eventually introduce a platform to verify the authenticity of white rice. A total of 30 cultivars were collected, and 330 representative samples of white rice from Korea and China as well as seven mixing ratios were examined. Random forests (RF), support vector machines (SVM) with a radial basis function kernel, C5.0, model averaged neural network, and k-nearest neighbor classifiers were used for the classification. We achieved desired results, and the classifiers effectively differentiated white rice from Korea to blended samples with high prediction accuracy for the contamination ratio as low as five percent. In addition, RF and SVM classifiers were generally superior to and more robust than the other techniques. Our approach demonstrated that the relative differences in lysoGPLs can be successfully utilized to detect the adulterated mixing of white rice originating from different countries. In conclusion, the present study introduces a novel and high-throughput platform that can be applied to authenticate adulterated admixtures from original white rice samples. Copyright © 2017 Elsevier Ltd. All rights reserved.
Automated separation of merged Langerhans islets
NASA Astrophysics Data System (ADS)
Švihlík, Jan; Kybic, Jan; Habart, David
2016-03-01
This paper deals with separation of merged Langerhans islets in segmentations in order to evaluate correct histogram of islet diameters. A distribution of islet diameters is useful for determining the feasibility of islet transplantation in diabetes. First, the merged islets at training segmentations are manually separated by medical experts. Based on the single islets, the merged islets are identified and the SVM classifier is trained on both classes (merged/single islets). The testing segmentations were over-segmented using watershed transform and the most probable back merging of islets were found using trained SVM classifier. Finally, the optimized segmentation is compared with ground truth segmentation (correctly separated islets).
Jongin Kim; Boreom Lee
2017-07-01
The classification of neuroimaging data for the diagnosis of Alzheimer's Disease (AD) is one of the main research goals of the neuroscience and clinical fields. In this study, we performed extreme learning machine (ELM) classifier to discriminate the AD, mild cognitive impairment (MCI) from normal control (NC). We compared the performance of ELM with that of a linear kernel support vector machine (SVM) for 718 structural MRI images from Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The data consisted of normal control, MCI converter (MCI-C), MCI non-converter (MCI-NC), and AD. We employed SVM-based recursive feature elimination (RFE-SVM) algorithm to find the optimal subset of features. In this study, we found that the RFE-SVM feature selection approach in combination with ELM shows the superior classification accuracy to that of linear kernel SVM for structural T1 MRI data.
Human action recognition with group lasso regularized-support vector machine
NASA Astrophysics Data System (ADS)
Luo, Huiwu; Lu, Huanzhang; Wu, Yabei; Zhao, Fei
2016-05-01
The bag-of-visual-words (BOVW) and Fisher kernel are two popular models in human action recognition, and support vector machine (SVM) is the most commonly used classifier for the two models. We show two kinds of group structures in the feature representation constructed by BOVW and Fisher kernel, respectively, since the structural information of feature representation can be seen as a prior for the classifier and can improve the performance of the classifier, which has been verified in several areas. However, the standard SVM employs L2-norm regularization in its learning procedure, which penalizes each variable individually and cannot express the structural information of feature representation. We replace the L2-norm regularization with group lasso regularization in standard SVM, and a group lasso regularized-support vector machine (GLRSVM) is proposed. Then, we embed the group structural information of feature representation into GLRSVM. Finally, we introduce an algorithm to solve the optimization problem of GLRSVM by alternating directions method of multipliers. The experiments evaluated on KTH, YouTube, and Hollywood2 datasets show that our method achieves promising results and improves the state-of-the-art methods on KTH and YouTube datasets.
Cascianelli, Silvia; Scialpi, Michele; Amici, Serena; Forini, Nevio; Minestrini, Matteo; Fravolini, Mario Luca; Sinzinger, Helmut; Schillaci, Orazio; Palumbo, Barbara
2017-01-01
Artificial Intelligence (AI) is a very active Computer Science research field aiming to develop systems that mimic human intelligence and is helpful in many human activities, including Medicine. In this review we presented some examples of the exploiting of AI techniques, in particular automatic classifiers such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification Tree (ClT) and ensemble methods like Random Forest (RF), able to analyze findings obtained by positron emission tomography (PET) or single-photon emission tomography (SPECT) scans of patients with Neurodegenerative Diseases, in particular Alzheimer's Disease. We also focused our attention on techniques applied in order to preprocess data and reduce their dimensionality via feature selection or projection in a more representative domain (Principal Component Analysis - PCA - or Partial Least Squares - PLS - are examples of such methods); this is a crucial step while dealing with medical data, since it is necessary to compress patient information and retain only the most useful in order to discriminate subjects into normal and pathological classes. Main literature papers on the application of these techniques to classify patients with neurodegenerative disease extracting data from molecular imaging modalities are reported, showing that the increasing development of computer aided diagnosis systems is very promising to contribute to the diagnostic process.
Automatic classification of seismic events within a regional seismograph network
NASA Astrophysics Data System (ADS)
Tiira, Timo; Kortström, Jari; Uski, Marja
2015-04-01
A fully automatic method for seismic event classification within a sparse regional seismograph network is presented. The tool is based on a supervised pattern recognition technique, Support Vector Machine (SVM), trained here to distinguish weak local earthquakes from a bulk of human-made or spurious seismic events. The classification rules rely on differences in signal energy distribution between natural and artificial seismic sources. Seismic records are divided into four windows, P, P coda, S, and S coda. For each signal window STA is computed in 20 narrow frequency bands between 1 and 41 Hz. The 80 discrimination parameters are used as a training data for the SVM. The SVM models are calculated for 19 on-line seismic stations in Finland. The event data are compiled mainly from fully automatic event solutions that are manually classified after automatic location process. The station-specific SVM training events include 11-302 positive (earthquake) and 227-1048 negative (non-earthquake) examples. The best voting rules for combining results from different stations are determined during an independent testing period. Finally, the network processing rules are applied to an independent evaluation period comprising 4681 fully automatic event determinations, of which 98 % have been manually identified as explosions or noise and 2 % as earthquakes. The SVM method correctly identifies 94 % of the non-earthquakes and all the earthquakes. The results imply that the SVM tool can identify and filter out blasts and spurious events from fully automatic event solutions with a high level of confidence. The tool helps to reduce work-load in manual seismic analysis by leaving only ~5 % of the automatic event determinations, i.e. the probable earthquakes for more detailed seismological analysis. The approach presented is easy to adjust to requirements of a denser or wider high-frequency network, once enough training examples for building a station-specific data set are available.
A Features Selection for Crops Classification
NASA Astrophysics Data System (ADS)
Liu, Yifan; Shao, Luyi; Yin, Qiang; Hong, Wen
2016-08-01
The components of the polarimetric target decomposition reflect the differences of target since they linked with the scattering properties of the target and can be imported into SVM as the classification features. The result of decomposition usually concentrate on part of the components. Selecting a combination of components can reduce the features that importing into the SVM. The features reduction can lead to less calculation and targeted classification of one target when we classify a multi-class area. In this research, we import different combinations of features into the SVM and find a better combination for classification with a data of AGRISAR.
NASA Astrophysics Data System (ADS)
Adhi Pradana, Wisnu; Adiwijaya; Novia Wisesty, Untari
2018-03-01
Support Vector Machine or commonly called SVM is one method that can be used to process the classification of a data. SVM classifies data from 2 different classes with hyperplane. In this study, the system was built using SVM to develop Arabic Speech Recognition. In the development of the system, there are 2 kinds of speakers that have been tested that is dependent speakers and independent speakers. The results from this system is an accuracy of 85.32% for speaker dependent and 61.16% for independent speakers.
Classification of cardiovascular tissues using LBP based descriptors and a cascade SVM.
Mazo, Claudia; Alegre, Enrique; Trujillo, Maria
2017-08-01
Histological images have characteristics, such as texture, shape, colour and spatial structure, that permit the differentiation of each fundamental tissue and organ. Texture is one of the most discriminative features. The automatic classification of tissues and organs based on histology images is an open problem, due to the lack of automatic solutions when treating tissues without pathologies. In this paper, we demonstrate that it is possible to automatically classify cardiovascular tissues using texture information and Support Vector Machines (SVM). Additionally, we realised that it is feasible to recognise several cardiovascular organs following the same process. The texture of histological images was described using Local Binary Patterns (LBP), LBP Rotation Invariant (LBPri), Haralick features and different concatenations between them, representing in this way its content. Using a SVM with linear kernel, we selected the more appropriate descriptor that, for this problem, was a concatenation of LBP and LBPri. Due to the small number of the images available, we could not follow an approach based on deep learning, but we selected the classifier who yielded the higher performance by comparing SVM with Random Forest and Linear Discriminant Analysis. Once SVM was selected as the classifier with a higher area under the curve that represents both higher recall and precision, we tuned it evaluating different kernels, finding that a linear SVM allowed us to accurately separate four classes of tissues: (i) cardiac muscle of the heart, (ii) smooth muscle of the muscular artery, (iii) loose connective tissue, and (iv) smooth muscle of the large vein and the elastic artery. The experimental validation was conducted using 3000 blocks of 100 × 100 sized pixels, with 600 blocks per class and the classification was assessed using a 10-fold cross-validation. using LBP as the descriptor, concatenated with LBPri and a SVM with linear kernel, the main four classes of tissues were recognised with an AUC higher than 0.98. A polynomial kernel was then used to separate the elastic artery and vein, yielding an AUC in both cases superior to 0.98. Following the proposed approach, it is possible to separate with very high precision (AUC greater than 0.98) the fundamental tissues of the cardiovascular system along with some organs, such as the heart, arteries and veins. Copyright © 2017 Elsevier B.V. All rights reserved.
Mirsky, Simcha K; Barnea, Itay; Levi, Mattan; Greenspan, Hayit; Shaked, Natan T
2017-09-01
Currently, the delicate process of selecting sperm cells to be used for in vitro fertilization (IVF) is still based on the subjective, qualitative analysis of experienced clinicians using non-quantitative optical microscopy techniques. In this work, a method was developed for the automated analysis of sperm cells based on the quantitative phase maps acquired through use of interferometric phase microscopy (IPM). Over 1,400 human sperm cells from 8 donors were imaged using IPM, and an algorithm was designed to digitally isolate sperm cell heads from the quantitative phase maps while taking into consideration both the cell 3D morphology and contents, as well as acquire features describing sperm head morphology. A subset of these features was used to train a support vector machine (SVM) classifier to automatically classify sperm of good and bad morphology. The SVM achieves an area under the receiver operating characteristic curve of 88.59% and an area under the precision-recall curve of 88.67%, as well as precisions of 90% or higher. We believe that our automatic analysis can become the basis for objective and automatic sperm cell selection in IVF. © 2017 International Society for Advancement of Cytometry. © 2017 International Society for Advancement of Cytometry.
Aksu, Yaman; Miller, David J; Kesidis, George; Yang, Qing X
2010-05-01
Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature "markers." For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both improved margin and improved generalization, compared with RFE. Moreover, for the case of a nonlinear kernel, we show that RFE assumes that the squared weight vector 2-norm is strictly decreasing as features are eliminated. We demonstrate this is not true for the Gaussian kernel and, consequently, RFE may give poor results in this case. MFE for nonlinear kernels gives better margin and generalization. We also present an extension which achieves further margin gains, by optimizing only two degrees of freedom--the hyperplane's intercept and its squared 2-norm--with the weight vector orientation fixed. We finally introduce an extension that allows margin slackness. We compare against several alternatives, including RFE and a linear programming method that embeds feature selection within the classifier design. On high-dimensional gene microarray data sets, University of California at Irvine (UCI) repository data sets, and Alzheimer's disease brain image data, MFE methods give promising results.
Efficacy Evaluation of Different Wavelet Feature Extraction Methods on Brain MRI Tumor Detection
NASA Astrophysics Data System (ADS)
Nabizadeh, Nooshin; John, Nigel; Kubat, Miroslav
2014-03-01
Automated Magnetic Resonance Imaging brain tumor detection and segmentation is a challenging task. Among different available methods, feature-based methods are very dominant. While many feature extraction techniques have been employed, it is still not quite clear which of feature extraction methods should be preferred. To help improve the situation, we present the results of a study in which we evaluate the efficiency of using different wavelet transform features extraction methods in brain MRI abnormality detection. Applying T1-weighted brain image, Discrete Wavelet Transform (DWT), Discrete Wavelet Packet Transform (DWPT), Dual Tree Complex Wavelet Transform (DTCWT), and Complex Morlet Wavelet Transform (CMWT) methods are applied to construct the feature pool. Three various classifiers as Support Vector Machine, K Nearest Neighborhood, and Sparse Representation-Based Classifier are applied and compared for classifying the selected features. The results show that DTCWT and CMWT features classified with SVM, result in the highest classification accuracy, proving of capability of wavelet transform features to be informative in this application.
NASA Astrophysics Data System (ADS)
Li, Hai; Kumavor, Patrick; Salman Alqasemi, Umar; Zhu, Quing
2015-01-01
A composite set of ovarian tissue features extracted from photoacoustic spectral data, beam envelope, and co-registered ultrasound and photoacoustic images are used to characterize malignant and normal ovaries using logistic and support vector machine (SVM) classifiers. Normalized power spectra were calculated from the Fourier transform of the photoacoustic beamformed data, from which the spectral slopes and 0-MHz intercepts were extracted. Five features were extracted from the beam envelope and another 10 features were extracted from the photoacoustic images. These 17 features were ranked by their p-values from t-tests on which a filter type of feature selection method was used to determine the optimal feature number for final classification. A total of 169 samples from 19 ex vivo ovaries were randomly distributed into training and testing groups. Both classifiers achieved a minimum value of the mean misclassification error when the seven features with lowest p-values were selected. Using these seven features, the logistic and SVM classifiers obtained sensitivities of 96.39±3.35% and 97.82±2.26%, and specificities of 98.92±1.39% and 100%, respectively, for the training group. For the testing group, logistic and SVM classifiers achieved sensitivities of 92.71±3.55% and 92.64±3.27%, and specificities of 87.52±8.78% and 98.49±2.05%, respectively.
Marschner, C B; Kokla, M; Amigo, J M; Rozanski, E A; Wiinberg, B; McEvoy, F J
2017-07-11
Diagnosis of pulmonary thromboembolism (PTE) in dogs relies on computed tomography pulmonary angiography (CTPA), but detailed interpretation of CTPA images is demanding for the radiologist and only large vessels may be evaluated. New approaches for better detection of smaller thrombi include dual energy computed tomography (DECT) as well as computer assisted diagnosis (CAD) techniques. The purpose of this study was to investigate the performance of quantitative texture analysis for detecting dogs with PTE using grey-level co-occurrence matrices (GLCM) and multivariate statistical classification analyses. CT images from healthy (n = 6) and diseased (n = 29) dogs with and without PTE confirmed on CTPA were segmented so that only tissue with CT numbers between -1024 and -250 Houndsfield Units (HU) was preserved. GLCM analysis and subsequent multivariate classification analyses were performed on texture parameters extracted from these images. Leave-one-dog-out cross validation and receiver operator characteristic (ROC) showed that the models generated from the texture analysis were able to predict healthy dogs with optimal levels of performance. Partial Least Square Discriminant Analysis (PLS-DA) obtained a sensitivity of 94% and a specificity of 96%, while Support Vector Machines (SVM) yielded a sensitivity of 99% and a specificity of 100%. The models, however, performed worse in classifying the type of disease in the diseased dog group: In diseased dogs with PTE sensitivities were 30% (PLS-DA) and 38% (SVM), and specificities were 80% (PLS-DA) and 89% (SVM). In diseased dogs without PTE the sensitivities of the models were 59% (PLS-DA) and 79% (SVM) and specificities were 79% (PLS-DA) and 82% (SVM). The results indicate that texture analysis of CTPA images using GLCM is an effective tool for distinguishing healthy from abnormal lung. Furthermore the texture of pulmonary parenchyma in dogs with PTE is altered, when compared to the texture of pulmonary parenchyma of healthy dogs. The models' poorer performance in classifying dogs within the diseased group, may be related to the low number of dogs compared to texture variables, a lack of balanced number of dogs within each group or a real lack of difference in the texture features among the diseased dogs.
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2008-12-01
Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.
Automated Tissue Classification Framework for Reproducible Chronic Wound Assessment
Mukherjee, Rashmi; Manohar, Dhiraj Dhane; Das, Dev Kumar; Achar, Arun; Mitra, Analava; Chakraborty, Chandan
2014-01-01
The aim of this paper was to develop a computer assisted tissue classification (granulation, necrotic, and slough) scheme for chronic wound (CW) evaluation using medical image processing and statistical machine learning techniques. The red-green-blue (RGB) wound images grabbed by normal digital camera were first transformed into HSI (hue, saturation, and intensity) color space and subsequently the “S” component of HSI color channels was selected as it provided higher contrast. Wound areas from 6 different types of CW were segmented from whole images using fuzzy divergence based thresholding by minimizing edge ambiguity. A set of color and textural features describing granulation, necrotic, and slough tissues in the segmented wound area were extracted using various mathematical techniques. Finally, statistical learning algorithms, namely, Bayesian classification and support vector machine (SVM), were trained and tested for wound tissue classification in different CW images. The performance of the wound area segmentation protocol was further validated by ground truth images labeled by clinical experts. It was observed that SVM with 3rd order polynomial kernel provided the highest accuracies, that is, 86.94%, 90.47%, and 75.53%, for classifying granulation, slough, and necrotic tissues, respectively. The proposed automated tissue classification technique achieved the highest overall accuracy, that is, 87.61%, with highest kappa statistic value (0.793). PMID:25114925
NASA Astrophysics Data System (ADS)
Liu, Di; Mishra, Ashok K.; Yu, Zhongbo
2016-07-01
This paper examines the combination of support vector machines (SVM) and the dual ensemble Kalman filter (EnKF) technique to estimate root zone soil moisture at different soil layers up to 100 cm depth. Multiple experiments are conducted in a data rich environment to construct and validate the SVM model and to explore the effectiveness and robustness of the EnKF technique. It was observed that the performance of SVM relies more on the initial length of training set than other factors (e.g., cost function, regularization parameter, and kernel parameters). The dual EnKF technique proved to be efficient to improve SVM with observed data either at each time step or at a flexible time steps. The EnKF technique can reach its maximum efficiency when the updating ensemble size approaches a certain threshold. It was observed that the SVM model performance for the multi-layer soil moisture estimation can be influenced by the rainfall magnitude (e.g., dry and wet spells).
Classification of Sporting Activities Using Smartphone Accelerometers
Mitchell, Edmond; Monaghan, David; O'Connor, Noel E.
2013-01-01
In this paper we present a framework that allows for the automatic identification of sporting activities using commonly available smartphones. We extract discriminative informational features from smartphone accelerometers using the Discrete Wavelet Transform (DWT). Despite the poor quality of their accelerometers, smartphones were used as capture devices due to their prevalence in today's society. Successful classification on this basis potentially makes the technology accessible to both elite and non-elite athletes. Extracted features are used to train different categories of classifiers. No one classifier family has a reportable direct advantage in activity classification problems to date; thus we examine classifiers from each of the most widely used classifier families. We investigate three classification approaches; a commonly used SVM-based approach, an optimized classification model and a fusion of classifiers. We also investigate the effect of changing several of the DWT input parameters, including mother wavelets, window lengths and DWT decomposition levels. During the course of this work we created a challenging sports activity analysis dataset, comprised of soccer and field-hockey activities. The average maximum F-measure accuracy of 87% was achieved using a fusion of classifiers, which was 6% better than a single classifier model and 23% better than a standard SVM approach. PMID:23604031
NASA Astrophysics Data System (ADS)
Li, S. X.; Zhang, Y. J.; Zeng, Q. Y.; Li, L. F.; Guo, Z. Y.; Liu, Z. M.; Xiong, H. L.; Liu, S. H.
2014-06-01
Cancer is the most common disease to threaten human health. The ability to screen individuals with malignant tumours with only a blood sample would be greatly advantageous to early diagnosis and intervention. This study explores the possibility of discriminating between cancer patients and normal subjects with serum surface-enhanced Raman spectroscopy (SERS) and a support vector machine (SVM) through a peripheral blood sample. A total of 130 blood samples were obtained from patients with liver cancer, colonic cancer, esophageal cancer, nasopharyngeal cancer, gastric cancer, as well as 113 blood samples from normal volunteers. Several diagnostic models were built with the serum SERS spectra using SVM and principal component analysis (PCA) techniques. The results show that a diagnostic accuracy of 85.5% is acquired with a PCA algorithm, while a diagnostic accuracy of 95.8% is obtained using radial basis function (RBF), PCA-SVM methods. The results prove that a RBF kernel PCA-SVM technique is superior to PCA and conventional SVM (C-SVM) algorithms in classification serum SERS spectra. The study demonstrates that serum SERS, in combination with SVM techniques, has great potential for screening cancerous patients with any solid malignant tumour through a peripheral blood sample.
Tamburro, Gabriella; Fiedler, Patrique; Stone, David; Haueisen, Jens; Comani, Silvia
2018-01-01
EEG may be affected by artefacts hindering the analysis of brain signals. Data-driven methods like independent component analysis (ICA) are successful approaches to remove artefacts from the EEG. However, the ICA-based methods developed so far are often affected by limitations, such as: the need for visual inspection of the separated independent components (subjectivity problem) and, in some cases, for the independent and simultaneous recording of the inspected artefacts to identify the artefactual independent components; a potentially heavy manipulation of the EEG signals; the use of linear classification methods; the use of simulated artefacts to validate the methods; no testing in dry electrode or high-density EEG datasets; applications limited to specific conditions and electrode layouts. Our fingerprint method automatically identifies EEG ICs containing eyeblinks, eye movements, myogenic artefacts and cardiac interference by evaluating 14 temporal, spatial, spectral, and statistical features composing the IC fingerprint. Sixty-two real EEG datasets containing cued artefacts are recorded with wet and dry electrodes (128 wet and 97 dry channels). For each artefact, 10 nonlinear SVM classifiers are trained on fingerprints of expert-classified ICs. Training groups include randomly chosen wet and dry datasets decomposed in 80 ICs. The classifiers are tested on the IC-fingerprints of different datasets decomposed into 20, 50, or 80 ICs. The SVM performance is assessed in terms of accuracy, False Omission Rate (FOR), Hit Rate (HR), False Alarm Rate (FAR), and sensitivity ( p ). For each artefact, the quality of the artefact-free EEG reconstructed using the classification of the best SVM is assessed by visual inspection and SNR. The best SVM classifier for each artefact type achieved average accuracy of 1 (eyeblink), 0.98 (cardiac interference), and 0.97 (eye movement and myogenic artefact). Average classification sensitivity (p) was 1 (eyeblink), 0.997 (myogenic artefact), 0.98 (eye movement), and 0.48 (cardiac interference). Average artefact reduction ranged from a maximum of 82% for eyeblinks to a minimum of 33% for cardiac interference, depending on the effectiveness of the proposed method and the amplitude of the removed artefact. The performance of the SVM classifiers did not depend on the electrode type, whereas it was better for lower decomposition levels (50 and 20 ICs). Apart from cardiac interference, SVM performance and average artefact reduction indicate that the fingerprint method has an excellent overall performance in the automatic detection of eyeblinks, eye movements and myogenic artefacts, which is comparable to that of existing methods. Being also independent from simultaneous artefact recording, electrode number, type and layout, and decomposition level, the proposed fingerprint method can have useful applications in clinical and experimental EEG settings.
Tamburro, Gabriella; Fiedler, Patrique; Stone, David; Haueisen, Jens
2018-01-01
Background EEG may be affected by artefacts hindering the analysis of brain signals. Data-driven methods like independent component analysis (ICA) are successful approaches to remove artefacts from the EEG. However, the ICA-based methods developed so far are often affected by limitations, such as: the need for visual inspection of the separated independent components (subjectivity problem) and, in some cases, for the independent and simultaneous recording of the inspected artefacts to identify the artefactual independent components; a potentially heavy manipulation of the EEG signals; the use of linear classification methods; the use of simulated artefacts to validate the methods; no testing in dry electrode or high-density EEG datasets; applications limited to specific conditions and electrode layouts. Methods Our fingerprint method automatically identifies EEG ICs containing eyeblinks, eye movements, myogenic artefacts and cardiac interference by evaluating 14 temporal, spatial, spectral, and statistical features composing the IC fingerprint. Sixty-two real EEG datasets containing cued artefacts are recorded with wet and dry electrodes (128 wet and 97 dry channels). For each artefact, 10 nonlinear SVM classifiers are trained on fingerprints of expert-classified ICs. Training groups include randomly chosen wet and dry datasets decomposed in 80 ICs. The classifiers are tested on the IC-fingerprints of different datasets decomposed into 20, 50, or 80 ICs. The SVM performance is assessed in terms of accuracy, False Omission Rate (FOR), Hit Rate (HR), False Alarm Rate (FAR), and sensitivity (p). For each artefact, the quality of the artefact-free EEG reconstructed using the classification of the best SVM is assessed by visual inspection and SNR. Results The best SVM classifier for each artefact type achieved average accuracy of 1 (eyeblink), 0.98 (cardiac interference), and 0.97 (eye movement and myogenic artefact). Average classification sensitivity (p) was 1 (eyeblink), 0.997 (myogenic artefact), 0.98 (eye movement), and 0.48 (cardiac interference). Average artefact reduction ranged from a maximum of 82% for eyeblinks to a minimum of 33% for cardiac interference, depending on the effectiveness of the proposed method and the amplitude of the removed artefact. The performance of the SVM classifiers did not depend on the electrode type, whereas it was better for lower decomposition levels (50 and 20 ICs). Discussion Apart from cardiac interference, SVM performance and average artefact reduction indicate that the fingerprint method has an excellent overall performance in the automatic detection of eyeblinks, eye movements and myogenic artefacts, which is comparable to that of existing methods. Being also independent from simultaneous artefact recording, electrode number, type and layout, and decomposition level, the proposed fingerprint method can have useful applications in clinical and experimental EEG settings. PMID:29492336
Applying six classifiers to airborne hyperspectral imagery for detecting giant reed
USDA-ARS?s Scientific Manuscript database
This study evaluated and compared six different image classifiers, including minimum distance (MD), Mahalanobis distance (MAHD), maximum likelihood (ML), spectral angle mapper (SAM), mixture tuned matched filtering (MTMF) and support vector machine (SVM), for detecting and mapping giant reed (Arundo...
Lynch, Chip M; Abdollahi, Behnaz; Fuqua, Joshua D; de Carlo, Alexandra R; Bartholomai, James A; Balgemann, Rayeanne N; van Berkel, Victor H; Frieboes, Hermann B
2017-12-01
Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods. Copyright © 2017 Elsevier B.V. All rights reserved.
Automatic epileptic seizure detection in EEGs using MF-DFA, SVM based on cloud computing.
Zhang, Zhongnan; Wen, Tingxi; Huang, Wei; Wang, Meihong; Li, Chunfeng
2017-01-01
Epilepsy is a chronic disease with transient brain dysfunction that results from the sudden abnormal discharge of neurons in the brain. Since electroencephalogram (EEG) is a harmless and noninvasive detection method, it plays an important role in the detection of neurological diseases. However, the process of analyzing EEG to detect neurological diseases is often difficult because the brain electrical signals are random, non-stationary and nonlinear. In order to overcome such difficulty, this study aims to develop a new computer-aided scheme for automatic epileptic seizure detection in EEGs based on multi-fractal detrended fluctuation analysis (MF-DFA) and support vector machine (SVM). New scheme first extracts features from EEG by MF-DFA during the first stage. Then, the scheme applies a genetic algorithm (GA) to calculate parameters used in SVM and classify the training data according to the selected features using SVM. Finally, the trained SVM classifier is exploited to detect neurological diseases. The algorithm utilizes MLlib from library of SPARK and runs on cloud platform. Applying to a public dataset for experiment, the study results show that the new feature extraction method and scheme can detect signals with less features and the accuracy of the classification reached up to 99%. MF-DFA is a promising approach to extract features for analyzing EEG, because of its simple algorithm procedure and less parameters. The features obtained by MF-DFA can represent samples as well as traditional wavelet transform and Lyapunov exponents. GA can always find useful parameters for SVM with enough execution time. The results illustrate that the classification model can achieve comparable accuracy, which means that it is effective in epileptic seizure detection.
Face recognition using total margin-based adaptive fuzzy support vector machines.
Liu, Yi-Hung; Chen, Yen-Ting
2007-01-01
This paper presents a new classifier called total margin-based adaptive fuzzy support vector machines (TAF-SVM) that deals with several problems that may occur in support vector machines (SVMs) when applied to the face recognition. The proposed TAF-SVM not only solves the overfitting problem resulted from the outlier with the approach of fuzzification of the penalty, but also corrects the skew of the optimal separating hyperplane due to the very imbalanced data sets by using different cost algorithm. In addition, by introducing the total margin algorithm to replace the conventional soft margin algorithm, a lower generalization error bound can be obtained. Those three functions are embodied into the traditional SVM so that the TAF-SVM is proposed and reformulated in both linear and nonlinear cases. By using two databases, the Chung Yuan Christian University (CYCU) multiview and the facial recognition technology (FERET) face databases, and using the kernel Fisher's discriminant analysis (KFDA) algorithm to extract discriminating face features, experimental results show that the proposed TAF-SVM is superior to SVM in terms of the face-recognition accuracy. The results also indicate that the proposed TAF-SVM can achieve smaller error variances than SVM over a number of tests such that better recognition stability can be obtained.
Wen, Tingxi; Zhang, Zhongnan; Qiu, Ming; Zeng, Ming; Luo, Weizhen
2017-01-01
The computer mouse is an important human-computer interaction device. But patients with physical finger disability are unable to operate this device. Surface EMG (sEMG) can be monitored by electrodes on the skin surface and is a reflection of the neuromuscular activities. Therefore, we can control limbs auxiliary equipment by utilizing sEMG classification in order to help the physically disabled patients to operate the mouse. To develop a new a method to extract sEMG generated by finger motion and apply novel features to classify sEMG. A window-based data acquisition method was presented to extract signal samples from sEMG electordes. Afterwards, a two-dimensional matrix image based feature extraction method, which differs from the classical methods based on time domain or frequency domain, was employed to transform signal samples to feature maps used for classification. In the experiments, sEMG data samples produced by the index and middle fingers at the click of a mouse button were separately acquired. Then, characteristics of the samples were analyzed to generate a feature map for each sample. Finally, the machine learning classification algorithms (SVM, KNN, RBF-NN) were employed to classify these feature maps on a GPU. The study demonstrated that all classifiers can identify and classify sEMG samples effectively. In particular, the accuracy of the SVM classifier reached up to 100%. The signal separation method is a convenient, efficient and quick method, which can effectively extract the sEMG samples produced by fingers. In addition, unlike the classical methods, the new method enables to extract features by enlarging sample signals' energy appropriately. The classical machine learning classifiers all performed well by using these features.
Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.
Sriwastava, Brijesh K; Basu, Subhadip; Maulik, Ujjwal
2015-01-01
Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.
Bascil, M Serdar; Tesneli, Ahmet Y; Temurtas, Feyzullah
2016-09-01
Brain computer interface (BCI) is a new communication way between man and machine. It identifies mental task patterns stored in electroencephalogram (EEG). So, it extracts brain electrical activities recorded by EEG and transforms them machine control commands. The main goal of BCI is to make available assistive environmental devices for paralyzed people such as computers and makes their life easier. This study deals with feature extraction and mental task pattern recognition on 2-D cursor control from EEG as offline analysis approach. The hemispherical power density changes are computed and compared on alpha-beta frequency bands with only mental imagination of cursor movements. First of all, power spectral density (PSD) features of EEG signals are extracted and high dimensional data reduced by principle component analysis (PCA) and independent component analysis (ICA) which are statistical algorithms. In the last stage, all features are classified with two types of support vector machine (SVM) which are linear and least squares (LS-SVM) and three different artificial neural network (ANN) structures which are learning vector quantization (LVQ), multilayer neural network (MLNN) and probabilistic neural network (PNN) and mental task patterns are successfully identified via k-fold cross validation technique.
NASA Astrophysics Data System (ADS)
Kestur, Ramesh; Farooq, Shariq; Abdal, Rameen; Mehraj, Emad; Narasipura, Omkar; Mudigere, Meenavathi
2018-01-01
Road extraction in imagery acquired by low altitude remote sensing (LARS) carried out using an unmanned aerial vehicle (UAV) is presented. LARS is carried out using a fixed wing UAV with a high spatial resolution vision spectrum (RGB) camera as the payload. Deep learning techniques, particularly fully convolutional network (FCN), are adopted to extract roads by dense semantic segmentation. The proposed model, UFCN (U-shaped FCN) is an FCN architecture, which is comprised of a stack of convolutions followed by corresponding stack of mirrored deconvolutions with the usage of skip connections in between for preserving the local information. The limited dataset (76 images and their ground truths) is subjected to real-time data augmentation during training phase to increase the size effectively. Classification performance is evaluated using precision, recall, accuracy, F1 score, and brier score parameters. The performance is compared with support vector machine (SVM) classifier, a one-dimensional convolutional neural network (1D-CNN) model, and a standard two-dimensional CNN (2D-CNN). The UFCN model outperforms the SVM, 1D-CNN, and 2D-CNN models across all the performance parameters. Further, the prediction time of the proposed UFCN model is comparable with SVM, 1D-CNN, and 2D-CNN models.
Multi-class SVM model for fMRI-based classification and grading of liver fibrosis
NASA Astrophysics Data System (ADS)
Freiman, M.; Sela, Y.; Edrei, Y.; Pappo, O.; Joskowicz, L.; Abramovitch, R.
2010-03-01
We present a novel non-invasive automatic method for the classification and grading of liver fibrosis from fMRI maps based on hepatic hemodynamic changes. This method automatically creates a model for liver fibrosis grading based on training datasets. Our supervised learning method evaluates hepatic hemodynamics from an anatomical MRI image and three T2*-W fMRI signal intensity time-course scans acquired during the breathing of air, air-carbon dioxide, and carbogen. It constructs a statistical model of liver fibrosis from these fMRI scans using a binary-based one-against-all multi class Support Vector Machine (SVM) classifier. We evaluated the resulting classification model with the leave-one out technique and compared it to both full multi-class SVM and K-Nearest Neighbor (KNN) classifications. Our experimental study analyzed 57 slice sets from 13 mice, and yielded a 98.2% separation accuracy between healthy and low grade fibrotic subjects, and an overall accuracy of 84.2% for fibrosis grading. These results are better than the existing image-based methods which can only discriminate between healthy and high grade fibrosis subjects. With appropriate extensions, our method may be used for non-invasive classification and progression monitoring of liver fibrosis in human patients instead of more invasive approaches, such as biopsy or contrast-enhanced imaging.
Watson, Robert A
2014-08-01
To test the hypothesis that machine learning algorithms increase the predictive power to classify surgical expertise using surgeons' hand motion patterns. In 2012 at the University of North Carolina at Chapel Hill, 14 surgical attendings and 10 first- and second-year surgical residents each performed two bench model venous anastomoses. During the simulated tasks, the participants wore an inertial measurement unit on the dorsum of their dominant (right) hand to capture their hand motion patterns. The pattern from each bench model task performed was preprocessed into a symbolic time series and labeled as expert (attending) or novice (resident). The labeled hand motion patterns were processed and used to train a Support Vector Machine (SVM) classification algorithm. The trained algorithm was then tested for discriminative/predictive power against unlabeled (blinded) hand motion patterns from tasks not used in the training. The Lempel-Ziv (LZ) complexity metric was also measured from each hand motion pattern, with an optimal threshold calculated to separately classify the patterns. The LZ metric classified unlabeled (blinded) hand motion patterns into expert and novice groups with an accuracy of 70% (sensitivity 64%, specificity 80%). The SVM algorithm had an accuracy of 83% (sensitivity 86%, specificity 80%). The results confirmed the hypothesis. The SVM algorithm increased the predictive power to classify blinded surgical hand motion patterns into expert versus novice groups. With further development, the system used in this study could become a viable tool for low-cost, objective assessment of procedural proficiency in a competency-based curriculum.
Balanced VS Imbalanced Training Data: Classifying Rapideye Data with Support Vector Machines
NASA Astrophysics Data System (ADS)
Ustuner, M.; Sanli, F. B.; Abdikan, S.
2016-06-01
The accuracy of supervised image classification is highly dependent upon several factors such as the design of training set (sample selection, composition, purity and size), resolution of input imagery and landscape heterogeneity. The design of training set is still a challenging issue since the sensitivity of classifier algorithm at learning stage is different for the same dataset. In this paper, the classification of RapidEye imagery with balanced and imbalanced training data for mapping the crop types was addressed. Classification with imbalanced training data may result in low accuracy in some scenarios. Support Vector Machines (SVM), Maximum Likelihood (ML) and Artificial Neural Network (ANN) classifications were implemented here to classify the data. For evaluating the influence of the balanced and imbalanced training data on image classification algorithms, three different training datasets were created. Two different balanced datasets which have 70 and 100 pixels for each class of interest and one imbalanced dataset in which each class has different number of pixels were used in classification stage. Results demonstrate that ML and NN classifications are affected by imbalanced training data in resulting a reduction in accuracy (from 90.94% to 85.94% for ML and from 91.56% to 88.44% for NN) while SVM is not affected significantly (from 94.38% to 94.69%) and slightly improved. Our results highlighted that SVM is proven to be a very robust, consistent and effective classifier as it can perform very well under balanced and imbalanced training data situations. Furthermore, the training stage should be precisely and carefully designed for the need of adopted classifier.
Connolly, Brian; Matykiewicz, Pawel; Bretonnel Cohen, K; Standridge, Shannon M; Glauser, Tracy A; Dlugos, Dennis J; Koh, Susan; Tham, Eric; Pestian, John
2014-01-01
The constant progress in computational linguistic methods provides amazing opportunities for discovering information in clinical text and enables the clinical scientist to explore novel approaches to care. However, these new approaches need evaluation. We describe an automated system to compare descriptions of epilepsy patients at three different organizations: Cincinnati Children's Hospital, the Children's Hospital Colorado, and the Children's Hospital of Philadelphia. To our knowledge, there have been no similar previous studies. In this work, a support vector machine (SVM)-based natural language processing (NLP) algorithm is trained to classify epilepsy progress notes as belonging to a patient with a specific type of epilepsy from a particular hospital. The same SVM is then used to classify notes from another hospital. Our null hypothesis is that an NLP algorithm cannot be trained using epilepsy-specific notes from one hospital and subsequently used to classify notes from another hospital better than a random baseline classifier. The hypothesis is tested using epilepsy progress notes from the three hospitals. We are able to reject the null hypothesis at the 95% level. It is also found that classification was improved by including notes from a second hospital in the SVM training sample. With a reasonably uniform epilepsy vocabulary and an NLP-based algorithm able to use this uniformity to classify epilepsy progress notes across different hospitals, we can pursue automated comparisons of patient conditions, treatments, and diagnoses across different healthcare settings. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Exploring the CAESAR database using dimensionality reduction techniques
NASA Astrophysics Data System (ADS)
Mendoza-Schrock, Olga; Raymer, Michael L.
2012-06-01
The Civilian American and European Surface Anthropometry Resource (CAESAR) database containing over 40 anthropometric measurements on over 4000 humans has been extensively explored for pattern recognition and classification purposes using the raw, original data [1-4]. However, some of the anthropometric variables would be impossible to collect in an uncontrolled environment. Here, we explore the use of dimensionality reduction methods in concert with a variety of classification algorithms for gender classification using only those variables that are readily observable in an uncontrolled environment. Several dimensionality reduction techniques are employed to learn the underlining structure of the data. These techniques include linear projections such as the classical Principal Components Analysis (PCA) and non-linear (manifold learning) techniques, such as Diffusion Maps and the Isomap technique. This paper briefly describes all three techniques, and compares three different classifiers, Naïve Bayes, Adaboost, and Support Vector Machines (SVM), for gender classification in conjunction with each of these three dimensionality reduction approaches.
Online Least Squares One-Class Support Vector Machines-Based Abnormal Visual Event Detection
Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem
2013-01-01
The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method. PMID:24351629
Online least squares one-class support vector machines-based abnormal visual event detection.
Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem
2013-12-12
The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method.
Automated analysis of brain activity for seizure detection in zebrafish models of epilepsy.
Hunyadi, Borbála; Siekierska, Aleksandra; Sourbron, Jo; Copmans, Daniëlle; de Witte, Peter A M
2017-08-01
Epilepsy is a chronic neurological condition, with over 30% of cases unresponsive to treatment. Zebrafish larvae show great potential to serve as an animal model of epilepsy in drug discovery. Thanks to their high fecundity and relatively low cost, they are amenable to high-throughput screening. However, the assessment of seizure occurrences in zebrafish larvae remains a bottleneck, as visual analysis is subjective and time-consuming. For the first time, we present an automated algorithm to detect epileptic discharges in single-channel local field potential (LFP) recordings in zebrafish. First, candidate seizure segments are selected based on their energy and length. Afterwards, discriminative features are extracted from each segment. Using a labeled dataset, a support vector machine (SVM) classifier is trained to learn an optimal feature mapping. Finally, this SVM classifier is used to detect seizure segments in new signals. We tested the proposed algorithm both in a chemically-induced seizure model and a genetic epilepsy model. In both cases, the algorithm delivered similar results to visual analysis and found a significant difference in number of seizures between the epileptic and control group. Direct comparison with multichannel techniques or methods developed for different animal models is not feasible. Nevertheless, a literature review shows that our algorithm outperforms state-of-the-art techniques in terms of accuracy, precision and specificity, while maintaining a reasonable sensitivity. Our seizure detection system is a generic, time-saving and objective method to analyze zebrafish LPF, which can replace visual analysis and facilitate true high-throughput studies. Copyright © 2017 Elsevier B.V. All rights reserved.
Liu, Changhong; Liu, Wei; Lu, Xuzhong; Chen, Wei; Yang, Jianbo; Zheng, Lei
2014-06-15
Crop-to-crop transgene flow may affect the seed purity of non-transgenic rice varieties, resulting in unwanted biosafety consequences. The feasibility of a rapid and nondestructive determination of transgenic rice seeds from its non-transgenic counterparts was examined by using multispectral imaging system combined with chemometric data analysis. Principal component analysis (PCA), partial least squares discriminant analysis (PLSDA), least squares-support vector machines (LS-SVM), and PCA-back propagation neural network (PCA-BPNN) methods were applied to classify rice seeds according to their genetic origins. The results demonstrated that clear differences between non-transgenic and transgenic rice seeds could be easily visualized with the nondestructive determination method developed through this study and an excellent classification (up to 100% with LS-SVM model) can be achieved. It is concluded that multispectral imaging together with chemometric data analysis is a promising technique to identify transgenic rice seeds with high efficiency, providing bright prospects for future applications. Copyright © 2013 Elsevier Ltd. All rights reserved.
Salleh, Sh-Hussain; Hamedi, Mahyar; Zulkifly, Ahmad Hafiz; Lee, Muhammad Hisyam; Mohd Noor, Alias; Harris, Arief Ruhullah A.; Abdul Majid, Norazman
2014-01-01
Stress shielding and micromotion are two major issues which determine the success of newly designed cementless femoral stems. The correlation of experimental validation with finite element analysis (FEA) is commonly used to evaluate the stress distribution and fixation stability of the stem within the femoral canal. This paper focused on the applications of feature extraction and pattern recognition using support vector machine (SVM) to determine the primary stability of the implant. We measured strain with triaxial rosette at the metaphyseal region and micromotion with linear variable direct transducer proximally and distally using composite femora. The root mean squares technique is used to feed the classifier which provides maximum likelihood estimation of amplitude, and radial basis function is used as the kernel parameter which mapped the datasets into separable hyperplanes. The results showed 100% pattern recognition accuracy using SVM for both strain and micromotion. This indicates that DSP could be applied in determining the femoral stem primary stability with high pattern recognition accuracy in biomechanical testing. PMID:24800230
Baharuddin, Mohd Yusof; Salleh, Sh-Hussain; Hamedi, Mahyar; Zulkifly, Ahmad Hafiz; Lee, Muhammad Hisyam; Mohd Noor, Alias; Harris, Arief Ruhullah A; Abdul Majid, Norazman
2014-01-01
Stress shielding and micromotion are two major issues which determine the success of newly designed cementless femoral stems. The correlation of experimental validation with finite element analysis (FEA) is commonly used to evaluate the stress distribution and fixation stability of the stem within the femoral canal. This paper focused on the applications of feature extraction and pattern recognition using support vector machine (SVM) to determine the primary stability of the implant. We measured strain with triaxial rosette at the metaphyseal region and micromotion with linear variable direct transducer proximally and distally using composite femora. The root mean squares technique is used to feed the classifier which provides maximum likelihood estimation of amplitude, and radial basis function is used as the kernel parameter which mapped the datasets into separable hyperplanes. The results showed 100% pattern recognition accuracy using SVM for both strain and micromotion. This indicates that DSP could be applied in determining the femoral stem primary stability with high pattern recognition accuracy in biomechanical testing.
A method of real-time fault diagnosis for power transformers based on vibration analysis
NASA Astrophysics Data System (ADS)
Hong, Kaixing; Huang, Hai; Zhou, Jianping; Shen, Yimin; Li, Yujie
2015-11-01
In this paper, a novel probability-based classification model is proposed for real-time fault detection of power transformers. First, the transformer vibration principle is introduced, and two effective feature extraction techniques are presented. Next, the details of the classification model based on support vector machine (SVM) are shown. The model also includes a binary decision tree (BDT) which divides transformers into different classes according to health state. The trained model produces posterior probabilities of membership to each predefined class for a tested vibration sample. During the experiments, the vibrations of transformers under different conditions are acquired, and the corresponding feature vectors are used to train the SVM classifiers. The effectiveness of this model is illustrated experimentally on typical in-service transformers. The consistency between the results of the proposed model and the actual condition of the test transformers indicates that the model can be used as a reliable method for transformer fault detection.
Approach to the problem of the parameters optimization of the shooting system
NASA Astrophysics Data System (ADS)
Demidova, L. A.; Sablina, V. A.; Sokolova, Y. S.
2018-02-01
The problem of the objects identification on the base of their hyperspectral features has been considered. It is offered to use the SVM classifiers’ ensembles, adapted to specifics of the problem of the objects identification on the base of their hyperspectral features. The results of the objects identification on the base of their hyperspectral features with using of the SVM classifiers have been presented.
Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals
2014-09-30
floor 1176 Howell St Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR...APPROACH Odontocete click detection and classification. A multi-class support vector machine (SVM) classifier was previously developed ( Jarvis ...beaked whales, Risso’s dolphins, short-finned pilot whales, and sperm whales. Here Moretti’s group, particularly S. Jarvis , is improving the SVM
Qureshi, Muhammad Naveed Iqbal; Min, Beomjun; Jo, Hang Joon; Lee, Boreom
2016-01-01
The classification of neuroimaging data for the diagnosis of certain brain diseases is one of the main research goals of the neuroscience and clinical communities. In this study, we performed multiclass classification using a hierarchical extreme learning machine (H-ELM) classifier. We compared the performance of this classifier with that of a support vector machine (SVM) and basic extreme learning machine (ELM) for cortical MRI data from attention deficit/hyperactivity disorder (ADHD) patients. We used 159 structural MRI images of children from the publicly available ADHD-200 MRI dataset. The data consisted of three types, namely, typically developing (TDC), ADHD-inattentive (ADHD-I), and ADHD-combined (ADHD-C). We carried out feature selection by using standard SVM-based recursive feature elimination (RFE-SVM) that enabled us to achieve good classification accuracy (60.78%). In this study, we found the RFE-SVM feature selection approach in combination with H-ELM to effectively enable the acquisition of high multiclass classification accuracy rates for structural neuroimaging data. In addition, we found that the most important features for classification were the surface area of the superior frontal lobe, and the cortical thickness, volume, and mean surface area of the whole cortex. PMID:27500640
New support vector machine-based method for microRNA target prediction.
Li, L; Gao, Q; Mao, X; Cao, Y
2014-06-09
MicroRNA (miRNA) plays important roles in cell differentiation, proliferation, growth, mobility, and apoptosis. An accurate list of precise target genes is necessary in order to fully understand the importance of miRNAs in animal development and disease. Several computational methods have been proposed for miRNA target-gene identification. However, these methods still have limitations with respect to their sensitivity and accuracy. Thus, we developed a new miRNA target-prediction method based on the support vector machine (SVM) model. The model supplies information of two binding sites (primary and secondary) for a radial basis function kernel as a similarity measure for SVM features. The information is categorized based on structural, thermodynamic, and sequence conservation. Using high-confidence datasets selected from public miRNA target databases, we obtained a human miRNA target SVM classifier model with high performance and provided an efficient tool for human miRNA target gene identification. Experiments have shown that our method is a reliable tool for miRNA target-gene prediction, and a successful application of an SVM classifier. Compared with other methods, the method proposed here improves the sensitivity and accuracy of miRNA prediction. Its performance can be further improved by providing more training examples.
Qureshi, Muhammad Naveed Iqbal; Min, Beomjun; Jo, Hang Joon; Lee, Boreom
2016-01-01
The classification of neuroimaging data for the diagnosis of certain brain diseases is one of the main research goals of the neuroscience and clinical communities. In this study, we performed multiclass classification using a hierarchical extreme learning machine (H-ELM) classifier. We compared the performance of this classifier with that of a support vector machine (SVM) and basic extreme learning machine (ELM) for cortical MRI data from attention deficit/hyperactivity disorder (ADHD) patients. We used 159 structural MRI images of children from the publicly available ADHD-200 MRI dataset. The data consisted of three types, namely, typically developing (TDC), ADHD-inattentive (ADHD-I), and ADHD-combined (ADHD-C). We carried out feature selection by using standard SVM-based recursive feature elimination (RFE-SVM) that enabled us to achieve good classification accuracy (60.78%). In this study, we found the RFE-SVM feature selection approach in combination with H-ELM to effectively enable the acquisition of high multiclass classification accuracy rates for structural neuroimaging data. In addition, we found that the most important features for classification were the surface area of the superior frontal lobe, and the cortical thickness, volume, and mean surface area of the whole cortex.
Thomas, Minta; De Brabanter, Kris; De Moor, Bart
2014-05-10
DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies. We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity.
Multicategory Composite Least Squares Classifiers
Park, Seo Young; Liu, Yufeng; Liu, Dacheng; Scholl, Paul
2010-01-01
Classification is a very useful statistical tool for information extraction. In particular, multicategory classification is commonly seen in various applications. Although binary classification problems are heavily studied, extensions to the multicategory case are much less so. In view of the increased complexity and volume of modern statistical problems, it is desirable to have multicategory classifiers that are able to handle problems with high dimensions and with a large number of classes. Moreover, it is necessary to have sound theoretical properties for the multicategory classifiers. In the literature, there exist several different versions of simultaneous multicategory Support Vector Machines (SVMs). However, the computation of the SVM can be difficult for large scale problems, especially for problems with large number of classes. Furthermore, the SVM cannot produce class probability estimation directly. In this article, we propose a novel efficient multicategory composite least squares classifier (CLS classifier), which utilizes a new composite squared loss function. The proposed CLS classifier has several important merits: efficient computation for problems with large number of classes, asymptotic consistency, ability to handle high dimensional data, and simple conditional class probability estimation. Our simulated and real examples demonstrate competitive performance of the proposed approach. PMID:21218128
NASA Astrophysics Data System (ADS)
Mohan, Dhanya; Kumar, C. Santhosh
2016-03-01
Predicting the physiological condition (normal/abnormal) of a patient is highly desirable to enhance the quality of health care. Multi-parameter patient monitors (MPMs) using heart rate, arterial blood pressure, respiration rate and oxygen saturation (S pO2) as input parameters were developed to monitor the condition of patients, with minimum human resource utilization. The Support vector machine (SVM), an advanced machine learning approach popularly used for classification and regression is used for the realization of MPMs. For making MPMs cost effective, we experiment on the hardware implementation of the MPM using support vector machine classifier. The training of the system is done using the matlab environment and the detection of the alarm/noalarm condition is implemented in hardware. We used different kernels for SVM classification and note that the best performance was obtained using intersection kernel SVM (IKSVM). The intersection kernel support vector machine classifier MPM has outperformed the best known MPM using radial basis function kernel by an absoute improvement of 2.74% in accuracy, 1.86% in sensitivity and 3.01% in specificity. The hardware model was developed based on the improved performance system using Verilog Hardware Description Language and was implemented on Altera cyclone-II development board.
NASA Astrophysics Data System (ADS)
Li, Hai; Kumavor, Patrick D.; Alqasemi, Umar; Zhu, Quing
2014-03-01
Human ovarian tissue features extracted from photoacoustic spectra data, beam envelopes and co-registered ultrasound and photoacoustic images are used to characterize cancerous vs. normal processes using a support vector machine (SVM) classifier. The centers of suspicious tumor areas are estimated from the Gaussian fitting of the mean Radon transforms of the photoacoustic image along 0 and 90 degrees. Normalized power spectra are calculated using the Fourier transform of the photoacoustic beamformed data across these suspicious areas, where the spectral slope and 0-MHz intercepts are extracted. Image statistics, envelope histogram fitting and maximum output of 6 composite filters of cancerous or normal patterns along with other previously used features are calculated to compose a total of 17 features. These features are extracted from 169 datasets of 19 ex vivo ovaries. Half of the cancerous and normal datasets are randomly chosen to train a SVM classifier with polynomial kernel and the remainder is used for testing. With 50 times data resampling, the SVM classifier, for the training group, gives 100% sensitivity and 100% specificity. For the testing group, it gives 89.68+/- 6.37% sensitivity and 93.16+/- 3.70% specificity. These results are superior to those obtained earlier by our group using features extracted from photoacoustic raw data or image statistics only.
NASA Astrophysics Data System (ADS)
Karsi, Redouane; Zaim, Mounia; El Alami, Jamila
2017-07-01
Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.
Yilmaz, E; Kayikcioglu, T; Kayipmaz, S
2017-07-01
In this article, we propose a decision support system for effective classification of dental periapical cyst and keratocystic odontogenic tumor (KCOT) lesions obtained via cone beam computed tomography (CBCT). CBCT has been effectively used in recent years for diagnosing dental pathologies and determining their boundaries and content. Unlike other imaging techniques, CBCT provides detailed and distinctive information about the pathologies by enabling a three-dimensional (3D) image of the region to be displayed. We employed 50 CBCT 3D image dataset files as the full dataset of our study. These datasets were identified by experts as periapical cyst and KCOT lesions according to the clinical, radiographic and histopathologic features. Segmentation operations were performed on the CBCT images using viewer software that we developed. Using the tools of this software, we marked the lesional volume of interest and calculated and applied the order statistics and 3D gray-level co-occurrence matrix for each CBCT dataset. A feature vector of the lesional region, including 636 different feature items, was created from those statistics. Six classifiers were used for the classification experiments. The Support Vector Machine (SVM) classifier achieved the best classification performance with 100% accuracy, and 100% F-score (F1) scores as a result of the experiments in which a ten-fold cross validation method was used with a forward feature selection algorithm. SVM achieved the best classification performance with 96.00% accuracy, and 96.00% F1 scores in the experiments in which a split sample validation method was used with a forward feature selection algorithm. SVM additionally achieved the best performance of 94.00% accuracy, and 93.88% F1 in which a leave-one-out (LOOCV) method was used with a forward feature selection algorithm. Based on the results, we determined that periapical cyst and KCOT lesions can be classified with a high accuracy with the models that we built using the new dataset selected for this study. The studies mentioned in this article, along with the selected 3D dataset, 3D statistics calculated from the dataset, and performance results of the different classifiers, comprise an important contribution to the field of computer-aided diagnosis of dental apical lesions. Copyright © 2017 Elsevier B.V. All rights reserved.
An evaluation of open set recognition for FLIR images
NASA Astrophysics Data System (ADS)
Scherreik, Matthew; Rigling, Brian
2015-05-01
Typical supervised classification algorithms label inputs according to what was learned in a training phase. Thus, test inputs that were not seen in training are always given incorrect labels. Open set recognition algorithms address this issue by accounting for inputs that are not present in training and providing the classifier with an option to reject" unknown samples. A number of such techniques have been developed in the literature, many of which are based on support vector machines (SVMs). One approach, the 1-vs-set machine, constructs a slab" in feature space using the SVM hyperplane. Inputs falling on one side of the slab or within the slab belong to a training class, while inputs falling on the far side of the slab are rejected. We note that rejection of unknown inputs can be achieved by thresholding class posterior probabilities. Another recently developed approach, the Probabilistic Open Set SVM (POS-SVM), empirically determines good probability thresholds. We apply the 1-vs-set machine, POS-SVM, and closed set SVMs to FLIR images taken from the Comanche SIG dataset. Vehicles in the dataset are divided into three general classes: wheeled, armored personnel carrier (APC), and tank. For each class, a coarse pose estimate (front, rear, left, right) is taken. In a closed set sense, we analyze these algorithms for prediction of vehicle class and pose. To test open set performance, one or more vehicle classes are held out from training. By considering closed and open set performance separately, we may closely analyze both inter-class discrimination and threshold effectiveness.
Extraction of prostatic lumina and automated recognition for prostatic calculus image using PCA-SVM.
Wang, Zhuocai; Xu, Xiangmin; Ding, Xiaojun; Xiao, Hui; Huang, Yusheng; Liu, Jian; Xing, Xiaofen; Wang, Hua; Liao, D Joshua
2011-01-01
Identification of prostatic calculi is an important basis for determining the tissue origin. Computation-assistant diagnosis of prostatic calculi may have promising potential but is currently still less studied. We studied the extraction of prostatic lumina and automated recognition for calculus images. Extraction of lumina from prostate histology images was based on local entropy and Otsu threshold recognition using PCA-SVM and based on the texture features of prostatic calculus. The SVM classifier showed an average time 0.1432 second, an average training accuracy of 100%, an average test accuracy of 93.12%, a sensitivity of 87.74%, and a specificity of 94.82%. We concluded that the algorithm, based on texture features and PCA-SVM, can recognize the concentric structure and visualized features easily. Therefore, this method is effective for the automated recognition of prostatic calculi.
NASA Astrophysics Data System (ADS)
Anees, Asim; Aryal, Jagannath; O'Reilly, Małgorzata M.; Gale, Timothy J.; Wardlaw, Tim
2016-12-01
A robust non-parametric framework, based on multiple Radial Basic Function (RBF) kernels, is proposed in this study, for detecting land/forest cover changes using Landsat 7 ETM+ images. One of the widely used frameworks is to find change vectors (difference image) and use a supervised classifier to differentiate between change and no-change. The Bayesian Classifiers e.g. Maximum Likelihood Classifier (MLC), Naive Bayes (NB), are widely used probabilistic classifiers which assume parametric models, e.g. Gaussian function, for the class conditional distributions. However, their performance can be limited if the data set deviates from the assumed model. The proposed framework exploits the useful properties of Least Squares Probabilistic Classifier (LSPC) formulation i.e. non-parametric and probabilistic nature, to model class posterior probabilities of the difference image using a linear combination of a large number of Gaussian kernels. To this end, a simple technique, based on 10-fold cross-validation is also proposed for tuning model parameters automatically instead of selecting a (possibly) suboptimal combination from pre-specified lists of values. The proposed framework has been tested and compared with Support Vector Machine (SVM) and NB for detection of defoliation, caused by leaf beetles (Paropsisterna spp.) in Eucalyptus nitens and Eucalyptus globulus plantations of two test areas, in Tasmania, Australia, using raw bands and band combination indices of Landsat 7 ETM+. It was observed that due to multi-kernel non-parametric formulation and probabilistic nature, the LSPC outperforms parametric NB with Gaussian assumption in change detection framework, with Overall Accuracy (OA) ranging from 93.6% (κ = 0.87) to 97.4% (κ = 0.94) against 85.3% (κ = 0.69) to 93.4% (κ = 0.85), and is more robust to changing data distributions. Its performance was comparable to SVM, with added advantages of being probabilistic and capable of handling multi-class problems naturally with its original formulation.
NASA Astrophysics Data System (ADS)
Porto, C. D. N.; Costa Filho, C. F. F.; Macedo, M. M. G.; Gutierrez, M. A.; Costa, M. G. F.
2017-03-01
Studies in intravascular optical coherence tomography (IV-OCT) have demonstrated the importance of coronary bifurcation regions in intravascular medical imaging analysis, as plaques are more likely to accumulate in this region leading to coronary disease. A typical IV-OCT pullback acquires hundreds of frames, thus developing an automated tool to classify the OCT frames as bifurcation or non-bifurcation can be an important step to speed up OCT pullbacks analysis and assist automated methods for atherosclerotic plaque quantification. In this work, we evaluate the performance of two state-of-the-art classifiers, SVM and Neural Networks in the bifurcation classification task. The study included IV-OCT frames from 9 patients. In order to improve classification performance, we trained and tested the SVM with different parameters by means of a grid search and different stop criteria were applied to the Neural Network classifier: mean square error, early stop and regularization. Different sets of features were tested, using feature selection techniques: PCA, LDA and scalar feature selection with correlation. Training and test were performed in sets with a maximum of 1460 OCT frames. We quantified our results in terms of false positive rate, true positive rate, accuracy, specificity, precision, false alarm, f-measure and area under ROC curve. Neural networks obtained the best classification accuracy, 98.83%, overcoming the results found in literature. Our methods appear to offer a robust and reliable automated classification of OCT frames that might assist physicians indicating potential frames to analyze. Methods for improving neural networks generalization have increased the classification performance.
A PCA aided cross-covariance scheme for discriminative feature extraction from EEG signals.
Zarei, Roozbeh; He, Jing; Siuly, Siuly; Zhang, Yanchun
2017-07-01
Feature extraction of EEG signals plays a significant role in Brain-computer interface (BCI) as it can significantly affect the performance and the computational time of the system. The main aim of the current work is to introduce an innovative algorithm for acquiring reliable discriminating features from EEG signals to improve classification performances and to reduce the time complexity. This study develops a robust feature extraction method combining the principal component analysis (PCA) and the cross-covariance technique (CCOV) for the extraction of discriminatory information from the mental states based on EEG signals in BCI applications. We apply the correlation based variable selection method with the best first search on the extracted features to identify the best feature set for characterizing the distribution of mental state signals. To verify the robustness of the proposed feature extraction method, three machine learning techniques: multilayer perceptron neural networks (MLP), least square support vector machine (LS-SVM), and logistic regression (LR) are employed on the obtained features. The proposed methods are evaluated on two publicly available datasets. Furthermore, we evaluate the performance of the proposed methods by comparing it with some recently reported algorithms. The experimental results show that all three classifiers achieve high performance (above 99% overall classification accuracy) for the proposed feature set. Among these classifiers, the MLP and LS-SVM methods yield the best performance for the obtained feature. The average sensitivity, specificity and classification accuracy for these two classifiers are same, which are 99.32%, 100%, and 99.66%, respectively for the BCI competition dataset IVa and 100%, 100%, and 100%, for the BCI competition dataset IVb. The results also indicate the proposed methods outperform the most recently reported methods by at least 0.25% average accuracy improvement in dataset IVa. The execution time results show that the proposed method has less time complexity after feature selection. The proposed feature extraction method is very effective for getting representatives information from mental states EEG signals in BCI applications and reducing the computational complexity of classifiers by reducing the number of extracted features. Copyright © 2017 Elsevier B.V. All rights reserved.
Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Manurung, Jonson; Mawengkang, Herman; Zamzami, Elviawaty
2017-12-01
Support vector machine (SVM) is a popular classification method known to have strong generalization capabilities. SVM can solve the problem of classification and linear regression or nonlinear kernel which can be a learning algorithm for the ability of classification and regression. However, SVM also has a weakness that is difficult to determine the optimal parameter value. SVM calculates the best linear separator on the input feature space according to the training data. To classify data which are non-linearly separable, SVM uses kernel tricks to transform the data into a linearly separable data on a higher dimension feature space. The kernel trick using various kinds of kernel functions, such as : linear kernel, polynomial, radial base function (RBF) and sigmoid. Each function has parameters which affect the accuracy of SVM classification. To solve the problem genetic algorithms are proposed to be applied as the optimal parameter value search algorithm thus increasing the best classification accuracy on SVM. Data taken from UCI repository of machine learning database: Australian Credit Approval. The results show that the combination of SVM and genetic algorithms is effective in improving classification accuracy. Genetic algorithms has been shown to be effective in systematically finding optimal kernel parameters for SVM, instead of randomly selected kernel parameters. The best accuracy for data has been upgraded from kernel Linear: 85.12%, polynomial: 81.76%, RBF: 77.22% Sigmoid: 78.70%. However, for bigger data sizes, this method is not practical because it takes a lot of time.
Marafino, Ben J; Davies, Jason M; Bardach, Naomi S; Dean, Mitzi L; Dudley, R Adams
2014-01-01
Existing risk adjustment models for intensive care unit (ICU) outcomes rely on manual abstraction of patient-level predictors from medical charts. Developing an automated method for abstracting these data from free text might reduce cost and data collection times. To develop a support vector machine (SVM) classifier capable of identifying a range of procedures and diagnoses in ICU clinical notes for use in risk adjustment. We selected notes from 2001-2008 for 4191 neonatal ICU (NICU) and 2198 adult ICU patients from the MIMIC-II database from the Beth Israel Deaconess Medical Center. Using these notes, we developed an implementation of the SVM classifier to identify procedures (mechanical ventilation and phototherapy in NICU notes) and diagnoses (jaundice in NICU and intracranial hemorrhage (ICH) in adult ICU). On the jaundice classification task, we also compared classifier performance using n-gram features to unigrams with application of a negation algorithm (NegEx). Our classifier accurately identified mechanical ventilation (accuracy=0.982, F1=0.954) and phototherapy use (accuracy=0.940, F1=0.912), as well as jaundice (accuracy=0.898, F1=0.884) and ICH diagnoses (accuracy=0.938, F1=0.943). Including bigram features improved performance on the jaundice (accuracy=0.898 vs 0.865) and ICH (0.938 vs 0.927) tasks, and outperformed NegEx-derived unigram features (accuracy=0.898 vs 0.863) on the jaundice task. Overall, a classifier using n-gram support vectors displayed excellent performance characteristics. The classifier generalizes to diverse patient populations, diagnoses, and procedures. SVM-based classifiers can accurately identify procedure status and diagnoses among ICU patients, and including n-gram features improves performance, compared to existing methods. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Support vector machines classifiers of physical activities in preschoolers
USDA-ARS?s Scientific Manuscript database
The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...
USDA-ARS?s Scientific Manuscript database
This paper presents a novel wrinkle evaluation method that uses modified wavelet coefficients and an optimized support-vector-machine (SVM) classification scheme to characterize and classify wrinkle appearance of fabric. Fabric images were decomposed with the wavelet transform (WT), and five parame...
Machine learning modelling for predicting soil liquefaction susceptibility
NASA Astrophysics Data System (ADS)
Samui, P.; Sitharam, T. G.
2011-01-01
This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N1)60] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [(N1)60 and peck ground acceleration (amax/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.
Lidar detection of underwater objects using a neuro-SVM-based architecture.
Mitra, Vikramjit; Wang, Chia-Jiu; Banerjee, Satarupa
2006-05-01
This paper presents a neural network architecture using a support vector machine (SVM) as an inference engine (IE) for classification of light detection and ranging (Lidar) data. Lidar data gives a sequence of laser backscatter intensities obtained from laser shots generated from an airborne object at various altitudes above the earth surface. Lidar data is pre-filtered to remove high frequency noise. As the Lidar shots are taken from above the earth surface, it has some air backscatter information, which is of no importance for detecting underwater objects. Because of these, the air backscatter information is eliminated from the data and a segment of this data is subsequently selected to extract features for classification. This is then encoded using linear predictive coding (LPC) and polynomial approximation. The coefficients thus generated are used as inputs to the two branches of a parallel neural architecture. The decisions obtained from the two branches are vector multiplied and the result is fed to an SVM-based IE that presents the final inference. Two parallel neural architectures using multilayer perception (MLP) and hybrid radial basis function (HRBF) are considered in this paper. The proposed structure fits the Lidar data classification task well due to the inherent classification efficiency of neural networks and accurate decision-making capability of SVM. A Bayesian classifier and a quadratic classifier were considered for the Lidar data classification task but they failed to offer high prediction accuracy. Furthermore, a single-layered artificial neural network (ANN) classifier was also considered and it failed to offer good accuracy. The parallel ANN architecture proposed in this paper offers high prediction accuracy (98.9%) and is found to be the most suitable architecture for the proposed task of Lidar data classification.
Zhang, Jian; Lockhart, Thurmon E.; Soangra, Rahul
2013-01-01
Fatigue in lower extremity musculature is associated with decline in postural stability, motor performance and alters normal walking patterns in human subjects. Automated recognition of lower extremity muscle fatigue condition may be advantageous in early detection of fall and injury risks. Supervised machine learning methods such as Support Vector Machines (SVM) have been previously used for classifying healthy and pathological gait patterns and also for separating old and young gait patterns. In this study we explore the classification potential of SVM in recognition of gait patterns utilizing an inertial measurement unit associated with lower extremity muscular fatigue. Both kinematic and kinetic gait patterns of 17 participants (29±11 years) were recorded and analyzed in normal and fatigued state of walking. Lower extremities were fatigued by performance of a squatting exercise until the participants reached 60% of their baseline maximal voluntary exertion level. Feature selection methods were used to classify fatigue and no-fatigue conditions based on temporal and frequency information of the signals. Additionally, influences of three different kernel schemes (i.e., linear, polynomial, and radial basis function) were investigated for SVM classification. The results indicated that lower extremity muscle fatigue condition influenced gait and loading responses. In terms of the SVM classification results, an accuracy of 96% was reached in distinguishing the two gait patterns (fatigue and no-fatigue) within the same subject using the kinematic, time and frequency domain features. It is also found that linear kernel and RBF kernel were equally good to identify intra-individual fatigue characteristics. These results suggest that intra-subject fatigue classification using gait patterns from an inertial sensor holds considerable potential in identifying “at-risk” gait due to muscle fatigue. PMID:24081829
McAllister, Patrick; Zheng, Huiru; Bond, Raymond; Moorhead, Anne
2018-04-01
Obesity is increasing worldwide and can cause many chronic conditions such as type-2 diabetes, heart disease, sleep apnea, and some cancers. Monitoring dietary intake through food logging is a key method to maintain a healthy lifestyle to prevent and manage obesity. Computer vision methods have been applied to food logging to automate image classification for monitoring dietary intake. In this work we applied pretrained ResNet-152 and GoogleNet convolutional neural networks (CNNs), initially trained using ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset with MatConvNet package, to extract features from food image datasets; Food 5K, Food-11, RawFooT-DB, and Food-101. Deep features were extracted from CNNs and used to train machine learning classifiers including artificial neural network (ANN), support vector machine (SVM), Random Forest, and Naive Bayes. Results show that using ResNet-152 deep features with SVM with RBF kernel can accurately detect food items with 99.4% accuracy using Food-5K validation food image dataset and 98.8% with Food-5K evaluation dataset using ANN, SVM-RBF, and Random Forest classifiers. Trained with ResNet-152 features, ANN can achieve 91.34%, 99.28% when applied to Food-11 and RawFooT-DB food image datasets respectively and SVM with RBF kernel can achieve 64.98% with Food-101 image dataset. From this research it is clear that using deep CNN features can be used efficiently for diverse food item image classification. The work presented in this research shows that pretrained ResNet-152 features provide sufficient generalisation power when applied to a range of food image classification tasks. Copyright © 2018 Elsevier Ltd. All rights reserved.
Detecting falls with wearable sensors using machine learning techniques.
Özdemir, Ahmet Turan; Barshan, Billur
2014-06-18
Falls are a serious public health problem and possibly life threatening for people in fall risk groups. We develop an automated fall detection system with wearable motion sensor units fitted to the subjects' body at six different positions. Each unit comprises three tri-axial devices (accelerometer, gyroscope, and magnetometer/compass). Fourteen volunteers perform a standardized set of movements including 20 voluntary falls and 16 activities of daily living (ADLs), resulting in a large dataset with 2520 trials. To reduce the computational complexity of training and testing the classifiers, we focus on the raw data for each sensor in a 4 s time window around the point of peak total acceleration of the waist sensor, and then perform feature extraction and reduction. Most earlier studies on fall detection employ rule-based approaches that rely on simple thresholding of the sensor outputs. We successfully distinguish falls from ADLs using six machine learning techniques (classifiers): the k-nearest neighbor (k-NN) classifier, least squares method (LSM), support vector machines (SVM), Bayesian decision making (BDM), dynamic time warping (DTW), and artificial neural networks (ANNs). We compare the performance and the computational complexity of the classifiers and achieve the best results with the k-NN classifier and LSM, with sensitivity, specificity, and accuracy all above 99%. These classifiers also have acceptable computational requirements for training and testing. Our approach would be applicable in real-world scenarios where data records of indeterminate length, containing multiple activities in sequence, are recorded.
Classification of EEG Signals Based on Pattern Recognition Approach.
Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed
2017-01-01
Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a "pattern recognition" approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90-7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11-89.63% and 91.60-81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy.
Classification of EEG Signals Based on Pattern Recognition Approach
Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed
2017-01-01
Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a “pattern recognition” approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90–7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11–89.63% and 91.60–81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy. PMID:29209190
Mourao-Miranda, J; Reinders, A A T S; Rocha-Rego, V; Lappin, J; Rondina, J; Morgan, C; Morgan, K D; Fearon, P; Jones, P B; Doody, G A; Murray, R M; Kapur, S; Dazzan, P
2012-05-01
To date, magnetic resonance imaging (MRI) has made little impact on the diagnosis and monitoring of psychoses in individual patients. In this study, we used a support vector machine (SVM) whole-brain classification approach to predict future illness course at the individual level from MRI data obtained at the first psychotic episode. One hundred patients at their first psychotic episode and 91 healthy controls had an MRI scan. Patients were re-evaluated 6.2 years (s.d.=2.3) later, and were classified as having a continuous, episodic or intermediate illness course. Twenty-eight subjects with a continuous course were compared with 28 patients with an episodic course and with 28 healthy controls. We trained each SVM classifier independently for the following contrasts: continuous versus episodic, continuous versus healthy controls, and episodic versus healthy controls. At baseline, patients with a continuous course were already distinguishable, with significance above chance level, from both patients with an episodic course (p=0.004, sensitivity=71, specificity=68) and healthy individuals (p=0.01, sensitivity=71, specificity=61). Patients with an episodic course could not be distinguished from healthy individuals. When patients with an intermediate outcome were classified according to the discriminating pattern episodic versus continuous, 74% of those who did not develop other episodes were classified as episodic, and 65% of those who did develop further episodes were classified as continuous (p=0.035). We provide preliminary evidence of MRI application in the individualized prediction of future illness course, using a simple and automated SVM pipeline. When replicated and validated in larger groups, this could enable targeted clinical decisions based on imaging data.
Mourao-Miranda, J.; Reinders, A. A. T. S.; Rocha-Rego, V.; Lappin, J.; Rondina, J.; Morgan, C.; Morgan, K. D.; Fearon, P.; Jones, P. B.; Doody, G. A.; Murray, R. M.; Kapur, S.; Dazzan, P.
2012-01-01
Background To date, magnetic resonance imaging (MRI) has made little impact on the diagnosis and monitoring of psychoses in individual patients. In this study, we used a support vector machine (SVM) whole-brain classification approach to predict future illness course at the individual level from MRI data obtained at the first psychotic episode. Method One hundred patients at their first psychotic episode and 91 healthy controls had an MRI scan. Patients were re-evaluated 6.2 years (s.d.=2.3) later, and were classified as having a continuous, episodic or intermediate illness course. Twenty-eight subjects with a continuous course were compared with 28 patients with an episodic course and with 28 healthy controls. We trained each SVM classifier independently for the following contrasts: continuous versus episodic, continuous versus healthy controls, and episodic versus healthy controls. Results At baseline, patients with a continuous course were already distinguishable, with significance above chance level, from both patients with an episodic course (p=0.004, sensitivity=71, specificity=68) and healthy individuals (p=0.01, sensitivity=71, specificity=61). Patients with an episodic course could not be distinguished from healthy individuals. When patients with an intermediate outcome were classified according to the discriminating pattern episodic versus continuous, 74% of those who did not develop other episodes were classified as episodic, and 65% of those who did develop further episodes were classified as continuous (p=0.035). Conclusions We provide preliminary evidence of MRI application in the individualized prediction of future illness course, using a simple and automated SVM pipeline. When replicated and validated in larger groups, this could enable targeted clinical decisions based on imaging data. PMID:22059690
Ant-cuckoo colony optimization for feature selection in digital mammogram.
Jona, J B; Nagaveni, N
2014-01-15
Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques.
Support vector machine for automatic pain recognition
NASA Astrophysics Data System (ADS)
Monwar, Md Maruf; Rezaei, Siamak
2009-02-01
Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.
A fast learning method for large scale and multi-class samples of SVM
NASA Astrophysics Data System (ADS)
Fan, Yu; Guo, Huiming
2017-06-01
A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
Terminator Detection by Support Vector Machine Utilizing aStochastic Context-Free Grammar
DOE Office of Scientific and Technical Information (OSTI.GOV)
Francis-Lyon, Patricia; Cristianini, Nello; Holbrook, Stephen
2006-12-30
A 2-stage detector was designed to find rho-independent transcription terminators in the Escherichia coli genome. The detector includes a Stochastic Context Free Grammar (SCFG) component and a Support Vector Machine (SVM) component. To find terminators, the SCFG searches the intergenic regions of nucleotide sequence for local matches to a terminator grammar that was designed and trained utilizing examples of known terminators. The grammar selects sequences that are the best candidates for terminators and assigns them a prefix, stem-loop, suffix structure using the Cocke-Younger-Kasaami (CYK) algorithm, modified to incorporate energy affects of base pairing. The parameters from this inferred structure aremore » passed to the SVM classifier, which distinguishes terminators from non-terminators that score high according to the terminator grammar. The SVM was trained with negative examples drawn from intergenic sequences that include both featureless and RNA gene regions (which were assigned prefix, stem-loop, suffix structure by the SCFG), so that it successfully distinguishes terminators from either of these. The classifier was found to be 96.4% successful during testing.« less
NASA Astrophysics Data System (ADS)
Lee, Youngjoo; Seo, Joon Beom; Kang, Bokyoung; Kim, Dongil; Lee, June Goo; Kim, Song Soo; Kim, Namkug; Kang, Suk Ho
2007-03-01
The performance of classification algorithms for differentiating among obstructive lung diseases based on features from texture analysis using HRCT (High Resolution Computerized Tomography) images was compared. HRCT can provide accurate information for the detection of various obstructive lung diseases, including centrilobular emphysema, panlobular emphysema and bronchiolitis obliterans. Features on HRCT images can be subtle, however, particularly in the early stages of disease, and image-based diagnosis is subject to inter-observer variation. To automate the diagnosis and improve the accuracy, we compared three types of automated classification systems, naÃve Bayesian classifier, ANN (Artificial Neural Net) and SVM (Support Vector Machine), based on their ability to differentiate among normal lung and three types of obstructive lung diseases. To assess the performance and cross-validation of these three classifiers, 5 folding methods with 5 randomly chosen groups were used. For a more robust result, each validation was repeated 100 times. SVM showed the best performance, with 86.5% overall sensitivity, significantly different from the other classifiers (one way ANOVA, p<0.01). We address the characteristics of each classifier affecting performance and the issue of which classifier is the most suitable for clinical applications, and propose an appropriate method to choose the best classifier and determine its optimal parameters for optimal disease discrimination. These results can be applied to classifiers for differentiation of other diseases.
Heidelberg Retina Tomograph 3 machine learning classifiers for glaucoma detection
Townsend, K A; Wollstein, G; Danks, D; Sung, K R; Ishikawa, H; Kagemann, L; Gabriele, M L; Schuman, J S
2010-01-01
Aims To assess performance of classifiers trained on Heidelberg Retina Tomograph 3 (HRT3) parameters for discriminating between healthy and glaucomatous eyes. Methods Classifiers were trained using HRT3 parameters from 60 healthy subjects and 140 glaucomatous subjects. The classifiers were trained on all 95 variables and smaller sets created with backward elimination. Seven types of classifiers, including Support Vector Machines with radial basis (SVM-radial), and Recursive Partitioning and Regression Trees (RPART), were trained on the parameters. The area under the ROC curve (AUC) was calculated for classifiers, individual parameters and HRT3 glaucoma probability scores (GPS). Classifier AUCs and leave-one-out accuracy were compared with the highest individual parameter and GPS AUCs and accuracies. Results The highest AUC and accuracy for an individual parameter were 0.848 and 0.79, for vertical cup/disc ratio (vC/D). For GPS, global GPS performed best with AUC 0.829 and accuracy 0.78. SVM-radial with all parameters showed significant improvement over global GPS and vC/ D with AUC 0.916 and accuracy 0.85. RPART with all parameters provided significant improvement over global GPS with AUC 0.899 and significant improvement over global GPS and vC/D with accuracy 0.875. Conclusions Machine learning classifiers of HRT3 data provide significant enhancement over current methods for detection of glaucoma. PMID:18523087
Extraction of Prostatic Lumina and Automated Recognition for Prostatic Calculus Image Using PCA-SVM
Wang, Zhuocai; Xu, Xiangmin; Ding, Xiaojun; Xiao, Hui; Huang, Yusheng; Liu, Jian; Xing, Xiaofen; Wang, Hua; Liao, D. Joshua
2011-01-01
Identification of prostatic calculi is an important basis for determining the tissue origin. Computation-assistant diagnosis of prostatic calculi may have promising potential but is currently still less studied. We studied the extraction of prostatic lumina and automated recognition for calculus images. Extraction of lumina from prostate histology images was based on local entropy and Otsu threshold recognition using PCA-SVM and based on the texture features of prostatic calculus. The SVM classifier showed an average time 0.1432 second, an average training accuracy of 100%, an average test accuracy of 93.12%, a sensitivity of 87.74%, and a specificity of 94.82%. We concluded that the algorithm, based on texture features and PCA-SVM, can recognize the concentric structure and visualized features easily. Therefore, this method is effective for the automated recognition of prostatic calculi. PMID:21461364
Wire connector classification with machine vision and a novel hybrid SVM
NASA Astrophysics Data System (ADS)
Chauhan, Vedang; Joshi, Keyur D.; Surgenor, Brian W.
2018-04-01
A machine vision-based system has been developed and tested that uses a novel hybrid Support Vector Machine (SVM) in a part inspection application with clear plastic wire connectors. The application required the system to differentiate between 4 different known styles of connectors plus one unknown style, for a total of 5 classes. The requirement to handle an unknown class is what necessitated the hybrid approach. The system was trained with the 4 known classes and tested with 5 classes (the 4 known plus the 1 unknown). The hybrid classification approach used two layers of SVMs: one layer was semi-supervised and the other layer was supervised. The semi-supervised SVM was a special case of unsupervised machine learning that classified test images as one of the 4 known classes (to accept) or as the unknown class (to reject). The supervised SVM classified test images as one of the 4 known classes and consequently would give false positives (FPs). Two methods were tested. The difference between the methods was that the order of the layers was switched. The method with the semi-supervised layer first gave an accuracy of 80% with 20% FPs. The method with the supervised layer first gave an accuracy of 98% with 0% FPs. Further work is being conducted to see if the hybrid approach works with other applications that have an unknown class requirement.
2018-01-01
Early detection of power transformer fault is important because it can reduce the maintenance cost of the transformer and it can ensure continuous electricity supply in power systems. Dissolved Gas Analysis (DGA) technique is commonly used to identify oil-filled power transformer fault type but utilisation of artificial intelligence method with optimisation methods has shown convincing results. In this work, a hybrid support vector machine (SVM) with modified evolutionary particle swarm optimisation (EPSO) algorithm was proposed to determine the transformer fault type. The superiority of the modified PSO technique with SVM was evaluated by comparing the results with the actual fault diagnosis, unoptimised SVM and previous reported works. Data reduction was also applied using stepwise regression prior to the training process of SVM to reduce the training time. It was found that the proposed hybrid SVM-Modified EPSO (MEPSO)-Time Varying Acceleration Coefficient (TVAC) technique results in the highest correct identification percentage of faults in a power transformer compared to other PSO algorithms. Thus, the proposed technique can be one of the potential solutions to identify the transformer fault type based on DGA data on site. PMID:29370230
Illias, Hazlee Azil; Zhao Liang, Wee
2018-01-01
Early detection of power transformer fault is important because it can reduce the maintenance cost of the transformer and it can ensure continuous electricity supply in power systems. Dissolved Gas Analysis (DGA) technique is commonly used to identify oil-filled power transformer fault type but utilisation of artificial intelligence method with optimisation methods has shown convincing results. In this work, a hybrid support vector machine (SVM) with modified evolutionary particle swarm optimisation (EPSO) algorithm was proposed to determine the transformer fault type. The superiority of the modified PSO technique with SVM was evaluated by comparing the results with the actual fault diagnosis, unoptimised SVM and previous reported works. Data reduction was also applied using stepwise regression prior to the training process of SVM to reduce the training time. It was found that the proposed hybrid SVM-Modified EPSO (MEPSO)-Time Varying Acceleration Coefficient (TVAC) technique results in the highest correct identification percentage of faults in a power transformer compared to other PSO algorithms. Thus, the proposed technique can be one of the potential solutions to identify the transformer fault type based on DGA data on site.
Component Pin Recognition Using Algorithms Based on Machine Learning
NASA Astrophysics Data System (ADS)
Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang
2018-04-01
The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.
Automatic threshold selection for multi-class open set recognition
NASA Astrophysics Data System (ADS)
Scherreik, Matthew; Rigling, Brian
2017-05-01
Multi-class open set recognition is the problem of supervised classification with additional unknown classes encountered after a model has been trained. An open set classifer often has two core components. The first component is a base classifier which estimates the most likely class of a given example. The second component consists of open set logic which estimates if the example is truly a member of the candidate class. Such a system is operated in a feed-forward fashion. That is, a candidate label is first estimated by the base classifier, and the true membership of the example to the candidate class is estimated afterward. Previous works have developed an iterative threshold selection algorithm for rejecting examples from classes which were not present at training time. In those studies, a Platt-calibrated SVM was used as the base classifier, and the thresholds were applied to class posterior probabilities for rejection. In this work, we investigate the effectiveness of other base classifiers when paired with the threshold selection algorithm and compare their performance with the original SVM solution.
NASA Astrophysics Data System (ADS)
Barman, Ranjan Kumar; Mukhopadhyay, Anirban; Das, Santasabuj
2017-04-01
Bacterial small non-coding RNAs (sRNAs) are not translated into proteins, but act as functional RNAs. They are involved in diverse biological processes like virulence, stress response and quorum sensing. Several high-throughput techniques have enabled identification of sRNAs in bacteria, but experimental detection remains a challenge and grossly incomplete for most species. Thus, there is a need to develop computational tools to predict bacterial sRNAs. Here, we propose a computational method to identify sRNAs in bacteria using support vector machine (SVM) classifier. The primary sequence and secondary structure features of experimentally-validated sRNAs of Salmonella Typhimurium LT2 (SLT2) was used to build the optimal SVM model. We found that a tri-nucleotide composition feature of sRNAs achieved an accuracy of 88.35% for SLT2. We validated the SVM model also on the experimentally-detected sRNAs of E. coli and Salmonella Typhi. The proposed model had robustly attained an accuracy of 81.25% and 88.82% for E. coli K-12 and S. Typhi Ty2, respectively. We confirmed that this method significantly improved the identification of sRNAs in bacteria. Furthermore, we used a sliding window-based method and identified sRNAs from complete genomes of SLT2, S. Typhi Ty2 and E. coli K-12 with sensitivities of 89.09%, 83.33% and 67.39%, respectively.
NASA Astrophysics Data System (ADS)
Marwaha, Richa; Kumar, Anil; Kumar, Arumugam Senthil
2015-01-01
Our primary objective was to explore a classification algorithm for thermal hyperspectral data. Minimum noise fraction is applied to thermal hyperspectral data and eight pixel-based classifiers, i.e., constrained energy minimization, matched filter, spectral angle mapper (SAM), adaptive coherence estimator, orthogonal subspace projection, mixture-tuned matched filter, target-constrained interference-minimized filter, and mixture-tuned target-constrained interference minimized filter are tested. The long-wave infrared (LWIR) has not yet been exploited for classification purposes. The LWIR data contain emissivity and temperature information about an object. A highest overall accuracy of 90.99% was obtained using the SAM algorithm for the combination of thermal data with a colored digital photograph. Similarly, an object-oriented approach is applied to thermal data. The image is segmented into meaningful objects based on properties such as geometry, length, etc., which are grouped into pixels using a watershed algorithm and an applied supervised classification algorithm, i.e., support vector machine (SVM). The best algorithm in the pixel-based category is the SAM technique. SVM is useful for thermal data, providing a high accuracy of 80.00% at a scale value of 83 and a merge value of 90, whereas for the combination of thermal data with a colored digital photograph, SVM gives the highest accuracy of 85.71% at a scale value of 82 and a merge value of 90.
NASA Astrophysics Data System (ADS)
Bramhe, V. S.; Ghosh, S. K.; Garg, P. K.
2018-04-01
With rapid globalization, the extent of built-up areas is continuously increasing. Extraction of features for classifying built-up areas that are more robust and abstract is a leading research topic from past many years. Although, various studies have been carried out where spatial information along with spectral features has been utilized to enhance the accuracy of classification. Still, these feature extraction techniques require a large number of user-specific parameters and generally application specific. On the other hand, recently introduced Deep Learning (DL) techniques requires less number of parameters to represent more abstract aspects of the data without any manual effort. Since, it is difficult to acquire high-resolution datasets for applications that require large scale monitoring of areas. Therefore, in this study Sentinel-2 image has been used for built-up areas extraction. In this work, pre-trained Convolutional Neural Networks (ConvNets) i.e. Inception v3 and VGGNet are employed for transfer learning. Since these networks are trained on generic images of ImageNet dataset which are having very different characteristics from satellite images. Therefore, weights of networks are fine-tuned using data derived from Sentinel-2 images. To compare the accuracies with existing shallow networks, two state of art classifiers i.e. Gaussian Support Vector Machine (SVM) and Back-Propagation Neural Network (BP-NN) are also implemented. Both SVM and BP-NN gives 84.31 % and 82.86 % overall accuracies respectively. Inception-v3 and VGGNet gives 89.43 % of overall accuracy using fine-tuned VGGNet and 92.10 % when using Inception-v3. The results indicate high accuracy of proposed fine-tuned ConvNets on a 4-channel Sentinel-2 dataset for built-up area extraction.
Havla, Lukas; Schneider, Moritz J; Thierfelder, Kolja M; Beyer, Sebastian E; Ertl-Wagner, Birgit; Reiser, Maximilian F; Sommer, Wieland H; Dietrich, Olaf
2016-02-01
The purpose of this study was to propose and evaluate a new wavelet-based technique for classification of arterial and venous vessels using time-resolved cerebral CT perfusion data sets. Fourteen consecutive patients (mean age 73 yr, range 17-97) with suspected stroke but no pathology in follow-up MRI were included. A CT perfusion scan with 32 dynamic phases was performed during intravenous bolus contrast-agent application. After rigid-body motion correction, a Paul wavelet (order 1) was used to calculate voxelwise the wavelet power spectrum (WPS) of each attenuation-time course. The angiographic intensity A was defined as the maximum of the WPS, located at the coordinates T (time axis) and W (scale/width axis) within the WPS. Using these three parameters (A, T, W) separately as well as combined by (1) Fisher's linear discriminant analysis (FLDA), (2) logistic regression (LogR) analysis, or (3) support vector machine (SVM) analysis, their potential to classify 18 different arterial and venous vessel segments per subject was evaluated. The best vessel classification was obtained using all three parameters A and T and W [area under the curve (AUC): 0.953 with FLDA and 0.957 with LogR or SVM]. In direct comparison, the wavelet-derived parameters provided performance at least equal to conventional attenuation-time-course parameters. The maximum AUC obtained from the proposed wavelet parameters was slightly (although not statistically significantly) higher than the maximum AUC (0.945) obtained from the conventional parameters. A new method to classify arterial and venous cerebral vessels with high statistical accuracy was introduced based on the time-domain wavelet transform of dynamic CT perfusion data in combination with linear or nonlinear multidimensional classification techniques.
Classifying High-noise EEG in Complex Environments for Brain-computer Interaction Technologies
2012-02-01
differentiation in the brain signal that our classification approach seeks to identify despite the noise in the recorded EEG signal and the complexity of...performed two offline classifications , one using BCILab (1), the other using LibSVM (2). Distinct classifiers were trained for each individual in...order to improve individual classifier performance (3). The highest classification performance results were obtained using individual frequency bands
Yousef, Malik; Khalifa, Waleed; AbedAllah, Loai
2016-12-22
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that ECkNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Yousef, Malik; Khalifa, Waleed; AbdAllah, Loai
2016-12-01
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Villa-Manríquez, J F; Castro-Ramos, J; Gutiérrez-Delgado, F; Lopéz-Pacheco, M A; Villanueva-Luna, A E
2017-08-01
In this study we identify and classify high and low levels of glycated hemoglobin (HbA1c) in healthy volunteers (HV) and diabetic patients (DP). Overall, 86 subjects were evaluated. The Raman spectrum was measured in three anatomical regions of the body: index fingertip, right ear lobe, and forehead. The measurements were performed to compare the difference between the HV and DP (22 well controlled diabetic patients (WCDP) (HbA1c <6.5%), and 49 not controlled diabetic patients (NCDP) (HbA1c ≥6.5%)). Multivariable methods such as principal components analysis (PCA) combined with support vector machine (SVM) were used to develop effective diagnostic algorithms for classification among these groups. The forehead of HV versus WCDP showed the highest sensitivity (100%) and specificity (100%). Sensitivity (100%) and specificity (60%), were highest in the forehead of WCDP, versus NCDP. In HV versus NCDP, the fingertip had the highest sensitivity (100%) and specificity (80%). The efficacy of the diagnostic algorithm by receiver operating characteristic (ROC) curve was confirmed. Overall, our study demonstrated that the combination of Raman spectroscopy and PCA-SVM are feasible non-invasive diagnostic tool in diabetes to classify qualitatively high and low levels of HbA1c in vivo. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Lesniak, J. M.; Hupse, R.; Blanc, R.; Karssemeijer, N.; Székely, G.
2012-08-01
False positive (FP) marks represent an obstacle for effective use of computer-aided detection (CADe) of breast masses in mammography. Typically, the problem can be approached either by developing more discriminative features or by employing different classifier designs. In this paper, the usage of support vector machine (SVM) classification for FP reduction in CADe is investigated, presenting a systematic quantitative evaluation against neural networks, k-nearest neighbor classification, linear discriminant analysis and random forests. A large database of 2516 film mammography examinations and 73 input features was used to train the classifiers and evaluate for their performance on correctly diagnosed exams as well as false negatives. Further, classifier robustness was investigated using varying training data and feature sets as input. The evaluation was based on the mean exam sensitivity in 0.05-1 FPs on normals on the free-response receiver operating characteristic curve (FROC), incorporated into a tenfold cross validation framework. It was found that SVM classification using a Gaussian kernel offered significantly increased detection performance (P = 0.0002) compared to the reference methods. Varying training data and input features, SVMs showed improved exploitation of large feature sets. It is concluded that with the SVM-based CADe a significant reduction of FPs is possible outperforming other state-of-the-art approaches for breast mass CADe.
Maximum margin semi-supervised learning with irrelevant data.
Yang, Haiqin; Huang, Kaizhu; King, Irwin; Lyu, Michael R
2015-10-01
Semi-supervised learning (SSL) is a typical learning paradigms training a model from both labeled and unlabeled data. The traditional SSL models usually assume unlabeled data are relevant to the labeled data, i.e., following the same distributions of the targeted labeled data. In this paper, we address a different, yet formidable scenario in semi-supervised classification, where the unlabeled data may contain irrelevant data to the labeled data. To tackle this problem, we develop a maximum margin model, named tri-class support vector machine (3C-SVM), to utilize the available training data, while seeking a hyperplane for separating the targeted data well. Our 3C-SVM exhibits several characteristics and advantages. First, it does not need any prior knowledge and explicit assumption on the data relatedness. On the contrary, it can relieve the effect of irrelevant unlabeled data based on the logistic principle and maximum entropy principle. That is, 3C-SVM approaches an ideal classifier. This classifier relies heavily on labeled data and is confident on the relevant data lying far away from the decision hyperplane, while maximally ignoring the irrelevant data, which are hardly distinguished. Second, theoretical analysis is provided to prove that in what condition, the irrelevant data can help to seek the hyperplane. Third, 3C-SVM is a generalized model that unifies several popular maximum margin models, including standard SVMs, Semi-supervised SVMs (S(3)VMs), and SVMs learned from the universum (U-SVMs) as its special cases. More importantly, we deploy a concave-convex produce to solve the proposed 3C-SVM, transforming the original mixed integer programming, to a semi-definite programming relaxation, and finally to a sequence of quadratic programming subproblems, which yields the same worst case time complexity as that of S(3)VMs. Finally, we demonstrate the effectiveness and efficiency of our proposed 3C-SVM through systematical experimental comparisons. Copyright © 2015 Elsevier Ltd. All rights reserved.
Deep neural mapping support vector machines.
Li, Yujian; Zhang, Ting
2017-09-01
The choice of kernel has an important effect on the performance of a support vector machine (SVM). The effect could be reduced by NEUROSVM, an architecture using multilayer perceptron for feature extraction and SVM for classification. In binary classification, a general linear kernel NEUROSVM can be theoretically simplified as an input layer, many hidden layers, and an SVM output layer. As a feature extractor, the sub-network composed of the input and hidden layers is first trained together with a virtual ordinary output layer by backpropagation, then with the output of its last hidden layer taken as input of the SVM classifier for further training separately. By taking the sub-network as a kernel mapping from the original input space into a feature space, we present a novel model, called deep neural mapping support vector machine (DNMSVM), from the viewpoint of deep learning. This model is also a new and general kernel learning method, where the kernel mapping is indeed an explicit function expressed as a sub-network, different from an implicit function induced by a kernel function traditionally. Moreover, we exploit a two-stage procedure of contrastive divergence learning and gradient descent for DNMSVM to jointly training an adaptive kernel mapping instead of a kernel function, without requirement of kernel tricks. As a whole of the sub-network and the SVM classifier, the joint training of DNMSVM is done by using gradient descent to optimize the objective function with the sub-network layer-wise pre-trained via contrastive divergence learning of restricted Boltzmann machines. Compared to the separate training of NEUROSVM, the joint training is a new algorithm for DNMSVM to have advantages over NEUROSVM. Experimental results show that DNMSVM can outperform NEUROSVM and RBFSVM (i.e., SVM with the kernel of radial basis function), demonstrating its effectiveness. Copyright © 2017 Elsevier Ltd. All rights reserved.
You, Zhu-Hong; Lei, Ying-Ke; Zhu, Lin; Xia, Junfeng; Wang, Bing
2013-01-01
Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.
Fiot, Jean-Baptiste; Cohen, Laurent D; Raniga, Parnesh; Fripp, Jurgen
2013-09-01
Support vector machines (SVM) are machine learning techniques that have been used for segmentation and classification of medical images, including segmentation of white matter hyper-intensities (WMH). Current approaches using SVM for WMH segmentation extract features from the brain and classify these followed by complex post-processing steps to remove false positives. The method presented in this paper combines advanced pre-processing, tissue-based feature selection and SVM classification to obtain efficient and accurate WMH segmentation. Features from 125 patients, generated from up to four MR modalities [T1-w, T2-w, proton-density and fluid attenuated inversion recovery(FLAIR)], differing neighbourhood sizes and the use of multi-scale features were compared. We found that although using all four modalities gave the best overall classification (average Dice scores of 0.54 ± 0.12, 0.72 ± 0.06 and 0.82 ± 0.06 respectively for small, moderate and severe lesion loads); this was not significantly different (p = 0.50) from using just T1-w and FLAIR sequences (Dice scores of 0.52 ± 0.13, 0.71 ± 0.08 and 0.81 ± 0.07). Furthermore, there was a negligible difference between using 5 × 5 × 5 and 3 × 3 × 3 features (p = 0.93). Finally, we show that careful consideration of features and pre-processing techniques not only saves storage space and computation time but also leads to more efficient classification, which outperforms the one based on all features with post-processing. Copyright © 2013 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Wu, Di; He, Yong
2007-11-01
The aim of this study is to investigate the potential of the visible and near infrared spectroscopy (Vis/NIRS) technique for non-destructive measurement of soluble solids contents (SSC) in grape juice beverage. 380 samples were studied in this paper. Smoothing way of Savitzky-Golay and standard normal variate were applied for the pre-processing of spectral data. Least-squares support vector machines (LS-SVM) with RBF kernel function was applied to developing the SSC prediction model based on the Vis/NIRS absorbance data. The determination coefficient for prediction (Rp2) of the results predicted by LS-SVM model was 0. 962 and root mean square error (RMSEP) was 0. 434137. It is concluded that Vis/NIRS technique can quantify the SSC of grape juice beverage fast and non-destructively.. At the same time, LS-SVM model was compared with PLS and back propagation neural network (BP-NN) methods. The results showed that LS-SVM was superior to the conventional linear and non-linear methods in predicting SSC of grape juice beverage. In this study, the generation ability of LS-SVM, PLS and BP-NN models were also investigated. It is concluded that LS-SVM regression method is a promising technique for chemometrics in quantitative prediction.
Deeb, Omar; Shaik, Basheerulla; Agrawal, Vijay K
2014-10-01
Quantitative Structure-Activity Relationship (QSAR) models for binding affinity constants (log Ki) of 78 flavonoid ligands towards the benzodiazepine site of GABA (A) receptor complex were calculated using the machine learning methods: artificial neural network (ANN) and support vector machine (SVM) techniques. The models obtained were compared with those obtained using multiple linear regression (MLR) analysis. The descriptor selection and model building were performed with 10-fold cross-validation using the training data set. The SVM and MLR coefficient of determination values are 0.944 and 0.879, respectively, for the training set and are higher than those of ANN models. Though the SVM model shows improvement of training set fitting, the ANN model was superior to SVM and MLR in predicting the test set. Randomization test is employed to check the suitability of the models.
Gromski, Piotr S; Correa, Elon; Vaughan, Andrew A; Wedge, David C; Turner, Michael L; Goodacre, Royston
2014-11-01
Accurate detection of certain chemical vapours is important, as these may be diagnostic for the presence of weapons, drugs of misuse or disease. In order to achieve this, chemical sensors could be deployed remotely. However, the readout from such sensors is a multivariate pattern, and this needs to be interpreted robustly using powerful supervised learning methods. Therefore, in this study, we compared the classification accuracy of four pattern recognition algorithms which include linear discriminant analysis (LDA), partial least squares-discriminant analysis (PLS-DA), random forests (RF) and support vector machines (SVM) which employed four different kernels. For this purpose, we have used electronic nose (e-nose) sensor data (Wedge et al., Sensors Actuators B Chem 143:365-372, 2009). In order to allow direct comparison between our four different algorithms, we employed two model validation procedures based on either 10-fold cross-validation or bootstrapping. The results show that LDA (91.56% accuracy) and SVM with a polynomial kernel (91.66% accuracy) were very effective at analysing these e-nose data. These two models gave superior prediction accuracy, sensitivity and specificity in comparison to the other techniques employed. With respect to the e-nose sensor data studied here, our findings recommend that SVM with a polynomial kernel should be favoured as a classification method over the other statistical models that we assessed. SVM with non-linear kernels have the advantage that they can be used for classifying non-linear as well as linear mapping from analytical data space to multi-group classifications and would thus be a suitable algorithm for the analysis of most e-nose sensor data.
NASA Astrophysics Data System (ADS)
Kumar, Deepak; Thakur, Manoj; Dubey, Chandra S.; Shukla, Dericks P.
2017-10-01
In recent years, various machine learning techniques have been applied for landslide susceptibility mapping. In this study, three different variants of support vector machine viz., SVM, Proximal Support Vector Machine (PSVM) and L2-Support Vector Machine - Modified Finite Newton (L2-SVM-MFN) have been applied on the Mandakini River Basin in Uttarakhand, India to carry out the landslide susceptibility mapping. Eight thematic layers such as elevation, slope, aspect, drainages, geology/lithology, buffer of thrusts/faults, buffer of streams and soil along with the past landslide data were mapped in GIS environment and used for landslide susceptibility mapping in MATLAB. The study area covering 1625 km2 has merely 0.11% of area under landslides. There are 2009 pixels for past landslides out of which 50% (1000) landslides were considered as training set while remaining 50% as testing set. The performance of these techniques has been evaluated and the computational results show that L2-SVM-MFN obtains higher prediction values (0.829) of receiver operating characteristic curve (AUC-area under the curve) as compared to 0.807 for PSVM model and 0.79 for SVM. The results obtained from L2-SVM-MFN model are found to be superior than other SVM prediction models and suggest the usefulness of this technique to problem of landslide susceptibility mapping where training data is very less. However, these techniques can be used for satisfactory determination of susceptible zones with these inputs.
Detection of pesticide (Cyantraniliprole) residue on grapes using hyperspectral sensing
NASA Astrophysics Data System (ADS)
Mohite, Jayantrao; Karale, Yogita; Pappula, Srinivasu; Shabeer, Ahammed T. P.; Sawant, S. D.; Hingmire, Sandip
2017-05-01
Pesticide residues in the fruits, vegetables and agricultural commodities are harmful to humans and are becoming a health concern nowadays. Detection of pesticide residues on various commodities in an open environment is a challenging task. Hyperspectral sensing is one of the recent technologies used to detect the pesticide residues. This paper addresses the problem of detection of pesticide residues of Cyantraniliprole on grapes in open fields using multi temporal hyperspectral remote sensing data. The re ectance data of 686 samples of grapes with no, single and double dose application of Cyantraniliprole has been collected by handheld spectroradiometer (MS- 720) with a wavelength ranging from 350 nm to 1052 nm. The data collection was carried out over a large feature set of 213 spectral bands during the period of March to May 2015. This large feature set may cause model over-fitting problem as well as increase the computational time, so in order to get the most relevant features, various feature selection techniques viz Principle Component Analysis (PCA), LASSO and Elastic Net regularization have been used. Using this selected features, we evaluate the performance of various classifiers such as Artificial Neural Networks (ANN), Support Vector Machine (SVM), Random Forest (RF) and Extreme Gradient Boosting (XGBoost) to classify the grape sample with no, single or double application of Cyantraniliprole. The key finding of this paper is; most of the features selected by the LASSO varies between 350-373nm and 940-990nm consistently for all days. Experimental results also shows that, by using the relevant features selected by LASSO, SVM performs better with average prediction accuracy of 91.98 % among all classifiers, for all days.
Semi-supervised vibration-based classification and condition monitoring of compressors
NASA Astrophysics Data System (ADS)
Potočnik, Primož; Govekar, Edvard
2017-09-01
Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.
NASA Astrophysics Data System (ADS)
Pinar, Anthony; Masarik, Matthew; Havens, Timothy C.; Burns, Joseph; Thelen, Brian; Becker, John
2015-05-01
This paper explores the effectiveness of an anomaly detection algorithm for downward-looking ground penetrating radar (GPR) and electromagnetic inductance (EMI) data. Threat detection with GPR is challenged by high responses to non-target/clutter objects, leading to a large number of false alarms (FAs), and since the responses of target and clutter signatures are so similar, classifier design is not trivial. We suggest a method based on a Run Packing (RP) algorithm to fuse GPR and EMI data into a composite confidence map to improve detection as measured by the area-under-ROC (NAUC) metric. We examine the value of a multiple kernel learning (MKL) support vector machine (SVM) classifier using image features such as histogram of oriented gradients (HOG), local binary patterns (LBP), and local statistics. Experimental results on government furnished data show that use of our proposed fusion and classification methods improves the NAUC when compared with the results from individual sensors and a single kernel SVM classifier.
Drug-related webpages classification based on multi-modal local decision fusion
NASA Astrophysics Data System (ADS)
Hu, Ruiguang; Su, Xiaojing; Liu, Yanxin
2018-03-01
In this paper, multi-modal local decision fusion is used for drug-related webpages classification. First, meaningful text are extracted through HTML parsing, and effective images are chosen by the FOCARSS algorithm. Second, six SVM classifiers are trained for six kinds of drug-taking instruments, which are represented by PHOG. One SVM classifier is trained for the cannabis, which is represented by the mid-feature of BOW model. For each instance in a webpage, seven SVMs give seven labels for its image, and other seven labels are given by searching the names of drug-taking instruments and cannabis in its related text. Concatenating seven labels of image and seven labels of text, the representation of those instances in webpages are generated. Last, Multi-Instance Learning is used to classify those drugrelated webpages. Experimental results demonstrate that the classification accuracy of multi-instance learning with multi-modal local decision fusion is much higher than those of single-modal classification.
NASA Astrophysics Data System (ADS)
Geelen, Christopher D.; Wijnhoven, Rob G. J.; Dubbelman, Gijs; de With, Peter H. N.
2015-03-01
This research considers gender classification in surveillance environments, typically involving low-resolution images and a large amount of viewpoint variations and occlusions. Gender classification is inherently difficult due to the large intra-class variation and interclass correlation. We have developed a gender classification system, which is successfully evaluated on two novel datasets, which realistically consider the above conditions, typical for surveillance. The system reaches a mean accuracy of up to 90% and approaches our human baseline of 92.6%, proving a high-quality gender classification system. We also present an in-depth discussion of the fundamental differences between SVM and RF classifiers. We conclude that balancing the degree of randomization in any classifier is required for the highest classification accuracy. For our problem, an RF-SVM hybrid classifier exploiting the combination of HSV and LBP features results in the highest classification accuracy of 89.9 0.2%, while classification computation time is negligible compared to the detection time of pedestrians.
Daily River Flow Forecasting with Hybrid Support Vector Machine – Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Zaini, N.; Malek, M. A.; Yusoff, M.; Mardi, N. H.; Norhisham, S.
2018-04-01
The application of artificial intelligence techniques for river flow forecasting can further improve the management of water resources and flood prevention. This study concerns the development of support vector machine (SVM) based model and its hybridization with particle swarm optimization (PSO) to forecast short term daily river flow at Upper Bertam Catchment located in Cameron Highland, Malaysia. Ten years duration of historical rainfall, antecedent river flow data and various meteorology parameters data from 2003 to 2012 are used in this study. Four SVM based models are proposed which are SVM1, SVM2, SVM-PSO1 and SVM-PSO2 to forecast 1 to 7 day ahead of river flow. SVM1 and SVM-PSO1 are the models with historical rainfall and antecedent river flow as its input, while SVM2 and SVM-PSO2 are the models with historical rainfall, antecedent river flow data and additional meteorological parameters as input. The performances of the proposed model are measured in term of RMSE and R2 . It is found that, SVM2 outperformed SVM1 and SVM-PSO2 outperformed SVM-PSO1 which meant the additional meteorology parameters used as input to the proposed models significantly affect the model performances. Hybrid models SVM-PSO1 and SVM-PSO2 yield higher performances as compared to SVM1 and SVM2. It is found that hybrid models are more effective in forecasting river flow at 1 to 7 day ahead at the study area.
Xu, Zhanfeng; Bunker, Christopher E; Harrington, Peter de B
2010-11-01
Monitoring the changes of jet fuel physical properties is important because fuel used in high-performance aircraft must meet rigorous specifications. Near-infrared (NIR) spectroscopy is a fast method to characterize fuels. Because of the complexity of NIR spectral data, chemometric techniques are used to extract relevant information from spectral data to accurately classify physical properties of complex fuel samples. In this work, discrimination of fuel types and classification of flash point, freezing point, boiling point (10%, v/v), boiling point (50%, v/v), and boiling point (90%, v/v) of jet fuels (JP-5, JP-8, Jet A, and Jet A1) were investigated. Each physical property was divided into three classes, low, medium, and high ranges, using two evaluations with different class boundary definitions. The class boundaries function as the threshold to alarm when the fuel properties change. Optimal partial least squares discriminant analysis (oPLS-DA), fuzzy rule-building expert system (FuRES), and support vector machines (SVM) were used to build the calibration models between the NIR spectra and classes of physical property of jet fuels. OPLS-DA, FuRES, and SVM were compared with respect to prediction accuracy. The validation of the calibration model was conducted by applying bootstrap Latin partition (BLP), which gives a measure of precision. Prediction accuracy of 97 ± 2% of the flash point, 94 ± 2% of freezing point, 99 ± 1% of the boiling point (10%, v/v), 98 ± 2% of the boiling point (50%, v/v), and 96 ± 1% of the boiling point (90%, v/v) were obtained by FuRES in one boundaries definition. Both FuRES and SVM obtained statistically better prediction accuracy over those obtained by oPLS-DA. The results indicate that combined with chemometric classifiers NIR spectroscopy could be a fast method to monitor the changes of jet fuel physical properties.
Detection of Splice Sites Using Support Vector Machine
NASA Astrophysics Data System (ADS)
Varadwaj, Pritish; Purohit, Neetesh; Arora, Bhumika
Automatic identification and annotation of exon and intron region of gene, from DNA sequences has been an important research area in field of computational biology. Several approaches viz. Hidden Markov Model (HMM), Artificial Intelligence (AI) based machine learning and Digital Signal Processing (DSP) techniques have extensively and independently been used by various researchers to cater this challenging task. In this work, we propose a Support Vector Machine based kernel learning approach for detection of splice sites (the exon-intron boundary) in a gene. Electron-Ion Interaction Potential (EIIP) values of nucleotides have been used for mapping character sequences to corresponding numeric sequences. Radial Basis Function (RBF) SVM kernel is trained using EIIP numeric sequences. Furthermore this was tested on test gene dataset for detection of splice site by window (of 12 residues) shifting. Optimum values of window size, various important parameters of SVM kernel have been optimized for a better accuracy. Receiver Operating Characteristic (ROC) curves have been utilized for displaying the sensitivity rate of the classifier and results showed 94.82% accuracy for splice site detection on test dataset.
Machine learning-based methods for prediction of linear B-cell epitopes.
Wang, Hsin-Wei; Pai, Tun-Wen
2014-01-01
B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.
Detection of Anomalies in Citrus Leaves Using Laser-Induced Breakdown Spectroscopy (LIBS).
Sankaran, Sindhuja; Ehsani, Reza; Morgan, Kelly T
2015-08-01
Nutrient assessment and management are important to maintain productivity in citrus orchards. In this study, laser-induced breakdown spectroscopy (LIBS) was applied for rapid and real-time detection of citrus anomalies. Laser-induced breakdown spectroscopy spectra were collected from citrus leaves with anomalies such as diseases (Huanglongbing, citrus canker) and nutrient deficiencies (iron, manganese, magnesium, zinc), and compared with those of healthy leaves. Baseline correction, wavelet multivariate denoising, and normalization techniques were applied to the LIBS spectra before analysis. After spectral pre-processing, features were extracted using principal component analysis and classified using two models, quadratic discriminant analysis and support vector machine (SVM). The SVM resulted in a high average classification accuracy of 97.5%, with high average canker classification accuracy (96.5%). LIBS peak analysis indicated that high intensities at 229.7, 247.9, 280.3, 393.5, 397.0, and 769.8 nm were observed of 11 peaks found in all the samples. Future studies using controlled experiments with variable nutrient applications are required for quantification of foliar nutrients by using LIBS-based sensing.
Aursand, Marit; Standal, Inger B; Praël, Angelika; McEvoy, Lesley; Irvine, Joe; Axelson, David E
2009-05-13
(13)C nuclear magnetic resonance (NMR) in combination with multivariate data analysis was used to (1) discriminate between farmed and wild Atlantic salmon ( Salmo salar L.), (2) discriminate between different geographical origins, and (3) verify the origin of market samples. Muscle lipids from 195 Atlantic salmon of known origin (wild and farmed salmon from Norway, Scotland, Canada, Iceland, Ireland, the Faroes, and Tasmania) in addition to market samples were analyzed by (13)C NMR spectroscopy and multivariate analysis. Both probabilistic neural networks (PNN) and support vector machines (SVM) provided excellent discrimination (98.5 and 100.0%, respectively) between wild and farmed salmon. Discrimination with respect to geographical origin was somewhat more difficult, with correct classification rates ranging from 82.2 to 99.3% by PNN and SVM, respectively. In the analysis of market samples, five fish labeled and purchased as wild salmon were classified as farmed salmon (indicating mislabeling), and there were also some discrepancies between the classification and the product declaration with regard to geographical origin.
Kim, Il-Hwan; Bong, Jae-Hwan; Park, Jooyoung; Park, Shinsuk
2017-01-01
Driver assistance systems have become a major safety feature of modern passenger vehicles. The advanced driver assistance system (ADAS) is one of the active safety systems to improve the vehicle control performance and, thus, the safety of the driver and the passengers. To use the ADAS for lane change control, rapid and correct detection of the driver’s intention is essential. This study proposes a novel preprocessing algorithm for the ADAS to improve the accuracy in classifying the driver’s intention for lane change by augmenting basic measurements from conventional on-board sensors. The information on the vehicle states and the road surface condition is augmented by using an artificial neural network (ANN) models, and the augmented information is fed to a support vector machine (SVM) to detect the driver’s intention with high accuracy. The feasibility of the developed algorithm was tested through driving simulator experiments. The results show that the classification accuracy for the driver’s intention can be improved by providing an SVM model with sufficient driving information augmented by using ANN models of vehicle dynamics. PMID:28604582
Weighted K-means support vector machine for cancer prediction.
Kim, SungHwan
2016-01-01
To date, the support vector machine (SVM) has been widely applied to diverse bio-medical fields to address disease subtype identification and pathogenicity of genetic variants. In this paper, I propose the weighted K-means support vector machine (wKM-SVM) and weighted support vector machine (wSVM), for which I allow the SVM to impose weights to the loss term. Besides, I demonstrate the numerical relations between the objective function of the SVM and weights. Motivated by general ensemble techniques, which are known to improve accuracy, I directly adopt the boosting algorithm to the newly proposed weighted KM-SVM (and wSVM). For predictive performance, a range of simulation studies demonstrate that the weighted KM-SVM (and wSVM) with boosting outperforms the standard KM-SVM (and SVM) including but not limited to many popular classification rules. I applied the proposed methods to simulated data and two large-scale real applications in the TCGA pan-cancer methylation data of breast and kidney cancer. In conclusion, the weighted KM-SVM (and wSVM) increases accuracy of the classification model, and will facilitate disease diagnosis and clinical treatment decisions to benefit patients. A software package (wSVM) is publicly available at the R-project webpage (https://www.r-project.org).
Mete, Mutlu; Sakoglu, Unal; Spence, Jeffrey S; Devous, Michael D; Harris, Thomas S; Adinoff, Bryon
2016-10-06
Neuroimaging studies have yielded significant advances in the understanding of neural processes relevant to the development and persistence of addiction. However, these advances have not explored extensively for diagnostic accuracy in human subjects. The aim of this study was to develop a statistical approach, using a machine learning framework, to correctly classify brain images of cocaine-dependent participants and healthy controls. In this study, a framework suitable for educing potential brain regions that differed between the two groups was developed and implemented. Single Photon Emission Computerized Tomography (SPECT) images obtained during rest or a saline infusion in three cohorts of 2-4 week abstinent cocaine-dependent participants (n = 93) and healthy controls (n = 69) were used to develop a classification model. An information theoretic-based feature selection algorithm was first conducted to reduce the number of voxels. A density-based clustering algorithm was then used to form spatially connected voxel clouds in three-dimensional space. A statistical classifier, Support Vectors Machine (SVM), was then used for participant classification. Statistically insignificant voxels of spatially connected brain regions were removed iteratively and classification accuracy was reported through the iterations. The voxel-based analysis identified 1,500 spatially connected voxels in 30 distinct clusters after a grid search in SVM parameters. Participants were successfully classified with 0.88 and 0.89 F-measure accuracies in 10-fold cross validation (10xCV) and leave-one-out (LOO) approaches, respectively. Sensitivity and specificity were 0.90 and 0.89 for LOO; 0.83 and 0.83 for 10xCV. Many of the 30 selected clusters are highly relevant to the addictive process, including regions relevant to cognitive control, default mode network related self-referential thought, behavioral inhibition, and contextual memories. Relative hyperactivity and hypoactivity of regional cerebral blood flow in brain regions in cocaine-dependent participants are presented with corresponding level of significance. The SVM-based approach successfully classified cocaine-dependent and healthy control participants using voxels selected with information theoretic-based and statistical methods from participants' SPECT data. The regions found in this study align with brain regions reported in the literature. These findings support the future use of brain imaging and SVM-based classifier in the diagnosis of substance use disorders and furthering an understanding of their underlying pathology.
Accuracy comparison among different machine learning techniques for detecting malicious codes
NASA Astrophysics Data System (ADS)
Narang, Komal
2016-03-01
In this paper, a machine learning based model for malware detection is proposed. It can detect newly released malware i.e. zero day attack by analyzing operation codes on Android operating system. The accuracy of Naïve Bayes, Support Vector Machine (SVM) and Neural Network for detecting malicious code has been compared for the proposed model. In the experiment 400 benign files, 100 system files and 500 malicious files have been used to construct the model. The model yields the best accuracy 88.9% when neural network is used as classifier and achieved 95% and 82.8% accuracy for sensitivity and specificity respectively.
Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan
2014-01-01
Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.
Training set extension for SVM ensemble in P300-speller with familiar face paradigm.
Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou
2018-03-27
P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.
Ruiz-Gonzalez, Ruben; Gomez-Gil, Jaime; Gomez-Gil, Francisco Javier; Martínez-Martínez, Víctor
2014-01-01
The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM)-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i) accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii) the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii) when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels. PMID:25372618
Ruiz-Gonzalez, Ruben; Gomez-Gil, Jaime; Gomez-Gil, Francisco Javier; Martínez-Martínez, Víctor
2014-11-03
The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM)-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i) accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii) the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii) when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels.
NASA Astrophysics Data System (ADS)
Ninos, K.; Georgiadis, P.; Cavouras, D.; Nomicos, C.
2010-05-01
This study presents the design and development of a mobile wireless platform to be used for monitoring and analysis of seismic events and related electromagnetic (EM) signals, employing Personal Digital Assistants (PDAs). A prototype custom-developed application was deployed on a 3G enabled PDA that could connect to the FTP server of the Institute of Geodynamics of the National Observatory of Athens and receive and display EM signals at 4 receiver frequencies (3 KHz (E-W, N-S), 10 KHz (E-W, N-S), 41 MHz and 46 MHz). Signals may originate from any one of the 16 field-stations located around the Greek territory. Employing continuous recordings of EM signals gathered from January 2003 till December 2007, a Support Vector Machines (SVM)-based classification system was designed to distinguish EM precursor signals within noisy background. EM-signals corresponding to recordings preceding major seismic events (Ms≥5R) were segmented, by an experienced scientist, and five features (mean, variance, skewness, kurtosis, and a wavelet based feature), derived from the EM-signals were calculated. These features were used to train the SVM-based classification scheme. The performance of the system was evaluated by the exhaustive search and leave-one-out methods giving 87.2% overall classification accuracy, in correctly identifying EM precursor signals within noisy background employing all calculated features. Due to the insufficient processing power of the PDAs, this task was performed on a typical desktop computer. This optimal trained context of the SVM classifier was then integrated in the PDA based application rendering the platform capable to discriminate between EM precursor signals and noise. System's efficiency was evaluated by an expert who reviewed 1/ multiple EM-signals, up to 18 days prior to corresponding past seismic events, and 2/ the possible EM-activity of a specific region employing the trained SVM classifier. Additionally, the proposed architecture can form a base platform for a future integrated system that will incorporate services such as notifications for field station power failures, disruption of data flow, occurring SEs, and even other types of measurement and analysis processes such as the integration of a special analysis algorithm based on the ratio of short term to long term signal average.
NASA Astrophysics Data System (ADS)
Taha, Zahari; Muazu Musa, Rabiu; Majeed, Anwar P. P. Abdul; Razali Abdullah, Mohamad; Amirul Abdullah, Muhammad; Hasnun Arif Hassan, Mohd; Khalil, Zubair
2018-04-01
The present study employs a machine learning algorithm namely support vector machine (SVM) to classify high and low potential archers from a collection of bio-physiological variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. The bio-physiological variables namely resting heart rate, resting respiratory rate, resting diastolic blood pressure, resting systolic blood pressure, as well as calories intake, were measured prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models i.e. linear, quadratic and cubic kernel functions, were trained on the aforementioned variables. The k-means clustered the archers into high (HPA) and low potential archers (LPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy with a classification accuracy of 94% in comparison the other tested models. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected bio-physiological variables examined.
Fuzzy support vector machine for microarray imbalanced data classification
NASA Astrophysics Data System (ADS)
Ladayya, Faroh; Purnami, Santi Wulan; Irhamah
2017-11-01
DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different contributions to the classifier. The minority classes have large fuzzy membership so FSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is a high dimensional data with a very large number of features, it is necessary to do feature selection first using Fast Correlation based Filter (FCBF). In this study will be analyzed by SVM, FSVM and both methods by applying FCBF and get the classification performance of them. Based on the overall results, FSVM on selected features has the best classification performance compared to SVM.
Bizios, Dimitrios; Heijl, Anders; Hougaard, Jesper Leth; Bengtsson, Boel
2010-02-01
To compare the performance of two machine learning classifiers (MLCs), artificial neural networks (ANNs) and support vector machines (SVMs), with input based on retinal nerve fibre layer thickness (RNFLT) measurements by optical coherence tomography (OCT), on the diagnosis of glaucoma, and to assess the effects of different input parameters. We analysed Stratus OCT data from 90 healthy persons and 62 glaucoma patients. Performance of MLCs was compared using conventional OCT RNFLT parameters plus novel parameters such as minimum RNFLT values, 10th and 90th percentiles of measured RNFLT, and transformations of A-scan measurements. For each input parameter and MLC, the area under the receiver operating characteristic curve (AROC) was calculated. There were no statistically significant differences between ANNs and SVMs. The best AROCs for both ANN (0.982, 95%CI: 0.966-0.999) and SVM (0.989, 95% CI: 0.979-1.0) were based on input of transformed A-scan measurements. Our SVM trained on this input performed better than ANNs or SVMs trained on any of the single RNFLT parameters (p < or = 0.038). The performance of ANNs and SVMs trained on minimum thickness values and the 10th and 90th percentiles were at least as good as ANNs and SVMs with input based on the conventional RNFLT parameters. No differences between ANN and SVM were observed in this study. Both MLCs performed very well, with similar diagnostic performance. Input parameters have a larger impact on diagnostic performance than the type of machine classifier. Our results suggest that parameters based on transformed A-scan thickness measurements of the RNFL processed by machine classifiers can improve OCT-based glaucoma diagnosis.
Application of texture analysis method for mammogram density classification
NASA Astrophysics Data System (ADS)
Nithya, R.; Santhi, B.
2017-07-01
Mammographic density is considered a major risk factor for developing breast cancer. This paper proposes an automated approach to classify breast tissue types in digital mammogram. The main objective of the proposed Computer-Aided Diagnosis (CAD) system is to investigate various feature extraction methods and classifiers to improve the diagnostic accuracy in mammogram density classification. Texture analysis methods are used to extract the features from the mammogram. Texture features are extracted by using histogram, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Difference Matrix (GLDM), Local Binary Pattern (LBP), Entropy, Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT), Gabor transform and trace transform. These extracted features are selected using Analysis of Variance (ANOVA). The features selected by ANOVA are fed into the classifiers to characterize the mammogram into two-class (fatty/dense) and three-class (fatty/glandular/dense) breast density classification. This work has been carried out by using the mini-Mammographic Image Analysis Society (MIAS) database. Five classifiers are employed namely, Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). Experimental results show that ANN provides better performance than LDA, NB, KNN and SVM classifiers. The proposed methodology has achieved 97.5% accuracy for three-class and 99.37% for two-class density classification.
Zangwill, Linda M; Chan, Kwokleung; Bowd, Christopher; Hao, Jicuang; Lee, Te-Won; Weinreb, Robert N; Sejnowski, Terrence J; Goldbaum, Michael H
2004-09-01
To determine whether topographical measurements of the parapapillary region analyzed by machine learning classifiers can detect early to moderate glaucoma better than similarly processed measurements obtained within the disc margin and to improve methods for optimization of machine learning classifier feature selection. One eye of each of 95 patients with early to moderate glaucomatous visual field damage and of each of 135 normal subjects older than 40 years participating in the longitudinal Diagnostic Innovations in Glaucoma Study (DIGS) were included. Heidelberg Retina Tomograph (HRT; Heidelberg Engineering, Dossenheim, Germany) mean height contour was measured in 36 equal sectors, both along the disc margin and in the parapapillary region (at a mean contour line radius of 1.7 mm). Each sector was evaluated individually and in combination with other sectors. Gaussian support vector machine (SVM) learning classifiers were used to interpret HRT sector measurements along the disc margin and in the parapapillary region, to differentiate between eyes with normal and glaucomatous visual fields and to compare the results with global and regional HRT parameter measurements. The area under the receiver operating characteristic (ROC) curve was used to measure diagnostic performance of the HRT parameters and to evaluate the cross-validation strategies and forward selection and backward elimination optimization techniques that were used to generate the reduced feature sets. The area under the ROC curve for mean height contour of the 36 sectors along the disc margin was larger than that for the mean height contour in the parapapillary region (0.97 and 0.85, respectively). Of the 36 individual sectors along the disc margin, those in the inferior region between 240 degrees and 300 degrees, had the largest area under the ROC curve (0.85-0.91). With SVM Gaussian techniques, the regional parameters showed the best ability to discriminate between normal eyes and eyes with glaucomatous visual field damage, followed by the global parameters, mean height contour measures along the disc margin, and mean height contour measures in the parapapillary region. The area under the ROC curve was 0.98, 0.94, 0.93, and 0.85, respectively. Cross-validation and optimization techniques demonstrated that good discrimination (99% of peak area under the ROC curve) can be obtained with a reduced number of HRT parameters. Mean height contour measurements along the disc margin discriminated between normal and glaucomatous eyes better than measurements obtained in the parapapillary region. Copyright Association for Research in Vision and Ophthalmology
Zangwill, Linda M.; Chan, Kwokleung; Bowd, Christopher; Hao, Jicuang; Lee, Te-Won; Weinreb, Robert N.; Sejnowski, Terrence J.; Goldbaum, Michael H.
2010-01-01
Purpose To determine whether topographical measurements of the parapapillary region analyzed by machine learning classifiers can detect early to moderate glaucoma better than similarly processed measurements obtained within the disc margin and to improve methods for optimization of machine learning classifier feature selection. Methods One eye of each of 95 patients with early to moderate glaucomatous visual field damage and of each of 135 normal subjects older than 40 years participating in the longitudinal Diagnostic Innovations in Glaucoma Study (DIGS) were included. Heidelberg Retina Tomograph (HRT; Heidelberg Engineering, Dossenheim, Germany) mean height contour was measured in 36 equal sectors, both along the disc margin and in the parapapillary region (at a mean contour line radius of 1.7 mm). Each sector was evaluated individually and in combination with other sectors. Gaussian support vector machine (SVM) learning classifiers were used to interpret HRT sector measurements along the disc margin and in the parapapillary region, to differentiate between eyes with normal and glaucomatous visual fields and to compare the results with global and regional HRT parameter measurements. The area under the receiver operating characteristic (ROC) curve was used to measure diagnostic performance of the HRT parameters and to evaluate the cross-validation strategies and forward selection and backward elimination optimization techniques that were used to generate the reduced feature sets. Results The area under the ROC curve for mean height contour of the 36 sectors along the disc margin was larger than that for the mean height contour in the parapapillary region (0.97 and 0.85, respectively). Of the 36 individual sectors along the disc margin, those in the inferior region between 240° and 300°, had the largest area under the ROC curve (0.85–0.91). With SVM Gaussian techniques, the regional parameters showed the best ability to discriminate between normal eyes and eyes with glaucomatous visual field damage, followed by the global parameters, mean height contour measures along the disc margin, and mean height contour measures in the parapapillary region. The area under the ROC curve was 0.98, 0.94, 0.93, and 0.85, respectively. Cross-validation and optimization techniques demonstrated that good discrimination (99% of peak area under the ROC curve) can be obtained with a reduced number of HRT parameters. Conclusions Mean height contour measurements along the disc margin discriminated between normal and glaucomatous eyes better than measurements obtained in the parapapillary region. PMID:15326133
Fine-grained leukocyte classification with deep residual learning for microscopic images.
Qin, Feiwei; Gao, Nannan; Peng, Yong; Wu, Zizhao; Shen, Shuying; Grudtsin, Artur
2018-08-01
Leukocyte classification and cytometry have wide applications in medical domain, previous researches usually exploit machine learning techniques to classify leukocytes automatically. However, constrained by the past development of machine learning techniques, for example, extracting distinctive features from raw microscopic images are difficult, the widely used SVM classifier only has relative few parameters to tune, these methods cannot efficiently handle fine-grained classification cases when the white blood cells have up to 40 categories. Based on deep learning theory, a systematic study is conducted on finer leukocyte classification in this paper. A deep residual neural network based leukocyte classifier is constructed at first, which can imitate the domain expert's cell recognition process, and extract salient features robustly and automatically. Then the deep neural network classifier's topology is adjusted according to the prior knowledge of white blood cell test. After that the microscopic image dataset with almost one hundred thousand labeled leukocytes belonging to 40 categories is built, and combined training strategies are adopted to make the designed classifier has good generalization ability. The proposed deep residual neural network based classifier was tested on microscopic image dataset with 40 leukocyte categories. It achieves top-1 accuracy of 77.80%, top-5 accuracy of 98.75% during the training procedure. The average accuracy on the test set is nearly 76.84%. This paper presents a fine-grained leukocyte classification method for microscopic images, based on deep residual learning theory and medical domain knowledge. Experimental results validate the feasibility and effectiveness of our approach. Extended experiments support that the fine-grained leukocyte classifier could be used in real medical applications, assist doctors in diagnosing diseases, reduce human power significantly. Copyright © 2018 Elsevier B.V. All rights reserved.
Raza, Shan-e-Ahmed; Smith, Hazel K.; Clarkson, Graham J. J.; Taylor, Gail; Thompson, Andrew J.; Clarkson, John; Rajpoot, Nasir M.
2014-01-01
Thermal imaging has been used in the past for remote detection of regions of canopy showing symptoms of stress, including water deficit stress. Stress indices derived from thermal images have been used as an indicator of canopy water status, but these depend on the choice of reference surfaces and environmental conditions and can be confounded by variations in complex canopy structure. Therefore, in this work, instead of using stress indices, information from thermal and visible light imagery was combined along with machine learning techniques to identify regions of canopy showing a response to soil water deficit. Thermal and visible light images of a spinach canopy with different levels of soil moisture were captured. Statistical measurements from these images were extracted and used to classify between canopies growing in well-watered soil or under soil moisture deficit using Support Vector Machines (SVM) and Gaussian Processes Classifier (GPC) and a combination of both the classifiers. The classification results show a high correlation with soil moisture. We demonstrate that regions of a spinach crop responding to soil water deficit can be identified by using machine learning techniques with a high accuracy of 97%. This method could, in principle, be applied to any crop at a range of scales. PMID:24892284
Tahir, Fahima; Fahiem, Muhammad Abuzar
2014-01-01
The quality of pharmaceutical products plays an important role in pharmaceutical industry as well as in our lives. Usage of defective tablets can be harmful for patients. In this research we proposed a nondestructive method to identify defective and nondefective tablets using their surface morphology. Three different environmental factors temperature, humidity and moisture are analyzed to evaluate the performance of the proposed method. Multiple textural features are extracted from the surface of the defective and nondefective tablets. These textural features are gray level cooccurrence matrix, run length matrix, histogram, autoregressive model and HAAR wavelet. Total textural features extracted from images are 281. We performed an analysis on all those 281, top 15, and top 2 features. Top 15 features are extracted using three different feature reduction techniques: chi-square, gain ratio and relief-F. In this research we have used three different classifiers: support vector machine, K-nearest neighbors and naïve Bayes to calculate the accuracies against proposed method using two experiments, that is, leave-one-out cross-validation technique and train test models. We tested each classifier against all selected features and then performed the comparison of their results. The experimental work resulted in that in most of the cases SVM performed better than the other two classifiers.
Millennial Filipino Student Engagement Analyzer Using Facial Feature Classification
NASA Astrophysics Data System (ADS)
Manseras, R.; Eugenio, F.; Palaoag, T.
2018-03-01
Millennials has been a word of mouth of everybody and a target market of various companies nowadays. In the Philippines, they comprise one third of the total population and most of them are still in school. Having a good education system is important for this generation to prepare them for better careers. And a good education system means having quality instruction as one of the input component indicators. In a classroom environment, teachers use facial features to measure the affect state of the class. Emerging technologies like Affective Computing is one of today’s trends to improve quality instruction delivery. This, together with computer vision, can be used in analyzing affect states of the students and improve quality instruction delivery. This paper proposed a system of classifying student engagement using facial features. Identifying affect state, specifically Millennial Filipino student engagement, is one of the main priorities of every educator and this directed the authors to develop a tool to assess engagement percentage. Multiple face detection framework using Face API was employed to detect as many student faces as possible to gauge current engagement percentage of the whole class. The binary classifier model using Support Vector Machine (SVM) was primarily set in the conceptual framework of this study. To achieve the most accuracy performance of this model, a comparison of SVM to two of the most widely used binary classifiers were tested. Results show that SVM bested RandomForest and Naive Bayesian algorithms in most of the experiments from the different test datasets.
NASA Astrophysics Data System (ADS)
Krell, Mario Michael; Wilshusen, Nils; Seeland, Anett; Kim, Su Kyoung
2017-04-01
Objective. Classifier transfers usually come with dataset shifts. To overcome dataset shifts in practical applications, we consider the limitations in computational resources in this paper for the adaptation of batch learning algorithms, like the support vector machine (SVM). Approach. We focus on data selection strategies which limit the size of the stored training data by different inclusion, exclusion, and further dataset manipulation criteria like handling class imbalance with two new approaches. We provide a comparison of the strategies with linear SVMs on several synthetic datasets with different data shifts as well as on different transfer settings with electroencephalographic (EEG) data. Main results. For the synthetic data, adding only misclassified samples performed astoundingly well. Here, balancing criteria were very important when the other criteria were not well chosen. For the transfer setups, the results show that the best strategy depends on the intensity of the drift during the transfer. Adding all and removing the oldest samples results in the best performance, whereas for smaller drifts, it can be sufficient to only add samples near the decision boundary of the SVM which reduces processing resources. Significance. For brain-computer interfaces based on EEG data, models trained on data from a calibration session, a previous recording session, or even from a recording session with another subject are used. We show, that by using the right combination of data selection criteria, it is possible to adapt the SVM classifier to overcome the performance drop from the transfer.
Reconstruction of hyperspectral image using matting model for classification
NASA Astrophysics Data System (ADS)
Xie, Weiying; Li, Yunsong; Ge, Chiru
2016-05-01
Although hyperspectral images (HSIs) captured by satellites provide much information in spectral regions, some bands are redundant or have large amounts of noise, which are not suitable for image analysis. To address this problem, we introduce a method for reconstructing the HSI with noise reduction and contrast enhancement using a matting model for the first time. The matting model refers to each spectral band of an HSI that can be decomposed into three components, i.e., alpha channel, spectral foreground, and spectral background. First, one spectral band of an HSI with more refined information than most other bands is selected, and is referred to as an alpha channel of the HSI to estimate the hyperspectral foreground and hyperspectral background. Finally, a combination operation is applied to reconstruct the HSI. In addition, the support vector machine (SVM) classifier and three sparsity-based classifiers, i.e., orthogonal matching pursuit (OMP), simultaneous OMP, and OMP based on first-order neighborhood system weighted classifiers, are utilized on the reconstructed HSI and the original HSI to verify the effectiveness of the proposed method. Specifically, using the reconstructed HSI, the average accuracy of the SVM classifier can be improved by as much as 19%.
NASA Astrophysics Data System (ADS)
Xin, Ni; Gu, Xiao-Feng; Wu, Hao; Hu, Yu-Zhu; Yang, Zhong-Lin
2012-04-01
Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.
A Hybrid Vehicle Detection Method Based on Viola-Jones and HOG + SVM from UAV Images
Xu, Yongzheng; Yu, Guizhen; Wang, Yunpeng; Wu, Xinkai; Ma, Yalong
2016-01-01
A new hybrid vehicle detection scheme which integrates the Viola-Jones (V-J) and linear SVM classifier with HOG feature (HOG + SVM) methods is proposed for vehicle detection from low-altitude unmanned aerial vehicle (UAV) images. As both V-J and HOG + SVM are sensitive to on-road vehicles’ in-plane rotation, the proposed scheme first adopts a roadway orientation adjustment method, which rotates each UAV image to align the roads with the horizontal direction so the original V-J or HOG + SVM method can be directly applied to achieve fast detection and high accuracy. To address the issue of descending detection speed for V-J and HOG + SVM, the proposed scheme further develops an adaptive switching strategy which sophistically integrates V-J and HOG + SVM methods based on their different descending trends of detection speed to improve detection efficiency. A comprehensive evaluation shows that the switching strategy, combined with the road orientation adjustment method, can significantly improve the efficiency and effectiveness of the vehicle detection from UAV images. The results also show that the proposed vehicle detection method is competitive compared with other existing vehicle detection methods. Furthermore, since the proposed vehicle detection method can be performed on videos captured from moving UAV platforms without the need of image registration or additional road database, it has great potentials of field applications. Future research will be focusing on expanding the current method for detecting other transportation modes such as buses, trucks, motors, bicycles, and pedestrians. PMID:27548179
A Hybrid Vehicle Detection Method Based on Viola-Jones and HOG + SVM from UAV Images.
Xu, Yongzheng; Yu, Guizhen; Wang, Yunpeng; Wu, Xinkai; Ma, Yalong
2016-08-19
A new hybrid vehicle detection scheme which integrates the Viola-Jones (V-J) and linear SVM classifier with HOG feature (HOG + SVM) methods is proposed for vehicle detection from low-altitude unmanned aerial vehicle (UAV) images. As both V-J and HOG + SVM are sensitive to on-road vehicles' in-plane rotation, the proposed scheme first adopts a roadway orientation adjustment method, which rotates each UAV image to align the roads with the horizontal direction so the original V-J or HOG + SVM method can be directly applied to achieve fast detection and high accuracy. To address the issue of descending detection speed for V-J and HOG + SVM, the proposed scheme further develops an adaptive switching strategy which sophistically integrates V-J and HOG + SVM methods based on their different descending trends of detection speed to improve detection efficiency. A comprehensive evaluation shows that the switching strategy, combined with the road orientation adjustment method, can significantly improve the efficiency and effectiveness of the vehicle detection from UAV images. The results also show that the proposed vehicle detection method is competitive compared with other existing vehicle detection methods. Furthermore, since the proposed vehicle detection method can be performed on videos captured from moving UAV platforms without the need of image registration or additional road database, it has great potentials of field applications. Future research will be focusing on expanding the current method for detecting other transportation modes such as buses, trucks, motors, bicycles, and pedestrians.
Document page structure learning for fixed-layout e-books using conditional random fields
NASA Astrophysics Data System (ADS)
Tao, Xin; Tang, Zhi; Xu, Canhui
2013-12-01
In this paper, a model is proposed to learn logical structure of fixed-layout document pages by combining support vector machine (SVM) and conditional random fields (CRF). Features related to each logical label and their dependencies are extracted from various original Portable Document Format (PDF) attributes. Both local evidence and contextual dependencies are integrated in the proposed model so as to achieve better logical labeling performance. With the merits of SVM as local discriminative classifier and CRF modeling contextual correlations of adjacent fragments, it is capable of resolving the ambiguities of semantic labels. The experimental results show that CRF based models with both tree and chain graph structures outperform the SVM model with an increase of macro-averaged F1 by about 10%.
Interpreting support vector machine models for multivariate group wise analysis in neuroimaging
Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos
2015-01-01
Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913
LMD Based Features for the Automatic Seizure Detection of EEG Signals Using SVM.
Zhang, Tao; Chen, Wanzhong
2017-08-01
Achieving the goal of detecting seizure activity automatically using electroencephalogram (EEG) signals is of great importance and significance for the treatment of epileptic seizures. To realize this aim, a newly-developed time-frequency analytical algorithm, namely local mean decomposition (LMD), is employed in the presented study. LMD is able to decompose an arbitrary signal into a series of product functions (PFs). Primarily, the raw EEG signal is decomposed into several PFs, and then the temporal statistical and non-linear features of the first five PFs are calculated. The features of each PF are fed into five classifiers, including back propagation neural network (BPNN), K-nearest neighbor (KNN), linear discriminant analysis (LDA), un-optimized support vector machine (SVM) and SVM optimized by genetic algorithm (GA-SVM), for five classification cases, respectively. Confluent features of all PFs and raw EEG are further passed into the high-performance GA-SVM for the same classification tasks. Experimental results on the international public Bonn epilepsy EEG dataset show that the average classification accuracy of the presented approach are equal to or higher than 98.10% in all the five cases, and this indicates the effectiveness of the proposed approach for automated seizure detection.
Dolz, Jose; Laprie, Anne; Ken, Soléakhéna; Leroy, Henri-Arthur; Reyns, Nicolas; Massoptier, Laurent; Vermandel, Maximilien
2016-01-01
To constrain the risk of severe toxicity in radiotherapy and radiosurgery, precise volume delineation of organs at risk is required. This task is still manually performed, which is time-consuming and prone to observer variability. To address these issues, and as alternative to atlas-based segmentation methods, machine learning techniques, such as support vector machines (SVM), have been recently presented to segment subcortical structures on magnetic resonance images (MRI). SVM is proposed to segment the brainstem on MRI in multicenter brain cancer context. A dataset composed by 14 adult brain MRI scans is used to evaluate its performance. In addition to spatial and probabilistic information, five different image intensity values (IIVs) configurations are evaluated as features to train the SVM classifier. Segmentation accuracy is evaluated by computing the Dice similarity coefficient (DSC), absolute volumes difference (AVD) and percentage volume difference between automatic and manual contours. Mean DSC for all proposed IIVs configurations ranged from 0.89 to 0.90. Mean AVD values were below 1.5 cm(3), where the value for best performing IIVs configuration was 0.85 cm(3), representing an absolute mean difference of 3.99% with respect to the manual segmented volumes. Results suggest consistent volume estimation and high spatial similarity with respect to expert delineations. The proposed approach outperformed presented methods to segment the brainstem, not only in volume similarity metrics, but also in segmentation time. Preliminary results showed that the approach might be promising for adoption in clinical use.
Environmental Monitoring Networks Optimization Using Advanced Active Learning Algorithms
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail; Volpi, Michele; Copa, Loris
2010-05-01
The problem of environmental monitoring networks optimization (MNO) belongs to one of the basic and fundamental tasks in spatio-temporal data collection, analysis, and modeling. There are several approaches to this problem, which can be considered as a design or redesign of monitoring network by applying some optimization criteria. The most developed and widespread methods are based on geostatistics (family of kriging models, conditional stochastic simulations). In geostatistics the variance is mainly used as an optimization criterion which has some advantages and drawbacks. In the present research we study an application of advanced techniques following from the statistical learning theory (SLT) - support vector machines (SVM) and the optimization of monitoring networks when dealing with a classification problem (data are discrete values/classes: hydrogeological units, soil types, pollution decision levels, etc.) is considered. SVM is a universal nonlinear modeling tool for classification problems in high dimensional spaces. The SVM solution is maximizing the decision boundary between classes and has a good generalization property for noisy data. The sparse solution of SVM is based on support vectors - data which contribute to the solution with nonzero weights. Fundamentally the MNO for classification problems can be considered as a task of selecting new measurement points which increase the quality of spatial classification and reduce the testing error (error on new independent measurements). In SLT this is a typical problem of active learning - a selection of the new unlabelled points which efficiently reduce the testing error. A classical approach (margin sampling) to active learning is to sample the points closest to the classification boundary. This solution is suboptimal when points (or generally the dataset) are redundant for the same class. In the present research we propose and study two new advanced methods of active learning adapted to the solution of MNO problem: 1) hierarchical top-down clustering in an input space in order to remove redundancy when data are clustered, and 2) a general method (independent on classifier) which gives posterior probabilities that can be used to define the classifier confidence and corresponding proposals for new measurement points. The basic ideas and procedures are explained by applying simulated data sets. The real case study deals with the analysis and mapping of soil types, which is a multi-class classification problem. Maps of soil types are important for the analysis and 3D modeling of heavy metals migration in soil and prediction risk mapping. The results obtained demonstrate the high quality of SVM mapping and efficiency of monitoring network optimization by using active learning approaches. The research was partly supported by SNSF projects No. 200021-126505 and 200020-121835.
Analyzing Body Movements within the Laban Effort Framework Using a Single Accelerometer
Kikhia, Basel; Gomez, Miguel; Jiménez, Lara Lorna; Hallberg, Josef; Karvonen, Niklas; Synnes, Kåre
2014-01-01
This article presents a study on analyzing body movements by using a single accelerometer sensor. The investigated categories of body movements belong to the Laban Effort Framework: Strong—Light, Free—Bound and Sudden—Sustained. All body movements were represented by a set of activities used for data collection. The calculated accuracy of detecting the body movements was based on collecting data from a single wireless tri-axial accelerometer sensor. Ten healthy subjects collected data from three body locations (chest, wrist and thigh) simultaneously in order to analyze the locations comparatively. The data was then processed and analyzed using Machine Learning techniques. The wrist placement was found to be the best single location to record data for detecting Strong—Light body movements using the Random Forest classifier. The wrist placement was also the best location for classifying Bound—Free body movements using the SVM classifier. However, the data collected from the chest placement yielded the best results for detecting Sudden—Sustained body movements using the Random Forest classifier. The study shows that the choice of the accelerometer placement should depend on the targeted type of movement. In addition, the choice of the classifier when processing data should also depend on the chosen location and the target movement. PMID:24662408
Kabiri, Keivan; Rezai, Hamid; Moradi, Masoud
2018-04-01
High spatial resolution WorldView-2 (WV2) satellite imagery coupled with field observations have been utilized for mapping the coral reefs around Hendorabi Island in the northern Persian Gulf. In doing so, three standard multispectral bands (red, green, and blue) were selected to produce a classified map for benthic habitats. The in-situ observations were included photo-transects taken by snorkeling in water surface and manta tow technique. The satellite image has been classified using support vector machine (SVM) classifier by considering the information obtained from field measurements as both training and control points data. The results obtained from manta tow demonstrated that the mean total live hard coral coverage was 29.04% ± 2.44% around the island. Massive corals poritiids (20.70%) and branching corals acroporiids (20.33%) showed higher live coral coverage compared to other corals. Moreover, the map produced from satellite image illustrated the distribution of habitats with 78.1% of overall accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.
Pizarro, Ricardo A; Cheng, Xi; Barnett, Alan; Lemaitre, Herve; Verchinski, Beth A; Goldman, Aaron L; Xiao, Ena; Luo, Qian; Berman, Karen F; Callicott, Joseph H; Weinberger, Daniel R; Mattay, Venkata S
2016-01-01
High-resolution three-dimensional magnetic resonance imaging (3D-MRI) is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM) algorithm in the quality assessment of structural brain images, using global and region of interest (ROI) automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy) of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.
Classification of stellar spectra with SVM based on within-class scatter and between-class scatter
NASA Astrophysics Data System (ADS)
Liu, Zhong-bao; Zhou, Fang-xiao; Qin, Zhen-tao; Luo, Xue-gang; Zhang, Jing
2018-07-01
Support Vector Machine (SVM) is a popular data mining technique, and it has been widely applied in astronomical tasks, especially in stellar spectra classification. Since SVM doesn't take the data distribution into consideration, and therefore, its classification efficiencies can't be greatly improved. Meanwhile, SVM ignores the internal information of the training dataset, such as the within-class structure and between-class structure. In view of this, we propose a new classification algorithm-SVM based on Within-Class Scatter and Between-Class Scatter (WBS-SVM) in this paper. WBS-SVM tries to find an optimal hyperplane to separate two classes. The difference is that it incorporates minimum within-class scatter and maximum between-class scatter in Linear Discriminant Analysis (LDA) into SVM. These two scatters represent the distributions of the training dataset, and the optimization of WBS-SVM ensures the samples in the same class are as close as possible and the samples in different classes are as far as possible. Experiments on the K-, F-, G-type stellar spectra from Sloan Digital Sky Survey (SDSS), Data Release 8 show that our proposed WBS-SVM can greatly improve the classification accuracies.
Lin, Xiaohui; Li, Chao; Zhang, Yanhui; Su, Benzhe; Fan, Meng; Wei, Hai
2017-12-26
Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.
Song, Min; Yu, Hwanjo; Han, Wook-Shin
2011-11-24
Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.
Incremental classification learning for anomaly detection in medical images
NASA Astrophysics Data System (ADS)
Giritharan, Balathasan; Yuan, Xiaohui; Liu, Jianguo
2009-02-01
Computer-aided diagnosis usually screens thousands of instances to find only a few positive cases that indicate probable presence of disease.The amount of patient data increases consistently all the time. In diagnosis of new instances, disagreement occurs between a CAD system and physicians, which suggests inaccurate classifiers. Intuitively, misclassified instances and the previously acquired data should be used to retrain the classifier. This, however, is very time consuming and, in some cases where dataset is too large, becomes infeasible. In addition, among the patient data, only a small percentile shows positive sign, which is known as imbalanced data.We present an incremental Support Vector Machines(SVM) as a solution for the class imbalance problem in classification of anomaly in medical images. The support vectors provide a concise representation of the distribution of the training data. Here we use bootstrapping to identify potential candidate support vectors for future iterations. Experiments were conducted using images from endoscopy videos, and the sensitivity and specificity were close to that of SVM trained using all samples available at a given incremental step with significantly improved efficiency in training the classifier.
Pattern Recognition Approaches for Breast Cancer DCE-MRI Classification: A Systematic Review.
Fusco, Roberta; Sansone, Mario; Filice, Salvatore; Carone, Guglielmo; Amato, Daniela Maria; Sansone, Carlo; Petrillo, Antonella
2016-01-01
We performed a systematic review of several pattern analysis approaches for classifying breast lesions using dynamic, morphological, and textural features in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). Several machine learning approaches, namely artificial neural networks (ANN), support vector machines (SVM), linear discriminant analysis (LDA), tree-based classifiers (TC), and Bayesian classifiers (BC), and features used for classification are described. The findings of a systematic review of 26 studies are presented. The sensitivity and specificity are respectively 91 and 83 % for ANN, 85 and 82 % for SVM, 96 and 85 % for LDA, 92 and 87 % for TC, and 82 and 85 % for BC. The sensitivity and specificity are respectively 82 and 74 % for dynamic features, 93 and 60 % for morphological features, 88 and 81 % for textural features, 95 and 86 % for a combination of dynamic and morphological features, and 88 and 84 % for a combination of dynamic, morphological, and other features. LDA and TC have the best performance. A combination of dynamic and morphological features gives the best performance.
Automated analysis and classification of melanocytic tumor on skin whole slide images.
Xu, Hongming; Lu, Cheng; Berendt, Richard; Jha, Naresh; Mandal, Mrinal
2018-06-01
This paper presents a computer-aided technique for automated analysis and classification of melanocytic tumor on skin whole slide biopsy images. The proposed technique consists of four main modules. First, skin epidermis and dermis regions are segmented by a multi-resolution framework. Next, epidermis analysis is performed, where a set of epidermis features reflecting nuclear morphologies and spatial distributions is computed. In parallel with epidermis analysis, dermis analysis is also performed, where dermal cell nuclei are segmented and a set of textural and cytological features are computed. Finally, the skin melanocytic image is classified into different categories such as melanoma, nevus or normal tissue by using a multi-class support vector machine (mSVM) with extracted epidermis and dermis features. Experimental results on 66 skin whole slide images indicate that the proposed technique achieves more than 95% classification accuracy, which suggests that the technique has the potential to be used for assisting pathologists on skin biopsy image analysis and classification. Copyright © 2018 Elsevier Ltd. All rights reserved.
Image Augmentation for Object Image Classification Based On Combination of Pre-Trained CNN and SVM
NASA Astrophysics Data System (ADS)
Shima, Yoshihiro
2018-04-01
Neural networks are a powerful means of classifying object images. The proposed image category classification method for object images combines convolutional neural networks (CNNs) and support vector machines (SVMs). A pre-trained CNN, called Alex-Net, is used as a pattern-feature extractor. Alex-Net is pre-trained for the large-scale object-image dataset ImageNet. Instead of training, Alex-Net, pre-trained for ImageNet is used. An SVM is used as trainable classifier. The feature vectors are passed to the SVM from Alex-Net. The STL-10 dataset are used as object images. The number of classes is ten. Training and test samples are clearly split. STL-10 object images are trained by the SVM with data augmentation. We use the pattern transformation method with the cosine function. We also apply some augmentation method such as rotation, skewing and elastic distortion. By using the cosine function, the original patterns were left-justified, right-justified, top-justified, or bottom-justified. Patterns were also center-justified and enlarged. Test error rate is decreased by 0.435 percentage points from 16.055% by augmentation with cosine transformation. Error rates are increased by other augmentation method such as rotation, skewing and elastic distortion, compared without augmentation. Number of augmented data is 30 times that of the original STL-10 5K training samples. Experimental test error rate for the test 8k STL-10 object images was 15.620%, which shows that image augmentation is effective for image category classification.
Thurston, Rebecca C; Hernandez, Javier; Del Rio, Jose M; De La Torre, Fernando
2011-07-01
Most midlife women have hot flashes. The conventional criterion (≥2 μmho rise/30 s) for classifying hot flashes physiologically has shown poor performance. We improved this performance in the laboratory with Support Vector Machines (SVMs), a pattern classification method. We aimed to compare conventional to SVM methods to classify hot flashes in the ambulatory setting. Thirty-one women with hot flashes underwent 24 h of ambulatory sternal skin conductance monitoring. Hot flashes were quantified with conventional (≥2 μmho/30 s) and SVM methods. Conventional methods had low sensitivity (sensitivity=.57, specificity=.98, positive predictive value (PPV)=.91, negative predictive value (NPV)=.90, F1=.60), with performance lower with higher body mass index (BMI). SVMs improved this performance (sensitivity=.87, specificity=.97, PPV=.90, NPV=.96, F1=.88) and reduced BMI variation. SVMs can improve ambulatory physiologic hot flash measures. Copyright © 2010 Society for Psychophysiological Research.
Physical Human Activity Recognition Using Wearable Sensors.
Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine
2015-12-11
This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.
Physical Human Activity Recognition Using Wearable Sensors
Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine
2015-01-01
This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors’ placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject. PMID:26690450
Arrogance analysis of several typical pattern recognition classifiers
NASA Astrophysics Data System (ADS)
Jing, Chen; Xia, Shengping; Hu, Weidong
2007-04-01
Various kinds of classification methods have been developed. However, most of these classical methods, such as Back-Propagation (BP), Bayesian method, Support Vector Machine(SVM), Self-Organizing Map (SOM) are arrogant. A so-called arrogance, for a human, means that his decision, which even is a mistake, overstates his actual experience. Accordingly, we say that he is a arrogant if he frequently makes arrogant decisions. Likewise, some classical pattern classifiers represent the similar characteristic of arrogance. Given an input feature vector, we say a classifier is arrogant in its classification if its veracity is high yet its experience is low. Typically, for a new sample which is distinguishable from original training samples, traditional classifiers recognize it as one of the known targets. Clearly, arrogance in classification is an undesirable attribute. Conversely, a classifier is non-arrogant in its classification if there is a reasonable balance between its veracity and its experience. Inquisitiveness is, in many ways, the opposite of arrogance. In nature, inquisitiveness is an eagerness for knowledge characterized by the drive to question, to seek a deeper understanding. The human capacity to doubt present beliefs allows us to acquire new experiences and to learn from our mistakes. Within the discrete world of computers, inquisitive pattern recognition is the constructive investigation and exploitation of conflict in information. Thus, we quantify this balance and discuss new techniques that will detect arrogance in a classifier.
Comparison of ANN and SVM for classification of eye movements in EOG signals
NASA Astrophysics Data System (ADS)
Qi, Lim Jia; Alias, Norma
2018-03-01
Nowadays, electrooculogram is regarded as one of the most important biomedical signal in measuring and analyzing eye movement patterns. Thus, it is helpful in designing EOG-based Human Computer Interface (HCI). In this research, electrooculography (EOG) data was obtained from five volunteers. The (EOG) data was then preprocessed before feature extraction methods were employed to further reduce the dimensionality of data. Three feature extraction approaches were put forward, namely statistical parameters, autoregressive (AR) coefficients using Burg method, and power spectral density (PSD) using Yule-Walker method. These features would then become input to both artificial neural network (ANN) and support vector machine (SVM). The performance of the combination of different feature extraction methods and classifiers was presented and analyzed. It was found that statistical parameters + SVM achieved the highest classification accuracy of 69.75%.
Classification of burst and suppression in the neonatal electroencephalogram
NASA Astrophysics Data System (ADS)
Löfhede, J.; Löfgren, N.; Thordstein, M.; Flisberg, A.; Kjellmer, I.; Lindecrantz, K.
2008-12-01
Fisher's linear discriminant (FLD), a feed-forward artificial neural network (ANN) and a support vector machine (SVM) were compared with respect to their ability to distinguish bursts from suppressions in electroencephalograms (EEG) displaying a burst-suppression pattern. Five features extracted from the EEG were used as inputs. The study was based on EEG signals from six full-term infants who had suffered from perinatal asphyxia, and the methods have been trained with reference data classified by an experienced electroencephalographer. The results are summarized as the area under the curve (AUC), derived from receiver operating characteristic (ROC) curves for the three methods. Based on this, the SVM performs slightly better than the others. Testing the three methods with combinations of increasing numbers of the five features shows that the SVM handles the increasing amount of information better than the other methods.
Ensemble of classifiers for ontology enrichment
NASA Astrophysics Data System (ADS)
Semenova, A. V.; Kureichik, V. M.
2018-05-01
A classifier is a basis of ontology learning systems. Classification of text documents is used in many applications, such as information retrieval, information extraction, definition of spam. A new ensemble of classifiers based on SVM (a method of support vectors), LSTM (neural network) and word embedding are suggested. An experiment was conducted on open data, which allows us to conclude that the proposed classification method is promising. The implementation of the proposed classifier is performed in the Matlab using the functions of the Text Analytics Toolbox. The principal difference between the proposed ensembles of classifiers is the high quality of classification of data at acceptable time costs.
Signal peptide discrimination and cleavage site identification using SVM and NN.
Kazemian, H B; Yusuf, S A; White, K
2014-02-01
About 15% of all proteins in a genome contain a signal peptide (SP) sequence, at the N-terminus, that targets the protein to intracellular secretory pathways. Once the protein is targeted correctly in the cell, the SP is cleaved, releasing the mature protein. Accurate prediction of the presence of these short amino-acid SP chains is crucial for modelling the topology of membrane proteins, since SP sequences can be confused with transmembrane domains due to similar composition of hydrophobic amino acids. This paper presents a cascaded Support Vector Machine (SVM)-Neural Network (NN) classification methodology for SP discrimination and cleavage site identification. The proposed method utilises a dual phase classification approach using SVM as a primary classifier to discriminate SP sequences from Non-SP. The methodology further employs NNs to predict the most suitable cleavage site candidates. In phase one, a SVM classification utilises hydrophobic propensities as a primary feature vector extraction using symmetric sliding window amino-acid sequence analysis for discrimination of SP and Non-SP. In phase two, a NN classification uses asymmetric sliding window sequence analysis for prediction of cleavage site identification. The proposed SVM-NN method was tested using Uni-Prot non-redundant datasets of eukaryotic and prokaryotic proteins with SP and Non-SP N-termini. Computer simulation results demonstrate an overall accuracy of 0.90 for SP and Non-SP discrimination based on Matthews Correlation Coefficient (MCC) tests using SVM. For SP cleavage site prediction, the overall accuracy is 91.5% based on cross-validation tests using the novel SVM-NN model. © 2013 Published by Elsevier Ltd.
Torii, Manabu; Yin, Lanlan; Nguyen, Thang; Mazumdar, Chand T.; Liu, Hongfang; Hartley, David M.; Nelson, Noele P.
2014-01-01
Purpose Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles. Methods Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers. Results Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period. Conclusions The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages. PMID:21134784
NASA Astrophysics Data System (ADS)
Adelabu, Samuel; Mutanga, Onisimo; Adam, Elhadi; Cho, Moses Azong
2013-01-01
Classification of different tree species in semiarid areas can be challenging as a result of the change in leaf structure and orientation due to soil moisture constraints. Tree species mapping is, however, a key parameter for forest management in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples.We performed classification using random forest (RF) and support vector machines (SVM) based on EnMap box. The overall accuracies for classifying the five tree species was 88.75 and 85% for both SVM and RF, respectively. We also demonstrated that the new red-edge band in the RapidEye sensor has the potential for classifying tree species in semiarid environments when integrated with other standard bands. Similarly, we observed that where there are limited training samples, SVM is preferred over RF. Finally, we demonstrated that the two accuracy measures of quantity and allocation disagreement are simpler and more helpful for the vast majority of remote sensing classification process than the kappa coefficient. Overall, high species classification can be achieved using strategically located RapidEye bands integrated with advanced processing algorithms.
Mao, Xue Gang; Du, Zi Han; Liu, Jia Qian; Chen, Shu Xin; Hou, Ji Yu
2018-01-01
Traditional field investigation and artificial interpretation could not satisfy the need of forest gaps extraction at regional scale. High spatial resolution remote sensing image provides the possibility for regional forest gaps extraction. In this study, we used object-oriented classification method to segment and classify forest gaps based on QuickBird high resolution optical remote sensing image in Jiangle National Forestry Farm of Fujian Province. In the process of object-oriented classification, 10 scales (10-100, with a step length of 10) were adopted to segment QuickBird remote sensing image; and the intersection area of reference object (RA or ) and intersection area of segmented object (RA os ) were adopted to evaluate the segmentation result at each scale. For segmentation result at each scale, 16 spectral characteristics and support vector machine classifier (SVM) were further used to classify forest gaps, non-forest gaps and others. The results showed that the optimal segmentation scale was 40 when RA or was equal to RA os . The accuracy difference between the maximum and minimum at different segmentation scales was 22%. At optimal scale, the overall classification accuracy was 88% (Kappa=0.82) based on SVM classifier. Combining high resolution remote sensing image data with object-oriented classification method could replace the traditional field investigation and artificial interpretation method to identify and classify forest gaps at regional scale.
NASA Astrophysics Data System (ADS)
Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Tseng, Derek; Benien, Parul; Ozcan, Aydogan
2017-03-01
Giardia lamblia causes a disease known as giardiasis, which results in diarrhea, abdominal cramps, and bloating. Although conventional pathogen detection methods used in water analysis laboratories offer high sensitivity and specificity, they are time consuming, and need experts to operate bulky equipment and analyze the samples. Here we present a field-portable and cost-effective smartphone-based waterborne pathogen detection platform that can automatically classify Giardia cysts using machine learning. Our platform enables the detection and quantification of Giardia cysts in one hour, including sample collection, labeling, filtration, and automated counting steps. We evaluated the performance of three prototypes using Giardia-spiked water samples from different sources (e.g., reagent-grade, tap, non-potable, and pond water samples). We populated a training database with >30,000 cysts and estimated our detection sensitivity and specificity using 20 different classifier models, including decision trees, nearest neighbor classifiers, support vector machines (SVMs), and ensemble classifiers, and compared their speed of training and classification, as well as predicted accuracies. Among them, cubic SVM, medium Gaussian SVM, and bagged-trees were the most promising classifier types with accuracies of 94.1%, 94.2%, and 95%, respectively; we selected the latter as our preferred classifier for the detection and enumeration of Giardia cysts that are imaged using our mobile-phone fluorescence microscope. Without the need for any experts or microbiologists, this field-portable pathogen detection platform can present a useful tool for water quality monitoring in resource-limited-settings.
Chao, Pei-Kuang; Wang, Chun-Li; Chan, Hsiao-Lung
2012-03-01
Predicting response after cardiac resynchronization therapy (CRT) has been a challenge of cardiologists. About 30% of selected patients based on the standard selection criteria for CRT do not show response after receiving the treatment. This study is aimed to build an intelligent classifier to assist in identifying potential CRT responders by speckle-tracking radial strain based on echocardiograms. The echocardiograms analyzed were acquired before CRT from 26 patients who have received CRT. Sequential forward selection was performed on the parameters obtained by peak-strain timing and phase space reconstruction on speckle-tracking radial strain to find an optimal set of features for creating intelligent classifiers. Support vector machine (SVM) with a linear, quadratic, and polynominal kernel were tested to build classifiers to identify potential responders and non-responders for CRT by selected features. Based on random sub-sampling validation, the best classification performance is correct rate about 95% with 96-97% sensitivity and 93-94% specificity achieved by applying SVM with a quadratic kernel on a set of 3 parameters. The selected 3 parameters contain both indexes extracted by peak-strain timing and phase space reconstruction. An intelligent classifier with an averaged correct rate, sensitivity and specificity above 90% for assisting in identifying CRT responders is built by speckle-tracking radial strain. The classifier can be applied to provide objective suggestion for patient selection of CRT. Copyright © 2011 Elsevier B.V. All rights reserved.
Energy-Based Metrics for Arthroscopic Skills Assessment.
Poursartip, Behnaz; LeBel, Marie-Eve; McCracken, Laura C; Escoto, Abelardo; Patel, Rajni V; Naish, Michael D; Trejos, Ana Luisa
2017-08-05
Minimally invasive skills assessment methods are essential in developing efficient surgical simulators and implementing consistent skills evaluation. Although numerous methods have been investigated in the literature, there is still a need to further improve the accuracy of surgical skills assessment. Energy expenditure can be an indication of motor skills proficiency. The goals of this study are to develop objective metrics based on energy expenditure, normalize these metrics, and investigate classifying trainees using these metrics. To this end, different forms of energy consisting of mechanical energy and work were considered and their values were divided by the related value of an ideal performance to develop normalized metrics. These metrics were used as inputs for various machine learning algorithms including support vector machines (SVM) and neural networks (NNs) for classification. The accuracy of the combination of the normalized energy-based metrics with these classifiers was evaluated through a leave-one-subject-out cross-validation. The proposed method was validated using 26 subjects at two experience levels (novices and experts) in three arthroscopic tasks. The results showed that there are statistically significant differences between novices and experts for almost all of the normalized energy-based metrics. The accuracy of classification using SVM and NN methods was between 70% and 95% for the various tasks. The results show that the normalized energy-based metrics and their combination with SVM and NN classifiers are capable of providing accurate classification of trainees. The assessment method proposed in this study can enhance surgical training by providing appropriate feedback to trainees about their level of expertise and can be used in the evaluation of proficiency.
Van Esbroeck, Alexander; Rubinfeld, Ilan; Hall, Bruce; Syed, Zeeshan
2014-11-01
To investigate the use of machine learning to empirically determine the risk of individual surgical procedures and to improve surgical models with this information. American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) data from 2005 to 2009 were used to train support vector machine (SVM) classifiers to learn the relationship between textual constructs in current procedural terminology (CPT) descriptions and mortality, morbidity, Clavien 4 complications, and surgical-site infections (SSI) within 30 days of surgery. The procedural risk scores produced by the SVM classifiers were validated on data from 2010 in univariate and multivariate analyses. The procedural risk scores produced by the SVM classifiers achieved moderate-to-high levels of discrimination in univariate analyses (area under receiver operating characteristic curve: 0.871 for mortality, 0.789 for morbidity, 0.791 for SSI, 0.845 for Clavien 4 complications). Addition of these scores also substantially improved multivariate models comprising patient factors and previously proposed correlates of procedural risk (net reclassification improvement and integrated discrimination improvement: 0.54 and 0.001 for mortality, 0.46 and 0.011 for morbidity, 0.68 and 0.022 for SSI, 0.44 and 0.001 for Clavien 4 complications; P < .05 for all comparisons). Similar improvements were noted in discrimination and calibration for other statistical measures, and in subcohorts comprising patients with general or vascular surgery. Machine learning provides clinically useful estimates of surgical risk for individual procedures. This information can be measured in an entirely data-driven manner and substantially improves multifactorial models to predict postoperative complications. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Lee, Youngjoo; Kim, Namkug; Seo, Joon Beom; Lee, JuneGoo; Kang, Suk Ho
2007-03-01
In this paper, we proposed novel shape features to improve classification performance of differentiating obstructive lung diseases, based on HRCT (High Resolution Computerized Tomography) images. The images were selected from HRCT images, obtained from 82 subjects. For each image, two experienced radiologists selected rectangular ROIs with various sizes (16x16, 32x32, and 64x64 pixels), representing each disease or normal lung parenchyma. Besides thirteen textural features, we employed additional seven shape features; cluster shape features, and Top-hat transform features. To evaluate the contribution of shape features for differentiation of obstructive lung diseases, several experiments were conducted with two different types of classifiers and various ROI sizes. For automated classification, the Bayesian classifier and support vector machine (SVM) were implemented. To assess the performance and cross-validation of the system, 5-folding method was used. In comparison to employing only textural features, adding shape features yields significant enhancement of overall sensitivity(5.9, 5.4, 4.4% in the Bayesian and 9.0, 7.3, 5.3% in the SVM), in the order of ROI size 16x16, 32x32, 64x64 pixels, respectively (t-test, p<0.01). Moreover, this enhancement was largely due to the improvement on class-specific sensitivity of mild centrilobular emphysema and bronchiolitis obliterans which are most hard to differentiate for radiologists. According to these experimental results, adding shape features to conventional texture features is much useful to improve classification performance of obstructive lung diseases in both Bayesian and SVM classifiers.
Damage level prediction of non-reshaped berm breakwater using ANN, SVM and ANFIS models
NASA Astrophysics Data System (ADS)
Mandal, Sukomal; Rao, Subba; N., Harish; Lokesha
2012-06-01
The damage analysis of coastal structure is very important as it involves many design parameters to be considered for the better and safe design of structure. In the present study experimental data for non-reshaped berm breakwater are collected from Marine Structures Laboratory, Department of Applied Mechanics and Hydraulics, NITK, Surathkal, India. Soft computing techniques like Artificial Neural Network (ANN), Support Vector Machine (SVM) and Adaptive Neuro Fuzzy Inference system (ANFIS) models are constructed using experimental data sets to predict the damage level of non-reshaped berm breakwater. The experimental data are used to train ANN, SVM and ANFIS models and results are determined in terms of statistical measures like mean square error, root mean square error, correla-tion coefficient and scatter index. The result shows that soft computing techniques i.e., ANN, SVM and ANFIS can be efficient tools in predicting damage levels of non reshaped berm breakwater.
Du, Tianchuan; Liao, Li; Wu, Cathy H; Sun, Bilin
2016-11-01
Protein-protein interactions play essential roles in many biological processes. Acquiring knowledge of the residue-residue contact information of two interacting proteins is not only helpful in annotating functions for proteins, but also critical for structure-based drug design. The prediction of the protein residue-residue contact matrix of the interfacial regions is challenging. In this work, we introduced deep learning techniques (specifically, stacked autoencoders) to build deep neural network models to tackled the residue-residue contact prediction problem. In tandem with interaction profile Hidden Markov Models, which was used first to extract Fisher score features from protein sequences, stacked autoencoders were deployed to extract and learn hidden abstract features. The deep learning model showed significant improvement over the traditional machine learning model, Support Vector Machines (SVM), with the overall accuracy increased by 15% from 65.40% to 80.82%. We showed that the stacked autoencoders could extract novel features, which can be utilized by deep neural networks and other classifiers to enhance learning, out of the Fisher score features. It is further shown that deep neural networks have significant advantages over SVM in making use of the newly extracted features. Copyright © 2016. Published by Elsevier Inc.
Traffic Behavior Recognition Using the Pachinko Allocation Model
Huynh-The, Thien; Banos, Oresti; Le, Ba-Vui; Bui, Dinh-Mao; Yoon, Yongik; Lee, Sungyoung
2015-01-01
CCTV-based behavior recognition systems have gained considerable attention in recent years in the transportation surveillance domain for identifying unusual patterns, such as traffic jams, accidents, dangerous driving and other abnormal behaviors. In this paper, a novel approach for traffic behavior modeling is presented for video-based road surveillance. The proposed system combines the pachinko allocation model (PAM) and support vector machine (SVM) for a hierarchical representation and identification of traffic behavior. A background subtraction technique using Gaussian mixture models (GMMs) and an object tracking mechanism based on Kalman filters are utilized to firstly construct the object trajectories. Then, the sparse features comprising the locations and directions of the moving objects are modeled by PAM into traffic topics, namely activities and behaviors. As a key innovation, PAM captures not only the correlation among the activities, but also among the behaviors based on the arbitrary directed acyclic graph (DAG). The SVM classifier is then utilized on top to train and recognize the traffic activity and behavior. The proposed model shows more flexibility and greater expressive power than the commonly-used latent Dirichlet allocation (LDA) approach, leading to a higher recognition accuracy in the behavior classification. PMID:26151213
Panwar, Bharat; Raghava, Gajendra P S
2015-04-01
The RNA-protein interactions play a diverse role in the cells, thus identification of RNA-protein interface is essential for the biologist to understand their function. In the past, several methods have been developed for predicting RNA interacting residues in proteins, but limited efforts have been made for the identification of protein-interacting nucleotides in RNAs. In order to discriminate protein-interacting and non-interacting nucleotides, we used various classifiers (NaiveBayes, NaiveBayesMultinomial, BayesNet, ComplementNaiveBayes, MultilayerPerceptron, J48, SMO, RandomForest, SMO and SVM(light)) for prediction model development using various features and achieved highest 83.92% sensitivity, 84.82 specificity, 84.62% accuracy and 0.62 Matthew's correlation coefficient by SVM(light) based models. We observed that certain tri-nucleotides like ACA, ACC, AGA, CAC, CCA, GAG, UGA, and UUU preferred in protein-interaction. All the models have been developed using a non-redundant dataset and are evaluated using five-fold cross validation technique. A web-server called RNApin has been developed for the scientific community (http://crdd.osdd.net/raghava/rnapin/). Copyright © 2015 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.
Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptidemore » identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample« less
Pedestrian detection in crowded scenes with the histogram of gradients principle
NASA Astrophysics Data System (ADS)
Sidla, O.; Rosner, M.; Lypetskyy, Y.
2006-10-01
This paper describes a close to real-time scale invariant implementation of a pedestrian detector system which is based on the Histogram of Oriented Gradients (HOG) principle. Salient HOG features are first selected from a manually created very large database of samples with an evolutionary optimization procedure that directly trains a polynomial Support Vector Machine (SVM). Real-time operation is achieved by a cascaded 2-step classifier which uses first a very fast linear SVM (with the same features as the polynomial SVM) to reject most of the irrelevant detections and then computes the decision function with a polynomial SVM on the remaining set of candidate detections. Scale invariance is achieved by running the detector of constant size on scaled versions of the original input images and by clustering the results over all resolutions. The pedestrian detection system has been implemented in two versions: i) fully body detection, and ii) upper body only detection. The latter is especially suited for very busy and crowded scenarios. On a state-of-the-art PC it is able to run at a frequency of 8 - 20 frames/sec.
Online image classification under monotonic decision boundary constraint
NASA Astrophysics Data System (ADS)
Lu, Cheng; Allebach, Jan; Wagner, Jerry; Pitta, Brandi; Larson, David; Guo, Yandong
2015-01-01
Image classification is a prerequisite for copy quality enhancement in all-in-one (AIO) device that comprises a printer and scanner, and which can be used to scan, copy and print. Different processing pipelines are provided in an AIO printer. Each of the processing pipelines is designed specifically for one type of input image to achieve the optimal output image quality. A typical approach to this problem is to apply Support Vector Machine to classify the input image and feed it to its corresponding processing pipeline. The online training SVM can help users to improve the performance of classification as input images accumulate. At the same time, we want to make quick decision on the input image to speed up the classification which means sometimes the AIO device does not need to scan the entire image to make a final decision. These two constraints, online SVM and quick decision, raise questions regarding: 1) what features are suitable for classification; 2) how we should control the decision boundary in online SVM training. This paper will discuss the compatibility of online SVM and quick decision capability.
Fault diagnosis method based on FFT-RPCA-SVM for Cascaded-Multilevel Inverter.
Wang, Tianzhen; Qi, Jie; Xu, Hao; Wang, Yide; Liu, Lei; Gao, Diju
2016-01-01
Thanks to reduced switch stress, high quality of load wave, easy packaging and good extensibility, the cascaded H-bridge multilevel inverter is widely used in wind power system. To guarantee stable operation of system, a new fault diagnosis method, based on Fast Fourier Transform (FFT), Relative Principle Component Analysis (RPCA) and Support Vector Machine (SVM), is proposed for H-bridge multilevel inverter. To avoid the influence of load variation on fault diagnosis, the output voltages of the inverter is chosen as the fault characteristic signals. To shorten the time of diagnosis and improve the diagnostic accuracy, the main features of the fault characteristic signals are extracted by FFT. To further reduce the training time of SVM, the feature vector is reduced based on RPCA that can get a lower dimensional feature space. The fault classifier is constructed via SVM. An experimental prototype of the inverter is built to test the proposed method. Compared to other fault diagnosis methods, the experimental results demonstrate the high accuracy and efficiency of the proposed method. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Classification of skin cancer images using local binary pattern and SVM classifier
NASA Astrophysics Data System (ADS)
Adjed, Faouzi; Faye, Ibrahima; Ababsa, Fakhreddine; Gardezi, Syed Jamal; Dass, Sarat Chandra
2016-11-01
In this paper, a classification method for melanoma and non-melanoma skin cancer images has been presented using the local binary patterns (LBP). The LBP computes the local texture information from the skin cancer images, which is later used to compute some statistical features that have capability to discriminate the melanoma and non-melanoma skin tissues. Support vector machine (SVM) is applied on the feature matrix for classification into two skin image classes (malignant and benign). The method achieves good classification accuracy of 76.1% with sensitivity of 75.6% and specificity of 76.7%.
NASA Astrophysics Data System (ADS)
Zhan, Liwei; Li, Chengwei
2017-02-01
A hybrid PSO-SVM-based model is proposed to predict the friction coefficient between aircraft tire and coating. The presented hybrid model combines a support vector machine (SVM) with particle swarm optimization (PSO) technique. SVM has been adopted to solve regression problems successfully. Its regression accuracy is greatly related to optimizing parameters such as the regularization constant C , the parameter gamma γ corresponding to RBF kernel and the epsilon parameter \\varepsilon in the SVM training procedure. However, the friction coefficient which is predicted based on SVM has yet to be explored between aircraft tire and coating. The experiment reveals that drop height and tire rotational speed are the factors affecting friction coefficient. Bearing in mind, the friction coefficient can been predicted using the hybrid PSO-SVM-based model by the measured friction coefficient between aircraft tire and coating. To compare regression accuracy, a grid search (GS) method and a genetic algorithm (GA) are used to optimize the relevant parameters (C , γ and \\varepsilon ), respectively. The regression accuracy could be reflected by the coefficient of determination ({{R}2} ). The result shows that the hybrid PSO-RBF-SVM-based model has better accuracy compared with the GS-RBF-SVM- and GA-RBF-SVM-based models. The agreement of this model (PSO-RBF-SVM) with experiment data confirms its good performance.
Wang, Ying; Coiera, Enrico; Runciman, William; Magrabi, Farah
2017-06-12
Approximately 10% of admissions to acute-care hospitals are associated with an adverse event. Analysis of incident reports helps to understand how and why incidents occur and can inform policy and practice for safer care. Unfortunately our capacity to monitor and respond to incident reports in a timely manner is limited by the sheer volumes of data collected. In this study, we aim to evaluate the feasibility of using multiclass classification to automate the identification of patient safety incidents in hospitals. Text based classifiers were applied to identify 10 incident types and 4 severity levels. Using the one-versus-one (OvsO) and one-versus-all (OvsA) ensemble strategies, we evaluated regularized logistic regression, linear support vector machine (SVM) and SVM with a radial-basis function (RBF) kernel. Classifiers were trained and tested with "balanced" datasets (n_ Type = 2860, n_ SeverityLevel = 1160) from a state-wide incident reporting system. Testing was also undertaken with imbalanced "stratified" datasets (n_ Type = 6000, n_ SeverityLevel =5950) from the state-wide system and an independent hospital reporting system. Classifier performance was evaluated using a confusion matrix, as well as F-score, precision and recall. The most effective combination was a OvsO ensemble of binary SVM RBF classifiers with binary count feature extraction. For incident type, classifiers performed well on balanced and stratified datasets (F-score: 78.3, 73.9%), but were worse on independent datasets (68.5%). Reports about falls, medications, pressure injury, aggression and blood products were identified with high recall and precision. "Documentation" was the hardest type to identify. For severity level, F-score for severity assessment code (SAC) 1 (extreme risk) was 87.3 and 64% for SAC4 (low risk) on balanced data. With stratified data, high recall was achieved for SAC1 (82.8-84%) but precision was poor (6.8-11.2%). High risk incidents (SAC2) were confused with medium risk incidents (SAC3). Binary classifier ensembles appear to be a feasible method for identifying incidents by type and severity level. Automated identification should enable safety problems to be detected and addressed in a more timely manner. Multi-label classifiers may be necessary for reports that relate to more than one incident type.
Accelerometer and Camera-Based Strategy for Improved Human Fall Detection.
Zerrouki, Nabil; Harrou, Fouzi; Sun, Ying; Houacine, Amrane
2016-12-01
In this paper, we address the problem of detecting human falls using anomaly detection. Detection and classification of falls are based on accelerometric data and variations in human silhouette shape. First, we use the exponentially weighted moving average (EWMA) monitoring scheme to detect a potential fall in the accelerometric data. We used an EWMA to identify features that correspond with a particular type of fall allowing us to classify falls. Only features corresponding with detected falls were used in the classification phase. A benefit of using a subset of the original data to design classification models minimizes training time and simplifies models. Based on features corresponding to detected falls, we used the support vector machine (SVM) algorithm to distinguish between true falls and fall-like events. We apply this strategy to the publicly available fall detection databases from the university of Rzeszow's. Results indicated that our strategy accurately detected and classified fall events, suggesting its potential application to early alert mechanisms in the event of fall situations and its capability for classification of detected falls. Comparison of the classification results using the EWMA-based SVM classifier method with those achieved using three commonly used machine learning classifiers, neural network, K-nearest neighbor and naïve Bayes, proved our model superior.
Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels
2014-01-01
Background Protein complexes play important roles in biological systems such as gene regulatory networks and metabolic pathways. Most methods for predicting protein complexes try to find protein complexes with size more than three. It, however, is known that protein complexes with smaller sizes occupy a large part of whole complexes for several species. In our previous work, we developed a method with several feature space mappings and the domain composition kernel for prediction of heterodimeric protein complexes, which outperforms existing methods. Results We propose methods for prediction of heterotrimeric protein complexes by extending techniques in the previous work on the basis of the idea that most heterotrimeric protein complexes are not likely to share the same protein with each other. We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase. As the second classifier, we examine SVMs and relevance vector machines (RVMs). We perform 10-fold cross-validation computational experiments. The results suggest that our proposed two-phase methods and SVM with the extended features outperform the existing method NWE, which was reported to outperform other existing methods such as MCL, MCODE, DPClus, CMC, COACH, RRW, and PPSampler for prediction of heterotrimeric protein complexes. Conclusions We propose two-phase prediction methods with the extended features, the domain composition kernel, SVMs and RVMs. The two-phase method with the extended features and the domain composition kernel using SVM as the second classifier is particularly useful for prediction of heterotrimeric protein complexes. PMID:24564744
"What is relevant in a text document?": An interpretable machine learning approach
Arras, Leila; Horn, Franziska; Montavon, Grégoire; Müller, Klaus-Robert
2017-01-01
Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML) models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text’s category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP), a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications. PMID:28800619
Banerjee, Satarupa; Pal, Mousumi; Chakrabarty, Jitamanyu; Petibois, Cyril; Paul, Ranjan Rashmi; Giri, Amita; Chatterjee, Jyotirmoy
2015-10-01
In search of specific label-free biomarkers for differentiation of two oral lesions, namely oral leukoplakia (OLK) and oral squamous-cell carcinoma (OSCC), Fourier-transform infrared (FTIR) spectroscopy was performed on paraffin-embedded tissue sections from 47 human subjects (eight normal (NOM), 16 OLK, and 23 OSCC). Difference between mean spectra (DBMS), Mann-Whitney's U test, and forward feature selection (FFS) techniques were used for optimising spectral-marker selection. Classification of diseases was performed with linear and quadratic support vector machine (SVM) at 10-fold cross-validation, using different combinations of spectral features. It was observed that six features obtained through FFS enabled differentiation of NOM and OSCC tissue (1782, 1713, 1665, 1545, 1409, and 1161 cm(-1)) and were most significant, able to classify OLK and OSCC with 81.3 % sensitivity, 95.7 % specificity, and 89.7 % overall accuracy. The 43 spectral markers extracted through Mann-Whitney's U Test were the least significant when quadratic SVM was used. Considering the high sensitivity and specificity of the FFS technique, extracting only six spectral biomarkers was thus most useful for diagnosis of OLK and OSCC, and to overcome inter and intra-observer variability experienced in diagnostic best-practice histopathological procedure. By considering the biochemical assignment of these six spectral signatures, this work also revealed altered glycogen and keratin content in histological sections which could able to discriminate OLK and OSCC. The method was validated through spectral selection by the DBMS technique. Thus this method has potential for diagnostic cost minimisation for oral lesions by label-free biomarker identification.
Wang, Liang-Jen; Li, Sung-Chou; Lee, Min-Jing; Chou, Miao-Chun; Chou, Wen-Jiun; Lee, Sheng-Yu; Hsu, Chih-Wei; Huang, Lien-Hung; Kuo, Ho-Chang
2018-01-01
Background: Attention-deficit/hyperactivity disorder (ADHD) is a highly genetic neurodevelopmental disorder, and its dysregulation of gene expression involves microRNAs (miRNAs). The purpose of this study was to identify potential miRNAs biomarkers and then use these biomarkers to establish a diagnostic panel for ADHD. Design and methods: RNA samples from white blood cells (WBCs) of five ADHD patients and five healthy controls were combined to create one pooled patient library and one control library. We identified 20 candidate miRNAs with the next-generation sequencing (NGS) technique (Illumina). Blood samples were then collected from a Training Set (68 patients and 54 controls) and a Testing Set (20 patients and 20 controls) to identify the expression profiles of these miRNAs with real-time quantitative reverse transcription polymerase chain reaction (qRT-PCR). We used receiver operating characteristic (ROC) curves and the area under the curve (AUC) to evaluate both the specificity and sensitivity of the probability score yielded by the support vector machine (SVM) model. Results: We identified 13 miRNAs as potential ADHD biomarkers. The ΔCt values of these miRNAs in the Training Set were integrated to create a biomarker model using the SVM algorithm, which demonstrated good validity in differentiating ADHD patients from control subjects (sensitivity: 86.8%, specificity: 88.9%, AUC: 0.94, p < 0.001). The results of the blind testing showed that 85% of the subjects in the Testing Set were correctly classified using the SVM model alignment (AUC: 0.91, p < 0.001). The discriminative validity is not influenced by patients' age or gender, indicating both the robustness and the reliability of the SVM classification model. Conclusion: As measured in peripheral blood, miRNA-based biomarkers can aid in the differentiation of ADHD in clinical settings. Additional studies are needed in the future to clarify the ADHD-associated gene functions and biological mechanisms modulated by miRNAs.
2013-01-01
Background Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. Results We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. Conclusions When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time. PMID:23815620
An Event-Triggered Machine Learning Approach for Accelerometer-Based Fall Detection.
Putra, I Putu Edy Suardiyana; Brusey, James; Gaura, Elena; Vesilo, Rein
2017-12-22
The fixed-size non-overlapping sliding window (FNSW) and fixed-size overlapping sliding window (FOSW) approaches are the most commonly used data-segmentation techniques in machine learning-based fall detection using accelerometer sensors. However, these techniques do not segment by fall stages (pre-impact, impact, and post-impact) and thus useful information is lost, which may reduce the detection rate of the classifier. Aligning the segment with the fall stage is difficult, as the segment size varies. We propose an event-triggered machine learning (EvenT-ML) approach that aligns each fall stage so that the characteristic features of the fall stages are more easily recognized. To evaluate our approach, two publicly accessible datasets were used. Classification and regression tree (CART), k -nearest neighbor ( k -NN), logistic regression (LR), and the support vector machine (SVM) were used to train the classifiers. EvenT-ML gives classifier F-scores of 98% for a chest-worn sensor and 92% for a waist-worn sensor, and significantly reduces the computational cost compared with the FNSW- and FOSW-based approaches, with reductions of up to 8-fold and 78-fold, respectively. EvenT-ML achieves a significantly better F-score than existing fall detection approaches. These results indicate that aligning feature segments with fall stages significantly increases the detection rate and reduces the computational cost.
MultiMiTar: a novel multi objective optimization based miRNA-target prediction method.
Mitra, Ramkrishna; Bandyopadhyay, Sanghamitra
2011-01-01
Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level. In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM. MultiMiTar is found to achieve much higher Matthew's correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from -0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm.
Classification of Stellar Spectra with Fuzzy Minimum Within-Class Support Vector Machine
NASA Astrophysics Data System (ADS)
Zhong-bao, Liu; Wen-ai, Song; Jing, Zhang; Wen-juan, Zhao
2017-06-01
Classification is one of the important tasks in astronomy, especially in spectra analysis. Support Vector Machine (SVM) is a typical classification method, which is widely used in spectra classification. Although it performs well in practice, its classification accuracies can not be greatly improved because of two limitations. One is it does not take the distribution of the classes into consideration. The other is it is sensitive to noise. In order to solve the above problems, inspired by the maximization of the Fisher's Discriminant Analysis (FDA) and the SVM separability constraints, fuzzy minimum within-class support vector machine (FMWSVM) is proposed in this paper. In FMWSVM, the distribution of the classes is reflected by the within-class scatter in FDA and the fuzzy membership function is introduced to decrease the influence of the noise. The comparative experiments with SVM on the SDSS datasets verify the effectiveness of the proposed classifier FMWSVM.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brunton, Steven
Optical systems provide valuable information for evaluating interactions and associations between organisms and MHK energy converters and for capturing potentially rare encounters between marine organisms and MHK device. The deluge of optical data from cabled monitoring packages makes expert review time-consuming and expensive. We propose algorithms and a processing framework to automatically extract events of interest from underwater video. The open-source software framework consists of background subtraction, filtering, feature extraction and hierarchical classification algorithms. This principle classification pipeline was validated on real-world data collected with an experimental underwater monitoring package. An event detection rate of 100% was achieved using robustmore » principal components analysis (RPCA), Fourier feature extraction and a support vector machine (SVM) binary classifier. The detected events were then further classified into more complex classes – algae | invertebrate | vertebrate, one species | multiple species of fish, and interest rank. Greater than 80% accuracy was achieved using a combination of machine learning techniques.« less
A Minimum Spanning Forest Based Method for Noninvasive Cancer Detection with Hyperspectral Imaging
Pike, Robert; Lu, Guolan; Wang, Dongsheng; Chen, Zhuo Georgia; Fei, Baowei
2016-01-01
Goal The purpose of this paper is to develop a classification method that combines both spectral and spatial information for distinguishing cancer from healthy tissue on hyperspectral images in an animal model. Methods An automated algorithm based on a minimum spanning forest (MSF) and optimal band selection has been proposed to classify healthy and cancerous tissue on hyperspectral images. A support vector machine (SVM) classifier is trained to create a pixel-wise classification probability map of cancerous and healthy tissue. This map is then used to identify markers that are used to compute mutual information for a range of bands in the hyperspectral image and thus select the optimal bands. An MSF is finally grown to segment the image using spatial and spectral information. Conclusion The MSF based method with automatically selected bands proved to be accurate in determining the tumor boundary on hyperspectral images. Significance Hyperspectral imaging combined with the proposed classification technique has the potential to provide a noninvasive tool for cancer detection. PMID:26285052
Nho, Kwangsik; Shen, Li; Kim, Sungeun; Risacher, Shannon L.; West, John D.; Foroud, Tatiana; Jack, Clifford R.; Weiner, Michael W.; Saykin, Andrew J.
2010-01-01
Mild Cognitive Impairment (MCI) is thought to be a precursor to the development of early Alzheimer’s disease (AD). For early diagnosis of AD, the development of a model that is able to predict the conversion of amnestic MCI to AD is challenging. Using automatic whole-brain MRI analysis techniques and pattern classification methods, we developed a model to differentiate AD from healthy controls (HC), and then applied it to the prediction of MCI conversion to AD. Classification was performed using support vector machines (SVMs) together with a SVM-based feature selection method, which selected a set of most discriminating predictors for optimizing prediction accuracy. We obtained 90.5% cross-validation accuracy for classifying AD and HC, and 72.3% accuracy for predicting MCI conversion to AD. These analyses suggest that a classifier trained to separate HC vs. AD has substantial potential for predicting MCI conversion to AD. PMID:21347037
Quantum optimization for training support vector machines.
Anguita, Davide; Ridella, Sandro; Rivieccio, Fabio; Zunino, Rodolfo
2003-01-01
Refined concepts, such as Rademacher estimates of model complexity and nonlinear criteria for weighting empirical classification errors, represent recent and promising approaches to characterize the generalization ability of Support Vector Machines (SVMs). The advantages of those techniques lie in both improving the SVM representation ability and yielding tighter generalization bounds. On the other hand, they often make Quadratic-Programming algorithms no longer applicable, and SVM training cannot benefit from efficient, specialized optimization techniques. The paper considers the application of Quantum Computing to solve the problem of effective SVM training, especially in the case of digital implementations. The presented research compares the behavioral aspects of conventional and enhanced SVMs; experiments in both a synthetic and real-world problems support the theoretical analysis. At the same time, the related differences between Quadratic-Programming and Quantum-based optimization techniques are considered.
Hasan, Mehedi; Kotov, Alexander; Carcone, April; Dong, Ming; Naar, Sylvie; Hartlieb, Kathryn Brogan
2016-08-01
This study examines the effectiveness of state-of-the-art supervised machine learning methods in conjunction with different feature types for the task of automatic annotation of fragments of clinical text based on codebooks with a large number of categories. We used a collection of motivational interview transcripts consisting of 11,353 utterances, which were manually annotated by two human coders as the gold standard, and experimented with state-of-art classifiers, including Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), Random Forest (RF), AdaBoost, DiscLDA, Conditional Random Fields (CRF) and Convolutional Neural Network (CNN) in conjunction with lexical, contextual (label of the previous utterance) and semantic (distribution of words in the utterance across the Linguistic Inquiry and Word Count dictionaries) features. We found out that, when the number of classes is large, the performance of CNN and CRF is inferior to SVM. When only lexical features were used, interview transcripts were automatically annotated by SVM with the highest classification accuracy among all classifiers of 70.8%, 61% and 53.7% based on the codebooks consisting of 17, 20 and 41 codes, respectively. Using contextual and semantic features, as well as their combination, in addition to lexical ones, improved the accuracy of SVM for annotation of utterances in motivational interview transcripts with a codebook consisting of 17 classes to 71.5%, 74.2%, and 75.1%, respectively. Our results demonstrate the potential of using machine learning methods in conjunction with lexical, semantic and contextual features for automatic annotation of clinical interview transcripts with near-human accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
DSouza, Adora M.; Abidin, Anas Z.; Chockanathan, Udaysankar; Wismüller, Axel
2018-03-01
In this study, we investigate whether there are discernable changes in influence that brain regions have on themselves once patients show symptoms of HIV Associated Neurocognitive Disorder (HAND) using functional MRI (fMRI). Simple functional connectivity measures, such as correlation cannot reveal such information. To this end, we use mutual connectivity analysis (MCA) with Local Models (LM), which reveals a measure of influence in terms of predictability. Once such measures of interaction are obtained, we train two classifiers to characterize difference in patterns of regional self-influence between healthy subjects and subjects presenting with HAND symptoms. The two classifiers we use are Support Vector Machines (SVM) and Localized Generalized Matrix Learning Vector Quantization (LGMLVQ). Performing machine learning on fMRI connectivity measures is popularly known as multi-voxel pattern analysis (MVPA). By performing such an analysis, we are interested in studying the impact HIV infection has on an individual's brain. The high area under receiver operating curve (AUC) and accuracy values for 100 different train/test separations using MCA-LM self-influence measures (SVM: mean AUC=0.86, LGMLVQ: mean AUC=0.88, SVM and LGMLVQ: mean accuracy=0.78) compared with standard MVPA analysis using cross-correlation between fMRI time-series (SVM: mean AUC=0.58, LGMLVQ: mean AUC=0.57), demonstrates that self-influence features can be more discriminative than measures of interaction between time-series pairs. Furthermore, our results suggest that incorporating measures of self-influence in MVPA analysis used commonly in fMRI analysis has the potential to provide a performance boost and indicate important changes in dynamics of regions in the brain as a consequence of HIV infection.
Lai, Daniel T H; Begg, Rezaul K; Taylor, Simon; Palaniswami, Marimuthu
2008-01-01
Elderly tripping falls cost billions annually in medical funds and result in high mortality rates often perpetrated by pulmonary embolism (internal bleeding) and infected fractures that do not heal well. In this paper, we propose an intelligent gait detection system (AR-SVM) for screening elderly individuals at risk of suffering tripping falls. The motivation of this system is to provide early detection of elderly gait reminiscent of tripping characteristics so that preventive measures could be administered. Our system is composed of two stages, a predictor model estimated by an autoregressive (AR) process and a support vector machine (SVM) classifier. The system input is a digital signal constructed from consecutive measurements of minimum toe clearance (MTC) representative of steady-state walking. The AR-SVM system was tested on 23 individuals (13 healthy and 10 having suffered at least one tripping fall in the past year) who each completed a minimum of 10 min of walking on a treadmill at a self-selected pace. In the first stage, a fourth order AR model required at least 64 MTC values to correctly detect all fallers and non-fallers. Detection was further improved to less than 1 min of walking when the model coefficients were used as input features to the SVM classifier. The system achieved a detection accuracy of 95.65% with the leave one out method using only 16 MTC samples, but was reduced to 69.57% when eight MTC samples were used. These results demonstrate a fast and efficient system requiring a small number of strides and only MTC measurements for accurate detection of tripping gait characteristics.
Automated detection of microcalcification clusters in mammograms
NASA Astrophysics Data System (ADS)
Karale, Vikrant A.; Mukhopadhyay, Sudipta; Singh, Tulika; Khandelwal, Niranjan; Sadhu, Anup
2017-03-01
Mammography is the most efficient modality for detection of breast cancer at early stage. Microcalcifications are tiny bright spots in mammograms and can often get missed by the radiologist during diagnosis. The presence of microcalcification clusters in mammograms can act as an early sign of breast cancer. This paper presents a completely automated computer-aided detection (CAD) system for detection of microcalcification clusters in mammograms. Unsharp masking is used as a preprocessing step which enhances the contrast between microcalcifications and the background. The preprocessed image is thresholded and various shape and intensity based features are extracted. Support vector machine (SVM) classifier is used to reduce the false positives while preserving the true microcalcification clusters. The proposed technique is applied on two different databases i.e DDSM and private database. The proposed technique shows good sensitivity with moderate false positives (FPs) per image on both databases.
Secure Recognition of Voice-Less Commands Using Videos
NASA Astrophysics Data System (ADS)
Yau, Wai Chee; Kumar, Dinesh Kant; Weghorn, Hans
Interest in voice recognition technologies for internet applications is growing due to the flexibility of speech-based communication. The major drawback with the use of sound for internet access with computers is that the commands will be audible to other people in the vicinity. This paper examines a secure and voice-less method for recognition of speech-based commands using video without evaluating sound signals. The proposed approach represents mouth movements in the video data using 2D spatio-temporal templates (STT). Zernike moments (ZM) are computed from STT and fed into support vector machines (SVM) to be classified into one of the utterances. The experimental results demonstrate that the proposed technique produces a high accuracy of 98% in a phoneme classification task. The proposed technique is demonstrated to be invariant to global variations of illumination level. Such a system is useful for securely interpreting user commands for internet applications on mobile devices.
On the use of feature selection to improve the detection of sea oil spills in SAR images
NASA Astrophysics Data System (ADS)
Mera, David; Bolon-Canedo, Veronica; Cotos, J. M.; Alonso-Betanzos, Amparo
2017-03-01
Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each detected dark spot is independently examined. Traditionally, detection systems use an arbitrary set of features to discriminate between oil spills and look-alikes phenomena. However, Feature Selection (FS) methods based on Machine Learning (ML) have proved to be very useful in real domains for enhancing the generalization capabilities of the classifiers, while discarding the existing irrelevant features. In this work, we present a generic and systematic approach, based on FS methods, for choosing a concise and relevant set of features to improve the oil spill detection systems. We have compared five FS methods: Correlation-based feature selection (CFS), Consistency-based filter, Information Gain, ReliefF and Recursive Feature Elimination for Support Vector Machine (SVM-RFE). They were applied on a 141-input vector composed of features from a collection of outstanding studies. Selected features were validated via a Support Vector Machine (SVM) classifier and the results were compared with previous works. Test experiments revealed that the classifier trained with the 6-input feature vector proposed by SVM-RFE achieved the best accuracy and Cohen's kappa coefficient (87.1% and 74.06% respectively). This is a smaller feature combination with similar or even better classification accuracy than previous works. The presented finding allows to speed up the feature extraction phase without reducing the classifier accuracy. Experiments also confirmed the significance of the geometrical features since 75.0% of the different features selected by the applied FS methods as well as 66.67% of the proposed 6-input feature vector belong to this category.
Bai, Ou; Lin, Peter; Vorbach, Sherry; Li, Jiang; Furlani, Steve; Hallett, Mark
2007-12-01
To explore effective combinations of computational methods for the prediction of movement intention preceding the production of self-paced right and left hand movements from single trial scalp electroencephalogram (EEG). Twelve naïve subjects performed self-paced movements consisting of three key strokes with either hand. EEG was recorded from 128 channels. The exploration was performed offline on single trial EEG data. We proposed that a successful computational procedure for classification would consist of spatial filtering, temporal filtering, feature selection, and pattern classification. A systematic investigation was performed with combinations of spatial filtering using principal component analysis (PCA), independent component analysis (ICA), common spatial patterns analysis (CSP), and surface Laplacian derivation (SLD); temporal filtering using power spectral density estimation (PSD) and discrete wavelet transform (DWT); pattern classification using linear Mahalanobis distance classifier (LMD), quadratic Mahalanobis distance classifier (QMD), Bayesian classifier (BSC), multi-layer perceptron neural network (MLP), probabilistic neural network (PNN), and support vector machine (SVM). A robust multivariate feature selection strategy using a genetic algorithm was employed. The combinations of spatial filtering using ICA and SLD, temporal filtering using PSD and DWT, and classification methods using LMD, QMD, BSC and SVM provided higher performance than those of other combinations. Utilizing one of the better combinations of ICA, PSD and SVM, the discrimination accuracy was as high as 75%. Further feature analysis showed that beta band EEG activity of the channels over right sensorimotor cortex was most appropriate for discrimination of right and left hand movement intention. Effective combinations of computational methods provide possible classification of human movement intention from single trial EEG. Such a method could be the basis for a potential brain-computer interface based on human natural movement, which might reduce the requirement of long-term training. Effective combinations of computational methods can classify human movement intention from single trial EEG with reasonable accuracy.
Nef, Tobias; Urwyler, Prabitha; Büchler, Marcel; Tarnanas, Ioannis; Stucki, Reto; Cazzoli, Dario; Müri, René; Mosimann, Urs
2012-01-01
Smart homes for the aging population have recently started attracting the attention of the research community. The “health state” of smart homes is comprised of many different levels; starting with the physical health of citizens, it also includes longer-term health norms and outcomes, as well as the arena of positive behavior changes. One of the problems of interest is to monitor the activities of daily living (ADL) of the elderly, aiming at their protection and well-being. For this purpose, we installed passive infrared (PIR) sensors to detect motion in a specific area inside a smart apartment and used them to collect a set of ADL. In a novel approach, we describe a technology that allows the ground truth collected in one smart home to train activity recognition systems for other smart homes. We asked the users to label all instances of all ADL only once and subsequently applied data mining techniques to cluster in-home sensor firings. Each cluster would therefore represent the instances of the same activity. Once the clusters were associated to their corresponding activities, our system was able to recognize future activities. To improve the activity recognition accuracy, our system preprocessed raw sensor data by identifying overlapping activities. To evaluate the recognition performance from a 200-day dataset, we implemented three different active learning classification algorithms and compared their performance: naive Bayesian (NB), support vector machine (SVM) and random forest (RF). Based on our results, the RF classifier recognized activities with an average specificity of 96.53%, a sensitivity of 68.49%, a precision of 74.41% and an F-measure of 71.33%, outperforming both the NB and SVM classifiers. Further clustering markedly improved the results of the RF classifier. An activity recognition system based on PIR sensors in conjunction with a clustering classification approach was able to detect ADL from datasets collected from different homes. Thus, our PIR-based smart home technology could improve care and provide valuable information to better understand the functioning of our societies, as well as to inform both individual and collective action in a smart city scenario. PMID:26007727
Nef, Tobias; Urwyler, Prabitha; Büchler, Marcel; Tarnanas, Ioannis; Stucki, Reto; Cazzoli, Dario; Müri, René; Mosimann, Urs
2015-05-21
Smart homes for the aging population have recently started attracting the attention of the research community. The "health state" of smart homes is comprised of many different levels; starting with the physical health of citizens, it also includes longer-term health norms and outcomes, as well as the arena of positive behavior changes. One of the problems of interest is to monitor the activities of daily living (ADL) of the elderly, aiming at their protection and well-being. For this purpose, we installed passive infrared (PIR) sensors to detect motion in a specific area inside a smart apartment and used them to collect a set of ADL. In a novel approach, we describe a technology that allows the ground truth collected in one smart home to train activity recognition systems for other smart homes. We asked the users to label all instances of all ADL only once and subsequently applied data mining techniques to cluster in-home sensor firings. Each cluster would therefore represent the instances of the same activity. Once the clusters were associated to their corresponding activities, our system was able to recognize future activities. To improve the activity recognition accuracy, our system preprocessed raw sensor data by identifying overlapping activities. To evaluate the recognition performance from a 200-day dataset, we implemented three different active learning classification algorithms and compared their performance: naive Bayesian (NB), support vector machine (SVM) and random forest (RF). Based on our results, the RF classifier recognized activities with an average specificity of 96.53%, a sensitivity of 68.49%, a precision of 74.41% and an F-measure of 71.33%, outperforming both the NB and SVM classifiers. Further clustering markedly improved the results of the RF classifier. An activity recognition system based on PIR sensors in conjunction with a clustering classification approach was able to detect ADL from datasets collected from different homes. Thus, our PIR-based smart home technology could improve care and provide valuable information to better understand the functioning of our societies, as well as to inform both individual and collective action in a smart city scenario.
Comparison of Classification Methods for Detecting Emotion from Mandarin Speech
NASA Astrophysics Data System (ADS)
Pao, Tsang-Long; Chen, Yu-Te; Yeh, Jun-Heng
It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.
Source Camera Identification and Blind Tamper Detections for Images
2007-04-24
measures and image quality measures in camera identification problem was studied using conjunction with a KNN classifier to identify the feature sets...shots varying from nature scenes .-.. motorala to close-ups of people. We experimented with the KNN *~. * ny classifier (K=5) as well SVM algorithm of...on Acoustic, Speech and Signal Processing (ICASSP), France, May 2006, vol. 5, pp. 401-404. [9] H. Farid and S. Lyu, "Higher-order wavelet statistics
Sequential Adaptive Multi-Modality Target Detection and Classification Using Physics Based Models
2006-09-01
estimation," R. Raghuram, R. Raich and A.O. Hero, IEEE Intl. Conf. on Acoustics, Speech , and Signal Processing, Toulouse France, June 2006, <http...can then be solved using off-the-shelf classifiers such as radial basis functions, SVM, or kNN classifier structures. When applied to mine detection we...stage waveform selection for adaptive resource constrained state estimation," 2006 IEEE Intl. Conf. on Acoustics, Speech , and Signal Processing
Diabetic peripheral neuropathy assessment through texture based analysis of corneal nerve images
NASA Astrophysics Data System (ADS)
Silva, Susana F.; Gouveia, Sofia; Gomes, Leonor; Negrão, Luís; João Quadrado, Maria; Domingues, José Paulo; Morgado, António Miguel
2015-05-01
Diabetic peripheral neuropathy (DPN) is one common complication of diabetes. Early diagnosis of DPN often fails due to the non-availability of a simple, reliable, non-invasive method. Several published studies show that corneal confocal microscopy (CCM) can identify small nerve fibre damage and quantify the severity of DPN, using nerve morphometric parameters. Here, we used image texture features, extracted from corneal sub-basal nerve plexus images, obtained in vivo by CCM, to identify DPN patients, using classification techniques. A SVM classifier using image texture features was used to identify (DPN vs. No DPN) DPN patients. The accuracies were 80.6%, when excluding diabetic patients without neuropathy, and 73.5%, when including diabetic patients without diabetic neuropathy jointly with healthy controls. The results suggest that texture analysis might be used as a complementing technique for DPN diagnosis, without requiring nerve segmentation in CCM images. The results also suggest that this technique has enough sensitivity to detect early disorders in the corneal nerves of diabetic patients.
Wang, Juan; Nishikawa, Robert M; Yang, Yongyi
2016-01-01
In computer-aided detection of microcalcifications (MCs), the detection accuracy is often compromised by frequent occurrence of false positives (FPs), which can be attributed to a number of factors, including imaging noise, inhomogeneity in tissue background, linear structures, and artifacts in mammograms. In this study, the authors investigated a unified classification approach for combating the adverse effects of these heterogeneous factors for accurate MC detection. To accommodate FPs caused by different factors in a mammogram image, the authors developed a classification model to which the input features were adapted according to the image context at a detection location. For this purpose, the input features were defined in two groups, of which one group was derived from the image intensity pattern in a local neighborhood of a detection location, and the other group was used to characterize how a MC is different from its structural background. Owing to the distinctive effect of linear structures in the detector response, the authors introduced a dummy variable into the unified classifier model, which allowed the input features to be adapted according to the image context at a detection location (i.e., presence or absence of linear structures). To suppress the effect of inhomogeneity in tissue background, the input features were extracted from different domains aimed for enhancing MCs in a mammogram image. To demonstrate the flexibility of the proposed approach, the authors implemented the unified classifier model by two widely used machine learning algorithms, namely, a support vector machine (SVM) classifier and an Adaboost classifier. In the experiment, the proposed approach was tested for two representative MC detectors in the literature [difference-of-Gaussians (DoG) detector and SVM detector]. The detection performance was assessed using free-response receiver operating characteristic (FROC) analysis on a set of 141 screen-film mammogram (SFM) images (66 cases) and a set of 188 full-field digital mammogram (FFDM) images (95 cases). The FROC analysis results show that the proposed unified classification approach can significantly improve the detection accuracy of two MC detectors on both SFM and FFDM images. Despite the difference in performance between the two detectors, the unified classifiers can reduce their FP rate to a similar level in the output of the two detectors. In particular, with true-positive rate at 85%, the FP rate on SFM images for the DoG detector was reduced from 1.16 to 0.33 clusters/image (unified SVM) and 0.36 clusters/image (unified Adaboost), respectively; similarly, for the SVM detector, the FP rate was reduced from 0.45 clusters/image to 0.30 clusters/image (unified SVM) and 0.25 clusters/image (unified Adaboost), respectively. Similar FP reduction results were also achieved on FFDM images for the two MC detectors. The proposed unified classification approach can be effective for discriminating MCs from FPs caused by different factors (such as MC-like noise patterns and linear structures) in MC detection. The framework is general and can be applicable for further improving the detection accuracy of existing MC detectors.
Möller, Christiane; Pijnenburg, Yolande A L; van der Flier, Wiesje M; Versteeg, Adriaan; Tijms, Betty; de Munck, Jan C; Hafkemeijer, Anne; Rombouts, Serge A R B; van der Grond, Jeroen; van Swieten, John; Dopper, Elise; Scheltens, Philip; Barkhof, Frederik; Vrenken, Hugo; Wink, Alle Meije
2016-06-01
Purpose To investigate the diagnostic accuracy of an image-based classifier to distinguish between Alzheimer disease (AD) and behavioral variant frontotemporal dementia (bvFTD) in individual patients by using gray matter (GM) density maps computed from standard T1-weighted structural images obtained with multiple imagers and with independent training and prediction data. Materials and Methods The local institutional review board approved the study. Eighty-four patients with AD, 51 patients with bvFTD, and 94 control subjects were divided into independent training (n = 115) and prediction (n = 114) sets with identical diagnosis and imager type distributions. Training of a support vector machine (SVM) classifier used diagnostic status and GM density maps and produced voxelwise discrimination maps. Discriminant function analysis was used to estimate suitability of the extracted weights for single-subject classification in the prediction set. Receiver operating characteristic (ROC) curves and area under the ROC curve (AUC) were calculated for image-based classifiers and neuropsychological z scores. Results Training accuracy of the SVM was 85% for patients with AD versus control subjects, 72% for patients with bvFTD versus control subjects, and 79% for patients with AD versus patients with bvFTD (P ≤ .029). Single-subject diagnosis in the prediction set when using the discrimination maps yielded accuracies of 88% for patients with AD versus control subjects, 85% for patients with bvFTD versus control subjects, and 82% for patients with AD versus patients with bvFTD, with a good to excellent AUC (range, 0.81-0.95; P ≤ .001). Machine learning-based categorization of AD versus bvFTD based on GM density maps outperforms classification based on neuropsychological test results. Conclusion The SVM can be used in single-subject discrimination and can help the clinician arrive at a diagnosis. The SVM can be used to distinguish disease-specific GM patterns in patients with AD and those with bvFTD as compared with normal aging by using common T1-weighted structural MR imaging. (©) RSNA, 2015.
Graña, M; Termenon, M; Savio, A; Gonzalez-Pinto, A; Echeveste, J; Pérez, J M; Besga, A
2011-09-20
The aim of this paper is to obtain discriminant features from two scalar measures of Diffusion Tensor Imaging (DTI) data, Fractional Anisotropy (FA) and Mean Diffusivity (MD), and to train and test classifiers able to discriminate Alzheimer's Disease (AD) patients from controls on the basis of features extracted from the FA or MD volumes. In this study, support vector machine (SVM) classifier was trained and tested on FA and MD data. Feature selection is done computing the Pearson's correlation between FA or MD values at voxel site across subjects and the indicative variable specifying the subject class. Voxel sites with high absolute correlation are selected for feature extraction. Results are obtained over an on-going study in Hospital de Santiago Apostol collecting anatomical T1-weighted MRI volumes and DTI data from healthy control subjects and AD patients. FA features and a linear SVM classifier achieve perfect accuracy, sensitivity and specificity in several cross-validation studies, supporting the usefulness of DTI-derived features as an image-marker for AD and to the feasibility of building Computer Aided Diagnosis systems for AD based on them. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Cheng, Shu-Xi; Xie, Chuan-Qi; Wang, Qiao-Nan; He, Yong; Shao, Yong-Ni
2014-05-01
Identification of early blight on tomato leaves by using hyperspectral imaging technique based on different effective wavelengths selection methods (successive projections algorithm, SPA; x-loading weights, x-LW; gram-schmidt orthogonaliza-tion, GSO) was studied in the present paper. Hyperspectral images of seventy healthy and seventy infected tomato leaves were obtained by hyperspectral imaging system across the wavelength range of 380-1023 nm. Reflectance of all pixels in region of interest (ROI) was extracted by ENVI 4. 7 software. Least squares-support vector machine (LS-SVM) model was established based on the full spectral wavelengths. It obtained an excellent result with the highest identification accuracy (100%) in both calibration and prediction sets. Then, EW-LS-SVM and EW-LDA models were established based on the selected wavelengths suggested by SPA, x-LW and GSO, respectively. The results showed that all of the EW-LS-SVM and EW-LDA models performed well with the identification accuracy of 100% in EW-LS-SVM model and 100%, 100% and 97. 83% in EW-LDA model, respectively. Moreover, the number of input wavelengths of SPA-LS-SVM, x-LW-LS-SVM and GSO-LS-SVM models were four (492, 550, 633 and 680 nm), three (631, 719 and 747 nm) and two (533 and 657 nm), respectively. Fewer input variables were beneficial for the development of identification instrument. It demonstrated that it is feasible to identify early blight on tomato leaves by using hyperspectral imaging, and SPA, x-LW and GSO were effective wavelengths selection methods.
NASA Astrophysics Data System (ADS)
Yang, Dong; Lu, Anxiang; Ren, Dong; Wang, Jihua
2017-11-01
This study explored the feasibility of rapid detection of biogenic amines (BAs) in cooked beef during the storage process using hyperspectral imaging technique combined with sparse representation (SR) algorithm. The hyperspectral images of samples were collected in the two spectral ranges of 400-1000 nm and 1000-1800 nm, separately. The spectral data were reduced dimensionality by SR and principal component analysis (PCA) algorithms, and then integrated the least square support vector machine (LS-SVM) to build the SR-LS-SVM and PC-LS-SVM models for the prediction of BAs values in cooked beef. The results showed that the SR-LS-SVM model exhibited the best predictive ability with determination coefficients (RP2) of 0.943 and root mean square errors (RMSEP) of 1.206 in the range of 400-1000 nm of prediction set. The SR and PCA algorithms were further combined to establish the best SR-PC-LS-SVM model for BAs prediction, which had high RP2of 0.969 and low RMSEP of 1.039 in the region of 400-1000 nm. The visual map of the BAs was generated using the best SR-PC-LS-SVM model with imaging process algorithms, which could be used to observe the changes of BAs in cooked beef more intuitively. The study demonstrated that hyperspectral imaging technique combined with sparse representation were able to detect effectively the BAs values in cooked beef during storage and the built SR-PC-LS-SVM model had a potential for rapid and accurate determination of freshness indexes in other meat and meat products.
Gas Classification Using Deep Convolutional Neural Networks.
Peng, Pai; Zhao, Xiaojin; Pan, Xiaofang; Ye, Wenbin
2018-01-08
In this work, we propose a novel Deep Convolutional Neural Network (DCNN) tailored for gas classification. Inspired by the great success of DCNN in the field of computer vision, we designed a DCNN with up to 38 layers. In general, the proposed gas neural network, named GasNet, consists of: six convolutional blocks, each block consist of six layers; a pooling layer; and a fully-connected layer. Together, these various layers make up a powerful deep model for gas classification. Experimental results show that the proposed DCNN method is an effective technique for classifying electronic nose data. We also demonstrate that the DCNN method can provide higher classification accuracy than comparable Support Vector Machine (SVM) methods and Multiple Layer Perceptron (MLP).
Gas Classification Using Deep Convolutional Neural Networks
Peng, Pai; Zhao, Xiaojin; Pan, Xiaofang; Ye, Wenbin
2018-01-01
In this work, we propose a novel Deep Convolutional Neural Network (DCNN) tailored for gas classification. Inspired by the great success of DCNN in the field of computer vision, we designed a DCNN with up to 38 layers. In general, the proposed gas neural network, named GasNet, consists of: six convolutional blocks, each block consist of six layers; a pooling layer; and a fully-connected layer. Together, these various layers make up a powerful deep model for gas classification. Experimental results show that the proposed DCNN method is an effective technique for classifying electronic nose data. We also demonstrate that the DCNN method can provide higher classification accuracy than comparable Support Vector Machine (SVM) methods and Multiple Layer Perceptron (MLP). PMID:29316723
Lefebvre, Alexandre; Rochefort, Gael Y.; Santos, Frédéric; Le Denmat, Dominique; Salmon, Benjamin; Pétillon, Jean-Marc
2016-01-01
Over the last decade, biomedical 3D-imaging tools have gained widespread use in the analysis of prehistoric bone artefacts. While initial attempts to characterise the major categories used in osseous industry (i.e. bone, antler, and dentine/ivory) have been successful, the taxonomic determination of prehistoric artefacts remains to be investigated. The distinction between reindeer and red deer antler can be challenging, particularly in cases of anthropic and/or taphonomic modifications. In addition to the range of destructive physicochemical identification methods available (mass spectrometry, isotopic ratio, and DNA analysis), X-ray micro-tomography (micro-CT) provides convincing non-destructive 3D images and analyses. This paper presents the experimental protocol (sample scans, image processing, and statistical analysis) we have developed in order to identify modern and archaeological antler collections (from Isturitz, France). This original method is based on bone microstructure analysis combined with advanced statistical support vector machine (SVM) classifiers. A combination of six microarchitecture biomarkers (bone volume fraction, trabecular number, trabecular separation, trabecular thickness, trabecular bone pattern factor, and structure model index) were screened using micro-CT in order to characterise internal alveolar structure. Overall, reindeer alveoli presented a tighter mesh than red deer alveoli, and statistical analysis allowed us to distinguish archaeological antler by species with an accuracy of 96%, regardless of anatomical location on the antler. In conclusion, micro-CT combined with SVM classifiers proves to be a promising additional non-destructive method for antler identification, suitable for archaeological artefacts whose degree of human modification and cultural heritage or scientific value has previously made it impossible (tools, ornaments, etc.). PMID:26901355
NASA Astrophysics Data System (ADS)
Sosa, Germán. D.; Cruz-Roa, Angel; González, Fabio A.
2015-01-01
This work addresses the problem of lung sound classification, in particular, the problem of distinguishing between wheeze and normal sounds. Wheezing sound detection is an important step to associate lung sounds with an abnormal state of the respiratory system, usually associated with tuberculosis or another chronic obstructive pulmonary diseases (COPD). The paper presents an approach for automatic lung sound classification, which uses different state-of-the-art sound features in combination with a C-weighted support vector machine (SVM) classifier that works better for unbalanced data. Feature extraction methods used here are commonly applied in speech recognition and related problems thanks to the fact that they capture the most informative spectral content from the original signals. The evaluated methods were: Fourier transform (FT), wavelet decomposition using Wavelet Packet Transform bank of filters (WPT) and Mel Frequency Cepstral Coefficients (MFCC). For comparison, we evaluated and contrasted the proposed approach against previous works using different combination of features and/or classifiers. The different methods were evaluated on a set of lung sounds including normal and wheezing sounds. A leave-two-out per-case cross-validation approach was used, which, in each fold, chooses as validation set a couple of cases, one including normal sounds and the other including wheezing sounds. Experimental results were reported in terms of traditional classification performance measures: sensitivity, specificity and balanced accuracy. Our best results using the suggested approach, C-weighted SVM and MFCC, achieve a 82.1% of balanced accuracy obtaining the best result for this problem until now. These results suggest that supervised classifiers based on kernel methods are able to learn better models for this challenging classification problem even using the same feature extraction methods.
Duarte, João V; Ribeiro, Maria J; Violante, Inês R; Cunha, Gil; Silva, Eduardo; Castelo-Branco, Miguel
2014-01-01
Neurofibromatosis Type 1 (NF1) is a common genetic condition associated with cognitive dysfunction. However, the pathophysiology of the NF1 cognitive deficits is not well understood. Abnormal brain structure, including increased total brain volume, white matter (WM) and grey matter (GM) abnormalities have been reported in the NF1 brain. These previous studies employed univariate model-driven methods preventing detection of subtle and spatially distributed differences in brain anatomy. Multivariate pattern analysis allows the combination of information from multiple spatial locations yielding a discriminative power beyond that of single voxels. Here we investigated for the first time subtle anomalies in the NF1 brain, using a multivariate data-driven classification approach. We used support vector machines (SVM) to classify whole-brain GM and WM segments of structural T1 -weighted MRI scans from 39 participants with NF1 and 60 non-affected individuals, divided in children/adolescents and adults groups. We also employed voxel-based morphometry (VBM) as a univariate gold standard to study brain structural differences. SVM classifiers correctly classified 94% of cases (sensitivity 92%; specificity 96%) revealing the existence of brain structural anomalies that discriminate NF1 individuals from controls. Accordingly, VBM analysis revealed structural differences in agreement with the SVM weight maps representing the most relevant brain regions for group discrimination. These included the hippocampus, basal ganglia, thalamus, and visual cortex. This multivariate data-driven analysis thus identified subtle anomalies in brain structure in the absence of visible pathology. Our results provide further insight into the neuroanatomical correlates of known features of the cognitive phenotype of NF1. Copyright © 2012 Wiley Periodicals, Inc.
[Discrimination of donkey meat by NIR and chemometrics].
Niu, Xiao-Ying; Shao, Li-Min; Dong, Fang; Zhao, Zhi-Lei; Zhu, Yan
2014-10-01
Donkey meat samples (n = 167) from different parts of donkey body (neck, costalia, rump, and tendon), beef (n = 47), pork (n = 51) and mutton (n = 32) samples were used to establish near-infrared reflectance spectroscopy (NIR) classification models in the spectra range of 4,000~12,500 cm(-1). The accuracies of classification models constructed by Mahalanobis distances analysis, soft independent modeling of class analogy (SIMCA) and least squares-support vector machine (LS-SVM), respectively combined with pretreatment of Savitzky-Golay smooth (5, 15 and 25 points) and derivative (first and second), multiplicative scatter correction and standard normal variate, were compared. The optimal models for intact samples were obtained by Mahalanobis distances analysis with the first 11 principal components (PCs) from original spectra as inputs and by LS-SVM with the first 6 PCs as inputs, and correctly classified 100% of calibration set and 98. 96% of prediction set. For minced samples of 7 mm diameter the optimal result was attained by LS-SVM with the first 5 PCs from original spectra as inputs, which gained an accuracy of 100% for calibration and 97.53% for prediction. For minced diameter of 5 mm SIMCA model with the first 8 PCs from original spectra as inputs correctly classified 100% of calibration and prediction. And for minced diameter of 3 mm Mahalanobis distances analysis and SIMCA models both achieved 100% accuracy for calibration and prediction respectively with the first 7 and 9 PCs from original spectra as inputs. And in these models, donkey meat samples were all correctly classified with 100% either in calibration or prediction. The results show that it is feasible that NIR with chemometrics methods is used to discriminate donkey meat from the else meat.
Predicting conversion from MCI to AD using resting-state fMRI, graph theoretical approach and SVM.
Hojjati, Seyed Hani; Ebrahimzadeh, Ata; Khazaee, Ali; Babajani-Feremi, Abbas
2017-04-15
We investigated identifying patients with mild cognitive impairment (MCI) who progress to Alzheimer's disease (AD), MCI converter (MCI-C), from those with MCI who do not progress to AD, MCI non-converter (MCI-NC), based on resting-state fMRI (rs-fMRI). Graph theory and machine learning approach were utilized to predict progress of patients with MCI to AD using rs-fMRI. Eighteen MCI converts (average age 73.6 years; 11 male) and 62 age-matched MCI non-converters (average age 73.0 years, 28 male) were included in this study. We trained and tested a support vector machine (SVM) to classify MCI-C from MCI-NC using features constructed based on the local and global graph measures. A novel feature selection algorithm was developed and utilized to select an optimal subset of features. Using subset of optimal features in SVM, we classified MCI-C from MCI-NC with an accuracy, sensitivity, specificity, and the area under the receiver operating characteristic (ROC) curve of 91.4%, 83.24%, 90.1%, and 0.95, respectively. Furthermore, results of our statistical analyses were used to identify the affected brain regions in AD. To the best of our knowledge, this is the first study that combines the graph measures (constructed based on rs-fMRI) with machine learning approach and accurately classify MCI-C from MCI-NC. Results of this study demonstrate potential of the proposed approach for early AD diagnosis and demonstrate capability of rs-fMRI to predict conversion from MCI to AD by identifying affected brain regions underlying this conversion. Copyright © 2017 Elsevier B.V. All rights reserved.
Biomedical named entity extraction: some issues of corpus compatibilities.
Ekbal, Asif; Saha, Sriparna; Sikdar, Utpal Kumar
2013-01-01
Named Entity (NE) extraction is one of the most fundamental and important tasks in biomedical information extraction. It involves identification of certain entities from text and their classification into some predefined categories. In the biomedical community, there is yet no general consensus regarding named entity (NE) annotation; thus, it is very difficult to compare the existing systems due to corpus incompatibilities. Due to this problem we can not also exploit the advantages of using different corpora together. In our present work we address the issues of corpus compatibilities, and use a single objective optimization (SOO) based classifier ensemble technique that uses the search capability of genetic algorithm (GA) for NE extraction in biomedicine. We hypothesize that the reliability of predictions of each classifier differs among the various output classes. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) frameworks to build a number of models depending upon the various representations of the set of features and/or feature templates. It is to be noted that we tried to extract the features without using any deep domain knowledge and/or resources. In order to assess the challenges of corpus compatibilities, we experiment with the different benchmark datasets and their various combinations. Comparison results with the existing approaches prove the efficacy of the used technique. GA based ensemble achieves around 2% performance improvements over the individual classifiers. Degradation in performance on the integrated corpus clearly shows the difficulties of the task. In summary, our used ensemble based approach attains the state-of-the-art performance levels for entity extraction in three different kinds of biomedical datasets. The possible reasons behind the better performance in our used approach are the (i). use of variety and rich features as described in Subsection "Features for named entity extraction"; (ii) use of GA based classifier ensemble technique to combine the outputs of multiple classifiers.
Prediction of cell penetrating peptides by support vector machines.
Sanders, William S; Johnston, C Ian; Bridges, Susan M; Burgess, Shane C; Willeford, Kenneth O
2011-07-01
Cell penetrating peptides (CPPs) are those peptides that can transverse cell membranes to enter cells. Once inside the cell, different CPPs can localize to different cellular components and perform different roles. Some generate pore-forming complexes resulting in the destruction of cells while others localize to various organelles. Use of machine learning methods to predict potential new CPPs will enable more rapid screening for applications such as drug delivery. We have investigated the influence of the composition of training datasets on the ability to classify peptides as cell penetrating using support vector machines (SVMs). We identified 111 known CPPs and 34 known non-penetrating peptides from the literature and commercial vendors and used several approaches to build training data sets for the classifiers. Features were calculated from the datasets using a set of basic biochemical properties combined with features from the literature determined to be relevant in the prediction of CPPs. Our results using different training datasets confirm the importance of a balanced training set with approximately equal number of positive and negative examples. The SVM based classifiers have greater classification accuracy than previously reported methods for the prediction of CPPs, and because they use primary biochemical properties of the peptides as features, these classifiers provide insight into the properties needed for cell-penetration. To confirm our SVM classifications, a subset of peptides classified as either penetrating or non-penetrating was selected for synthesis and experimental validation. Of the synthesized peptides predicted to be CPPs, 100% of these peptides were shown to be penetrating.
Khazendar, S; Sayasneh, A; Al-Assam, H; Du, H; Kaijser, J; Ferrara, L; Timmerman, D; Jassim, S; Bourne, T
2015-01-01
Preoperative characterisation of ovarian masses into benign or malignant is of paramount importance to optimise patient management. In this study, we developed and validated a computerised model to characterise ovarian masses as benign or malignant. Transvaginal 2D B mode static ultrasound images of 187 ovarian masses with known histological diagnosis were included. Images were first pre-processed and enhanced, and Local Binary Pattern Histograms were then extracted from 2 × 2 blocks of each image. A Support Vector Machine (SVM) was trained using stratified cross validation with randomised sampling. The process was repeated 15 times and in each round 100 images were randomly selected. The SVM classified the original non-treated static images as benign or malignant masses with an average accuracy of 0.62 (95% CI: 0.59-0.65). This performance significantly improved to an average accuracy of 0.77 (95% CI: 0.75-0.79) when images were pre-processed, enhanced and treated with a Local Binary Pattern operator (mean difference 0.15: 95% 0.11-0.19, p < 0.0001, two-tailed t test). We have shown that an SVM can classify static 2D B mode ultrasound images of ovarian masses into benign and malignant categories. The accuracy improves if texture related LBP features extracted from the images are considered.
Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM.
Davari Dolatabadi, Azam; Khadem, Siamak Esmael Zadeh; Asl, Babak Mohammadzadeh
2017-01-01
Currently Coronary Artery Disease (CAD) is one of the most prevalent diseases, and also can lead to death, disability and economic loss in patients who suffer from cardiovascular disease. Diagnostic procedures of this disease by medical teams are typically invasive, although they do not satisfy the required accuracy. In this study, we have proposed a methodology for the automatic diagnosis of normal and Coronary Artery Disease conditions using Heart Rate Variability (HRV) signal extracted from electrocardiogram (ECG). The features are extracted from HRV signal in time, frequency and nonlinear domains. The Principal Component Analysis (PCA) is applied to reduce the dimension of the extracted features in order to reduce computational complexity and to reveal the hidden information underlaid in the data. Finally, Support Vector Machine (SVM) classifier has been utilized to classify two classes of data using the extracted distinguishing features. In this paper, parameters of the SVM have been optimized in order to improve the accuracy. Provided reports in this paper indicate that the detection of CAD class from normal class using the proposed algorithm was performed with accuracy of 99.2%, sensitivity of 98.43%, and specificity of 100%. This study has shown that methods which are based on the feature extraction of the biomedical signals are an appropriate approach to predict the health situation of the patients. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Kalatzis, I; Piliouras, N; Ventouras, E; Papageorgiou, C C; Rabavilas, A D; Cavouras, D
2004-07-01
A computer-based classification system has been designed capable of distinguishing patients with depression from normal controls by event-related potential (ERP) signals using the P600 component. Clinical material comprised 25 patients with depression and an equal number of gender and aged-matched healthy controls. All subjects were evaluated by a computerized version of the digit span Wechsler test. EEG activity was recorded and digitized from 15 scalp electrodes (leads). Seventeen features related to the shape of the waveform were generated and were employed in the design of an optimum support vector machine (SVM) classifier at each lead. The outcomes of those SVM classifiers were selected by a majority-vote engine (MVE), which assigned each subject to either the normal or depressive classes. MVE classification accuracy was 94% when using all leads and 92% or 82% when using only the right or left scalp leads, respectively. These findings support the hypothesis that depression is associated with dysfunction of right hemisphere mechanisms mediating the processing of information that assigns a specific response to a specific stimulus, as those mechanisms are reflected by the P600 component of ERPs. Our method may aid the further understanding of the neurophysiology underlying depression, due to its potentiality to integrate theories of depression and psychophysiology.
Dey, Susmita; Sarkar, Ripon; Chatterjee, Kabita; Datta, Pallab; Barui, Ananya; Maity, Santi P
2017-04-01
Habitual smokers are known to be at higher risk for developing oral cancer, which is increasing at an alarming rate globally. Conventionally, oral cancer is associated with high mortality rates, although recent reports show the improved survival outcomes by early diagnosis of disease. An effective prediction system which will enable to identify the probability of cancer development amongst the habitual smokers, is thus expected to benefit sizable number of populations. Present work describes a non-invasive, integrated method for early detection of cellular abnormalities based on analysis of different cyto-morphological features of exfoliative oral epithelial cells. Differential interference contrast (DIC) microscopy provides a potential optical tool as this mode provides a pseudo three dimensional (3-D) image with detailed morphological and textural features obtained from noninvasive, label free epithelial cells. For segmentation of DIC images, gradient vector flow snake model active contour process has been adopted. To evaluate cellular abnormalities amongst habitual smokers, the selected morphological and textural features of epithelial cells are compared with the non-smoker (-ve control group) group and clinically diagnosed pre-cancer patients (+ve control group) using support vector machine (SVM) classifier. Accuracy of the developed SVM based classification has been found to be 86% with 80% sensitivity and 89% specificity in classifying the features from the volunteers having smoking habit. Copyright © 2017 Elsevier Ltd. All rights reserved.
Super-resolution mapping using multi-viewing CHRIS/PROBA data
NASA Astrophysics Data System (ADS)
Dwivedi, Manish; Kumar, Vinay
2016-04-01
High-spatial resolution Remote Sensing (RS) data provides detailed information which ensures high-definition visual image analysis of earth surface features. These data sets also support improved information extraction capabilities at a fine scale. In order to improve the spatial resolution of coarser resolution RS data, the Super Resolution Reconstruction (SRR) technique has become widely acknowledged which focused on multi-angular image sequences. In this study multi-angle CHRIS/PROBA data of Kutch area is used for SR image reconstruction to enhance the spatial resolution from 18 m to 6m in the hope to obtain a better land cover classification. Various SR approaches like Projection onto Convex Sets (POCS), Robust, Iterative Back Projection (IBP), Non-Uniform Interpolation and Structure-Adaptive Normalized Convolution (SANC) chosen for this study. Subjective assessment through visual interpretation shows substantial improvement in land cover details. Quantitative measures including peak signal to noise ratio and structural similarity are used for the evaluation of the image quality. It was observed that SANC SR technique using Vandewalle algorithm for the low resolution image registration outperformed the other techniques. After that SVM based classifier is used for the classification of SRR and data resampled to 6m spatial resolution using bi-cubic interpolation. A comparative analysis is carried out between classified data of bicubic interpolated and SR derived images of CHRIS/PROBA and SR derived classified data have shown a significant improvement of 10-12% in the overall accuracy. The results demonstrated that SR methods is able to improve spatial detail of multi-angle images as well as the classification accuracy.
Bisgin, Halil; Bera, Tanmay; Ding, Hongjian; Semey, Howard G; Wu, Leihong; Liu, Zhichao; Barnes, Amy E; Langley, Darryl A; Pava-Ripoll, Monica; Vyas, Himansu J; Tong, Weida; Xu, Joshua
2018-04-25
Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.
Distributed support vector machine in master-slave mode.
Chen, Qingguo; Cao, Feilong
2018-05-01
It is well known that the support vector machine (SVM) is an effective learning algorithm. The alternating direction method of multipliers (ADMM) algorithm has emerged as a powerful technique for solving distributed optimisation models. This paper proposes a distributed SVM algorithm in a master-slave mode (MS-DSVM), which integrates a distributed SVM and ADMM acting in a master-slave configuration where the master node and slave nodes are connected, meaning the results can be broadcasted. The distributed SVM is regarded as a regularised optimisation problem and modelled as a series of convex optimisation sub-problems that are solved by ADMM. Additionally, the over-relaxation technique is utilised to accelerate the convergence rate of the proposed MS-DSVM. Our theoretical analysis demonstrates that the proposed MS-DSVM has linear convergence, meaning it possesses the fastest convergence rate among existing standard distributed ADMM algorithms. Numerical examples demonstrate that the convergence and accuracy of the proposed MS-DSVM are superior to those of existing methods under the ADMM framework. Copyright © 2018 Elsevier Ltd. All rights reserved.
Biometric identification based on feature fusion with PCA and SVM
NASA Astrophysics Data System (ADS)
Lefkovits, László; Lefkovits, Szidónia; Emerich, Simina
2018-04-01
Biometric identification is gaining ground compared to traditional identification methods. Many biometric measurements may be used for secure human identification. The most reliable among them is the iris pattern because of its uniqueness, stability, unforgeability and inalterability over time. The approach presented in this paper is a fusion of different feature descriptor methods such as HOG, LIOP, LBP, used for extracting iris texture information. The classifiers obtained through the SVM and PCA methods demonstrate the effectiveness of our system applied to one and both irises. The performances measured are highly accurate and foreshadow a fusion system with a rate of identification approaching 100% on the UPOL database.
A hybrid approach to select features and classify diseases based on medical data
NASA Astrophysics Data System (ADS)
AbdelLatif, Hisham; Luo, Jiawei
2018-03-01
Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms