Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng
2013-01-01
In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR.
Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng
2013-01-01
In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR. PMID:23536777
SVM and SVM Ensembles in Breast Cancer Prediction.
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
SVM and SVM Ensembles in Breast Cancer Prediction
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.
Classification of Multiple Chinese Liquors by Means of a QCM-based E-Nose and MDS-SVM Classifier.
Li, Qiang; Gu, Yu; Jia, Jing
2017-01-30
Chinese liquors are internationally well-known fermentative alcoholic beverages. They have unique flavors attributable to the use of various bacteria and fungi, raw materials, and production processes. Developing a novel, rapid, and reliable method to identify multiple Chinese liquors is of positive significance. This paper presents a pattern recognition system for classifying ten brands of Chinese liquors based on multidimensional scaling (MDS) and support vector machine (SVM) algorithms in a quartz crystal microbalance (QCM)-based electronic nose (e-nose) we designed. We evaluated the comprehensive performance of the MDS-SVM classifier that predicted all ten brands of Chinese liquors individually. The prediction accuracy (98.3%) showed superior performance of the MDS-SVM classifier over the back-propagation artificial neural network (BP-ANN) classifier (93.3%) and moving average-linear discriminant analysis (MA-LDA) classifier (87.6%). The MDS-SVM classifier has reasonable reliability, good fitting and prediction (generalization) performance in classification of the Chinese liquors. Taking both application of the e-nose and validation of the MDS-SVM classifier into account, we have thus created a useful method for the classification of multiple Chinese liquors.
Multi-view L2-SVM and its multi-view core vector machine.
Huang, Chengquan; Chung, Fu-lai; Wang, Shitong
2016-03-01
In this paper, a novel L2-SVM based classifier Multi-view L2-SVM is proposed to address multi-view classification tasks. The proposed Multi-view L2-SVM classifier does not have any bias in its objective function and hence has the flexibility like μ-SVC in the sense that the number of the yielded support vectors can be controlled by a pre-specified parameter. The proposed Multi-view L2-SVM classifier can make full use of the coherence and the difference of different views through imposing the consensus among multiple views to improve the overall classification performance. Besides, based on the generalized core vector machine GCVM, the proposed Multi-view L2-SVM classifier is extended into its GCVM version MvCVM which can realize its fast training on large scale multi-view datasets, with its asymptotic linear time complexity with the sample size and its space complexity independent of the sample size. Our experimental results demonstrated the effectiveness of the proposed Multi-view L2-SVM classifier for small scale multi-view datasets and the proposed MvCVM classifier for large scale multi-view datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.
Zhang, Ming-Huan; Ma, Jun-Shan; Shen, Ying; Chen, Ying
2016-09-01
This study aimed to investigate the optimal support vector machines (SVM)-based classifier of duchenne muscular dystrophy (DMD) magnetic resonance imaging (MRI) images. T1-weighted (T1W) and T2-weighted (T2W) images of the 15 boys with DMD and 15 normal controls were obtained. Textural features of the images were extracted and wavelet decomposed, and then, principal features were selected. Scale transform was then performed for MRI images. Afterward, SVM-based classifiers of MRI images were analyzed based on the radical basis function and decomposition levels. The cost (C) parameter and kernel parameter [Formula: see text] were used for classification. Then, the optimal SVM-based classifier, expressed as [Formula: see text]), was identified by performance evaluation (sensitivity, specificity and accuracy). Eight of 12 textural features were selected as principal features (eigenvalues [Formula: see text]). The 16 SVM-based classifiers were obtained using combination of (C, [Formula: see text]), and those with lower C and [Formula: see text] values showed higher performances, especially classifier of [Formula: see text]). The SVM-based classifiers of T1W images showed higher performance than T1W images at the same decomposition level. The T1W images in classifier of [Formula: see text]) at level 2 decomposition showed the highest performance of all, and its overall correct sensitivity, specificity, and accuracy reached 96.9, 97.3, and 97.1 %, respectively. The T1W images in SVM-based classifier [Formula: see text] at level 2 decomposition showed the highest performance of all, demonstrating that it was the optimal classification for the diagnosis of DMD.
Cho, Ming-Yuan; Hoang, Thi Thom
2017-01-01
Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.
Pirooznia, Mehdi; Deng, Youping
2006-12-12
Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.
Mourão-Miranda, Janaina; Hardoon, David R.; Hahn, Tim; Marquand, Andre F.; Williams, Steve C.R.; Shawe-Taylor, John; Brammer, Michael
2011-01-01
Pattern recognition approaches, such as the Support Vector Machine (SVM), have been successfully used to classify groups of individuals based on their patterns of brain activity or structure. However these approaches focus on finding group differences and are not applicable to situations where one is interested in accessing deviations from a specific class or population. In the present work we propose an application of the one-class SVM (OC-SVM) to investigate if patterns of fMRI response to sad facial expressions in depressed patients would be classified as outliers in relation to patterns of healthy control subjects. We defined features based on whole brain voxels and anatomical regions. In both cases we found a significant correlation between the OC-SVM predictions and the patients' Hamilton Rating Scale for Depression (HRSD), i.e. the more depressed the patients were the more of an outlier they were. In addition the OC-SVM split the patient groups into two subgroups whose membership was associated with future response to treatment. When applied to region-based features the OC-SVM classified 52% of patients as outliers. However among the patients classified as outliers 70% did not respond to treatment and among those classified as non-outliers 89% responded to treatment. In addition 89% of the healthy controls were classified as non-outliers. PMID:21723950
NASA Astrophysics Data System (ADS)
Du, Peijun; Tan, Kun; Xing, Xiaoshi
2010-12-01
Combining Support Vector Machine (SVM) with wavelet analysis, we constructed wavelet SVM (WSVM) classifier based on wavelet kernel functions in Reproducing Kernel Hilbert Space (RKHS). In conventional kernel theory, SVM is faced with the bottleneck of kernel parameter selection which further results in time-consuming and low classification accuracy. The wavelet kernel in RKHS is a kind of multidimensional wavelet function that can approximate arbitrary nonlinear functions. Implications on semiparametric estimation are proposed in this paper. Airborne Operational Modular Imaging Spectrometer II (OMIS II) hyperspectral remote sensing image with 64 bands and Reflective Optics System Imaging Spectrometer (ROSIS) data with 115 bands were used to experiment the performance and accuracy of the proposed WSVM classifier. The experimental results indicate that the WSVM classifier can obtain the highest accuracy when using the Coiflet Kernel function in wavelet transform. In contrast with some traditional classifiers, including Spectral Angle Mapping (SAM) and Minimum Distance Classification (MDC), and SVM classifier using Radial Basis Function kernel, the proposed wavelet SVM classifier using the wavelet kernel function in Reproducing Kernel Hilbert Space is capable of improving classification accuracy obviously.
Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA
Ma, Xiaoqi
2015-01-01
A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. PMID:26543867
NASA Astrophysics Data System (ADS)
Guo, Yiqing; Jia, Xiuping; Paull, David
2018-06-01
The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.
The construction of support vector machine classifier using the firefly algorithm.
Chao, Chih-Feng; Horng, Ming-Huwi
2015-01-01
The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.
The Construction of Support Vector Machine Classifier Using the Firefly Algorithm
Chao, Chih-Feng; Horng, Ming-Huwi
2015-01-01
The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy. PMID:25802511
Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang
2018-06-14
Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.
NASA Astrophysics Data System (ADS)
Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.
2017-10-01
Crop maps are essential inputs for the agricultural planning done at various governmental and agribusinesses agencies. Remote sensing offers timely and costs efficient technologies to identify and map crop types over large areas. Among the plethora of classification methods, Support Vector Machine (SVM) and Random Forest (RF) are widely used because of their proven performance. In this work, we study the synergic use of both methods by introducing a random forest kernel (RFK) in an SVM classifier. A time series of multispectral WorldView-2 images acquired over Mali (West Africa) in 2014 was used to develop our case study. Ground truth containing five common crop classes (cotton, maize, millet, peanut, and sorghum) were collected at 45 farms and used to train and test the classifiers. An SVM with the standard Radial Basis Function (RBF) kernel, a RF, and an SVM-RFK were trained and tested over 10 random training and test subsets generated from the ground data. Results show that the newly proposed SVM-RFK classifier can compete with both RF and SVM-RBF. The overall accuracies based on the spectral bands only are of 83, 82 and 83% respectively. Adding vegetation indices to the analysis result in the classification accuracy of 82, 81 and 84% for SVM-RFK, RF, and SVM-RBF respectively. Overall, it can be observed that the newly tested RFK can compete with SVM-RBF and RF classifiers in terms of classification accuracy.
Research on Classification of Chinese Text Data Based on SVM
NASA Astrophysics Data System (ADS)
Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao
2017-09-01
Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.
Generative Models for Similarity-based Classification
2007-01-01
NC), local nearest centroid (local NC), k-nearest neighbors ( kNN ), and condensed nearest neighbors (CNN) are all similarity-based classifiers which...vector machine to the k nearest neighbors of the test sample [80]. The SVM- KNN method was developed to address the robustness and dimensionality...concerns that afflict nearest neighbors and SVMs. Similarly to the nearest-means classifier, the SVM- KNN is a hybrid local and global classifier developed
SVM based colon polyps classifier in a wireless active stereo endoscope.
Ayoub, J; Granado, B; Mhanna, Y; Romain, O
2010-01-01
This work focuses on the recognition of three-dimensional colon polyps captured by an active stereo vision sensor. The detection algorithm consists of SVM classifier trained on robust feature descriptors. The study is related to Cyclope, this prototype sensor allows real time 3D object reconstruction and continues to be optimized technically to improve its classification task by differentiation between hyperplastic and adenomatous polyps. Experimental results were encouraging and show correct classification rate of approximately 97%. The work contains detailed statistics about the detection rate and the computing complexity. Inspired by intensity histogram, the work shows a new approach that extracts a set of features based on depth histogram and combines stereo measurement with SVM classifiers to correctly classify benign and malignant polyps.
Optimization of Support Vector Machine (SVM) for Object Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew; Dhingra, Neil; Lu, Thomas T.; Chao, Tien-Hsin
2012-01-01
The Support Vector Machine (SVM) is a powerful algorithm, useful in classifying data into species. The SVMs implemented in this research were used as classifiers for the final stage in a Multistage Automatic Target Recognition (ATR) system. A single kernel SVM known as SVMlight, and a modified version known as a SVM with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SVM as a method for classification. From trial to trial, SVM produces consistent results.
Application of machine learning on brain cancer multiclass classification
NASA Astrophysics Data System (ADS)
Panca, V.; Rustam, Z.
2017-07-01
Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.
SVM classifier on chip for melanoma detection.
Afifi, Shereen; GholamHosseini, Hamid; Sinha, Roopak
2017-07-01
Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM-based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy.
Anam, Khairul; Al-Jumaily, Adel
2017-01-01
The success of myoelectric pattern recognition (M-PR) mostly relies on the features extracted and classifier employed. This paper proposes and evaluates a fast classifier, extreme learning machine (ELM), to classify individual and combined finger movements on amputees and non-amputees. ELM is a single hidden layer feed-forward network (SLFN) that avoids iterative learning by determining input weights randomly and output weights analytically. Therefore, it can accelerate the training time of SLFNs. In addition to the classifier evaluation, this paper evaluates various feature combinations to improve the performance of M-PR and investigate some feature projections to improve the class separability of the features. Different from other studies on the implementation of ELM in the myoelectric controller, this paper presents a complete and thorough investigation of various types of ELMs including the node-based and kernel-based ELM. Furthermore, this paper provides comparisons of ELMs and other well-known classifiers such as linear discriminant analysis (LDA), k-nearest neighbour (kNN), support vector machine (SVM) and least-square SVM (LS-SVM). The experimental results show the most accurate ELM classifier is radial basis function ELM (RBF-ELM). The comparison of RBF-ELM and other well-known classifiers shows that RBF-ELM is as accurate as SVM and LS-SVM but faster than the SVM family; it is superior to LDA and kNN. The experimental results also indicate that the accuracy gap of the M-PR on the amputees and non-amputees is not too much with the accuracy of 98.55% on amputees and 99.5% on the non-amputees using six electromyography (EMG) channels. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram
2015-08-01
In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.
[Hyperspectral remote sensing image classification based on SVM optimized by clonal selection].
Liu, Qing-Jie; Jing, Lin-Hai; Wang, Meng-Fei; Lin, Qi-Zhong
2013-03-01
Model selection for support vector machine (SVM) involving kernel and the margin parameter values selection is usually time-consuming, impacts training efficiency of SVM model and final classification accuracies of SVM hyperspectral remote sensing image classifier greatly. Firstly, based on combinatorial optimization theory and cross-validation method, artificial immune clonal selection algorithm is introduced to the optimal selection of SVM (CSSVM) kernel parameter a and margin parameter C to improve the training efficiency of SVM model. Then an experiment of classifying AVIRIS in India Pine site of USA was performed for testing the novel CSSVM, as well as a traditional SVM classifier with general Grid Searching cross-validation method (GSSVM) for comparison. And then, evaluation indexes including SVM model training time, classification overall accuracy (OA) and Kappa index of both CSSVM and GSSVM were all analyzed quantitatively. It is demonstrated that OA of CSSVM on test samples and whole image are 85.1% and 81.58, the differences from that of GSSVM are both within 0.08% respectively; And Kappa indexes reach 0.8213 and 0.7728, the differences from that of GSSVM are both within 0.001; While the ratio of model training time of CSSVM and GSSVM is between 1/6 and 1/10. Therefore, CSSVM is fast and accurate algorithm for hyperspectral image classification and is superior to GSSVM.
An ensemble of SVM classifiers based on gene pairs.
Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin
2013-07-01
In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.
SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.
SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306
Wang, Lei; Pedersen, Peder C; Agu, Emmanuel; Strong, Diane M; Tulu, Bengisu
2017-09-01
The standard chronic wound assessment method based on visual examination is potentially inaccurate and also represents a significant clinical workload. Hence, computer-based systems providing quantitative wound assessment may be valuable for accurately monitoring wound healing status, with the wound area the best suited for automated analysis. Here, we present a novel approach, using support vector machines (SVM) to determine the wound boundaries on foot ulcer images captured with an image capture box, which provides controlled lighting and range. After superpixel segmentation, a cascaded two-stage classifier operates as follows: in the first stage, a set of k binary SVM classifiers are trained and applied to different subsets of the entire training images dataset, and incorrectly classified instances are collected. In the second stage, another binary SVM classifier is trained on the incorrectly classified set. We extracted various color and texture descriptors from superpixels that are used as input for each stage in the classifier training. Specifically, color and bag-of-word representations of local dense scale invariant feature transformation features are descriptors for ruling out irrelevant regions, and color and wavelet-based features are descriptors for distinguishing healthy tissue from wound regions. Finally, the detected wound boundary is refined by applying the conditional random field method. We have implemented the wound classification on a Nexus 5 smartphone platform, except for training which was done offline. Results are compared with other classifiers and show that our approach provides high global performance rates (average sensitivity = 73.3%, specificity = 94.6%) and is sufficiently efficient for a smartphone-based image analysis.
Support vector machines-based fault diagnosis for turbo-pump rotor
NASA Astrophysics Data System (ADS)
Yuan, Sheng-Fa; Chu, Fu-Lei
2006-05-01
Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.
Liu, Jingfang; Zhang, Pengzhu; Lu, Yingjie
2014-11-01
User-generated medical messages on Internet contain extensive information related to adverse drug reactions (ADRs) and are known as valuable resources for post-marketing drug surveillance. The aim of this study was to find an effective method to identify messages related to ADRs automatically from online user reviews. We conducted experiments on online user reviews using different feature set and different classification technique. Firstly, the messages from three communities, allergy community, schizophrenia community and pain management community, were collected, the 3000 messages were annotated. Secondly, the N-gram-based features set and medical domain-specific features set were generated. Thirdly, three classification techniques, SVM, C4.5 and Naïve Bayes, were used to perform classification tasks separately. Finally, we evaluated the performance of different method using different feature set and different classification technique by comparing the metrics including accuracy and F-measure. In terms of accuracy, the accuracy of SVM classifier was higher than 0.8, the accuracy of C4.5 classifier or Naïve Bayes classifier was lower than 0.8; meanwhile, the combination feature sets including n-gram-based feature set and domain-specific feature set consistently outperformed single feature set. In terms of F-measure, the highest F-measure is 0.895 which was achieved by using combination feature sets and a SVM classifier. In all, we can get the best classification performance by using combination feature sets and SVM classifier. By using combination feature sets and SVM classifier, we can get an effective method to identify messages related to ADRs automatically from online user reviews.
DCS-SVM: a novel semi-automated method for human brain MR image segmentation.
Ahmadvand, Ali; Daliri, Mohammad Reza; Hajiali, Mohammadtaghi
2017-11-27
In this paper, a novel method is proposed which appropriately segments magnetic resonance (MR) brain images into three main tissues. This paper proposes an extension of our previous work in which we suggested a combination of multiple classifiers (CMC)-based methods named dynamic classifier selection-dynamic local training local Tanimoto index (DCS-DLTLTI) for MR brain image segmentation into three main cerebral tissues. This idea is used here and a novel method is developed that tries to use more complex and accurate classifiers like support vector machine (SVM) in the ensemble. This work is challenging because the CMC-based methods are time consuming, especially on huge datasets like three-dimensional (3D) brain MR images. Moreover, SVM is a powerful method that is used for modeling datasets with complex feature space, but it also has huge computational cost for big datasets, especially those with strong interclass variability problems and with more than two classes such as 3D brain images; therefore, we cannot use SVM in DCS-DLTLTI. Therefore, we propose a novel approach named "DCS-SVM" to use SVM in DCS-DLTLTI to improve the accuracy of segmentation results. The proposed method is applied on well-known datasets of the Internet Brain Segmentation Repository (IBSR) and promising results are obtained.
A method of distributed avionics data processing based on SVM classifier
NASA Astrophysics Data System (ADS)
Guo, Hangyu; Wang, Jinyan; Kang, Minyang; Xu, Guojing
2018-03-01
Under the environment of system combat, in order to solve the problem on management and analysis of the massive heterogeneous data on multi-platform avionics system, this paper proposes a management solution which called avionics "resource cloud" based on big data technology, and designs an aided decision classifier based on SVM algorithm. We design an experiment with STK simulation, the result shows that this method has a high accuracy and a broad application prospect.
Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li
2011-01-01
Background Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Methodology/Principal Findings Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. Conclusions/Significance The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice. PMID:21359184
Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li
2011-02-16
Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice.
Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin
2013-03-01
Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Tuning support vector machines for minimax and Neyman-Pearson classification.
Davenport, Mark A; Baraniuk, Richard G; Scott, Clayton D
2010-10-01
This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and Neyman-Pearson criteria. In principle, these criteria can be optimized in a straightforward way using a cost-sensitive SVM. In practice, however, because these criteria require especially accurate error estimation, standard techniques for tuning SVM parameters, such as cross-validation, can lead to poor classifier performance. To address this issue, we first prove that the usual cost-sensitive SVM, here called the 2C-SVM, is equivalent to another formulation called the 2nu-SVM. We then exploit a characterization of the 2nu-SVM parameter space to develop a simple yet powerful approach to error estimation based on smoothing. In an extensive experimental study, we demonstrate that smoothing significantly improves the accuracy of cross-validation error estimates, leading to dramatic performance gains. Furthermore, we propose coordinate descent strategies that offer significant gains in computational efficiency, with little to no loss in performance.
Age and gender estimation using Region-SIFT and multi-layered SVM
NASA Astrophysics Data System (ADS)
Kim, Hyunduk; Lee, Sang-Heon; Sohn, Myoung-Kyu; Hwang, Byunghun
2018-04-01
In this paper, we propose an age and gender estimation framework using the region-SIFT feature and multi-layered SVM classifier. The suggested framework entails three processes. The first step is landmark based face alignment. The second step is the feature extraction step. In this step, we introduce the region-SIFT feature extraction method based on facial landmarks. First, we define sub-regions of the face. We then extract SIFT features from each sub-region. In order to reduce the dimensions of features we employ a Principal Component Analysis (PCA) and a Linear Discriminant Analysis (LDA). Finally, we classify age and gender using a multi-layered Support Vector Machines (SVM) for efficient classification. Rather than performing gender estimation and age estimation independently, the use of the multi-layered SVM can improve the classification rate by constructing a classifier that estimate the age according to gender. Moreover, we collect a dataset of face images, called by DGIST_C, from the internet. A performance evaluation of proposed method was performed with the FERET database, CACD database, and DGIST_C database. The experimental results demonstrate that the proposed approach classifies age and performs gender estimation very efficiently and accurately.
Multiclass Classification of Cardiac Arrhythmia Using Improved Feature Selection and SVM Invariants.
Mustaqeem, Anam; Anwar, Syed Muhammad; Majid, Muahammad
2018-01-01
Arrhythmia is considered a life-threatening disease causing serious health issues in patients, when left untreated. An early diagnosis of arrhythmias would be helpful in saving lives. This study is conducted to classify patients into one of the sixteen subclasses, among which one class represents absence of disease and the other fifteen classes represent electrocardiogram records of various subtypes of arrhythmias. The research is carried out on the dataset taken from the University of California at Irvine Machine Learning Data Repository. The dataset contains a large volume of feature dimensions which are reduced using wrapper based feature selection technique. For multiclass classification, support vector machine (SVM) based approaches including one-against-one (OAO), one-against-all (OAA), and error-correction code (ECC) are employed to detect the presence and absence of arrhythmias. The SVM method results are compared with other standard machine learning classifiers using varying parameters and the performance of the classifiers is evaluated using accuracy, kappa statistics, and root mean square error. The results show that OAO method of SVM outperforms all other classifiers by achieving an accuracy rate of 81.11% when used with 80/20 data split and 92.07% using 90/10 data split option.
Generalized SMO algorithm for SVM-based multitask learning.
Cai, Feng; Cherkassky, Vladimir
2012-06-01
Exploiting additional information to improve traditional inductive learning is an active research area in machine learning. In many supervised-learning applications, training data can be naturally separated into several groups, and incorporating this group information into learning may improve generalization. Recently, Vapnik proposed a general approach to formalizing such problems, known as "learning with structured data" and its support vector machine (SVM) based optimization formulation called SVM+. Liang and Cherkassky showed the connection between SVM+ and multitask learning (MTL) approaches in machine learning, and proposed an SVM-based formulation for MTL called SVM+MTL for classification. Training the SVM+MTL classifier requires the solution of a large quadratic programming optimization problem which scales as O(n(3)) with sample size n. So there is a need to develop computationally efficient algorithms for implementing SVM+MTL. This brief generalizes Platt's sequential minimal optimization (SMO) algorithm to the SVM+MTL setting. Empirical results show that, for typical SVM+MTL problems, the proposed generalized SMO achieves over 100 times speed-up, in comparison with general-purpose optimization routines.
Semisupervised learning using Bayesian interpretation: application to LS-SVM.
Adankon, Mathias M; Cheriet, Mohamed; Biem, Alain
2011-04-01
Bayesian reasoning provides an ideal basis for representing and manipulating uncertain knowledge, with the result that many interesting algorithms in machine learning are based on Bayesian inference. In this paper, we use the Bayesian approach with one and two levels of inference to model the semisupervised learning problem and give its application to the successful kernel classifier support vector machine (SVM) and its variant least-squares SVM (LS-SVM). Taking advantage of Bayesian interpretation of LS-SVM, we develop a semisupervised learning algorithm for Bayesian LS-SVM using our approach based on two levels of inference. Experimental results on both artificial and real pattern recognition problems show the utility of our method.
NASA Astrophysics Data System (ADS)
Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore
2017-10-01
This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.
An assessment of support vector machines for land cover classification
Huang, C.; Davis, L.S.; Townshend, J.R.G.
2002-01-01
The support vector machine (SVM) is a group of theoretically superior machine learning algorithms. It was found competitive with the best available machine learning algorithms in classifying high-dimensional data sets. This paper gives an introduction to the theoretical development of the SVM and an experimental evaluation of its accuracy, stability and training speed in deriving land cover classifications from satellite images. The SVM was compared to three other popular classifiers, including the maximum likelihood classifier (MLC), neural network classifiers (NNC) and decision tree classifiers (DTC). The impacts of kernel configuration on the performance of the SVM and of the selection of training data and input variables on the four classifiers were also evaluated in this experiment.
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
Understanding user intents in online health forums.
Zhang, Thomas; Cho, Jason H D; Zhai, Chengxiang
2015-07-01
Online health forums provide a convenient way for patients to obtain medical information and connect with physicians and peers outside of clinical settings. However, large quantities of unstructured and diversified content generated on these forums make it difficult for users to digest and extract useful information. Understanding user intents would enable forums to find and recommend relevant information to users by filtering out threads that do not match particular intents. In this paper, we derive a taxonomy of intents to capture user information needs in online health forums and propose novel pattern-based features for use with a multiclass support vector machine (SVM) classifier to classify original thread posts according to their underlying intents. Since no dataset existed for this task, we employ three annotators to manually label a dataset of 1192 HealthBoards posts spanning four forum topics. Experimental results show that a SVM using pattern-based features is highly capable of identifying user intents in forum posts, reaching a maximum precision of 75%, and that a SVM-based hierarchical classifier using both pattern and word features outperforms its SVM counterpart that uses only word features. Furthermore, comparable classification performance can be achieved by training and testing on posts from different forum topics.
[Study on application of SVM in prediction of coronary heart disease].
Zhu, Yue; Wu, Jianghua; Fang, Ying
2013-12-01
Base on the data of blood pressure, plasma lipid, Glu and UA by physical test, Support Vector Machine (SVM) was applied to identify coronary heart disease (CHD) in patients and non-CHD individuals in south China population for guide of further prevention and treatment of the disease. Firstly, the SVM classifier was built using radial basis kernel function, liner kernel function and polynomial kernel function, respectively. Secondly, the SVM penalty factor C and kernel parameter sigma were optimized by particle swarm optimization (PSO) and then employed to diagnose and predict the CHD. By comparison with those from artificial neural network with the back propagation (BP) model, linear discriminant analysis, logistic regression method and non-optimized SVM, the overall results of our calculation demonstrated that the classification performance of optimized RBF-SVM model could be superior to other classifier algorithm with higher accuracy rate, sensitivity and specificity, which were 94.51%, 92.31% and 96.67%, respectively. So, it is well concluded that SVM could be used as a valid method for assisting diagnosis of CHD.
Analysis of miRNA expression profile based on SVM algorithm
NASA Astrophysics Data System (ADS)
Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian
2018-05-01
Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.
Cerebral 18F-FDG PET in macrophagic myofasciitis: An individual SVM-based approach.
Blanc-Durand, Paul; Van Der Gucht, Axel; Guedj, Eric; Abulizi, Mukedaisi; Aoun-Sebaiti, Mehdi; Lerman, Lionel; Verger, Antoine; Authier, François-Jérôme; Itti, Emmanuel
2017-01-01
Macrophagic myofasciitis (MMF) is an emerging condition with highly specific myopathological alterations. A peculiar spatial pattern of a cerebral glucose hypometabolism involving occipito-temporal cortex and cerebellum have been reported in patients with MMF; however, the full pattern is not systematically present in routine interpretation of scans, and with varying degrees of severity depending on the cognitive profile of patients. Aim was to generate and evaluate a support vector machine (SVM) procedure to classify patients between healthy or MMF 18F-FDG brain profiles. 18F-FDG PET brain images of 119 patients with MMF and 64 healthy subjects were retrospectively analyzed. The whole-population was divided into two groups; a training set (100 MMF, 44 healthy subjects) and a testing set (19 MMF, 20 healthy subjects). Dimensionality reduction was performed using a t-map from statistical parametric mapping (SPM) and a SVM with a linear kernel was trained on the training set. To evaluate the performance of the SVM classifier, values of sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV) and accuracy (Acc) were calculated. The SPM12 analysis on the training set exhibited the already reported hypometabolism pattern involving occipito-temporal and fronto-parietal cortices, limbic system and cerebellum. The SVM procedure, based on the t-test mask generated from the training set, correctly classified MMF patients of the testing set with following Se, Sp, PPV, NPV and Acc: 89%, 85%, 85%, 89%, and 87%. We developed an original and individual approach including a SVM to classify patients between healthy or MMF metabolic brain profiles using 18F-FDG-PET. Machine learning algorithms are promising for computer-aided diagnosis but will need further validation in prospective cohorts.
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition
Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina
2007-01-01
Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145
Solution Path for Pin-SVM Classifiers With Positive and Negative $\\tau $ Values.
Huang, Xiaolin; Shi, Lei; Suykens, Johan A K
2017-07-01
Applying the pinball loss in a support vector machine (SVM) classifier results in pin-SVM. The pinball loss is characterized by a parameter τ . Its value is related to the quantile level and different τ values are suitable for different problems. In this paper, we establish an algorithm to find the entire solution path for pin-SVM with different τ values. This algorithm is based on the fact that the optimal solution to pin-SVM is continuous and piecewise linear with respect to τ . We also show that the nonnegativity constraint on τ is not necessary, i.e., τ can be extended to negative values. First, in some applications, a negative τ leads to better accuracy. Second, τ = -1 corresponds to a simple solution that links SVM and the classical kernel rule. The solution for τ = -1 can be obtained directly and then be used as a starting point of the solution path. The proposed method efficiently traverses τ values through the solution path, and then achieves good performance by a suitable τ . In particular, τ = 0 corresponds to C-SVM, meaning that the traversal algorithm can output a result at least as good as C-SVM with respect to validation error.
LBP and SIFT based facial expression recognition
NASA Astrophysics Data System (ADS)
Sumer, Omer; Gunes, Ece O.
2015-02-01
This study compares the performance of local binary patterns (LBP) and scale invariant feature transform (SIFT) with support vector machines (SVM) in automatic classification of discrete facial expressions. Facial expression recognition is a multiclass classification problem and seven classes; happiness, anger, sadness, disgust, surprise, fear and comtempt are classified. Using SIFT feature vectors and linear SVM, 93.1% mean accuracy is acquired on CK+ database. On the other hand, the performance of LBP-based classifier with linear SVM is reported on SFEW using strictly person independent (SPI) protocol. Seven-class mean accuracy on SFEW is 59.76%. Experiments on both databases showed that LBP features can be used in a fairly descriptive way if a good localization of facial points and partitioning strategy are followed.
Power line identification of millimeter wave radar based on PCA-GS-SVM
NASA Astrophysics Data System (ADS)
Fang, Fang; Zhang, Guifeng; Cheng, Yansheng
2017-12-01
Aiming at the problem that the existing detection method can not effectively solve the security of UAV's ultra low altitude flight caused by power line, a power line recognition method based on grid search (GS) and the principal component analysis and support vector machine (PCA-SVM) is proposed. Firstly, the candidate line of Hough transform is reduced by PCA, and the main feature of candidate line is extracted. Then, upport vector machine (SVM is) optimized by grid search method (GS). Finally, using support vector machine classifier optimized parameters to classify the candidate line. MATLAB simulation results show that this method can effectively identify the power line and noise, and has high recognition accuracy and algorithm efficiency.
Identification and classification of similar looking food grains
NASA Astrophysics Data System (ADS)
Anami, B. S.; Biradar, Sunanda D.; Savakar, D. G.; Kulkarni, P. V.
2013-01-01
This paper describes the comparative study of Artificial Neural Network (ANN) and Support Vector Machine (SVM) classifiers by taking a case study of identification and classification of four pairs of similar looking food grains namely, Finger Millet, Mustard, Soyabean, Pigeon Pea, Aniseed, Cumin-seeds, Split Greengram and Split Blackgram. Algorithms are developed to acquire and process color images of these grains samples. The developed algorithms are used to extract 18 colors-Hue Saturation Value (HSV), and 42 wavelet based texture features. Back Propagation Neural Network (BPNN)-based classifier is designed using three feature sets namely color - HSV, wavelet-texture and their combined model. SVM model for color- HSV model is designed for the same set of samples. The classification accuracies ranging from 93% to 96% for color-HSV, ranging from 78% to 94% for wavelet texture model and from 92% to 97% for combined model are obtained for ANN based models. The classification accuracy ranging from 80% to 90% is obtained for color-HSV based SVM model. Training time required for the SVM based model is substantially lesser than ANN for the same set of images.
Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging.
Wang, Rui; Li, Rui; Lei, Yanyan; Zhu, Quing
2015-01-01
Support vector machine (SVM) is one of the most effective classification methods for cancer detection. The efficiency and quality of a SVM classifier depends strongly on several important features and a set of proper parameters. Here, a series of classification analyses, with one set of photoacoustic data from ovarian tissues ex vivo and a widely used breast cancer dataset- the Wisconsin Diagnostic Breast Cancer (WDBC), revealed the different accuracy of a SVM classification in terms of the number of features used and the parameters selected. A pattern recognition system is proposed by means of SVM-Recursive Feature Elimination (RFE) with the Radial Basis Function (RBF) kernel. To improve the effectiveness and robustness of the system, an optimized tuning ensemble algorithm called as SVM-RFE(C) with correlation filter was implemented to quantify feature and parameter information based on cross validation. The proposed algorithm is first demonstrated outperforming SVM-RFE on WDBC. Then the best accuracy of 94.643% and sensitivity of 94.595% were achieved when using SVM-RFE(C) to test 57 new PAT data from 19 patients. The experiment results show that the classifier constructed with SVM-RFE(C) algorithm is able to learn additional information from new data and has significant potential in ovarian cancer diagnosis.
NASA Astrophysics Data System (ADS)
Hu, Yan-Yan; Li, Dong-Sheng
2016-01-01
The hyperspectral images(HSI) consist of many closely spaced bands carrying the most object information. While due to its high dimensionality and high volume nature, it is hard to get satisfactory classification performance. In order to reduce HSI data dimensionality preparation for high classification accuracy, it is proposed to combine a band selection method of artificial immune systems (AIS) with a hybrid kernels support vector machine (SVM-HK) algorithm. In fact, after comparing different kernels for hyperspectral analysis, the approach mixed radial basis function kernel (RBF-K) with sigmoid kernel (Sig-K) and applied the optimized hybrid kernels in SVM classifiers. Then the SVM-HK algorithm used to induce the bands selection of an improved version of AIS. The AIS was composed of clonal selection and elite antibody mutation, including evaluation process with optional index factor (OIF). Experimental classification performance was on a San Diego Naval Base acquired by AVIRIS, the HRS dataset shows that the method is able to efficiently achieve bands redundancy removal while outperforming the traditional SVM classifier.
A comprehensive simulation study on classification of RNA-Seq data.
Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet
2017-01-01
RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.
Mediterranean Land Use and Land Cover Classification Assessment Using High Spatial Resolution Data
NASA Astrophysics Data System (ADS)
Elhag, Mohamed; Boteva, Silvena
2016-10-01
Landscape fragmentation is noticeably practiced in Mediterranean regions and imposes substantial complications in several satellite image classification methods. To some extent, high spatial resolution data were able to overcome such complications. For better classification performances in Land Use Land Cover (LULC) mapping, the current research adopts different classification methods comparison for LULC mapping using Sentinel-2 satellite as a source of high spatial resolution. Both of pixel-based and an object-based classification algorithms were assessed; the pixel-based approach employs Maximum Likelihood (ML), Artificial Neural Network (ANN) algorithms, Support Vector Machine (SVM), and, the object-based classification uses the Nearest Neighbour (NN) classifier. Stratified Masking Process (SMP) that integrates a ranking process within the classes based on spectral fluctuation of the sum of the training and testing sites was implemented. An analysis of the overall and individual accuracy of the classification results of all four methods reveals that the SVM classifier was the most efficient overall by distinguishing most of the classes with the highest accuracy. NN succeeded to deal with artificial surface classes in general while agriculture area classes, and forest and semi-natural area classes were segregated successfully with SVM. Furthermore, a comparative analysis indicates that the conventional classification method yielded better accuracy results than the SMP method overall with both classifiers used, ML and SVM.
Zhang, Jiang; Wang, James Z; Yuan, Zhen; Sobel, Eric S; Jiang, Huabei
2011-01-01
This study presents a computer-aided classification method to distinguish osteoarthritis finger joints from healthy ones based on the functional images captured by x-ray guided diffuse optical tomography. Three imaging features, joint space width, optical absorption, and scattering coefficients, are employed to train a Least Squares Support Vector Machine (LS-SVM) classifier for osteoarthritis classification. The 10-fold validation results show that all osteoarthritis joints are clearly identified and all healthy joints are ruled out by the LS-SVM classifier. The best sensitivity, specificity, and overall accuracy of the classification by experienced technicians based on manual calculation of optical properties and visual examination of optical images are only 85%, 93%, and 90%, respectively. Therefore, our LS-SVM based computer-aided classification is a considerably improved method for osteoarthritis diagnosis.
NASA Astrophysics Data System (ADS)
Leena, N.; Saju, K. K.
2018-04-01
Nutritional deficiencies in plants are a major concern for farmers as it affects productivity and thus profit. The work aims to classify nutritional deficiencies in maize plant in a non-destructive mannerusing image processing and machine learning techniques. The colored images of the leaves are analyzed and classified with multi-class support vector machine (SVM) method. Several images of maize leaves with known deficiencies like nitrogen, phosphorous and potassium (NPK) are used to train the SVM classifier prior to the classification of test images. The results show that the method was able to classify and identify nutritional deficiencies.
Lamb wave based damage detection using Matching Pursuit and Support Vector Machine classifier
NASA Astrophysics Data System (ADS)
Agarwal, Sushant; Mitra, Mira
2014-03-01
In this paper, the suitability of using Matching Pursuit (MP) and Support Vector Machine (SVM) for damage detection using Lamb wave response of thin aluminium plate is explored. Lamb wave response of thin aluminium plate with or without damage is simulated using finite element. Simulations are carried out at different frequencies for various kinds of damage. The procedure is divided into two parts - signal processing and machine learning. Firstly, MP is used for denoising and to maintain the sparsity of the dataset. In this study, MP is extended by using a combination of time-frequency functions as the dictionary and is deployed in two stages. Selection of a particular type of atoms lead to extraction of important features while maintaining the sparsity of the waveform. The resultant waveform is then passed as input data for SVM classifier. SVM is used to detect the location of the potential damage from the reduced data. The study demonstrates that SVM is a robust classifier in presence of noise and more efficient as compared to Artificial Neural Network (ANN). Out-of-sample data is used for the validation of the trained and tested classifier. Trained classifiers are found successful in detection of the damage with more than 95% detection rate.
A linear-RBF multikernel SVM to classify big text corpora.
Romero, R; Iglesias, E L; Borrajo, L
2015-01-01
Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers.
NASA Astrophysics Data System (ADS)
Quitadamo, L. R.; Cavrini, F.; Sbernini, L.; Riillo, F.; Bianchi, L.; Seri, S.; Saggio, G.
2017-02-01
Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported.
Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying
2015-01-01
Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.
2015-01-01
Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes. PMID:26681483
NASA Astrophysics Data System (ADS)
Rokni Deilmai, B.; Ahmad, B. Bin; Zabihi, H.
2014-06-01
Mapping is essential for the analysis of the land use and land cover, which influence many environmental processes and properties. For the purpose of the creation of land cover maps, it is important to minimize error. These errors will propagate into later analyses based on these land cover maps. The reliability of land cover maps derived from remotely sensed data depends on an accurate classification. In this study, we have analyzed multispectral data using two different classifiers including Maximum Likelihood Classifier (MLC) and Support Vector Machine (SVM). To pursue this aim, Landsat Thematic Mapper data and identical field-based training sample datasets in Johor Malaysia used for each classification method, which results indicate in five land cover classes forest, oil palm, urban area, water, rubber. Classification results indicate that SVM was more accurate than MLC. With demonstrated capability to produce reliable cover results, the SVM methods should be especially useful for land cover classification.
USDA-ARS?s Scientific Manuscript database
It is important to find an appropriate pattern-recognition method for in-field plant identification based on spectral measurement in order to classify the crop and weeds accurately. In this study, the method of Support Vector Machine (SVM) was evaluated and compared with two other methods, Decision ...
Comparison of water extraction methods in Tibet based on GF-1 data
NASA Astrophysics Data System (ADS)
Jia, Lingjun; Shang, Kun; Liu, Jing; Sun, Zhongqing
2018-03-01
In this study, we compared four different water extraction methods with GF-1 data according to different water types in Tibet, including Support Vector Machine (SVM), Principal Component Analysis (PCA), Decision Tree Classifier based on False Normalized Difference Water Index (FNDWI-DTC), and PCA-SVM. The results show that all of the four methods can extract large area water body, but only SVM and PCA-SVM can obtain satisfying extraction results for small size water body. The methods were evaluated by both overall accuracy (OAA) and Kappa coefficient (KC). The OAA of PCA-SVM, SVM, FNDWI-DTC, PCA are 96.68%, 94.23%, 93.99%, 93.01%, and the KCs are 0.9308, 0.8995, 0.8962, 0.8842, respectively, in consistent with visual inspection. In summary, SVM is better for narrow rivers extraction and PCA-SVM is suitable for water extraction of various types. As for dark blue lakes, the methods using PCA can extract more quickly and accurately.
Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading.
Sahran, Shahnorbanun; Albashish, Dheeb; Abdullah, Azizi; Shukor, Nordashima Abd; Hayati Md Pauzi, Suria
2018-04-18
Feature selection (FS) methods are widely used in grading and diagnosing prostate histopathological images. In this context, FS is based on the texture features obtained from the lumen, nuclei, cytoplasm and stroma, all of which are important tissue components. However, it is difficult to represent the high-dimensional textures of these tissue components. To solve this problem, we propose a new FS method that enables the selection of features with minimal redundancy in the tissue components. We categorise tissue images based on the texture of individual tissue components via the construction of a single classifier and also construct an ensemble learning model by merging the values obtained by each classifier. Another issue that arises is overfitting due to the high-dimensional texture of individual tissue components. We propose a new FS method, SVM-RFE(AC), that integrates a Support Vector Machine-Recursive Feature Elimination (SVM-RFE) embedded procedure with an absolute cosine (AC) filter method to prevent redundancy in the selected features of the SV-RFE and an unoptimised classifier in the AC. We conducted experiments on H&E histopathological prostate and colon cancer images with respect to three prostate classifications, namely benign vs. grade 3, benign vs. grade 4 and grade 3 vs. grade 4. The colon benchmark dataset requires a distinction between grades 1 and 2, which are the most difficult cases to distinguish in the colon domain. The results obtained by both the single and ensemble classification models (which uses the product rule as its merging method) confirm that the proposed SVM-RFE(AC) is superior to the other SVM and SVM-RFE-based methods. We developed an FS method based on SVM-RFE and AC and successfully showed that its use enabled the identification of the most crucial texture feature of each tissue component. Thus, it makes possible the distinction between multiple Gleason grades (e.g. grade 3 vs. grade 4) and its performance is far superior to other reported FS methods. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher
2012-10-01
Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.
Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.
Tuo, Youlin; An, Ning; Zhang, Ming
2018-03-01
The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.
Modeling Personalized Email Prioritization: Classification-based and Regression-based Approaches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo S.; Yang, Y.; Carbonell, J.
2011-10-24
Email overload, even after spam filtering, presents a serious productivity challenge for busy professionals and executives. One solution is automated prioritization of incoming emails to ensure the most important are read and processed quickly, while others are processed later as/if time permits in declining priority levels. This paper presents a study of machine learning approaches to email prioritization into discrete levels, comparing ordinal regression versus classier cascades. Given the ordinal nature of discrete email priority levels, SVM ordinal regression would be expected to perform well, but surprisingly a cascade of SVM classifiers significantly outperforms ordinal regression for email prioritization. Inmore » contrast, SVM regression performs well -- better than classifiers -- on selected UCI data sets. This unexpected performance inversion is analyzed and results are presented, providing core functionality for email prioritization systems.« less
USDA-ARS?s Scientific Manuscript database
Support Vector Machine (SVM) was used in the Genetic Algorithms (GA) process to select and classify a subset of hyperspectral image bands. The method was applied to fluorescence hyperspectral data for the detection of aflatoxin contamination in Aspergillus flavus infected single corn kernels. In the...
Applying machine-learning techniques to Twitter data for automatic hazard-event classification.
NASA Astrophysics Data System (ADS)
Filgueira, R.; Bee, E. J.; Diaz-Doce, D.; Poole, J., Sr.; Singh, A.
2017-12-01
The constant flow of information offered by tweets provides valuable information about all sorts of events at a high temporal and spatial resolution. Over the past year we have been analyzing in real-time geological hazards/phenomenon, such as earthquakes, volcanic eruptions, landslides, floods or the aurora, as part of the GeoSocial project, by geo-locating tweets filtered by keywords in a web-map. However, not all the filtered tweets are related with hazard/phenomenon events. This work explores two classification techniques for automatic hazard-event categorization based on tweets about the "Aurora". First, tweets were filtered using aurora-related keywords, removing stop words and selecting the ones written in English. For classifying the remaining between "aurora-event" or "no-aurora-event" categories, we compared two state-of-art techniques: Support Vector Machine (SVM) and Deep Convolutional Neural Networks (CNN) algorithms. Both approaches belong to the family of supervised learning algorithms, which make predictions based on labelled training dataset. Therefore, we created a training dataset by tagging 1200 tweets between both categories. The general form of SVM is used to separate two classes by a function (kernel). We compared the performance of four different kernels (Linear Regression, Logistic Regression, Multinomial Naïve Bayesian and Stochastic Gradient Descent) provided by Scikit-Learn library using our training dataset to build the SVM classifier. The results shown that the Logistic Regression (LR) gets the best accuracy (87%). So, we selected the SVM-LR classifier to categorise a large collection of tweets using the "dispel4py" framework.Later, we developed a CNN classifier, where the first layer embeds words into low-dimensional vectors. The next layer performs convolutions over the embedded word vectors. Results from the convolutional layer are max-pooled into a long feature vector, which is classified using a softmax layer. The CNN's accuracy is lower (83%) than the SVM-LR, since the algorithm needs a bigger training dataset to increase its accuracy. We used TensorFlow framework for applying CNN classifier to the same collection of tweets.In future we will modify both classifiers to work with other geo-hazards, use larger training datasets and apply them in real-time.
NASA Astrophysics Data System (ADS)
Li, Shao-Xin; Zeng, Qiu-Yao; Li, Lin-Fang; Zhang, Yan-Jiao; Wan, Ming-Ming; Liu, Zhi-Ming; Xiong, Hong-Lian; Guo, Zhou-Yi; Liu, Song-Hao
2013-02-01
The ability of combining serum surface-enhanced Raman spectroscopy (SERS) with support vector machine (SVM) for improving classification esophageal cancer patients from normal volunteers is investigated. Two groups of serum SERS spectra based on silver nanoparticles (AgNPs) are obtained: one group from patients with pathologically confirmed esophageal cancer (n=30) and the other group from healthy volunteers (n=31). Principal components analysis (PCA), conventional SVM (C-SVM) and conventional SVM combination with PCA (PCA-SVM) methods are implemented to classify the same spectral dataset. Results show that a diagnostic accuracy of 77.0% is acquired for PCA technique, while diagnostic accuracies of 83.6% and 85.2% are obtained for C-SVM and PCA-SVM methods based on radial basis functions (RBF) models. The results prove that RBF SVM models are superior to PCA algorithm in classification serum SERS spectra. The study demonstrates that serum SERS in combination with SVM technique has great potential to provide an effective and accurate diagnostic schema for noninvasive detection of esophageal cancer.
A SVM-based method for sentiment analysis in Persian language
NASA Astrophysics Data System (ADS)
Hajmohammadi, Mohammad Sadegh; Ibrahim, Roliana
2013-03-01
Persian language is the official language of Iran, Tajikistan and Afghanistan. Local online users often represent their opinions and experiences on the web with written Persian. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product. In this paper, standard machine learning techniques SVM and naive Bayes are incorporated into the domain of online Persian Movie reviews to automatically classify user reviews as positive or negative and performance of these two classifiers is compared with each other in this language. The effects of feature presentations on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The SVM classifier achieves as well as or better accuracy than naive Bayes in Persian movie. Unigrams are proved better features than bigrams and trigrams in capturing Persian sentiment orientation.
PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons.
Long, Yi; Du, Zhi-Jiang; Wang, Wei-Dong; Zhao, Guang-Yu; Xu, Guo-Qiang; He, Long; Mao, Xi-Wang; Dong, Wei
2016-09-02
Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM) optimized by particle swarm optimization (PSO) to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS) attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz), a three-layer wavelet packet analysis (WPA) is used for feature extraction, after which, the kernel principal component analysis (kPCA) is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA) is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance.
PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons
Long, Yi; Du, Zhi-Jiang; Wang, Wei-Dong; Zhao, Guang-Yu; Xu, Guo-Qiang; He, Long; Mao, Xi-Wang; Dong, Wei
2016-01-01
Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM) optimized by particle swarm optimization (PSO) to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS) attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz), a three-layer wavelet packet analysis (WPA) is used for feature extraction, after which, the kernel principal component analysis (kPCA) is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA) is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance. PMID:27598160
Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian
2016-01-01
In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%–19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides. PMID:27187430
Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian
2016-05-11
In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%-19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides.
An implementation of support vector machine on sentiment classification of movie reviews
NASA Astrophysics Data System (ADS)
Yulietha, I. M.; Faraby, S. A.; Adiwijaya; Widyaningtyas, W. C.
2018-03-01
With technological advances, all information about movie is available on the internet. If the information is processed properly, it will get the quality of the information. This research proposes to the classify sentiments on movie review documents. This research uses Support Vector Machine (SVM) method because it can classify high dimensional data in accordance with the data used in this research in the form of text. Support Vector Machine is a popular machine learning technique for text classification because it can classify by learning from a collection of documents that have been classified previously and can provide good result. Based on number of datasets, the 90-10 composition has the best result that is 85.6%. Based on SVM kernel, kernel linear with constant 1 has the best result that is 84.9%
Ozcift, Akin
2012-08-01
Parkinson disease (PD) is an age-related deterioration of certain nerve systems, which affects movement, balance, and muscle control of clients. PD is one of the common diseases which affect 1% of people older than 60 years. A new classification scheme based on support vector machine (SVM) selected features to train rotation forest (RF) ensemble classifiers is presented for improving diagnosis of PD. The dataset contains records of voice measurements from 31 people, 23 with PD and each record in the dataset is defined with 22 features. The diagnosis model first makes use of a linear SVM to select ten most relevant features from 22. As a second step of the classification model, six different classifiers are trained with the subset of features. Subsequently, at the third step, the accuracies of classifiers are improved by the utilization of RF ensemble classification strategy. The results of the experiments are evaluated using three metrics; classification accuracy (ACC), Kappa Error (KE) and Area under the Receiver Operating Characteristic (ROC) Curve (AUC). Performance measures of two base classifiers, i.e. KStar and IBk, demonstrated an apparent increase in PD diagnosis accuracy compared to similar studies in literature. After all, application of RF ensemble classification scheme improved PD diagnosis in 5 of 6 classifiers significantly. We, numerically, obtained about 97% accuracy in RF ensemble of IBk (a K-Nearest Neighbor variant) algorithm, which is a quite high performance for Parkinson disease diagnosis.
Approach to the problem of the parameters optimization of the shooting system
NASA Astrophysics Data System (ADS)
Demidova, L. A.; Sablina, V. A.; Sokolova, Y. S.
2018-02-01
The problem of the objects identification on the base of their hyperspectral features has been considered. It is offered to use the SVM classifiers’ ensembles, adapted to specifics of the problem of the objects identification on the base of their hyperspectral features. The results of the objects identification on the base of their hyperspectral features with using of the SVM classifiers have been presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Yongjun; Lim, Jonghyuck; Kim, Namkug
2013-05-15
Purpose: To investigate the effect of using different computed tomography (CT) scanners on the accuracy of high-resolution CT (HRCT) images in classifying regional disease patterns in patients with diffuse lung disease, support vector machine (SVM) and Bayesian classifiers were applied to multicenter data. Methods: Two experienced radiologists marked sets of 600 rectangular 20 Multiplication-Sign 20 pixel regions of interest (ROIs) on HRCT images obtained from two scanners (GE and Siemens), including 100 ROIs for each of local patterns of lungs-normal lung and five of regional pulmonary disease patterns (ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation). Each ROI was assessedmore » using 22 quantitative features belonging to one of the following descriptors: histogram, gradient, run-length, gray level co-occurrence matrix, low-attenuation area cluster, and top-hat transform. For automatic classification, a Bayesian classifier and a SVM classifier were compared under three different conditions. First, classification accuracies were estimated using data from each scanner. Next, data from the GE and Siemens scanners were used for training and testing, respectively, and vice versa. Finally, all ROI data were integrated regardless of the scanner type and were then trained and tested together. All experiments were performed based on forward feature selection and fivefold cross-validation with 20 repetitions. Results: For each scanner, better classification accuracies were achieved with the SVM classifier than the Bayesian classifier (92% and 82%, respectively, for the GE scanner; and 92% and 86%, respectively, for the Siemens scanner). The classification accuracies were 82%/72% for training with GE data and testing with Siemens data, and 79%/72% for the reverse. The use of training and test data obtained from the HRCT images of different scanners lowered the classification accuracy compared to the use of HRCT images from the same scanner. For integrated ROI data obtained from both scanners, the classification accuracies with the SVM and Bayesian classifiers were 92% and 77%, respectively. The selected features resulting from the classification process differed by scanner, with more features included for the classification of the integrated HRCT data than for the classification of the HRCT data from each scanner. For the integrated data, consisting of HRCT images of both scanners, the classification accuracy based on the SVM was statistically similar to the accuracy of the data obtained from each scanner. However, the classification accuracy of the integrated data using the Bayesian classifier was significantly lower than the classification accuracy of the ROI data of each scanner. Conclusions: The use of an integrated dataset along with a SVM classifier rather than a Bayesian classifier has benefits in terms of the classification accuracy of HRCT images acquired with more than one scanner. This finding is of relevance in studies involving large number of images, as is the case in a multicenter trial with different scanners.« less
A Support Vector Machine-Based Gender Identification Using Speech Signal
NASA Astrophysics Data System (ADS)
Lee, Kye-Hwan; Kang, Sang-Ick; Kim, Deok-Hwan; Chang, Joon-Hyuk
We propose an effective voice-based gender identification method using a support vector machine (SVM). The SVM is a binary classification algorithm that classifies two groups by finding the voluntary nonlinear boundary in a feature space and is known to yield high classification performance. In the present work, we compare the identification performance of the SVM with that of a Gaussian mixture model (GMM)-based method using the mel frequency cepstral coefficients (MFCC). A novel approach of incorporating a features fusion scheme based on a combination of the MFCC and the fundamental frequency is proposed with the aim of improving the performance of gender identification. Experimental results demonstrate that the gender identification performance using the SVM is significantly better than that of the GMM-based scheme. Moreover, the performance is substantially improved when the proposed features fusion technique is applied.
CNN-SVM for Microvascular Morphological Type Recognition with Data Augmentation.
Xue, Di-Xiu; Zhang, Rong; Feng, Hui; Wang, Ya-Lei
2016-01-01
This paper focuses on the problem of feature extraction and the classification of microvascular morphological types to aid esophageal cancer detection. We present a patch-based system with a hybrid SVM model with data augmentation for intraepithelial papillary capillary loop recognition. A greedy patch-generating algorithm and a specialized CNN named NBI-Net are designed to extract hierarchical features from patches. We investigate a series of data augmentation techniques to progressively improve the prediction invariance of image scaling and rotation. For classifier boosting, SVM is used as an alternative to softmax to enhance generalization ability. The effectiveness of CNN feature representation ability is discussed for a set of widely used CNN models, including AlexNet, VGG-16, and GoogLeNet. Experiments are conducted on the NBI-ME dataset. The recognition rate is up to 92.74% on the patch level with data augmentation and classifier boosting. The results show that the combined CNN-SVM model beats models of traditional features with SVM as well as the original CNN with softmax. The synthesis results indicate that our system is able to assist clinical diagnosis to a certain extent.
NASA Astrophysics Data System (ADS)
Li, Shaoxin; Zhang, Yanjiao; Xu, Junfa; Li, Linfang; Zeng, Qiuyao; Lin, Lin; Guo, Zhouyi; Liu, Zhiming; Xiong, Honglian; Liu, Songhao
2014-09-01
This study aims to present a noninvasive prostate cancer screening methods using serum surface-enhanced Raman scattering (SERS) and support vector machine (SVM) techniques through peripheral blood sample. SERS measurements are performed using serum samples from 93 prostate cancer patients and 68 healthy volunteers by silver nanoparticles. Three types of kernel functions including linear, polynomial, and Gaussian radial basis function (RBF) are employed to build SVM diagnostic models for classifying measured SERS spectra. For comparably evaluating the performance of SVM classification models, the standard multivariate statistic analysis method of principal component analysis (PCA) is also applied to classify the same datasets. The study results show that for the RBF kernel SVM diagnostic model, the diagnostic accuracy of 98.1% is acquired, which is superior to the results of 91.3% obtained from PCA methods. The receiver operating characteristic curve of diagnostic models further confirm above research results. This study demonstrates that label-free serum SERS analysis technique combined with SVM diagnostic algorithm has great potential for noninvasive prostate cancer screening.
Sriwastava, Brijesh Kumar; Basu, Subhadip; Maulik, Ujjwal
2015-10-01
Protein-protein interaction (PPI) site prediction aids to ascertain the interface residues that participate in interaction processes. Fuzzy support vector machine (F-SVM) is proposed as an effective method to solve this problem, and we have shown that the performance of the classical SVM can be enhanced with the help of an interaction-affinity based fuzzy membership function. The performances of both SVM and F-SVM on the PPI databases of the Homo sapiens and E. coli organisms are evaluated and estimated the statistical significance of the developed method over classical SVM and other fuzzy membership-based SVM methods available in the literature. Our membership function uses the residue-level interaction affinity scores for each pair of positive and negative sequence fragments. The average AUC scores in the 10-fold cross-validation experiments are measured as 79.94% and 80.48% for the Homo sapiens and E. coli organisms respectively. On the independent test datasets, AUC scores are obtained as 76.59% and 80.17% respectively for the two organisms. In almost all cases, the developed F-SVM method improves the performances obtained by the corresponding classical SVM and the other classifiers, available in the literature.
Wen, Tingxi; Zhang, Zhongnan; Qiu, Ming; Zeng, Ming; Luo, Weizhen
2017-01-01
The computer mouse is an important human-computer interaction device. But patients with physical finger disability are unable to operate this device. Surface EMG (sEMG) can be monitored by electrodes on the skin surface and is a reflection of the neuromuscular activities. Therefore, we can control limbs auxiliary equipment by utilizing sEMG classification in order to help the physically disabled patients to operate the mouse. To develop a new a method to extract sEMG generated by finger motion and apply novel features to classify sEMG. A window-based data acquisition method was presented to extract signal samples from sEMG electordes. Afterwards, a two-dimensional matrix image based feature extraction method, which differs from the classical methods based on time domain or frequency domain, was employed to transform signal samples to feature maps used for classification. In the experiments, sEMG data samples produced by the index and middle fingers at the click of a mouse button were separately acquired. Then, characteristics of the samples were analyzed to generate a feature map for each sample. Finally, the machine learning classification algorithms (SVM, KNN, RBF-NN) were employed to classify these feature maps on a GPU. The study demonstrated that all classifiers can identify and classify sEMG samples effectively. In particular, the accuracy of the SVM classifier reached up to 100%. The signal separation method is a convenient, efficient and quick method, which can effectively extract the sEMG samples produced by fingers. In addition, unlike the classical methods, the new method enables to extract features by enlarging sample signals' energy appropriately. The classical machine learning classifiers all performed well by using these features.
Testing of the Support Vector Machine for Binary-Class Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew
2011-01-01
The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results
Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong
2015-09-01
Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.
Construction accident narrative classification: An evaluation of text mining techniques.
Goh, Yang Miang; Ubeynarayana, C U
2017-11-01
Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin
2010-12-01
We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
Padma, A; Sukanesh, R
2013-01-01
A computer software system is designed for the segmentation and classification of benign from malignant tumour slices in brain computed tomography (CT) images. This paper presents a method to find and select both the dominant run length and co-occurrence texture features of region of interest (ROI) of the tumour region of each slice to be segmented by Fuzzy c means clustering (FCM) and evaluate the performance of support vector machine (SVM)-based classifiers in classifying benign and malignant tumour slices. Two hundred and six tumour confirmed CT slices are considered in this study. A total of 17 texture features are extracted by a feature extraction procedure, and six features are selected using Principal Component Analysis (PCA). This study constructed the SVM-based classifier with the selected features and by comparing the segmentation results with the experienced radiologist labelled ground truth (target). Quantitative analysis between ground truth and segmented tumour is presented in terms of segmentation accuracy, segmentation error and overlap similarity measures such as the Jaccard index. The classification performance of the SVM-based classifier with the same selected features is also evaluated using a 10-fold cross-validation method. The proposed system provides some newly found texture features have an important contribution in classifying benign and malignant tumour slices efficiently and accurately with less computational time. The experimental results showed that the proposed system is able to achieve the highest segmentation and classification accuracy effectiveness as measured by jaccard index and sensitivity and specificity.
Feng, Zhichao; Rong, Pengfei; Cao, Peng; Zhou, Qingyu; Zhu, Wenwei; Yan, Zhimin; Liu, Qianyun; Wang, Wei
2018-04-01
To evaluate the diagnostic performance of machine-learning based quantitative texture analysis of CT images to differentiate small (≤ 4 cm) angiomyolipoma without visible fat (AMLwvf) from renal cell carcinoma (RCC). This single-institutional retrospective study included 58 patients with pathologically proven small renal mass (17 in AMLwvf and 41 in RCC groups). Texture features were extracted from the largest possible tumorous regions of interest (ROIs) by manual segmentation in preoperative three-phase CT images. Interobserver reliability and the Mann-Whitney U test were applied to select features preliminarily. Then support vector machine with recursive feature elimination (SVM-RFE) and synthetic minority oversampling technique (SMOTE) were adopted to establish discriminative classifiers, and the performance of classifiers was assessed. Of the 42 extracted features, 16 candidate features showed significant intergroup differences (P < 0.05) and had good interobserver agreement. An optimal feature subset including 11 features was further selected by the SVM-RFE method. The SVM-RFE+SMOTE classifier achieved the best performance in discriminating between small AMLwvf and RCC, with the highest accuracy, sensitivity, specificity and AUC of 93.9 %, 87.8 %, 100 % and 0.955, respectively. Machine learning analysis of CT texture features can facilitate the accurate differentiation of small AMLwvf from RCC. • Although conventional CT is useful for diagnosis of SRMs, it has limitations. • Machine-learning based CT texture analysis facilitate differentiation of small AMLwvf from RCC. • The highest accuracy of SVM-RFE+SMOTE classifier reached 93.9 %. • Texture analysis combined with machine-learning methods might spare unnecessary surgery for AMLwvf.
Cancer survival classification using integrated data sets and intermediate information.
Kim, Shinuk; Park, Taesung; Kon, Mark
2014-09-01
Although numerous studies related to cancer survival have been published, increasing the prediction accuracy of survival classes still remains a challenge. Integration of different data sets, such as microRNA (miRNA) and mRNA, might increase the accuracy of survival class prediction. Therefore, we suggested a machine learning (ML) approach to integrate different data sets, and developed a novel method based on feature selection with Cox proportional hazard regression model (FSCOX) to improve the prediction of cancer survival time. FSCOX provides us with intermediate survival information, which is usually discarded when separating survival into 2 groups (short- and long-term), and allows us to perform survival analysis. We used an ML-based protocol for feature selection, integrating information from miRNA and mRNA expression profiles at the feature level. To predict survival phenotypes, we used the following classifiers, first, existing ML methods, support vector machine (SVM) and random forest (RF), second, a new median-based classifier using FSCOX (FSCOX_median), and third, an SVM classifier using FSCOX (FSCOX_SVM). We compared these methods using 3 types of cancer tissue data sets: (i) miRNA expression, (ii) mRNA expression, and (iii) combined miRNA and mRNA expression. The latter data set included features selected either from the combined miRNA/mRNA profile or independently from miRNAs and mRNAs profiles (IFS). In the ovarian data set, the accuracy of survival classification using the combined miRNA/mRNA profiles with IFS was 75% using RF, 86.36% using SVM, 84.09% using FSCOX_median, and 88.64% using FSCOX_SVM with a balanced 22 short-term and 22 long-term survivor data set. These accuracies are higher than those using miRNA alone (70.45%, RF; 75%, SVM; 75%, FSCOX_median; and 75%, FSCOX_SVM) or mRNA alone (65.91%, RF; 63.64%, SVM; 72.73%, FSCOX_median; and 70.45%, FSCOX_SVM). Similarly in the glioblastoma multiforme data, the accuracy of miRNA/mRNA using IFS was 75.51% (RF), 87.76% (SVM) 85.71% (FSCOX_median), 85.71% (FSCOX_SVM). These results are higher than the results of using miRNA expression and mRNA expression alone. In addition we predict 16 hsa-miR-23b and hsa-miR-27b target genes in ovarian cancer data sets, obtained by SVM-based feature selection through integration of sequence information and gene expression profiles. Among the approaches used, the integrated miRNA and mRNA data set yielded better results than the individual data sets. The best performance was achieved using the FSCOX_SVM method with independent feature selection, which uses intermediate survival information between short-term and long-term survival time and the combination of the 2 different data sets. The results obtained using the combined data set suggest that there are some strong interactions between miRNA and mRNA features that are not detectable in the individual analyses. Copyright © 2014 Elsevier B.V. All rights reserved.
Mapping membrane activity in undiscovered peptide sequence space using machine learning
Fulan, Benjamin M.; Wong, Gerard C. L.
2016-01-01
There are some ∼1,100 known antimicrobial peptides (AMPs), which permeabilize microbial membranes but have diverse sequences. Here, we develop a support vector machine (SVM)-based classifier to investigate ⍺-helical AMPs and the interrelated nature of their functional commonality and sequence homology. SVM is used to search the undiscovered peptide sequence space and identify Pareto-optimal candidates that simultaneously maximize the distance σ from the SVM hyperplane (thus maximize its “antimicrobialness”) and its ⍺-helicity, but minimize mutational distance to known AMPs. By calibrating SVM machine learning results with killing assays and small-angle X-ray scattering (SAXS), we find that the SVM metric σ correlates not with a peptide’s minimum inhibitory concentration (MIC), but rather its ability to generate negative Gaussian membrane curvature. This surprising result provides a topological basis for membrane activity common to AMPs. Moreover, we highlight an important distinction between the maximal recognizability of a sequence to a trained AMP classifier (its ability to generate membrane curvature) and its maximal antimicrobial efficacy. As mutational distances are increased from known AMPs, we find AMP-like sequences that are increasingly difficult for nature to discover via simple mutation. Using the sequence map as a discovery tool, we find a unexpectedly diverse taxonomy of sequences that are just as membrane-active as known AMPs, but with a broad range of primary functions distinct from AMP functions, including endogenous neuropeptides, viral fusion proteins, topogenic peptides, and amyloids. The SVM classifier is useful as a general detector of membrane activity in peptide sequences. PMID:27849600
Support vector machine as a binary classifier for automated object detection in remotely sensed data
NASA Astrophysics Data System (ADS)
Wardaya, P. D.
2014-02-01
In the present paper, author proposes the application of Support Vector Machine (SVM) for the analysis of satellite imagery. One of the advantages of SVM is that, with limited training data, it may generate comparable or even better results than the other methods. The SVM algorithm is used for automated object detection and characterization. Specifically, the SVM is applied in its basic nature as a binary classifier where it classifies two classes namely, object and background. The algorithm aims at effectively detecting an object from its background with the minimum training data. The synthetic image containing noises is used for algorithm testing. Furthermore, it is implemented to perform remote sensing image analysis such as identification of Island vegetation, water body, and oil spill from the satellite imagery. It is indicated that SVM provides the fast and accurate analysis with the acceptable result.
Protein classification based on text document classification techniques.
Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith
2005-03-01
The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.
Wissel, Tobias; Pfeiffer, Tim; Frysch, Robert; Knight, Robert T.; Chang, Edward F.; Hinrichs, Hermann; Rieger, Jochem W.; Rose, Georg
2013-01-01
Objective Support Vector Machines (SVM) have developed into a gold standard for accurate classification in Brain-Computer-Interfaces (BCI). The choice of the most appropriate classifier for a particular application depends on several characteristics in addition to decoding accuracy. Here we investigate the implementation of Hidden Markov Models (HMM)for online BCIs and discuss strategies to improve their performance. Approach We compare the SVM, serving as a reference, and HMMs for classifying discrete finger movements obtained from the Electrocorticograms of four subjects doing a finger tapping experiment. The classifier decisions are based on a subset of low-frequency time domain and high gamma oscillation features. Main results We show that decoding optimization between the two approaches is due to the way features are extracted and selected and less dependent on the classifier. An additional gain in HMM performance of up to 6% was obtained by introducing model constraints. Comparable accuracies of up to 90% were achieved with both SVM and HMM with the high gamma cortical response providing the most important decoding information for both techniques. Significance We discuss technical HMM characteristics and adaptations in the context of the presented data as well as for general BCI applications. Our findings suggest that HMMs and their characteristics are promising for efficient online brain-computer interfaces. PMID:24045504
Extraction of prostatic lumina and automated recognition for prostatic calculus image using PCA-SVM.
Wang, Zhuocai; Xu, Xiangmin; Ding, Xiaojun; Xiao, Hui; Huang, Yusheng; Liu, Jian; Xing, Xiaofen; Wang, Hua; Liao, D Joshua
2011-01-01
Identification of prostatic calculi is an important basis for determining the tissue origin. Computation-assistant diagnosis of prostatic calculi may have promising potential but is currently still less studied. We studied the extraction of prostatic lumina and automated recognition for calculus images. Extraction of lumina from prostate histology images was based on local entropy and Otsu threshold recognition using PCA-SVM and based on the texture features of prostatic calculus. The SVM classifier showed an average time 0.1432 second, an average training accuracy of 100%, an average test accuracy of 93.12%, a sensitivity of 87.74%, and a specificity of 94.82%. We concluded that the algorithm, based on texture features and PCA-SVM, can recognize the concentric structure and visualized features easily. Therefore, this method is effective for the automated recognition of prostatic calculi.
Meier, Timothy B.; Desphande, Alok S.; Vergun, Svyatoslav; Nair, Veena A.; Song, Jie; Biswal, Bharat B.; Meyerand, Mary E.; Birn, Rasmus M.; Prabhakaran, Vivek
2012-01-01
Most of what is known about the reorganization of functional brain networks that accompanies normal aging is based on neuroimaging studies in which participants perform specific tasks. In these studies, reorganization is defined by the differences in task activation between young and old adults. However, task activation differences could be the result of differences in task performance, strategy, or motivation, and not necessarily reflect reorganization. Resting-state fMRI provides a method of investigating functional brain networks without such confounds. Here, a support vector machine (SVM) classifier was used in an attempt to differentiate older adults from younger adults based on their resting-state functional connectivity. In addition, the information used by the SVM was investigated to see what functional connections best differentiated younger adult brains from older adult brains. Three separate resting-state scans from 26 younger adults (18-35 yrs) and 26 older adults (55-85) were obtained from the International Consortium for Brain Mapping (ICBM) dataset made publically available in the 1000 Functional Connectomes project www.nitrc.org/projects/fcon_1000. 100 seed-regions from four functional networks with 5 mm3 radius were defined based on a recent study using machine learning classifiers on adolescent brains. Time-series for every seed-region were averaged and three matrices of z-transformed correlation coefficients were created for each subject corresponding to each individual’s three resting-state scans. SVM was then applied using leave-one-out cross-validation. The SVM classifier was 84% accurate in classifying older and younger adult brains. The majority of the connections used by the classifier to distinguish subjects by age came from seed-regions belonging to the sensorimotor and cingulo-opercular networks. These results suggest that age-related decreases in positive correlations within the cingulo-opercular and default networks, and decreases in negative correlations between the default and sensorimotor networks, are the distinguishing characteristics of age-related reorganization. PMID:22227886
Meier, Timothy B; Desphande, Alok S; Vergun, Svyatoslav; Nair, Veena A; Song, Jie; Biswal, Bharat B; Meyerand, Mary E; Birn, Rasmus M; Prabhakaran, Vivek
2012-03-01
Most of what is known about the reorganization of functional brain networks that accompanies normal aging is based on neuroimaging studies in which participants perform specific tasks. In these studies, reorganization is defined by the differences in task activation between young and old adults. However, task activation differences could be the result of differences in task performance, strategy, or motivation, and not necessarily reflect reorganization. Resting-state fMRI provides a method of investigating functional brain networks without such confounds. Here, a support vector machine (SVM) classifier was used in an attempt to differentiate older adults from younger adults based on their resting-state functional connectivity. In addition, the information used by the SVM was investigated to see what functional connections best differentiated younger adult brains from older adult brains. Three separate resting-state scans from 26 younger adults (18-35 yrs) and 26 older adults (55-85) were obtained from the International Consortium for Brain Mapping (ICBM) dataset made publically available in the 1000 Functional Connectomes project www.nitrc.org/projects/fcon_1000. 100 seed-regions from four functional networks with 5mm(3) radius were defined based on a recent study using machine learning classifiers on adolescent brains. Time-series for every seed-region were averaged and three matrices of z-transformed correlation coefficients were created for each subject corresponding to each individual's three resting-state scans. SVM was then applied using leave-one-out cross-validation. The SVM classifier was 84% accurate in classifying older and younger adult brains. The majority of the connections used by the classifier to distinguish subjects by age came from seed-regions belonging to the sensorimotor and cingulo-opercular networks. These results suggest that age-related decreases in positive correlations within the cingulo-opercular and default networks, and decreases in negative correlations between the default and sensorimotor networks, are the distinguishing characteristics of age-related reorganization. Copyright © 2011 Elsevier Inc. All rights reserved.
Guinness, Robert E
2015-04-28
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.
Guinness, Robert E.
2015-01-01
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity. PMID:25928060
Onboard Classifiers for Science Event Detection on a Remote Sensing Spacecraft
NASA Technical Reports Server (NTRS)
Castano, Rebecca; Mazzoni, Dominic; Tang, Nghia; Greeley, Ron; Doggett, Thomas; Cichy, Ben; Chien, Steve; Davies, Ashley
2006-01-01
Typically, data collected by a spacecraft is downlinked to Earth and pre-processed before any analysis is performed. We have developed classifiers that can be used onboard a spacecraft to identify high priority data for downlink to Earth, providing a method for maximizing the use of a potentially bandwidth limited downlink channel. Onboard analysis can also enable rapid reaction to dynamic events, such as flooding, volcanic eruptions or sea ice break-up. Four classifiers were developed to identify cryosphere events using hyperspectral images. These classifiers include a manually constructed classifier, a Support Vector Machine (SVM), a Decision Tree and a classifier derived by searching over combinations of thresholded band ratios. Each of the classifiers was designed to run in the computationally constrained operating environment of the spacecraft. A set of scenes was hand-labeled to provide training and testing data. Performance results on the test data indicate that the SVM and manual classifiers outperformed the Decision Tree and band-ratio classifiers with the SVM yielding slightly better classifications than the manual classifier.
Applications of Support Vector Machines In Chemo And Bioinformatics
NASA Astrophysics Data System (ADS)
Jayaraman, V. K.; Sundararajan, V.
2010-10-01
Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.
Soft Computing Application in Fault Detection of Induction Motor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Konar, P.; Puhan, P. S.; Chattopadhyay, P. Dr.
2010-10-26
The paper investigates the effectiveness of different patter classifier like Feed Forward Back Propagation (FFBPN), Radial Basis Function (RBF) and Support Vector Machine (SVM) for detection of bearing faults in Induction Motor. The steady state motor current with Park's Transformation has been used for discrimination of inner race and outer race bearing defects. The RBF neural network shows very encouraging results for multi-class classification problems and is hoped to set up a base for incipient fault detection of induction motor. SVM is also found to be a very good fault classifier which is highly competitive with RBF.
New support vector machine-based method for microRNA target prediction.
Li, L; Gao, Q; Mao, X; Cao, Y
2014-06-09
MicroRNA (miRNA) plays important roles in cell differentiation, proliferation, growth, mobility, and apoptosis. An accurate list of precise target genes is necessary in order to fully understand the importance of miRNAs in animal development and disease. Several computational methods have been proposed for miRNA target-gene identification. However, these methods still have limitations with respect to their sensitivity and accuracy. Thus, we developed a new miRNA target-prediction method based on the support vector machine (SVM) model. The model supplies information of two binding sites (primary and secondary) for a radial basis function kernel as a similarity measure for SVM features. The information is categorized based on structural, thermodynamic, and sequence conservation. Using high-confidence datasets selected from public miRNA target databases, we obtained a human miRNA target SVM classifier model with high performance and provided an efficient tool for human miRNA target gene identification. Experiments have shown that our method is a reliable tool for miRNA target-gene prediction, and a successful application of an SVM classifier. Compared with other methods, the method proposed here improves the sensitivity and accuracy of miRNA prediction. Its performance can be further improved by providing more training examples.
NASA Astrophysics Data System (ADS)
Kale, Mandar; Mukhopadhyay, Sudipta; Dash, Jatindra K.; Garg, Mandeep; Khandelwal, Niranjan
2016-03-01
Interstitial lung disease (ILD) is complicated group of pulmonary disorders. High Resolution Computed Tomography (HRCT) considered to be best imaging technique for analysis of different pulmonary disorders. HRCT findings can be categorised in several patterns viz. Consolidation, Emphysema, Ground Glass Opacity, Nodular, Normal etc. based on their texture like appearance. Clinician often find it difficult to diagnosis these pattern because of their complex nature. In such scenario computer-aided diagnosis system could help clinician to identify patterns. Several approaches had been proposed for classification of ILD patterns. This includes computation of textural feature and training /testing of classifier such as artificial neural network (ANN), support vector machine (SVM) etc. In this paper, wavelet features are calculated from two different ILD database, publically available MedGIFT ILD database and private ILD database, followed by performance evaluation of ANN and SVM classifiers in terms of average accuracy. It is found that average classification accuracy by SVM is greater than ANN where trained and tested on same database. Investigation continued further to test variation in accuracy of classifier when training and testing is performed with alternate database and training and testing of classifier with database formed by merging samples from same class from two individual databases. The average classification accuracy drops when two independent databases used for training and testing respectively. There is significant improvement in average accuracy when classifiers are trained and tested with merged database. It infers dependency of classification accuracy on training data. It is observed that SVM outperforms ANN when same database is used for training and testing.
Frick, Andreas; Gingnell, Malin; Marquand, Andre F.; Howner, Katarina; Fischer, Håkan; Kristiansson, Marianne; Williams, Steven C.R.; Fredrikson, Mats; Furmark, Tomas
2014-01-01
Functional neuroimaging of social anxiety disorder (SAD) support altered neural activation to threat-provoking stimuli focally in the fear network, while structural differences are distributed over the temporal and frontal cortices as well as limbic structures. Previous neuroimaging studies have investigated the brain at the voxel level using mass-univariate methods which do not enable detection of more complex patterns of activity and structural alterations that may separate SAD from healthy individuals. Support vector machine (SVM) is a supervised machine learning method that capitalizes on brain activation and structural patterns to classify individuals. The aim of this study was to investigate if it is possible to discriminate SAD patients (n = 14) from healthy controls (n = 12) using SVM based on (1) functional magnetic resonance imaging during fearful face processing and (2) regional gray matter volume. Whole brain and region of interest (fear network) SVM analyses were performed for both modalities. For functional scans, significant classifications were obtained both at whole brain level and when restricting the analysis to the fear network while gray matter SVM analyses correctly classified participants only when using the whole brain search volume. These results support that SAD is characterized by aberrant neural activation to affective stimuli in the fear network, while disorder-related alterations in regional gray matter volume are more diffusely distributed over the whole brain. SVM may thus be useful for identifying imaging biomarkers of SAD. PMID:24239689
Automatic threshold selection for multi-class open set recognition
NASA Astrophysics Data System (ADS)
Scherreik, Matthew; Rigling, Brian
2017-05-01
Multi-class open set recognition is the problem of supervised classification with additional unknown classes encountered after a model has been trained. An open set classifer often has two core components. The first component is a base classifier which estimates the most likely class of a given example. The second component consists of open set logic which estimates if the example is truly a member of the candidate class. Such a system is operated in a feed-forward fashion. That is, a candidate label is first estimated by the base classifier, and the true membership of the example to the candidate class is estimated afterward. Previous works have developed an iterative threshold selection algorithm for rejecting examples from classes which were not present at training time. In those studies, a Platt-calibrated SVM was used as the base classifier, and the thresholds were applied to class posterior probabilities for rejection. In this work, we investigate the effectiveness of other base classifiers when paired with the threshold selection algorithm and compare their performance with the original SVM solution.
Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals
2012-09-30
floor 1176 Howell St Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR...multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et al. 2008). This classifier both detects and classifies echolocation...whales. Here Moretti’s group, especially S. Jarvis , will improve the SVM classifier by resolving confusion between species whose clicks overlap in
Combined data mining/NIR spectroscopy for purity assessment of lime juice
NASA Astrophysics Data System (ADS)
Shafiee, Sahameh; Minaei, Saeid
2018-06-01
This paper reports the data mining study on the NIR spectrum of lime juice samples to determine their purity (natural or synthetic). NIR spectra for 72 pure and synthetic lime juice samples were recorded in reflectance mode. Sample outliers were removed using PCA analysis. Different data mining techniques for feature selection (Genetic Algorithm (GA)) and classification (including the radial basis function (RBF) network, Support Vector Machine (SVM), and Random Forest (RF) tree) were employed. Based on the results, SVM proved to be the most accurate classifier as it achieved the highest accuracy (97%) using the raw spectrum information. The classifier accuracy dropped to 93% when selected feature vector by GA search method was applied as classifier input. It can be concluded that some relevant features which produce good performance with the SVM classifier are removed by feature selection. Also, reduced spectra using PCA do not show acceptable performance (total accuracy of 66% by RBFNN), which indicates that dimensional reduction methods such as PCA do not always lead to more accurate results. These findings demonstrate the potential of data mining combination with near-infrared spectroscopy for monitoring lime juice quality in terms of natural or synthetic nature.
Extraction of Prostatic Lumina and Automated Recognition for Prostatic Calculus Image Using PCA-SVM
Wang, Zhuocai; Xu, Xiangmin; Ding, Xiaojun; Xiao, Hui; Huang, Yusheng; Liu, Jian; Xing, Xiaofen; Wang, Hua; Liao, D. Joshua
2011-01-01
Identification of prostatic calculi is an important basis for determining the tissue origin. Computation-assistant diagnosis of prostatic calculi may have promising potential but is currently still less studied. We studied the extraction of prostatic lumina and automated recognition for calculus images. Extraction of lumina from prostate histology images was based on local entropy and Otsu threshold recognition using PCA-SVM and based on the texture features of prostatic calculus. The SVM classifier showed an average time 0.1432 second, an average training accuracy of 100%, an average test accuracy of 93.12%, a sensitivity of 87.74%, and a specificity of 94.82%. We concluded that the algorithm, based on texture features and PCA-SVM, can recognize the concentric structure and visualized features easily. Therefore, this method is effective for the automated recognition of prostatic calculi. PMID:21461364
NASA Astrophysics Data System (ADS)
Lee, Youngjoo; Seo, Joon Beom; Kang, Bokyoung; Kim, Dongil; Lee, June Goo; Kim, Song Soo; Kim, Namkug; Kang, Suk Ho
2007-03-01
The performance of classification algorithms for differentiating among obstructive lung diseases based on features from texture analysis using HRCT (High Resolution Computerized Tomography) images was compared. HRCT can provide accurate information for the detection of various obstructive lung diseases, including centrilobular emphysema, panlobular emphysema and bronchiolitis obliterans. Features on HRCT images can be subtle, however, particularly in the early stages of disease, and image-based diagnosis is subject to inter-observer variation. To automate the diagnosis and improve the accuracy, we compared three types of automated classification systems, naÃve Bayesian classifier, ANN (Artificial Neural Net) and SVM (Support Vector Machine), based on their ability to differentiate among normal lung and three types of obstructive lung diseases. To assess the performance and cross-validation of these three classifiers, 5 folding methods with 5 randomly chosen groups were used. For a more robust result, each validation was repeated 100 times. SVM showed the best performance, with 86.5% overall sensitivity, significantly different from the other classifiers (one way ANOVA, p<0.01). We address the characteristics of each classifier affecting performance and the issue of which classifier is the most suitable for clinical applications, and propose an appropriate method to choose the best classifier and determine its optimal parameters for optimal disease discrimination. These results can be applied to classifiers for differentiation of other diseases.
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆
Cao, Houwei; Verma, Ragini; Nenkova, Ani
2014-01-01
We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion. PMID:25422534
Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆
Cao, Houwei; Verma, Ragini; Nenkova, Ani
2015-01-01
We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.
Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin
2017-01-01
Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization. PMID:28599282
Zhang, Xin; Yan, Lin-Feng; Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin
2017-07-18
Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization.
Rajagopal, Rekha; Ranganathan, Vidhyapriya
2018-06-05
Automation in cardiac arrhythmia classification helps medical professionals make accurate decisions about the patient's health. The aim of this work was to design a hybrid classification model to classify cardiac arrhythmias. The design phase of the classification model comprises the following stages: preprocessing of the cardiac signal by eliminating detail coefficients that contain noise, feature extraction through Daubechies wavelet transform, and arrhythmia classification using a collaborative decision from the K nearest neighbor classifier (KNN) and a support vector machine (SVM). The proposed model is able to classify 5 arrhythmia classes as per the ANSI/AAMI EC57: 1998 classification standard. Level 1 of the proposed model involves classification using the KNN and the classifier is trained with examples from all classes. Level 2 involves classification using an SVM and is trained specifically to classify overlapped classes. The final classification of a test heartbeat pertaining to a particular class is done using the proposed KNN/SVM hybrid model. The experimental results demonstrated that the average sensitivity of the proposed model was 92.56%, the average specificity 99.35%, the average positive predictive value 98.13%, the average F-score 94.5%, and the average accuracy 99.78%. The results obtained using the proposed model were compared with the results of discriminant, tree, and KNN classifiers. The proposed model is able to achieve a high classification accuracy.
Bowd, Christopher; Medeiros, Felipe A.; Zhang, Zuohua; Zangwill, Linda M.; Hao, Jiucang; Lee, Te-Won; Sejnowski, Terrence J.; Weinreb, Robert N.; Goldbaum, Michael H.
2010-01-01
Purpose To classify healthy and glaucomatous eyes using relevance vector machine (RVM) and support vector machine (SVM) learning classifiers trained on retinal nerve fiber layer (RNFL) thickness measurements obtained by scanning laser polarimetry (SLP). Methods Seventy-two eyes of 72 healthy control subjects (average age = 64.3 ± 8.8 years, visual field mean deviation =−0.71 ± 1.2 dB) and 92 eyes of 92 patients with glaucoma (average age = 66.9 ± 8.9 years, visual field mean deviation =−5.32 ± 4.0 dB) were imaged with SLP with variable corneal compensation (GDx VCC; Laser Diagnostic Technologies, San Diego, CA). RVM and SVM learning classifiers were trained and tested on SLP-determined RNFL thickness measurements from 14 standard parameters and 64 sectors (approximately 5.6° each) obtained in the circumpapillary area under the instrument-defined measurement ellipse (total 78 parameters). Tenfold cross-validation was used to train and test RVM and SVM classifiers on unique subsets of the full 164-eye data set and areas under the receiver operating characteristic (AUROC) curve for the classification of eyes in the test set were generated. AUROC curve results from RVM and SVM were compared to those for 14 SLP software-generated global and regional RNFL thickness parameters. Also reported was the AUROC curve for the GDx VCC software-generated nerve fiber indicator (NFI). Results The AUROC curves for RVM and SVM were 0.90 and 0.91, respectively, and increased to 0.93 and 0.94 when the training sets were optimized with sequential forward and backward selection (resulting in reduced dimensional data sets). AUROC curves for optimized RVM and SVM were significantly larger than those for all individual SLP parameters. The AUROC curve for the NFI was 0.87. Conclusions Results from RVM and SVM trained on SLP RNFL thickness measurements are similar and provide accurate classification of glaucomatous and healthy eyes. RVM may be preferable to SVM, because it provides a Bayesian-derived probability of glaucoma as an output. These results suggest that these machine learning classifiers show good potential for glaucoma diagnosis. PMID:15790898
Machine learning approach to automatic exudate detection in retinal images from diabetic patients
NASA Astrophysics Data System (ADS)
Sopharak, Akara; Dailey, Matthew N.; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Tom; Thet Nwe, Khine; Aye Moe, Yin
2010-01-01
Exudates are among the preliminary signs of diabetic retinopathy, a major cause of vision loss in diabetic patients. Early detection of exudates could improve patients' chances to avoid blindness. In this paper, we present a series of experiments on feature selection and exudates classification using naive Bayes and support vector machine (SVM) classifiers. We first fit the naive Bayes model to a training set consisting of 15 features extracted from each of 115,867 positive examples of exudate pixels and an equal number of negative examples. We then perform feature selection on the naive Bayes model, repeatedly removing features from the classifier, one by one, until classification performance stops improving. To find the best SVM, we begin with the best feature set from the naive Bayes classifier, and repeatedly add the previously-removed features to the classifier. For each combination of features, we perform a grid search to determine the best combination of hyperparameters ν (tolerance for training errors) and γ (radial basis function width). We compare the best naive Bayes and SVM classifiers to a baseline nearest neighbour (NN) classifier using the best feature sets from both classifiers. We find that the naive Bayes and SVM classifiers perform better than the NN classifier. The overall best sensitivity, specificity, precision, and accuracy are 92.28%, 98.52%, 53.05%, and 98.41%, respectively.
Classification of cardiovascular tissues using LBP based descriptors and a cascade SVM.
Mazo, Claudia; Alegre, Enrique; Trujillo, Maria
2017-08-01
Histological images have characteristics, such as texture, shape, colour and spatial structure, that permit the differentiation of each fundamental tissue and organ. Texture is one of the most discriminative features. The automatic classification of tissues and organs based on histology images is an open problem, due to the lack of automatic solutions when treating tissues without pathologies. In this paper, we demonstrate that it is possible to automatically classify cardiovascular tissues using texture information and Support Vector Machines (SVM). Additionally, we realised that it is feasible to recognise several cardiovascular organs following the same process. The texture of histological images was described using Local Binary Patterns (LBP), LBP Rotation Invariant (LBPri), Haralick features and different concatenations between them, representing in this way its content. Using a SVM with linear kernel, we selected the more appropriate descriptor that, for this problem, was a concatenation of LBP and LBPri. Due to the small number of the images available, we could not follow an approach based on deep learning, but we selected the classifier who yielded the higher performance by comparing SVM with Random Forest and Linear Discriminant Analysis. Once SVM was selected as the classifier with a higher area under the curve that represents both higher recall and precision, we tuned it evaluating different kernels, finding that a linear SVM allowed us to accurately separate four classes of tissues: (i) cardiac muscle of the heart, (ii) smooth muscle of the muscular artery, (iii) loose connective tissue, and (iv) smooth muscle of the large vein and the elastic artery. The experimental validation was conducted using 3000 blocks of 100 × 100 sized pixels, with 600 blocks per class and the classification was assessed using a 10-fold cross-validation. using LBP as the descriptor, concatenated with LBPri and a SVM with linear kernel, the main four classes of tissues were recognised with an AUC higher than 0.98. A polynomial kernel was then used to separate the elastic artery and vein, yielding an AUC in both cases superior to 0.98. Following the proposed approach, it is possible to separate with very high precision (AUC greater than 0.98) the fundamental tissues of the cardiovascular system along with some organs, such as the heart, arteries and veins. Copyright © 2017 Elsevier B.V. All rights reserved.
Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals
2013-09-30
N0001411WX21394 Steve W. Martin SPAWAR Systems Center Pacific 53366 Front St. San Diego, CA 92152-6551 phone: (619) 553-9882 email: Steve.W.Martin...multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et al. 2008). This classifier both detects and classifies echolocation...whales. Here Moretti’s group, particularly S. Jarvis , will improve the SVM classifier by resolving confusion between species whose clicks overlap in
NASA Astrophysics Data System (ADS)
Zhang, Yanjiao; Lai, Xiaoping; Zeng, Qiuyao; Li, Linfang; Lin, Lin; Li, Shaoxin; Liu, Zhiming; Su, Chengkang; Qi, Minni; Guo, Zhouyi
2018-03-01
This study aims to classify low-grade and high-grade bladder cancer (BC) patients using serum surface-enhanced Raman scattering (SERS) spectra and support vector machine (SVM) algorithms. Serum SERS spectra are acquired from 88 serum samples with silver nanoparticles as the SERS-active substrate. Diagnostic accuracies of 96.4% and 95.4% are obtained when differentiating the serum SERS spectra of all BC patients versus normal subjects and low-grade versus high-grade BC patients, respectively, with optimal SVM classifier models. This study demonstrates that the serum SERS technique combined with SVM has great potential to noninvasively detect and classify high-grade and low-grade BC patients.
Kianmehr, Keivan; Alhajj, Reda
2008-09-01
In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.
Face recognition using total margin-based adaptive fuzzy support vector machines.
Liu, Yi-Hung; Chen, Yen-Ting
2007-01-01
This paper presents a new classifier called total margin-based adaptive fuzzy support vector machines (TAF-SVM) that deals with several problems that may occur in support vector machines (SVMs) when applied to the face recognition. The proposed TAF-SVM not only solves the overfitting problem resulted from the outlier with the approach of fuzzification of the penalty, but also corrects the skew of the optimal separating hyperplane due to the very imbalanced data sets by using different cost algorithm. In addition, by introducing the total margin algorithm to replace the conventional soft margin algorithm, a lower generalization error bound can be obtained. Those three functions are embodied into the traditional SVM so that the TAF-SVM is proposed and reformulated in both linear and nonlinear cases. By using two databases, the Chung Yuan Christian University (CYCU) multiview and the facial recognition technology (FERET) face databases, and using the kernel Fisher's discriminant analysis (KFDA) algorithm to extract discriminating face features, experimental results show that the proposed TAF-SVM is superior to SVM in terms of the face-recognition accuracy. The results also indicate that the proposed TAF-SVM can achieve smaller error variances than SVM over a number of tests such that better recognition stability can be obtained.
Tripathy, Rajesh Kumar; Dandapat, Samarendra
2017-04-01
The complex wavelet sub-band bi-spectrum (CWSB) features are proposed for detection and classification of myocardial infarction (MI), heart muscle disease (HMD) and bundle branch block (BBB) from 12-lead ECG. The dual tree CW transform of 12-lead ECG produces CW coefficients at different sub-bands. The higher-order CW analysis is used for evaluation of CWSB. The mean of the absolute value of CWSB, and the number of negative phase angle and the number of positive phase angle features from the phase of CWSB of 12-lead ECG are evaluated. Extreme learning machine and support vector machine (SVM) classifiers are used to evaluate the performance of CWSB features. Experimental results show that the proposed CWSB features of 12-lead ECG and the SVM classifier are successful for classification of various heart pathologies. The individual accuracy values for MI, HMD and BBB classes are obtained as 98.37, 97.39 and 96.40%, respectively, using SVM classifier and radial basis function kernel function. A comparison has also been made with existing 12-lead ECG-based cardiac disease detection techniques.
Automated separation of merged Langerhans islets
NASA Astrophysics Data System (ADS)
Švihlík, Jan; Kybic, Jan; Habart, David
2016-03-01
This paper deals with separation of merged Langerhans islets in segmentations in order to evaluate correct histogram of islet diameters. A distribution of islet diameters is useful for determining the feasibility of islet transplantation in diabetes. First, the merged islets at training segmentations are manually separated by medical experts. Based on the single islets, the merged islets are identified and the SVM classifier is trained on both classes (merged/single islets). The testing segmentations were over-segmented using watershed transform and the most probable back merging of islets were found using trained SVM classifier. Finally, the optimized segmentation is compared with ground truth segmentation (correctly separated islets).
NASA Astrophysics Data System (ADS)
Adhi Pradana, Wisnu; Adiwijaya; Novia Wisesty, Untari
2018-03-01
Support Vector Machine or commonly called SVM is one method that can be used to process the classification of a data. SVM classifies data from 2 different classes with hyperplane. In this study, the system was built using SVM to develop Arabic Speech Recognition. In the development of the system, there are 2 kinds of speakers that have been tested that is dependent speakers and independent speakers. The results from this system is an accuracy of 85.32% for speaker dependent and 61.16% for independent speakers.
Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.
Sriwastava, Brijesh K; Basu, Subhadip; Maulik, Ujjwal
2015-01-01
Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.
Bizios, Dimitrios; Heijl, Anders; Hougaard, Jesper Leth; Bengtsson, Boel
2010-02-01
To compare the performance of two machine learning classifiers (MLCs), artificial neural networks (ANNs) and support vector machines (SVMs), with input based on retinal nerve fibre layer thickness (RNFLT) measurements by optical coherence tomography (OCT), on the diagnosis of glaucoma, and to assess the effects of different input parameters. We analysed Stratus OCT data from 90 healthy persons and 62 glaucoma patients. Performance of MLCs was compared using conventional OCT RNFLT parameters plus novel parameters such as minimum RNFLT values, 10th and 90th percentiles of measured RNFLT, and transformations of A-scan measurements. For each input parameter and MLC, the area under the receiver operating characteristic curve (AROC) was calculated. There were no statistically significant differences between ANNs and SVMs. The best AROCs for both ANN (0.982, 95%CI: 0.966-0.999) and SVM (0.989, 95% CI: 0.979-1.0) were based on input of transformed A-scan measurements. Our SVM trained on this input performed better than ANNs or SVMs trained on any of the single RNFLT parameters (p < or = 0.038). The performance of ANNs and SVMs trained on minimum thickness values and the 10th and 90th percentiles were at least as good as ANNs and SVMs with input based on the conventional RNFLT parameters. No differences between ANN and SVM were observed in this study. Both MLCs performed very well, with similar diagnostic performance. Input parameters have a larger impact on diagnostic performance than the type of machine classifier. Our results suggest that parameters based on transformed A-scan thickness measurements of the RNFL processed by machine classifiers can improve OCT-based glaucoma diagnosis.
Breast Cancer Recognition Using a Novel Hybrid Intelligent Method
Addeh, Jalil; Ebrahimzadeh, Ata
2012-01-01
Breast cancer is the second largest cause of cancer deaths among women. At the same time, it is also among the most curable cancer types if it can be diagnosed early. This paper presents a novel hybrid intelligent method for recognition of breast cancer tumors. The proposed method includes three main modules: the feature extraction module, the classifier module, and the optimization module. In the feature extraction module, fuzzy features are proposed as the efficient characteristic of the patterns. In the classifier module, because of the promising generalization capability of support vector machines (SVM), a SVM-based classifier is proposed. In support vector machine training, the hyperparameters have very important roles for its recognition accuracy. Therefore, in the optimization module, the bees algorithm (BA) is proposed for selecting appropriate parameters of the classifier. The proposed system is tested on Wisconsin Breast Cancer database and simulation results show that the recommended system has a high accuracy. PMID:23626945
Interpreting support vector machine models for multivariate group wise analysis in neuroimaging
Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos
2015-01-01
Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913
Semi-supervised vibration-based classification and condition monitoring of compressors
NASA Astrophysics Data System (ADS)
Potočnik, Primož; Govekar, Edvard
2017-09-01
Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.
Mete, Mutlu; Sakoglu, Unal; Spence, Jeffrey S; Devous, Michael D; Harris, Thomas S; Adinoff, Bryon
2016-10-06
Neuroimaging studies have yielded significant advances in the understanding of neural processes relevant to the development and persistence of addiction. However, these advances have not explored extensively for diagnostic accuracy in human subjects. The aim of this study was to develop a statistical approach, using a machine learning framework, to correctly classify brain images of cocaine-dependent participants and healthy controls. In this study, a framework suitable for educing potential brain regions that differed between the two groups was developed and implemented. Single Photon Emission Computerized Tomography (SPECT) images obtained during rest or a saline infusion in three cohorts of 2-4 week abstinent cocaine-dependent participants (n = 93) and healthy controls (n = 69) were used to develop a classification model. An information theoretic-based feature selection algorithm was first conducted to reduce the number of voxels. A density-based clustering algorithm was then used to form spatially connected voxel clouds in three-dimensional space. A statistical classifier, Support Vectors Machine (SVM), was then used for participant classification. Statistically insignificant voxels of spatially connected brain regions were removed iteratively and classification accuracy was reported through the iterations. The voxel-based analysis identified 1,500 spatially connected voxels in 30 distinct clusters after a grid search in SVM parameters. Participants were successfully classified with 0.88 and 0.89 F-measure accuracies in 10-fold cross validation (10xCV) and leave-one-out (LOO) approaches, respectively. Sensitivity and specificity were 0.90 and 0.89 for LOO; 0.83 and 0.83 for 10xCV. Many of the 30 selected clusters are highly relevant to the addictive process, including regions relevant to cognitive control, default mode network related self-referential thought, behavioral inhibition, and contextual memories. Relative hyperactivity and hypoactivity of regional cerebral blood flow in brain regions in cocaine-dependent participants are presented with corresponding level of significance. The SVM-based approach successfully classified cocaine-dependent and healthy control participants using voxels selected with information theoretic-based and statistical methods from participants' SPECT data. The regions found in this study align with brain regions reported in the literature. These findings support the future use of brain imaging and SVM-based classifier in the diagnosis of substance use disorders and furthering an understanding of their underlying pathology.
A tri-fold hybrid classification approach for diagnostics with unexampled faulty states
NASA Astrophysics Data System (ADS)
Tamilselvan, Prasanna; Wang, Pingfeng
2015-01-01
System health diagnostics provides diversified benefits such as improved safety, improved reliability and reduced costs for the operation and maintenance of engineered systems. Successful health diagnostics requires the knowledge of system failures. However, with an increasing system complexity, it is extraordinarily difficult to have a well-tested system so that all potential faulty states can be realized and studied at product testing stage. Thus, real time health diagnostics requires automatic detection of unexampled system faulty states based upon sensory data to avoid sudden catastrophic system failures. This paper presents a trifold hybrid classification (THC) approach for structural health diagnosis with unexampled health states (UHS), which comprises of preliminary UHS identification using a new thresholded Mahalanobis distance (TMD) classifier, UHS diagnostics using a two-class support vector machine (SVM) classifier, and exampled health states diagnostics using a multi-class SVM classifier. The proposed THC approach, which takes the advantages of both TMD and SVM-based classification techniques, is able to identify and isolate the unexampled faulty states through interactively detecting the deviation of sensory data from the exampled health states and forming new ones autonomously. The proposed THC approach is further extended to a generic framework for health diagnostics problems with unexampled faulty states and demonstrated with health diagnostics case studies for power transformers and rolling bearings.
An expert support system for breast cancer diagnosis using color wavelet features.
Issac Niwas, S; Palanisamy, P; Chibbar, Rajni; Zhang, W J
2012-10-01
Breast cancer diagnosis can be done through the pathologic assessments of breast tissue samples such as core needle biopsy technique. The result of analysis on this sample by pathologist is crucial for breast cancer patient. In this paper, nucleus of tissue samples are investigated after decomposition by means of the Log-Gabor wavelet on HSV color domain and an algorithm is developed to compute the color wavelet features. These features are used for breast cancer diagnosis using Support Vector Machine (SVM) classifier algorithm. The ability of properly trained SVM is to correctly classify patterns and make them particularly suitable for use in an expert system that aids in the diagnosis of cancer tissue samples. The results are compared with other multivariate classifiers such as Naïves Bayes classifier and Artificial Neural Network. The overall accuracy of the proposed method using SVM classifier will be further useful for automation in cancer diagnosis.
Automatic epileptic seizure detection in EEGs using MF-DFA, SVM based on cloud computing.
Zhang, Zhongnan; Wen, Tingxi; Huang, Wei; Wang, Meihong; Li, Chunfeng
2017-01-01
Epilepsy is a chronic disease with transient brain dysfunction that results from the sudden abnormal discharge of neurons in the brain. Since electroencephalogram (EEG) is a harmless and noninvasive detection method, it plays an important role in the detection of neurological diseases. However, the process of analyzing EEG to detect neurological diseases is often difficult because the brain electrical signals are random, non-stationary and nonlinear. In order to overcome such difficulty, this study aims to develop a new computer-aided scheme for automatic epileptic seizure detection in EEGs based on multi-fractal detrended fluctuation analysis (MF-DFA) and support vector machine (SVM). New scheme first extracts features from EEG by MF-DFA during the first stage. Then, the scheme applies a genetic algorithm (GA) to calculate parameters used in SVM and classify the training data according to the selected features using SVM. Finally, the trained SVM classifier is exploited to detect neurological diseases. The algorithm utilizes MLlib from library of SPARK and runs on cloud platform. Applying to a public dataset for experiment, the study results show that the new feature extraction method and scheme can detect signals with less features and the accuracy of the classification reached up to 99%. MF-DFA is a promising approach to extract features for analyzing EEG, because of its simple algorithm procedure and less parameters. The features obtained by MF-DFA can represent samples as well as traditional wavelet transform and Lyapunov exponents. GA can always find useful parameters for SVM with enough execution time. The results illustrate that the classification model can achieve comparable accuracy, which means that it is effective in epileptic seizure detection.
Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders.
Subasi, Abdulhamit
2013-06-01
Support vector machine (SVM) is an extensively used machine learning method with many biomedical signal classification applications. In this study, a novel PSO-SVM model has been proposed that hybridized the particle swarm optimization (PSO) and SVM to improve the EMG signal classification accuracy. This optimization mechanism involves kernel parameter setting in the SVM training procedure, which significantly influences the classification accuracy. The experiments were conducted on the basis of EMG signal to classify into normal, neurogenic or myopathic. In the proposed method the EMG signals were decomposed into the frequency sub-bands using discrete wavelet transform (DWT) and a set of statistical features were extracted from these sub-bands to represent the distribution of wavelet coefficients. The obtained results obviously validate the superiority of the SVM method compared to conventional machine learning methods, and suggest that further significant enhancements in terms of classification accuracy can be achieved by the proposed PSO-SVM classification system. The PSO-SVM yielded an overall accuracy of 97.41% on 1200 EMG signals selected from 27 subject records against 96.75%, 95.17% and 94.08% for the SVM, the k-NN and the RBF classifiers, respectively. PSO-SVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of PSO-SVM for diagnosis of neuromuscular disorders. Copyright © 2013 Elsevier Ltd. All rights reserved.
Connolly, Brian; Matykiewicz, Pawel; Bretonnel Cohen, K; Standridge, Shannon M; Glauser, Tracy A; Dlugos, Dennis J; Koh, Susan; Tham, Eric; Pestian, John
2014-01-01
The constant progress in computational linguistic methods provides amazing opportunities for discovering information in clinical text and enables the clinical scientist to explore novel approaches to care. However, these new approaches need evaluation. We describe an automated system to compare descriptions of epilepsy patients at three different organizations: Cincinnati Children's Hospital, the Children's Hospital Colorado, and the Children's Hospital of Philadelphia. To our knowledge, there have been no similar previous studies. In this work, a support vector machine (SVM)-based natural language processing (NLP) algorithm is trained to classify epilepsy progress notes as belonging to a patient with a specific type of epilepsy from a particular hospital. The same SVM is then used to classify notes from another hospital. Our null hypothesis is that an NLP algorithm cannot be trained using epilepsy-specific notes from one hospital and subsequently used to classify notes from another hospital better than a random baseline classifier. The hypothesis is tested using epilepsy progress notes from the three hospitals. We are able to reject the null hypothesis at the 95% level. It is also found that classification was improved by including notes from a second hospital in the SVM training sample. With a reasonably uniform epilepsy vocabulary and an NLP-based algorithm able to use this uniformity to classify epilepsy progress notes across different hospitals, we can pursue automated comparisons of patient conditions, treatments, and diagnoses across different healthcare settings. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Schnyer, David M; Clasen, Peter C; Gonzalez, Christopher; Beevers, Christopher G
2017-06-30
Using MRI to diagnose mental disorders has been a long-term goal. Despite this, the vast majority of prior neuroimaging work has been descriptive rather than predictive. The current study applies support vector machine (SVM) learning to MRI measures of brain white matter to classify adults with Major Depressive Disorder (MDD) and healthy controls. In a precisely matched group of individuals with MDD (n =25) and healthy controls (n =25), SVM learning accurately (74%) classified patients and controls across a brain map of white matter fractional anisotropy values (FA). The study revealed three main findings: 1) SVM applied to DTI derived FA maps can accurately classify MDD vs. healthy controls; 2) prediction is strongest when only right hemisphere white matter is examined; and 3) removing FA values from a region identified by univariate contrast as significantly different between MDD and healthy controls does not change the SVM accuracy. These results indicate that SVM learning applied to neuroimaging data can classify the presence versus absence of MDD and that predictive information is distributed across brain networks rather than being highly localized. Finally, MDD group differences revealed through typical univariate contrasts do not necessarily reveal patterns that provide accurate predictive information. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
2014-01-01
Background Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. Results The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Conclusion Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database. PMID:24970564
Palaniappan, Rajkumar; Sundaraj, Kenneth; Sundaraj, Sebastian
2014-06-27
Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database.
Jongin Kim; Boreom Lee
2017-07-01
The classification of neuroimaging data for the diagnosis of Alzheimer's Disease (AD) is one of the main research goals of the neuroscience and clinical fields. In this study, we performed extreme learning machine (ELM) classifier to discriminate the AD, mild cognitive impairment (MCI) from normal control (NC). We compared the performance of ELM with that of a linear kernel support vector machine (SVM) for 718 structural MRI images from Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The data consisted of normal control, MCI converter (MCI-C), MCI non-converter (MCI-NC), and AD. We employed SVM-based recursive feature elimination (RFE-SVM) algorithm to find the optimal subset of features. In this study, we found that the RFE-SVM feature selection approach in combination with ELM shows the superior classification accuracy to that of linear kernel SVM for structural T1 MRI data.
An improved PSO-SVM model for online recognition defects in eddy current testing
NASA Astrophysics Data System (ADS)
Liu, Baoling; Hou, Dibo; Huang, Pingjie; Liu, Banteng; Tang, Huayi; Zhang, Wubo; Chen, Peihua; Zhang, Guangxin
2013-12-01
Accurate and rapid recognition of defects is essential for structural integrity and health monitoring of in-service device using eddy current (EC) non-destructive testing. This paper introduces a novel model-free method that includes three main modules: a signal pre-processing module, a classifier module and an optimisation module. In the signal pre-processing module, a kind of two-stage differential structure is proposed to suppress the lift-off fluctuation that could contaminate the EC signal. In the classifier module, multi-class support vector machine (SVM) based on one-against-one strategy is utilised for its good accuracy. In the optimisation module, the optimal parameters of classifier are obtained by an improved particle swarm optimisation (IPSO) algorithm. The proposed IPSO technique can improve convergence performance of the primary PSO through the following strategies: nonlinear processing of inertia weight, introductions of the black hole and simulated annealing model with extremum disturbance. The good generalisation ability of the IPSO-SVM model has been validated through adding additional specimen into the testing set. Experiments show that the proposed algorithm can achieve higher recognition accuracy and efficiency than other well-known classifiers and the superiorities are more obvious with less training set, which contributes to online application.
A fast learning method for large scale and multi-class samples of SVM
NASA Astrophysics Data System (ADS)
Fan, Yu; Guo, Huiming
2017-06-01
A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
Chesnokov, Yuriy V
2008-06-01
Paroxysmal atrial fibrillation (PAF) is a serious arrhythmia associated with morbidity and mortality. We explore the possibility of distant prediction of PAF by analyzing changes in heart rate variability (HRV) dynamics of non-PAF rhythms immediately before PAF event. We use that model for distant prognosis of PAF onset with artificial intelligence methods. We analyzed 30-min non-PAF HRV records from 51 subjects immediately before PAF onset and at least 45min distant from any PAF event. We used spectral and complexity analysis with sample (SmEn) and approximate (ApEn) entropies and their multiscale versions on extracted HRV data. We used that features to train the artificial neural networks (ANNs) and support vector machine (SVM) classifiers to differentiate the subjects. The trained classifiers were further tested for distant PAF event prognosis on 16 subjects from independent database on non-PAF rhythm lasting from 60 to 320 min before PAF onset classifying the 30-min segments as distant or leading to PAF. We found statistically significant increase in 30-min non-PAF HRV recordings from 51 subjects in the VLF, LF, HF bands and total power (p<0.0001) before PAF event compared to PAF distant ones. The SmEn and ApEn analysis provided significant decrease in complexity (p<0.0001 and p<0.001) before PAF onset. For training ANN and SVM classifiers the data from 51 subjects were randomly split to training, validation and testing. ANN provided better results in terms of sensitivity (Se), specificity (Sp) and positive predictivity (Pp) compared to SVM which became biased towards positive case. The validation results of the ANN classifier we achieved: Se 76%, Sp 93%, Pp 94%. Testing ANN and SVM classifiers on 16 subjects with non-PAF HRV data preceding PAF events we obtained distant prediction of PAF onset with SVM classifier in 10 subjects (58+/-18 min in advance). ANN classifier provided distant prediction of PAF event in 13 subjects (62+/-21 min in advance). From the results of distant PAF prediction we conclude that ANN and SVM classifiers learned the changes in the HRV dynamics immediately before PAF event and successfully identified them during distant PAF prognosis on independent database. This confirms the reported in the literature results that corresponding changes in the HRV data occur about 60 min before PAF onset and proves the possibility of distant PAF prediction with ANN and SVM methods.
A case study on Discrete Wavelet Transform based Hurst exponent for epilepsy detection.
Madan, Saiby; Srivastava, Kajri; Sharmila, A; Mahalakshmi, P
2018-01-01
Epileptic seizures are manifestations of epilepsy. Careful analysis of EEG records can provide valuable insight and improved understanding of the mechanism causing epileptic disorders. The detection of epileptic form discharges in EEG is an important component in the diagnosis of epilepsy. As EEG signals are non-stationary, the conventional frequency and time domain analysis does not provide better accuracy. So, in this work an attempt has been made to provide an overview of the determination of epilepsy by implementation of Hurst exponent (HE)-based discrete wavelet transform techniques for feature extraction from EEG data sets obtained during ictal and pre ictal stages of affected person and finally classifying EEG signals using SVM and KNN Classifiers. The The highest accuracy of 99% is obtained using SVM.
Jiang, Rou; You, Rui; Pei, Xiao-Qing; Zou, Xiong; Zhang, Meng-Xia; Wang, Tong-Min; Sun, Rui; Luo, Dong-Hua; Huang, Pei-Yu; Chen, Qiu-Yan; Hua, Yi-Jun; Tang, Lin-Quan; Guo, Ling; Mo, Hao-Yuan; Qian, Chao-Nan; Mai, Hai-Qiang; Hong, Ming-Huang; Cai, Hong-Min; Chen, Ming-Yuan
2016-01-19
The aim of this study was to develop a prognostic classifier and subdivided the M1 stage for nasopharyngeal carcinoma patients with synchronous metastases (mNPC). A retrospective cohort of 347 mNPC patients was recruited between January 2000 and December 2010. Thirty hematological markers and 11 clinical characteristics were collected, and the association of these factors with overall survival (OS) was evaluated. Advanced machine learning schemes of a support vector machine (SVM) were used to select a subset of highly informative factors and to construct a prognostic model (mNPC-SVM). The mNPC-SVM classifier identified ten informative variables, including three clinical indexes and seven hematological markers. The median survival time for low-risk patients (M1a) as identified by the mNPC-SVM classifier was 38.0 months, and survival time was dramatically reduced to 13.8 months for high-risk patients (M1b) (P < 0.001). Multivariate adjustment using prognostic factors revealed that the mNPC-SVM classifier remained a powerful predictor of OS (M1a vs. M1b, hazard ratio, 3.45; 95% CI, 2.59 to 4.60, P < 0.001). Moreover, combination treatment of systemic chemotherapy and loco-regional radiotherapy was associated with significantly better survival outcomes than chemotherapy alone (the 5-year OS, 47.0% vs. 10.0%, P < 0.001) in the M1a subgroup but not in the M1b subgroup (12.0% vs. 3.0%, P = 0.101). These findings were validated by a separate cohort. In conclusion, the newly developed mNPC-SVM classifier led to more precise risk definitions that offer a promising subdivision of the M1 stage and individualized selection for future therapeutic regimens in mNPC patients.
SVM Pixel Classification on Colour Image Segmentation
NASA Astrophysics Data System (ADS)
Barui, Subhrajit; Latha, S.; Samiappan, Dhanalakshmi; Muthu, P.
2018-04-01
The aim of image segmentation is to simplify the representation of an image with the help of cluster pixels into something meaningful to analyze. Segmentation is typically used to locate boundaries and curves in an image, precisely to label every pixel in an image to give each pixel an independent identity. SVM pixel classification on colour image segmentation is the topic highlighted in this paper. It holds useful application in the field of concept based image retrieval, machine vision, medical imaging and object detection. The process is accomplished step by step. At first we need to recognize the type of colour and the texture used as an input to the SVM classifier. These inputs are extracted via local spatial similarity measure model and Steerable filter also known as Gabon Filter. It is then trained by using FCM (Fuzzy C-Means). Both the pixel level information of the image and the ability of the SVM Classifier undergoes some sophisticated algorithm to form the final image. The method has a well developed segmented image and efficiency with respect to increased quality and faster processing of the segmented image compared with the other segmentation methods proposed earlier. One of the latest application result is the Light L16 camera.
Diagnosis of periodontal diseases using different classification algorithms: a preliminary study.
Ozden, F O; Özgönenel, O; Özden, B; Aydogdu, A
2015-01-01
The purpose of the proposed study was to develop an identification unit for classifying periodontal diseases using support vector machine (SVM), decision tree (DT), and artificial neural networks (ANNs). A total of 150 patients was divided into two groups such as training (100) and testing (50). The codes created for risk factors, periodontal data, and radiographically bone loss were formed as a matrix structure and regarded as inputs for the classification unit. A total of six periodontal conditions was the outputs of the classification unit. The accuracy of the suggested methods was compared according to their resolution and working time. DT and SVM were best to classify the periodontal diseases with a high accuracy according to the clinical research based on 150 patients. The performances of SVM and DT were found 98% with total computational time of 19.91 and 7.00 s, respectively. ANN had the worst correlation between input and output variable, and its performance was calculated as 46%. SVM and DT appeared to be sufficiently complex to reflect all the factors associated with the periodontal status, simple enough to be understandable and practical as a decision-making aid for prediction of periodontal disease.
Paiva, Joana S; Cardoso, João; Pereira, Tânia
2018-01-01
The main goal of this study was to develop an automatic method based on supervised learning methods, able to distinguish healthy from pathologic arterial pulse wave (APW), and those two from noisy waveforms (non-relevant segments of the signal), from the data acquired during a clinical examination with a novel optical system. The APW dataset analysed was composed by signals acquired in a clinical environment from a total of 213 subjects, including healthy volunteers and non-healthy patients. The signals were parameterised by means of 39pulse features: morphologic, time domain statistics, cross-correlation features, wavelet features. Multiclass Support Vector Machine Recursive Feature Elimination (SVM RFE) method was used to select the most relevant features. A comparative study was performed in order to evaluate the performance of the two classifiers: Support Vector Machine (SVM) and Artificial Neural Network (ANN). SVM achieved a statistically significant better performance for this problem with an average accuracy of 0.9917±0.0024 and a F-Measure of 0.9925±0.0019, in comparison with ANN, which reached the values of 0.9847±0.0032 and 0.9852±0.0031 for Accuracy and F-Measure, respectively. A significant difference was observed between the performances obtained with SVM classifier using a different number of features from the original set available. The comparison between SVM and NN allowed reassert the higher performance of SVM. The results obtained in this study showed the potential of the proposed method to differentiate those three important signal outcomes (healthy, pathologic and noise) and to reduce bias associated with clinical diagnosis of cardiovascular disease using APW. Copyright © 2017 Elsevier B.V. All rights reserved.
Dandapat, Samarendra
2017-01-01
The complex wavelet sub-band bi-spectrum (CWSB) features are proposed for detection and classification of myocardial infarction (MI), heart muscle disease (HMD) and bundle branch block (BBB) from 12-lead ECG. The dual tree CW transform of 12-lead ECG produces CW coefficients at different sub-bands. The higher-order CW analysis is used for evaluation of CWSB. The mean of the absolute value of CWSB, and the number of negative phase angle and the number of positive phase angle features from the phase of CWSB of 12-lead ECG are evaluated. Extreme learning machine and support vector machine (SVM) classifiers are used to evaluate the performance of CWSB features. Experimental results show that the proposed CWSB features of 12-lead ECG and the SVM classifier are successful for classification of various heart pathologies. The individual accuracy values for MI, HMD and BBB classes are obtained as 98.37, 97.39 and 96.40%, respectively, using SVM classifier and radial basis function kernel function. A comparison has also been made with existing 12-lead ECG-based cardiac disease detection techniques. PMID:28894589
Progressive Classification Using Support Vector Machines
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri; Kocurek, Michael
2009-01-01
An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user can halt this reclassification process at any point, thereby obtaining the best possible result for a given amount of computation time. Alternatively, the results can be displayed as they are generated, providing the user with real-time feedback about the current accuracy of classification.
Ruiz-Gonzalez, Ruben; Gomez-Gil, Jaime; Gomez-Gil, Francisco Javier; Martínez-Martínez, Víctor
2014-01-01
The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM)-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i) accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii) the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii) when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels. PMID:25372618
Ruiz-Gonzalez, Ruben; Gomez-Gil, Jaime; Gomez-Gil, Francisco Javier; Martínez-Martínez, Víctor
2014-11-03
The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM)-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i) accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii) the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii) when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels.
Predicting breast cancer using an expression values weighted clinical classifier.
Thomas, Minta; De Brabanter, Kris; Suykens, Johan A K; De Moor, Bart
2014-12-31
Clinical data, such as patient history, laboratory analysis, ultrasound parameters-which are the basis of day-to-day clinical decision support-are often used to guide the clinical management of cancer in the presence of microarray data. Several data fusion techniques are available to integrate genomics or proteomics data, but only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. To improve clinical management, these data should be fully exploited. This requires efficient algorithms to integrate these data sets and design a final classifier. LS-SVM classifiers and generalized eigenvalue/singular value decompositions are successfully used in many bioinformatics applications for prediction tasks. While bringing up the benefits of these two techniques, we propose a machine learning approach, a weighted LS-SVM classifier to integrate two data sources: microarray and clinical parameters. We compared and evaluated the proposed methods on five breast cancer case studies. Compared to LS-SVM classifier on individual data sets, generalized eigenvalue decomposition (GEVD) and kernel GEVD, the proposed weighted LS-SVM classifier offers good prediction performance, in terms of test area under ROC Curve (AUC), on all breast cancer case studies. Thus a clinical classifier weighted with microarray data set results in significantly improved diagnosis, prognosis and prediction responses to therapy. The proposed model has been shown as a promising mathematical framework in both data fusion and non-linear classification problems.
Energy-Based Metrics for Arthroscopic Skills Assessment.
Poursartip, Behnaz; LeBel, Marie-Eve; McCracken, Laura C; Escoto, Abelardo; Patel, Rajni V; Naish, Michael D; Trejos, Ana Luisa
2017-08-05
Minimally invasive skills assessment methods are essential in developing efficient surgical simulators and implementing consistent skills evaluation. Although numerous methods have been investigated in the literature, there is still a need to further improve the accuracy of surgical skills assessment. Energy expenditure can be an indication of motor skills proficiency. The goals of this study are to develop objective metrics based on energy expenditure, normalize these metrics, and investigate classifying trainees using these metrics. To this end, different forms of energy consisting of mechanical energy and work were considered and their values were divided by the related value of an ideal performance to develop normalized metrics. These metrics were used as inputs for various machine learning algorithms including support vector machines (SVM) and neural networks (NNs) for classification. The accuracy of the combination of the normalized energy-based metrics with these classifiers was evaluated through a leave-one-subject-out cross-validation. The proposed method was validated using 26 subjects at two experience levels (novices and experts) in three arthroscopic tasks. The results showed that there are statistically significant differences between novices and experts for almost all of the normalized energy-based metrics. The accuracy of classification using SVM and NN methods was between 70% and 95% for the various tasks. The results show that the normalized energy-based metrics and their combination with SVM and NN classifiers are capable of providing accurate classification of trainees. The assessment method proposed in this study can enhance surgical training by providing appropriate feedback to trainees about their level of expertise and can be used in the evaluation of proficiency.
Tamburro, Gabriella; Fiedler, Patrique; Stone, David; Haueisen, Jens; Comani, Silvia
2018-01-01
EEG may be affected by artefacts hindering the analysis of brain signals. Data-driven methods like independent component analysis (ICA) are successful approaches to remove artefacts from the EEG. However, the ICA-based methods developed so far are often affected by limitations, such as: the need for visual inspection of the separated independent components (subjectivity problem) and, in some cases, for the independent and simultaneous recording of the inspected artefacts to identify the artefactual independent components; a potentially heavy manipulation of the EEG signals; the use of linear classification methods; the use of simulated artefacts to validate the methods; no testing in dry electrode or high-density EEG datasets; applications limited to specific conditions and electrode layouts. Our fingerprint method automatically identifies EEG ICs containing eyeblinks, eye movements, myogenic artefacts and cardiac interference by evaluating 14 temporal, spatial, spectral, and statistical features composing the IC fingerprint. Sixty-two real EEG datasets containing cued artefacts are recorded with wet and dry electrodes (128 wet and 97 dry channels). For each artefact, 10 nonlinear SVM classifiers are trained on fingerprints of expert-classified ICs. Training groups include randomly chosen wet and dry datasets decomposed in 80 ICs. The classifiers are tested on the IC-fingerprints of different datasets decomposed into 20, 50, or 80 ICs. The SVM performance is assessed in terms of accuracy, False Omission Rate (FOR), Hit Rate (HR), False Alarm Rate (FAR), and sensitivity ( p ). For each artefact, the quality of the artefact-free EEG reconstructed using the classification of the best SVM is assessed by visual inspection and SNR. The best SVM classifier for each artefact type achieved average accuracy of 1 (eyeblink), 0.98 (cardiac interference), and 0.97 (eye movement and myogenic artefact). Average classification sensitivity (p) was 1 (eyeblink), 0.997 (myogenic artefact), 0.98 (eye movement), and 0.48 (cardiac interference). Average artefact reduction ranged from a maximum of 82% for eyeblinks to a minimum of 33% for cardiac interference, depending on the effectiveness of the proposed method and the amplitude of the removed artefact. The performance of the SVM classifiers did not depend on the electrode type, whereas it was better for lower decomposition levels (50 and 20 ICs). Apart from cardiac interference, SVM performance and average artefact reduction indicate that the fingerprint method has an excellent overall performance in the automatic detection of eyeblinks, eye movements and myogenic artefacts, which is comparable to that of existing methods. Being also independent from simultaneous artefact recording, electrode number, type and layout, and decomposition level, the proposed fingerprint method can have useful applications in clinical and experimental EEG settings.
Tamburro, Gabriella; Fiedler, Patrique; Stone, David; Haueisen, Jens
2018-01-01
Background EEG may be affected by artefacts hindering the analysis of brain signals. Data-driven methods like independent component analysis (ICA) are successful approaches to remove artefacts from the EEG. However, the ICA-based methods developed so far are often affected by limitations, such as: the need for visual inspection of the separated independent components (subjectivity problem) and, in some cases, for the independent and simultaneous recording of the inspected artefacts to identify the artefactual independent components; a potentially heavy manipulation of the EEG signals; the use of linear classification methods; the use of simulated artefacts to validate the methods; no testing in dry electrode or high-density EEG datasets; applications limited to specific conditions and electrode layouts. Methods Our fingerprint method automatically identifies EEG ICs containing eyeblinks, eye movements, myogenic artefacts and cardiac interference by evaluating 14 temporal, spatial, spectral, and statistical features composing the IC fingerprint. Sixty-two real EEG datasets containing cued artefacts are recorded with wet and dry electrodes (128 wet and 97 dry channels). For each artefact, 10 nonlinear SVM classifiers are trained on fingerprints of expert-classified ICs. Training groups include randomly chosen wet and dry datasets decomposed in 80 ICs. The classifiers are tested on the IC-fingerprints of different datasets decomposed into 20, 50, or 80 ICs. The SVM performance is assessed in terms of accuracy, False Omission Rate (FOR), Hit Rate (HR), False Alarm Rate (FAR), and sensitivity (p). For each artefact, the quality of the artefact-free EEG reconstructed using the classification of the best SVM is assessed by visual inspection and SNR. Results The best SVM classifier for each artefact type achieved average accuracy of 1 (eyeblink), 0.98 (cardiac interference), and 0.97 (eye movement and myogenic artefact). Average classification sensitivity (p) was 1 (eyeblink), 0.997 (myogenic artefact), 0.98 (eye movement), and 0.48 (cardiac interference). Average artefact reduction ranged from a maximum of 82% for eyeblinks to a minimum of 33% for cardiac interference, depending on the effectiveness of the proposed method and the amplitude of the removed artefact. The performance of the SVM classifiers did not depend on the electrode type, whereas it was better for lower decomposition levels (50 and 20 ICs). Discussion Apart from cardiac interference, SVM performance and average artefact reduction indicate that the fingerprint method has an excellent overall performance in the automatic detection of eyeblinks, eye movements and myogenic artefacts, which is comparable to that of existing methods. Being also independent from simultaneous artefact recording, electrode number, type and layout, and decomposition level, the proposed fingerprint method can have useful applications in clinical and experimental EEG settings. PMID:29492336
Lidar detection of underwater objects using a neuro-SVM-based architecture.
Mitra, Vikramjit; Wang, Chia-Jiu; Banerjee, Satarupa
2006-05-01
This paper presents a neural network architecture using a support vector machine (SVM) as an inference engine (IE) for classification of light detection and ranging (Lidar) data. Lidar data gives a sequence of laser backscatter intensities obtained from laser shots generated from an airborne object at various altitudes above the earth surface. Lidar data is pre-filtered to remove high frequency noise. As the Lidar shots are taken from above the earth surface, it has some air backscatter information, which is of no importance for detecting underwater objects. Because of these, the air backscatter information is eliminated from the data and a segment of this data is subsequently selected to extract features for classification. This is then encoded using linear predictive coding (LPC) and polynomial approximation. The coefficients thus generated are used as inputs to the two branches of a parallel neural architecture. The decisions obtained from the two branches are vector multiplied and the result is fed to an SVM-based IE that presents the final inference. Two parallel neural architectures using multilayer perception (MLP) and hybrid radial basis function (HRBF) are considered in this paper. The proposed structure fits the Lidar data classification task well due to the inherent classification efficiency of neural networks and accurate decision-making capability of SVM. A Bayesian classifier and a quadratic classifier were considered for the Lidar data classification task but they failed to offer high prediction accuracy. Furthermore, a single-layered artificial neural network (ANN) classifier was also considered and it failed to offer good accuracy. The parallel ANN architecture proposed in this paper offers high prediction accuracy (98.9%) and is found to be the most suitable architecture for the proposed task of Lidar data classification.
Classification of EEG Signals Based on Pattern Recognition Approach.
Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed
2017-01-01
Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a "pattern recognition" approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90-7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11-89.63% and 91.60-81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy.
Classification of EEG Signals Based on Pattern Recognition Approach
Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed
2017-01-01
Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a “pattern recognition” approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90–7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11–89.63% and 91.60–81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy. PMID:29209190
NASA Astrophysics Data System (ADS)
Lesniak, J. M.; Hupse, R.; Blanc, R.; Karssemeijer, N.; Székely, G.
2012-08-01
False positive (FP) marks represent an obstacle for effective use of computer-aided detection (CADe) of breast masses in mammography. Typically, the problem can be approached either by developing more discriminative features or by employing different classifier designs. In this paper, the usage of support vector machine (SVM) classification for FP reduction in CADe is investigated, presenting a systematic quantitative evaluation against neural networks, k-nearest neighbor classification, linear discriminant analysis and random forests. A large database of 2516 film mammography examinations and 73 input features was used to train the classifiers and evaluate for their performance on correctly diagnosed exams as well as false negatives. Further, classifier robustness was investigated using varying training data and feature sets as input. The evaluation was based on the mean exam sensitivity in 0.05-1 FPs on normals on the free-response receiver operating characteristic curve (FROC), incorporated into a tenfold cross validation framework. It was found that SVM classification using a Gaussian kernel offered significantly increased detection performance (P = 0.0002) compared to the reference methods. Varying training data and input features, SVMs showed improved exploitation of large feature sets. It is concluded that with the SVM-based CADe a significant reduction of FPs is possible outperforming other state-of-the-art approaches for breast mass CADe.
Classification of Sporting Activities Using Smartphone Accelerometers
Mitchell, Edmond; Monaghan, David; O'Connor, Noel E.
2013-01-01
In this paper we present a framework that allows for the automatic identification of sporting activities using commonly available smartphones. We extract discriminative informational features from smartphone accelerometers using the Discrete Wavelet Transform (DWT). Despite the poor quality of their accelerometers, smartphones were used as capture devices due to their prevalence in today's society. Successful classification on this basis potentially makes the technology accessible to both elite and non-elite athletes. Extracted features are used to train different categories of classifiers. No one classifier family has a reportable direct advantage in activity classification problems to date; thus we examine classifiers from each of the most widely used classifier families. We investigate three classification approaches; a commonly used SVM-based approach, an optimized classification model and a fusion of classifiers. We also investigate the effect of changing several of the DWT input parameters, including mother wavelets, window lengths and DWT decomposition levels. During the course of this work we created a challenging sports activity analysis dataset, comprised of soccer and field-hockey activities. The average maximum F-measure accuracy of 87% was achieved using a fusion of classifiers, which was 6% better than a single classifier model and 23% better than a standard SVM approach. PMID:23604031
Ghayab, Hadi Ratham Al; Li, Yan; Abdulla, Shahab; Diykh, Mohammed; Wan, Xiangkui
2016-06-01
Electroencephalogram (EEG) signals are used broadly in the medical fields. The main applications of EEG signals are the diagnosis and treatment of diseases such as epilepsy, Alzheimer, sleep problems and so on. This paper presents a new method which extracts and selects features from multi-channel EEG signals. This research focuses on three main points. Firstly, simple random sampling (SRS) technique is used to extract features from the time domain of EEG signals. Secondly, the sequential feature selection (SFS) algorithm is applied to select the key features and to reduce the dimensionality of the data. Finally, the selected features are forwarded to a least square support vector machine (LS_SVM) classifier to classify the EEG signals. The LS_SVM classifier classified the features which are extracted and selected from the SRS and the SFS. The experimental results show that the method achieves 99.90, 99.80 and 100 % for classification accuracy, sensitivity and specificity, respectively.
Classification of hadith into positive suggestion, negative suggestion, and information
NASA Astrophysics Data System (ADS)
Faraby, Said Al; Riviera Rachmawati Jasin, Eliza; Kusumaningrum, Andina; Adiwijaya
2018-03-01
As one of the Muslim life guidelines, based on the meaning of its sentence(s), a hadith can be viewed as a suggestion for doing something, or a suggestion for not doing something, or just information without any suggestion. In this paper, we tried to classify the Bahasa translation of hadith into the three categories using machine learning approach. We tried stemming and stopword removal in preprocessing, and TF-IDF of unigram, bigram, and trigram as the extracted features. As the classifier, we compared between SVM and Neural Network. Since the categories are new, so in order to compare the results of the previous pipelines, we created a baseline classifier using simple rule-based string matching technique. The rule-based algorithm conditions on the occurrence of words such as “janganlah, sholatlah, and so on” to determine the category. The baseline method achieved F1-Score of 0.69, while the best F1-Score from the machine learning approach was 0.88, and it was produced by SVM model with the linear kernel.
Extraction and classification of 3D objects from volumetric CT data
NASA Astrophysics Data System (ADS)
Song, Samuel M.; Kwon, Junghyun; Ely, Austin; Enyeart, John; Johnson, Chad; Lee, Jongkyu; Kim, Namho; Boyd, Douglas P.
2016-05-01
We propose an Automatic Threat Detection (ATD) algorithm for Explosive Detection System (EDS) using our multistage Segmentation Carving (SC) followed by Support Vector Machine (SVM) classifier. The multi-stage Segmentation and Carving (SC) step extracts all suspect 3-D objects. The feature vector is then constructed for all extracted objects and the feature vector is classified by the Support Vector Machine (SVM) previously learned using a set of ground truth threat and benign objects. The learned SVM classifier has shown to be effective in classification of different types of threat materials. The proposed ATD algorithm robustly deals with CT data that are prone to artifacts due to scatter, beam hardening as well as other systematic idiosyncrasies of the CT data. Furthermore, the proposed ATD algorithm is amenable for including newly emerging threat materials as well as for accommodating data from newly developing sensor technologies. Efficacy of the proposed ATD algorithm with the SVM classifier is demonstrated by the Receiver Operating Characteristics (ROC) curve that relates Probability of Detection (PD) as a function of Probability of False Alarm (PFA). The tests performed using CT data of passenger bags shows excellent performance characteristics.
Dates fruits classification using SVM
NASA Astrophysics Data System (ADS)
Alzu'bi, Reem; Anushya, A.; Hamed, Ebtisam; Al Sha'ar, Eng. Abdelnour; Vincy, B. S. Angela
2018-04-01
In this paper, we used SVM in classifying various types of dates using their images. Dates have interesting different characteristics that can be valuable to distinguish and determine a particular date type. These characteristics include shape, texture, and color. A system that achieves 100% accuracy was built to classify the dates which can be eatable and cannot be eatable. The built system helps the food industry and customer in classifying dates depending on specific quality measures giving best performance with specific type of dates.
Accelerometer and Camera-Based Strategy for Improved Human Fall Detection.
Zerrouki, Nabil; Harrou, Fouzi; Sun, Ying; Houacine, Amrane
2016-12-01
In this paper, we address the problem of detecting human falls using anomaly detection. Detection and classification of falls are based on accelerometric data and variations in human silhouette shape. First, we use the exponentially weighted moving average (EWMA) monitoring scheme to detect a potential fall in the accelerometric data. We used an EWMA to identify features that correspond with a particular type of fall allowing us to classify falls. Only features corresponding with detected falls were used in the classification phase. A benefit of using a subset of the original data to design classification models minimizes training time and simplifies models. Based on features corresponding to detected falls, we used the support vector machine (SVM) algorithm to distinguish between true falls and fall-like events. We apply this strategy to the publicly available fall detection databases from the university of Rzeszow's. Results indicated that our strategy accurately detected and classified fall events, suggesting its potential application to early alert mechanisms in the event of fall situations and its capability for classification of detected falls. Comparison of the classification results using the EWMA-based SVM classifier method with those achieved using three commonly used machine learning classifiers, neural network, K-nearest neighbor and naïve Bayes, proved our model superior.
Online Least Squares One-Class Support Vector Machines-Based Abnormal Visual Event Detection
Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem
2013-01-01
The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method. PMID:24351629
Online least squares one-class support vector machines-based abnormal visual event detection.
Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem
2013-12-12
The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method.
Chao, Pei-Kuang; Wang, Chun-Li; Chan, Hsiao-Lung
2012-03-01
Predicting response after cardiac resynchronization therapy (CRT) has been a challenge of cardiologists. About 30% of selected patients based on the standard selection criteria for CRT do not show response after receiving the treatment. This study is aimed to build an intelligent classifier to assist in identifying potential CRT responders by speckle-tracking radial strain based on echocardiograms. The echocardiograms analyzed were acquired before CRT from 26 patients who have received CRT. Sequential forward selection was performed on the parameters obtained by peak-strain timing and phase space reconstruction on speckle-tracking radial strain to find an optimal set of features for creating intelligent classifiers. Support vector machine (SVM) with a linear, quadratic, and polynominal kernel were tested to build classifiers to identify potential responders and non-responders for CRT by selected features. Based on random sub-sampling validation, the best classification performance is correct rate about 95% with 96-97% sensitivity and 93-94% specificity achieved by applying SVM with a quadratic kernel on a set of 3 parameters. The selected 3 parameters contain both indexes extracted by peak-strain timing and phase space reconstruction. An intelligent classifier with an averaged correct rate, sensitivity and specificity above 90% for assisting in identifying CRT responders is built by speckle-tracking radial strain. The classifier can be applied to provide objective suggestion for patient selection of CRT. Copyright © 2011 Elsevier B.V. All rights reserved.
Structural analysis of online handwritten mathematical symbols based on support vector machines
NASA Astrophysics Data System (ADS)
Simistira, Foteini; Papavassiliou, Vassilis; Katsouros, Vassilis; Carayannis, George
2013-01-01
Mathematical expression recognition is still a very challenging task for the research community mainly because of the two-dimensional (2d) structure of mathematical expressions (MEs). In this paper, we present a novel approach for the structural analysis between two on-line handwritten mathematical symbols of a ME, based on spatial features of the symbols. We introduce six features to represent the spatial affinity of the symbols and compare two multi-class classification methods that employ support vector machines (SVMs): one based on the "one-against-one" technique and one based on the "one-against-all", in identifying the relation between a pair of symbols (i.e. subscript, numerator, etc). A dataset containing 1906 spatial relations derived from the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) 2012 training dataset is constructed to evaluate the classifiers and compare them with the rule-based classifier of the ILSP-1 system participated in the contest. The experimental results give an overall mean error rate of 2.61% for the "one-against-one" SVM approach, 6.57% for the "one-against-all" SVM technique and 12.31% error rate for the ILSP-1 classifier.
Accuracy of dementia diagnosis: a direct comparison between radiologists and a computerized method.
Klöppel, Stefan; Stonnington, Cynthia M; Barnes, Josephine; Chen, Frederick; Chu, Carlton; Good, Catriona D; Mader, Irina; Mitchell, L Anne; Patel, Ameet C; Roberts, Catherine C; Fox, Nick C; Jack, Clifford R; Ashburner, John; Frackowiak, Richard S J
2008-11-01
There has been recent interest in the application of machine learning techniques to neuroimaging-based diagnosis. These methods promise fully automated, standard PC-based clinical decisions, unbiased by variable radiological expertise. We recently used support vector machines (SVMs) to separate sporadic Alzheimer's disease from normal ageing and from fronto-temporal lobar degeneration (FTLD). In this study, we compare the results to those obtained by radiologists. A binary diagnostic classification was made by six radiologists with different levels of experience on the same scans and information that had been previously analysed with SVM. SVMs correctly classified 95% (sensitivity/specificity: 95/95) of sporadic Alzheimer's disease and controls into their respective groups. Radiologists correctly classified 65-95% (median 89%; sensitivity/specificity: 88/90) of scans. SVM correctly classified another set of sporadic Alzheimer's disease in 93% (sensitivity/specificity: 100/86) of cases, whereas radiologists ranged between 80% and 90% (median 83%; sensitivity/specificity: 80/85). SVMs were better at separating patients with sporadic Alzheimer's disease from those with FTLD (SVM 89%; sensitivity/specificity: 83/95; compared to radiological range from 63% to 83%; median 71%; sensitivity/specificity: 64/76). Radiologists were always accurate when they reported a high degree of diagnostic confidence. The results show that well-trained neuroradiologists classify typical Alzheimer's disease-associated scans comparable to SVMs. However, SVMs require no expert knowledge and trained SVMs can readily be exchanged between centres for use in diagnostic classification. These results are encouraging and indicate a role for computerized diagnostic methods in clinical practice.
Accuracy of dementia diagnosis—a direct comparison between radiologists and a computerized method
Stonnington, Cynthia M.; Barnes, Josephine; Chen, Frederick; Chu, Carlton; Good, Catriona D.; Mader, Irina; Mitchell, L. Anne; Patel, Ameet C.; Roberts, Catherine C.; Fox, Nick C.; Jack, Clifford R.; Ashburner, John; Frackowiak, Richard S. J.
2008-01-01
There has been recent interest in the application of machine learning techniques to neuroimaging-based diagnosis. These methods promise fully automated, standard PC-based clinical decisions, unbiased by variable radiological expertise. We recently used support vector machines (SVMs) to separate sporadic Alzheimer's disease from normal ageing and from fronto-temporal lobar degeneration (FTLD). In this study, we compare the results to those obtained by radiologists. A binary diagnostic classification was made by six radiologists with different levels of experience on the same scans and information that had been previously analysed with SVM. SVMs correctly classified 95% (sensitivity/specificity: 95/95) of sporadic Alzheimer's disease and controls into their respective groups. Radiologists correctly classified 65–95% (median 89%; sensitivity/specificity: 88/90) of scans. SVM correctly classified another set of sporadic Alzheimer's disease in 93% (sensitivity/specificity: 100/86) of cases, whereas radiologists ranged between 80% and 90% (median 83%; sensitivity/specificity: 80/85). SVMs were better at separating patients with sporadic Alzheimer's disease from those with FTLD (SVM 89%; sensitivity/specificity: 83/95; compared to radiological range from 63% to 83%; median 71%; sensitivity/specificity: 64/76). Radiologists were always accurate when they reported a high degree of diagnostic confidence. The results show that well-trained neuroradiologists classify typical Alzheimer's disease-associated scans comparable to SVMs. However, SVMs require no expert knowledge and trained SVMs can readily be exchanged between centres for use in diagnostic classification. These results are encouraging and indicate a role for computerized diagnostic methods in clinical practice. PMID:18835868
NASA Astrophysics Data System (ADS)
Sehad, Mounir; Lazri, Mourad; Ameur, Soltane
2017-03-01
In this work, a new rainfall estimation technique based on the high spatial and temporal resolution of the Spinning Enhanced Visible and Infra Red Imager (SEVIRI) aboard the Meteosat Second Generation (MSG) is presented. This work proposes efficient scheme rainfall estimation based on two multiclass support vector machine (SVM) algorithms: SVM_D for daytime and SVM_N for night time rainfall estimations. Both SVM models are trained using relevant rainfall parameters based on optical, microphysical and textural cloud proprieties. The cloud parameters are derived from the Spectral channels of the SEVIRI MSG radiometer. The 3-hourly and daily accumulated rainfall are derived from the 15 min-rainfall estimation given by the SVM classifiers for each MSG observation image pixel. The SVMs were trained with ground meteorological radar precipitation scenes recorded from November 2006 to March 2007 over the north of Algeria located in the Mediterranean region. Further, the SVM_D and SVM_N models were used to estimate 3-hourly and daily rainfall using data set gathered from November 2010 to March 2011 over north Algeria. The results were validated against collocated rainfall observed by rain gauge network. Indeed, the statistical scores given by correlation coefficient, bias, root mean square error and mean absolute error, showed good accuracy of rainfall estimates by the present technique. Moreover, rainfall estimates of our technique were compared with two high accuracy rainfall estimates methods based on MSG SEVIRI imagery namely: random forests (RF) based approach and an artificial neural network (ANN) based technique. The findings of the present technique indicate higher correlation coefficient (3-hourly: 0.78; daily: 0.94), and lower mean absolute error and root mean square error values. The results show that the new technique assign 3-hourly and daily rainfall with good and better accuracy than ANN technique and (RF) model.
Training set extension for SVM ensemble in P300-speller with familiar face paradigm.
Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou
2018-03-27
P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.
NASA Astrophysics Data System (ADS)
Taha, Zahari; Muazu Musa, Rabiu; Majeed, Anwar P. P. Abdul; Razali Abdullah, Mohamad; Amirul Abdullah, Muhammad; Hasnun Arif Hassan, Mohd; Khalil, Zubair
2018-04-01
The present study employs a machine learning algorithm namely support vector machine (SVM) to classify high and low potential archers from a collection of bio-physiological variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. The bio-physiological variables namely resting heart rate, resting respiratory rate, resting diastolic blood pressure, resting systolic blood pressure, as well as calories intake, were measured prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models i.e. linear, quadratic and cubic kernel functions, were trained on the aforementioned variables. The k-means clustered the archers into high (HPA) and low potential archers (LPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy with a classification accuracy of 94% in comparison the other tested models. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected bio-physiological variables examined.
gkmSVM: an R package for gapped-kmer SVM
Ghandi, Mahmoud; Mohammad-Noori, Morteza; Ghareghani, Narges; Lee, Dongwon; Garraway, Levi; Beer, Michael A.
2016-01-01
Summary: We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel. Availability and Implementation: gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The C ++ implementation is available at www.beerlab.org/gkmsvm Contact: mghandi@gmail.com or mbeer@jhu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153639
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.
Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptidemore » identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample« less
Fault diagnosis method based on FFT-RPCA-SVM for Cascaded-Multilevel Inverter.
Wang, Tianzhen; Qi, Jie; Xu, Hao; Wang, Yide; Liu, Lei; Gao, Diju
2016-01-01
Thanks to reduced switch stress, high quality of load wave, easy packaging and good extensibility, the cascaded H-bridge multilevel inverter is widely used in wind power system. To guarantee stable operation of system, a new fault diagnosis method, based on Fast Fourier Transform (FFT), Relative Principle Component Analysis (RPCA) and Support Vector Machine (SVM), is proposed for H-bridge multilevel inverter. To avoid the influence of load variation on fault diagnosis, the output voltages of the inverter is chosen as the fault characteristic signals. To shorten the time of diagnosis and improve the diagnostic accuracy, the main features of the fault characteristic signals are extracted by FFT. To further reduce the training time of SVM, the feature vector is reduced based on RPCA that can get a lower dimensional feature space. The fault classifier is constructed via SVM. An experimental prototype of the inverter is built to test the proposed method. Compared to other fault diagnosis methods, the experimental results demonstrate the high accuracy and efficiency of the proposed method. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Dementev, A. O.; Dmitriev, E. V.; Kozoderov, V. V.; Egorov, V. D.
2017-10-01
Hyperspectral imaging is up-to-date promising technology widely applied for the accurate thematic mapping. The presence of a large number of narrow survey channels allows us to use subtle differences in spectral characteristics of objects and to make a more detailed classification than in the case of using standard multispectral data. The difficulties encountered in the processing of hyperspectral images are usually associated with the redundancy of spectral information which leads to the problem of the curse of dimensionality. Methods currently used for recognizing objects on multispectral and hyperspectral images are usually based on standard base supervised classification algorithms of various complexity. Accuracy of these algorithms can be significantly different depending on considered classification tasks. In this paper we study the performance of ensemble classification methods for the problem of classification of the forest vegetation. Error correcting output codes and boosting are tested on artificial data and real hyperspectral images. It is demonstrates, that boosting gives more significant improvement when used with simple base classifiers. The accuracy in this case in comparable the error correcting output code (ECOC) classifier with Gaussian kernel SVM base algorithm. However the necessity of boosting ECOC with Gaussian kernel SVM is questionable. It is demonstrated, that selected ensemble classifiers allow us to recognize forest species with high enough accuracy which can be compared with ground-based forest inventory data.
Qureshi, Muhammad Naveed Iqbal; Min, Beomjun; Jo, Hang Joon; Lee, Boreom
2016-01-01
The classification of neuroimaging data for the diagnosis of certain brain diseases is one of the main research goals of the neuroscience and clinical communities. In this study, we performed multiclass classification using a hierarchical extreme learning machine (H-ELM) classifier. We compared the performance of this classifier with that of a support vector machine (SVM) and basic extreme learning machine (ELM) for cortical MRI data from attention deficit/hyperactivity disorder (ADHD) patients. We used 159 structural MRI images of children from the publicly available ADHD-200 MRI dataset. The data consisted of three types, namely, typically developing (TDC), ADHD-inattentive (ADHD-I), and ADHD-combined (ADHD-C). We carried out feature selection by using standard SVM-based recursive feature elimination (RFE-SVM) that enabled us to achieve good classification accuracy (60.78%). In this study, we found the RFE-SVM feature selection approach in combination with H-ELM to effectively enable the acquisition of high multiclass classification accuracy rates for structural neuroimaging data. In addition, we found that the most important features for classification were the surface area of the superior frontal lobe, and the cortical thickness, volume, and mean surface area of the whole cortex. PMID:27500640
Qureshi, Muhammad Naveed Iqbal; Min, Beomjun; Jo, Hang Joon; Lee, Boreom
2016-01-01
The classification of neuroimaging data for the diagnosis of certain brain diseases is one of the main research goals of the neuroscience and clinical communities. In this study, we performed multiclass classification using a hierarchical extreme learning machine (H-ELM) classifier. We compared the performance of this classifier with that of a support vector machine (SVM) and basic extreme learning machine (ELM) for cortical MRI data from attention deficit/hyperactivity disorder (ADHD) patients. We used 159 structural MRI images of children from the publicly available ADHD-200 MRI dataset. The data consisted of three types, namely, typically developing (TDC), ADHD-inattentive (ADHD-I), and ADHD-combined (ADHD-C). We carried out feature selection by using standard SVM-based recursive feature elimination (RFE-SVM) that enabled us to achieve good classification accuracy (60.78%). In this study, we found the RFE-SVM feature selection approach in combination with H-ELM to effectively enable the acquisition of high multiclass classification accuracy rates for structural neuroimaging data. In addition, we found that the most important features for classification were the surface area of the superior frontal lobe, and the cortical thickness, volume, and mean surface area of the whole cortex.
Recognition of medication information from discharge summaries using ensembles of classifiers.
Doan, Son; Collier, Nigel; Xu, Hua; Pham, Hoang Duy; Tu, Minh Phuong
2012-05-07
Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Recently, supervised machine learning methods have proven to be effective in clinical NLP as well. However, combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Combining classifiers into an ensemble classifier presents both challenges and opportunities to improve performance in such NLP tasks. We investigated ensemble classifiers that used different voting strategies to combine outputs from three individual classifiers: a rule-based system, a support vector machine (SVM) based system, and a conditional random field (CRF) based system. Three voting methods were proposed and evaluated using the annotated data sets from the 2009 i2b2 NLP challenge: simple majority, local SVM-based voting, and local CRF-based voting. Evaluation on 268 manually annotated discharge summaries from the i2b2 challenge showed that the local CRF-based voting method achieved the best F-score of 90.84% (94.11% Precision, 87.81% Recall) for 10-fold cross-validation. We then compared our systems with the first-ranked system in the challenge by using the same training and test sets. Our system based on majority voting achieved a better F-score of 89.65% (93.91% Precision, 85.76% Recall) than the previously reported F-score of 89.19% (93.78% Precision, 85.03% Recall) by the first-ranked system in the challenge. Our experimental results using the 2009 i2b2 challenge datasets showed that ensemble classifiers that combine individual classifiers into a voting system could achieve better performance than a single classifier in recognizing medication information from clinical text. It suggests that simple strategies that can be easily implemented such as majority voting could have the potential to significantly improve clinical entity recognition.
Reconstruction of hyperspectral image using matting model for classification
NASA Astrophysics Data System (ADS)
Xie, Weiying; Li, Yunsong; Ge, Chiru
2016-05-01
Although hyperspectral images (HSIs) captured by satellites provide much information in spectral regions, some bands are redundant or have large amounts of noise, which are not suitable for image analysis. To address this problem, we introduce a method for reconstructing the HSI with noise reduction and contrast enhancement using a matting model for the first time. The matting model refers to each spectral band of an HSI that can be decomposed into three components, i.e., alpha channel, spectral foreground, and spectral background. First, one spectral band of an HSI with more refined information than most other bands is selected, and is referred to as an alpha channel of the HSI to estimate the hyperspectral foreground and hyperspectral background. Finally, a combination operation is applied to reconstruct the HSI. In addition, the support vector machine (SVM) classifier and three sparsity-based classifiers, i.e., orthogonal matching pursuit (OMP), simultaneous OMP, and OMP based on first-order neighborhood system weighted classifiers, are utilized on the reconstructed HSI and the original HSI to verify the effectiveness of the proposed method. Specifically, using the reconstructed HSI, the average accuracy of the SVM classifier can be improved by as much as 19%.
Möller, Christiane; Pijnenburg, Yolande A L; van der Flier, Wiesje M; Versteeg, Adriaan; Tijms, Betty; de Munck, Jan C; Hafkemeijer, Anne; Rombouts, Serge A R B; van der Grond, Jeroen; van Swieten, John; Dopper, Elise; Scheltens, Philip; Barkhof, Frederik; Vrenken, Hugo; Wink, Alle Meije
2016-06-01
Purpose To investigate the diagnostic accuracy of an image-based classifier to distinguish between Alzheimer disease (AD) and behavioral variant frontotemporal dementia (bvFTD) in individual patients by using gray matter (GM) density maps computed from standard T1-weighted structural images obtained with multiple imagers and with independent training and prediction data. Materials and Methods The local institutional review board approved the study. Eighty-four patients with AD, 51 patients with bvFTD, and 94 control subjects were divided into independent training (n = 115) and prediction (n = 114) sets with identical diagnosis and imager type distributions. Training of a support vector machine (SVM) classifier used diagnostic status and GM density maps and produced voxelwise discrimination maps. Discriminant function analysis was used to estimate suitability of the extracted weights for single-subject classification in the prediction set. Receiver operating characteristic (ROC) curves and area under the ROC curve (AUC) were calculated for image-based classifiers and neuropsychological z scores. Results Training accuracy of the SVM was 85% for patients with AD versus control subjects, 72% for patients with bvFTD versus control subjects, and 79% for patients with AD versus patients with bvFTD (P ≤ .029). Single-subject diagnosis in the prediction set when using the discrimination maps yielded accuracies of 88% for patients with AD versus control subjects, 85% for patients with bvFTD versus control subjects, and 82% for patients with AD versus patients with bvFTD, with a good to excellent AUC (range, 0.81-0.95; P ≤ .001). Machine learning-based categorization of AD versus bvFTD based on GM density maps outperforms classification based on neuropsychological test results. Conclusion The SVM can be used in single-subject discrimination and can help the clinician arrive at a diagnosis. The SVM can be used to distinguish disease-specific GM patterns in patients with AD and those with bvFTD as compared with normal aging by using common T1-weighted structural MR imaging. (©) RSNA, 2015.
Mao, Xue Gang; Du, Zi Han; Liu, Jia Qian; Chen, Shu Xin; Hou, Ji Yu
2018-01-01
Traditional field investigation and artificial interpretation could not satisfy the need of forest gaps extraction at regional scale. High spatial resolution remote sensing image provides the possibility for regional forest gaps extraction. In this study, we used object-oriented classification method to segment and classify forest gaps based on QuickBird high resolution optical remote sensing image in Jiangle National Forestry Farm of Fujian Province. In the process of object-oriented classification, 10 scales (10-100, with a step length of 10) were adopted to segment QuickBird remote sensing image; and the intersection area of reference object (RA or ) and intersection area of segmented object (RA os ) were adopted to evaluate the segmentation result at each scale. For segmentation result at each scale, 16 spectral characteristics and support vector machine classifier (SVM) were further used to classify forest gaps, non-forest gaps and others. The results showed that the optimal segmentation scale was 40 when RA or was equal to RA os . The accuracy difference between the maximum and minimum at different segmentation scales was 22%. At optimal scale, the overall classification accuracy was 88% (Kappa=0.82) based on SVM classifier. Combining high resolution remote sensing image data with object-oriented classification method could replace the traditional field investigation and artificial interpretation method to identify and classify forest gaps at regional scale.
Classification of burst and suppression in the neonatal electroencephalogram
NASA Astrophysics Data System (ADS)
Löfhede, J.; Löfgren, N.; Thordstein, M.; Flisberg, A.; Kjellmer, I.; Lindecrantz, K.
2008-12-01
Fisher's linear discriminant (FLD), a feed-forward artificial neural network (ANN) and a support vector machine (SVM) were compared with respect to their ability to distinguish bursts from suppressions in electroencephalograms (EEG) displaying a burst-suppression pattern. Five features extracted from the EEG were used as inputs. The study was based on EEG signals from six full-term infants who had suffered from perinatal asphyxia, and the methods have been trained with reference data classified by an experienced electroencephalographer. The results are summarized as the area under the curve (AUC), derived from receiver operating characteristic (ROC) curves for the three methods. Based on this, the SVM performs slightly better than the others. Testing the three methods with combinations of increasing numbers of the five features shows that the SVM handles the increasing amount of information better than the other methods.
Unresolved Galaxy Classifier for ESA/Gaia mission: Support Vector Machines approach
NASA Astrophysics Data System (ADS)
Bellas-Velidis, Ioannis; Kontizas, Mary; Dapergolas, Anastasios; Livanou, Evdokia; Kontizas, Evangelos; Karampelas, Antonios
A software package Unresolved Galaxy Classifier (UGC) is being developed for the ground-based pipeline of ESA's Gaia mission. It aims to provide an automated taxonomic classification and specific parameters estimation analyzing Gaia BP/RP instrument low-dispersion spectra of unresolved galaxies. The UGC algorithm is based on a supervised learning technique, the Support Vector Machines (SVM). The software is implemented in Java as two separate modules. An offline learning module provides functions for SVM-models training. Once trained, the set of models can be repeatedly applied to unknown galaxy spectra by the pipeline's application module. A library of galaxy models synthetic spectra, simulated for the BP/RP instrument, is used to train and test the modules. Science tests show a very good classification performance of UGC and relatively good regression performance, except for some of the parameters. Possible approaches to improve the performance are discussed.
NASA Astrophysics Data System (ADS)
Mohan, Dhanya; Kumar, C. Santhosh
2016-03-01
Predicting the physiological condition (normal/abnormal) of a patient is highly desirable to enhance the quality of health care. Multi-parameter patient monitors (MPMs) using heart rate, arterial blood pressure, respiration rate and oxygen saturation (S pO2) as input parameters were developed to monitor the condition of patients, with minimum human resource utilization. The Support vector machine (SVM), an advanced machine learning approach popularly used for classification and regression is used for the realization of MPMs. For making MPMs cost effective, we experiment on the hardware implementation of the MPM using support vector machine classifier. The training of the system is done using the matlab environment and the detection of the alarm/noalarm condition is implemented in hardware. We used different kernels for SVM classification and note that the best performance was obtained using intersection kernel SVM (IKSVM). The intersection kernel support vector machine classifier MPM has outperformed the best known MPM using radial basis function kernel by an absoute improvement of 2.74% in accuracy, 1.86% in sensitivity and 3.01% in specificity. The hardware model was developed based on the improved performance system using Verilog Hardware Description Language and was implemented on Altera cyclone-II development board.
NASA Astrophysics Data System (ADS)
Li, Hai; Kumavor, Patrick D.; Alqasemi, Umar; Zhu, Quing
2014-03-01
Human ovarian tissue features extracted from photoacoustic spectra data, beam envelopes and co-registered ultrasound and photoacoustic images are used to characterize cancerous vs. normal processes using a support vector machine (SVM) classifier. The centers of suspicious tumor areas are estimated from the Gaussian fitting of the mean Radon transforms of the photoacoustic image along 0 and 90 degrees. Normalized power spectra are calculated using the Fourier transform of the photoacoustic beamformed data across these suspicious areas, where the spectral slope and 0-MHz intercepts are extracted. Image statistics, envelope histogram fitting and maximum output of 6 composite filters of cancerous or normal patterns along with other previously used features are calculated to compose a total of 17 features. These features are extracted from 169 datasets of 19 ex vivo ovaries. Half of the cancerous and normal datasets are randomly chosen to train a SVM classifier with polynomial kernel and the remainder is used for testing. With 50 times data resampling, the SVM classifier, for the training group, gives 100% sensitivity and 100% specificity. For the testing group, it gives 89.68+/- 6.37% sensitivity and 93.16+/- 3.70% specificity. These results are superior to those obtained earlier by our group using features extracted from photoacoustic raw data or image statistics only.
NASA Astrophysics Data System (ADS)
Teodoro, Ana C.; Araujo, Ricardo
2016-01-01
The use of unmanned aerial vehicles (UAVs) for remote sensing applications is becoming more frequent. However, this type of information can result in several software problems related to the huge amount of data available. Object-based image analysis (OBIA) has proven to be superior to pixel-based analysis for very high-resolution images. The main objective of this work was to explore the potentialities of the OBIA methods available in two different open source software applications, Spring and OTB/Monteverdi, in order to generate an urban land cover map. An orthomosaic derived from UAVs was considered, 10 different regions of interest were selected, and two different approaches were followed. The first one (Spring) uses the region growing segmentation algorithm followed by the Bhattacharya classifier. The second approach (OTB/Monteverdi) uses the mean shift segmentation algorithm followed by the support vector machine (SVM) classifier. Two strategies were followed: four classes were considered using Spring and thereafter seven classes were considered for OTB/Monteverdi. The SVM classifier produces slightly better results and presents a shorter processing time. However, the poor spectral resolution of the data (only RGB bands) is an important factor that limits the performance of the classifiers applied.
Watson, Robert A
2014-08-01
To test the hypothesis that machine learning algorithms increase the predictive power to classify surgical expertise using surgeons' hand motion patterns. In 2012 at the University of North Carolina at Chapel Hill, 14 surgical attendings and 10 first- and second-year surgical residents each performed two bench model venous anastomoses. During the simulated tasks, the participants wore an inertial measurement unit on the dorsum of their dominant (right) hand to capture their hand motion patterns. The pattern from each bench model task performed was preprocessed into a symbolic time series and labeled as expert (attending) or novice (resident). The labeled hand motion patterns were processed and used to train a Support Vector Machine (SVM) classification algorithm. The trained algorithm was then tested for discriminative/predictive power against unlabeled (blinded) hand motion patterns from tasks not used in the training. The Lempel-Ziv (LZ) complexity metric was also measured from each hand motion pattern, with an optimal threshold calculated to separately classify the patterns. The LZ metric classified unlabeled (blinded) hand motion patterns into expert and novice groups with an accuracy of 70% (sensitivity 64%, specificity 80%). The SVM algorithm had an accuracy of 83% (sensitivity 86%, specificity 80%). The results confirmed the hypothesis. The SVM algorithm increased the predictive power to classify blinded surgical hand motion patterns into expert versus novice groups. With further development, the system used in this study could become a viable tool for low-cost, objective assessment of procedural proficiency in a competency-based curriculum.
Activity Recognition in Egocentric video using SVM, kNN and Combined SVMkNN Classifiers
NASA Astrophysics Data System (ADS)
Sanal Kumar, K. P.; Bhavani, R., Dr.
2017-08-01
Egocentric vision is a unique perspective in computer vision which is human centric. The recognition of egocentric actions is a challenging task which helps in assisting elderly people, disabled patients and so on. In this work, life logging activity videos are taken as input. There are 2 categories, first one is the top level and second one is second level. Here, the recognition is done using the features like Histogram of Oriented Gradients (HOG), Motion Boundary Histogram (MBH) and Trajectory. The features are fused together and it acts as a single feature. The extracted features are reduced using Principal Component Analysis (PCA). The features that are reduced are provided as input to the classifiers like Support Vector Machine (SVM), k nearest neighbor (kNN) and combined Support Vector Machine (SVM) and k Nearest Neighbor (kNN) (combined SVMkNN). These classifiers are evaluated and the combined SVMkNN provided better results than other classifiers in the literature.
gkmSVM: an R package for gapped-kmer SVM.
Ghandi, Mahmoud; Mohammad-Noori, Morteza; Ghareghani, Narges; Lee, Dongwon; Garraway, Levi; Beer, Michael A
2016-07-15
We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel. gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The C ++ implementation is available at www.beerlab.org/gkmsvm mghandi@gmail.com or mbeer@jhu.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals
2011-09-30
Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR Systems Center Pacific...APPROACH Odontocete click detection and classification. A multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et...beaked whales, Risso’s dolphins, short-finned pilot whales, and sperm whales. Here Moretti’s group, especially S. Jarvis , will improve the SVM classifier
NASA Astrophysics Data System (ADS)
Alqasemi, Umar; Kumavor, Patrick; Aguirre, Andres; Zhu, Quing
2012-12-01
Unique features and the underlining hypotheses of how these features may relate to the tumor physiology in coregistered ultrasound and photoacoustic images of ex vivo ovarian tissue are introduced. The images were first compressed with wavelet transform. The mean Radon transform of photoacoustic images was then computed and fitted with a Gaussian function to find the centroid of a suspicious area for shift-invariant recognition process. Twenty-four features were extracted from a training set by several methods, including Fourier transform, image statistics, and different composite filters. The features were chosen from more than 400 training images obtained from 33 ex vivo ovaries of 24 patients, and used to train three classifiers, including generalized linear model, neural network, and support vector machine (SVM). The SVM achieved the best training performance and was able to exclusively separate cancerous from non-cancerous cases with 100% sensitivity and specificity. At the end, the classifiers were used to test 95 new images obtained from 37 ovaries of 20 additional patients. The SVM classifier achieved 76.92% sensitivity and 95.12% specificity. Furthermore, if we assume that recognizing one image as a cancer is sufficient to consider an ovary as malignant, the SVM classifier achieves 100% sensitivity and 87.88% specificity.
NASA Astrophysics Data System (ADS)
Rama Krishna, K.; Ramachandran, K. I.
2018-02-01
Crack propagation is a major cause of failure in rotating machines. It adversely affects the productivity, safety, and the machining quality. Hence, detecting the crack’s severity accurately is imperative for the predictive maintenance of such machines. Fault diagnosis is an established concept in identifying the faults, for observing the non-linear behaviour of the vibration signals at various operating conditions. In this work, we find the classification efficiencies for both original and the reconstructed vibrational signals. The reconstructed signals are obtained using Variational Mode Decomposition (VMD), by splitting the original signal into three intrinsic mode functional components and framing them accordingly. Feature extraction, feature selection and feature classification are the three phases in obtaining the classification efficiencies. All the statistical features from the original signals and reconstructed signals are found out in feature extraction process individually. A few statistical parameters are selected in feature selection process and are classified using the SVM classifier. The obtained results show the best parameters and appropriate kernel in SVM classifier for detecting the faults in bearings. Hence, we conclude that better results were obtained by VMD and SVM process over normal process using SVM. This is owing to denoising and filtering the raw vibrational signals.
Fuzzy support vector machine for microarray imbalanced data classification
NASA Astrophysics Data System (ADS)
Ladayya, Faroh; Purnami, Santi Wulan; Irhamah
2017-11-01
DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different contributions to the classifier. The minority classes have large fuzzy membership so FSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is a high dimensional data with a very large number of features, it is necessary to do feature selection first using Fast Correlation based Filter (FCBF). In this study will be analyzed by SVM, FSVM and both methods by applying FCBF and get the classification performance of them. Based on the overall results, FSVM on selected features has the best classification performance compared to SVM.
Terminator Detection by Support Vector Machine Utilizing aStochastic Context-Free Grammar
DOE Office of Scientific and Technical Information (OSTI.GOV)
Francis-Lyon, Patricia; Cristianini, Nello; Holbrook, Stephen
2006-12-30
A 2-stage detector was designed to find rho-independent transcription terminators in the Escherichia coli genome. The detector includes a Stochastic Context Free Grammar (SCFG) component and a Support Vector Machine (SVM) component. To find terminators, the SCFG searches the intergenic regions of nucleotide sequence for local matches to a terminator grammar that was designed and trained utilizing examples of known terminators. The grammar selects sequences that are the best candidates for terminators and assigns them a prefix, stem-loop, suffix structure using the Cocke-Younger-Kasaami (CYK) algorithm, modified to incorporate energy affects of base pairing. The parameters from this inferred structure aremore » passed to the SVM classifier, which distinguishes terminators from non-terminators that score high according to the terminator grammar. The SVM was trained with negative examples drawn from intergenic sequences that include both featureless and RNA gene regions (which were assigned prefix, stem-loop, suffix structure by the SCFG), so that it successfully distinguishes terminators from either of these. The classifier was found to be 96.4% successful during testing.« less
Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry
2018-01-01
The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout (Oncorhynchus mykiss) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k-Nearest neighbours (k-NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k-NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet’s effects on fish skin. PMID:29596375
Saberioon, Mohammadmehdi; Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry
2018-03-29
The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout ( Oncorhynchus mykiss ) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k -Nearest neighbours ( k -NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k -NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet's effects on fish skin.
A Hybrid Vehicle Detection Method Based on Viola-Jones and HOG + SVM from UAV Images
Xu, Yongzheng; Yu, Guizhen; Wang, Yunpeng; Wu, Xinkai; Ma, Yalong
2016-01-01
A new hybrid vehicle detection scheme which integrates the Viola-Jones (V-J) and linear SVM classifier with HOG feature (HOG + SVM) methods is proposed for vehicle detection from low-altitude unmanned aerial vehicle (UAV) images. As both V-J and HOG + SVM are sensitive to on-road vehicles’ in-plane rotation, the proposed scheme first adopts a roadway orientation adjustment method, which rotates each UAV image to align the roads with the horizontal direction so the original V-J or HOG + SVM method can be directly applied to achieve fast detection and high accuracy. To address the issue of descending detection speed for V-J and HOG + SVM, the proposed scheme further develops an adaptive switching strategy which sophistically integrates V-J and HOG + SVM methods based on their different descending trends of detection speed to improve detection efficiency. A comprehensive evaluation shows that the switching strategy, combined with the road orientation adjustment method, can significantly improve the efficiency and effectiveness of the vehicle detection from UAV images. The results also show that the proposed vehicle detection method is competitive compared with other existing vehicle detection methods. Furthermore, since the proposed vehicle detection method can be performed on videos captured from moving UAV platforms without the need of image registration or additional road database, it has great potentials of field applications. Future research will be focusing on expanding the current method for detecting other transportation modes such as buses, trucks, motors, bicycles, and pedestrians. PMID:27548179
A Hybrid Vehicle Detection Method Based on Viola-Jones and HOG + SVM from UAV Images.
Xu, Yongzheng; Yu, Guizhen; Wang, Yunpeng; Wu, Xinkai; Ma, Yalong
2016-08-19
A new hybrid vehicle detection scheme which integrates the Viola-Jones (V-J) and linear SVM classifier with HOG feature (HOG + SVM) methods is proposed for vehicle detection from low-altitude unmanned aerial vehicle (UAV) images. As both V-J and HOG + SVM are sensitive to on-road vehicles' in-plane rotation, the proposed scheme first adopts a roadway orientation adjustment method, which rotates each UAV image to align the roads with the horizontal direction so the original V-J or HOG + SVM method can be directly applied to achieve fast detection and high accuracy. To address the issue of descending detection speed for V-J and HOG + SVM, the proposed scheme further develops an adaptive switching strategy which sophistically integrates V-J and HOG + SVM methods based on their different descending trends of detection speed to improve detection efficiency. A comprehensive evaluation shows that the switching strategy, combined with the road orientation adjustment method, can significantly improve the efficiency and effectiveness of the vehicle detection from UAV images. The results also show that the proposed vehicle detection method is competitive compared with other existing vehicle detection methods. Furthermore, since the proposed vehicle detection method can be performed on videos captured from moving UAV platforms without the need of image registration or additional road database, it has great potentials of field applications. Future research will be focusing on expanding the current method for detecting other transportation modes such as buses, trucks, motors, bicycles, and pedestrians.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klement, Rainer J., E-mail: rainer_klement@gmx.de; Department of Radiotherapy and Radiation Oncology, Leopoldina Hospital, Schweinfurt; Allgäuer, Michael
2014-03-01
Background: Several prognostic factors for local tumor control probability (TCP) after stereotactic body radiation therapy (SBRT) for early stage non-small cell lung cancer (NSCLC) have been described, but no attempts have been undertaken to explore whether a nonlinear combination of potential factors might synergistically improve the prediction of local control. Methods and Materials: We investigated a support vector machine (SVM) for predicting TCP in a cohort of 399 patients treated at 13 German and Austrian institutions. Among 7 potential input features for the SVM we selected those most important on the basis of forward feature selection, thereby evaluating classifier performancemore » by using 10-fold cross-validation and computing the area under the ROC curve (AUC). The final SVM classifier was built by repeating the feature selection 10 times with different splitting of the data for cross-validation and finally choosing only those features that were selected at least 5 out of 10 times. It was compared with a multivariate logistic model that was built by forward feature selection. Results: Local failure occurred in 12% of patients. Biologically effective dose (BED) at the isocenter (BED{sub ISO}) was the strongest predictor of TCP in the logistic model and also the most frequently selected input feature for the SVM. A bivariate logistic function of BED{sub ISO} and the pulmonary function indicator forced expiratory volume in 1 second (FEV1) yielded the best description of the data but resulted in a significantly smaller AUC than the final SVM classifier with the input features BED{sub ISO}, age, baseline Karnofsky index, and FEV1 (0.696 ± 0.040 vs 0.789 ± 0.001, P<.03). The final SVM resulted in sensitivity and specificity of 67.0% ± 0.5% and 78.7% ± 0.3%, respectively. Conclusions: These results confirm that machine learning techniques like SVMs can be successfully applied to predict treatment outcome after SBRT. Improvements over traditional TCP modeling are expected through a nonlinear combination of multiple features, eventually helping in the task of personalized treatment planning.« less
Klement, Rainer J; Allgäuer, Michael; Appold, Steffen; Dieckmann, Karin; Ernst, Iris; Ganswindt, Ute; Holy, Richard; Nestle, Ursula; Nevinny-Stickel, Meinhard; Semrau, Sabine; Sterzing, Florian; Wittig, Andrea; Andratschke, Nicolaus; Guckenberger, Matthias
2014-03-01
Several prognostic factors for local tumor control probability (TCP) after stereotactic body radiation therapy (SBRT) for early stage non-small cell lung cancer (NSCLC) have been described, but no attempts have been undertaken to explore whether a nonlinear combination of potential factors might synergistically improve the prediction of local control. We investigated a support vector machine (SVM) for predicting TCP in a cohort of 399 patients treated at 13 German and Austrian institutions. Among 7 potential input features for the SVM we selected those most important on the basis of forward feature selection, thereby evaluating classifier performance by using 10-fold cross-validation and computing the area under the ROC curve (AUC). The final SVM classifier was built by repeating the feature selection 10 times with different splitting of the data for cross-validation and finally choosing only those features that were selected at least 5 out of 10 times. It was compared with a multivariate logistic model that was built by forward feature selection. Local failure occurred in 12% of patients. Biologically effective dose (BED) at the isocenter (BED(ISO)) was the strongest predictor of TCP in the logistic model and also the most frequently selected input feature for the SVM. A bivariate logistic function of BED(ISO) and the pulmonary function indicator forced expiratory volume in 1 second (FEV1) yielded the best description of the data but resulted in a significantly smaller AUC than the final SVM classifier with the input features BED(ISO), age, baseline Karnofsky index, and FEV1 (0.696 ± 0.040 vs 0.789 ± 0.001, P<.03). The final SVM resulted in sensitivity and specificity of 67.0% ± 0.5% and 78.7% ± 0.3%, respectively. These results confirm that machine learning techniques like SVMs can be successfully applied to predict treatment outcome after SBRT. Improvements over traditional TCP modeling are expected through a nonlinear combination of multiple features, eventually helping in the task of personalized treatment planning. Copyright © 2014 Elsevier Inc. All rights reserved.
Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil
2014-09-07
Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods. Copyright © 2014 Elsevier Ltd. All rights reserved.
Multivariate pattern analysis of obsessive-compulsive disorder using structural neuroanatomy.
Hu, Xinyu; Liu, Qi; Li, Bin; Tang, Wanjie; Sun, Huaiqiang; Li, Fei; Yang, Yanchun; Gong, Qiyong; Huang, Xiaoqi
2016-02-01
Magnetic resonance imaging (MRI) studies have revealed brain structural abnormalities in obsessive-compulsive disorder (OCD) patients, involving both gray matter (GM) and white matter (WM). However, the results of previous publications were based on average differences between groups, which limited their usages in clinical practice. Therefore, the aim of this study was to examine whether the application of multivariate pattern analysis (MVPA) to high-dimensional structural images would allow accurate discrimination between OCD patients and healthy control subjects (HCS). High-resolution T1-weighted images were acquired from 33 OCD patients and 33 demographically matched HCS in a 3.0 T scanner. Differences in GM and WM volume between OCD and HCS were examined using two types of well-established MVPA techniques: support vector machine (SVM) and Gaussian process classifier (GPC). We also drew a receiver operating characteristic (ROC) curve to evaluate the performance of each classifier. The classification accuracies for both classifiers using GM and WM anatomy were all above 75%. The highest classification accuracy (81.82%, P<0.001) was achieved with the SVM classifier using WM information. Regional brain anomalies with high discriminative power were based on three distributed networks including the fronto-striatal circuit, the temporo-parieto-occipital junction and the cerebellum. Our study illustrated that both GM and WM anatomical features may be useful in differentiating OCD patients from HCS. WM volume using the SVM approach showed the highest accuracy in our population for revealing group differences, which suggested its potential diagnostic role in detecting highly enriched OCD patients at the level of the individual. Copyright © 2015 Elsevier B.V. and ECNP. All rights reserved.
On the use of feature selection to improve the detection of sea oil spills in SAR images
NASA Astrophysics Data System (ADS)
Mera, David; Bolon-Canedo, Veronica; Cotos, J. M.; Alonso-Betanzos, Amparo
2017-03-01
Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each detected dark spot is independently examined. Traditionally, detection systems use an arbitrary set of features to discriminate between oil spills and look-alikes phenomena. However, Feature Selection (FS) methods based on Machine Learning (ML) have proved to be very useful in real domains for enhancing the generalization capabilities of the classifiers, while discarding the existing irrelevant features. In this work, we present a generic and systematic approach, based on FS methods, for choosing a concise and relevant set of features to improve the oil spill detection systems. We have compared five FS methods: Correlation-based feature selection (CFS), Consistency-based filter, Information Gain, ReliefF and Recursive Feature Elimination for Support Vector Machine (SVM-RFE). They were applied on a 141-input vector composed of features from a collection of outstanding studies. Selected features were validated via a Support Vector Machine (SVM) classifier and the results were compared with previous works. Test experiments revealed that the classifier trained with the 6-input feature vector proposed by SVM-RFE achieved the best accuracy and Cohen's kappa coefficient (87.1% and 74.06% respectively). This is a smaller feature combination with similar or even better classification accuracy than previous works. The presented finding allows to speed up the feature extraction phase without reducing the classifier accuracy. Experiments also confirmed the significance of the geometrical features since 75.0% of the different features selected by the applied FS methods as well as 66.67% of the proposed 6-input feature vector belong to this category.
Yeh, Shih-Ching; Huang, Ming-Chun; Wang, Pa-Chun; Fang, Te-Yung; Su, Mu-Chun; Tsai, Po-Yi; Rizzo, Albert
2014-10-01
Dizziness is a major consequence of imbalance and vestibular dysfunction. Compared to surgery and drug treatments, balance training is non-invasive and more desired. However, training exercises are usually tedious and the assessment tool is insufficient to diagnose patient's severity rapidly. An interactive virtual reality (VR) game-based rehabilitation program that adopted Cawthorne-Cooksey exercises, and a sensor-based measuring system were introduced. To verify the therapeutic effect, a clinical experiment with 48 patients and 36 normal subjects was conducted. Quantified balance indices were measured and analyzed by statistical tools and a Support Vector Machine (SVM) classifier. In terms of balance indices, patients who completed the training process are progressed and the difference between normal subjects and patients is obvious. Further analysis by SVM classifier show that the accuracy of recognizing the differences between patients and normal subject is feasible, and these results can be used to evaluate patients' severity and make rapid assessment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Manurung, Jonson; Mawengkang, Herman; Zamzami, Elviawaty
2017-12-01
Support vector machine (SVM) is a popular classification method known to have strong generalization capabilities. SVM can solve the problem of classification and linear regression or nonlinear kernel which can be a learning algorithm for the ability of classification and regression. However, SVM also has a weakness that is difficult to determine the optimal parameter value. SVM calculates the best linear separator on the input feature space according to the training data. To classify data which are non-linearly separable, SVM uses kernel tricks to transform the data into a linearly separable data on a higher dimension feature space. The kernel trick using various kinds of kernel functions, such as : linear kernel, polynomial, radial base function (RBF) and sigmoid. Each function has parameters which affect the accuracy of SVM classification. To solve the problem genetic algorithms are proposed to be applied as the optimal parameter value search algorithm thus increasing the best classification accuracy on SVM. Data taken from UCI repository of machine learning database: Australian Credit Approval. The results show that the combination of SVM and genetic algorithms is effective in improving classification accuracy. Genetic algorithms has been shown to be effective in systematically finding optimal kernel parameters for SVM, instead of randomly selected kernel parameters. The best accuracy for data has been upgraded from kernel Linear: 85.12%, polynomial: 81.76%, RBF: 77.22% Sigmoid: 78.70%. However, for bigger data sizes, this method is not practical because it takes a lot of time.
Pizarro, Ricardo A; Cheng, Xi; Barnett, Alan; Lemaitre, Herve; Verchinski, Beth A; Goldman, Aaron L; Xiao, Ena; Luo, Qian; Berman, Karen F; Callicott, Joseph H; Weinberger, Daniel R; Mattay, Venkata S
2016-01-01
High-resolution three-dimensional magnetic resonance imaging (3D-MRI) is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM) algorithm in the quality assessment of structural brain images, using global and region of interest (ROI) automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy) of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.
Becker, Natalia; Toedt, Grischa; Lichter, Peter; Benner, Axel
2011-05-09
Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data
2011-01-01
Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters. The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'. We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets. PMID:21554689
Data mining for the analysis of hippocampal zones in Alzheimer's disease
NASA Astrophysics Data System (ADS)
Ovando Vázquez, Cesaré M.
2012-02-01
In this work, a methodology to classify people with Alzheimer's Disease (AD), Healthy Controls (HC) and people with Mild Cognitive Impairment (MCI) is presented. This methodology consists of an ensemble of Support Vector Machines (SVM) with the hippocampal boxes (HB) as input data, these hippocampal zones are taken from Magnetic Resonance (MRI) and Positron Emission Tomography (PET) images. Two ways of constructing this ensemble are presented, the first consists of linear SVM models and the second of non-linear SVM models. Results demonstrate that the linear models classify HBs more accurately than the non-linear models between HC and MCI and that there are no differences between HC and AD.
Carbon Nanotube Growth Rate Regression using Support Vector Machines and Artificial Neural Networks
2014-03-27
intensity D peak. Reprinted with permission from [38]. The SVM classifier is trained using custom written Java code leveraging the Sequential Minimal...Society Encog is a machine learning framework for Java , C++ and .Net applications that supports Bayesian Networks, Hidden Markov Models, SVMs and ANNs [13...SVM classifiers are trained using Weka libraries and leveraging custom written Java code. The data set is created as an Attribute Relationship File
NASA Astrophysics Data System (ADS)
Jing, Ya-Bing; Liu, Chang-Wen; Bi, Feng-Rong; Bi, Xiao-Yang; Wang, Xia; Shao, Kang
2017-07-01
Numerous vibration-based techniques are rarely used in diesel engines fault diagnosis in a direct way, due to the surface vibration signals of diesel engines with the complex non-stationary and nonlinear time-varying features. To investigate the fault diagnosis of diesel engines, fractal correlation dimension, wavelet energy and entropy as features reflecting the diesel engine fault fractal and energy characteristics are extracted from the decomposed signals through analyzing vibration acceleration signals derived from the cylinder head in seven different states of valve train. An intelligent fault detector FastICA-SVM is applied for diesel engine fault diagnosis and classification. The results demonstrate that FastICA-SVM achieves higher classification accuracy and makes better generalization performance in small samples recognition. Besides, the fractal correlation dimension and wavelet energy and entropy as the special features of diesel engine vibration signal are considered as input vectors of classifier FastICA-SVM and could produce the excellent classification results. The proposed methodology improves the accuracy of feature extraction and the fault diagnosis of diesel engines.
A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease
NASA Astrophysics Data System (ADS)
Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas
2017-08-01
The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.
NASA Astrophysics Data System (ADS)
Pinar, Anthony; Masarik, Matthew; Havens, Timothy C.; Burns, Joseph; Thelen, Brian; Becker, John
2015-05-01
This paper explores the effectiveness of an anomaly detection algorithm for downward-looking ground penetrating radar (GPR) and electromagnetic inductance (EMI) data. Threat detection with GPR is challenged by high responses to non-target/clutter objects, leading to a large number of false alarms (FAs), and since the responses of target and clutter signatures are so similar, classifier design is not trivial. We suggest a method based on a Run Packing (RP) algorithm to fuse GPR and EMI data into a composite confidence map to improve detection as measured by the area-under-ROC (NAUC) metric. We examine the value of a multiple kernel learning (MKL) support vector machine (SVM) classifier using image features such as histogram of oriented gradients (HOG), local binary patterns (LBP), and local statistics. Experimental results on government furnished data show that use of our proposed fusion and classification methods improves the NAUC when compared with the results from individual sensors and a single kernel SVM classifier.
Drug-related webpages classification based on multi-modal local decision fusion
NASA Astrophysics Data System (ADS)
Hu, Ruiguang; Su, Xiaojing; Liu, Yanxin
2018-03-01
In this paper, multi-modal local decision fusion is used for drug-related webpages classification. First, meaningful text are extracted through HTML parsing, and effective images are chosen by the FOCARSS algorithm. Second, six SVM classifiers are trained for six kinds of drug-taking instruments, which are represented by PHOG. One SVM classifier is trained for the cannabis, which is represented by the mid-feature of BOW model. For each instance in a webpage, seven SVMs give seven labels for its image, and other seven labels are given by searching the names of drug-taking instruments and cannabis in its related text. Concatenating seven labels of image and seven labels of text, the representation of those instances in webpages are generated. Last, Multi-Instance Learning is used to classify those drugrelated webpages. Experimental results demonstrate that the classification accuracy of multi-instance learning with multi-modal local decision fusion is much higher than those of single-modal classification.
An ensemble of dissimilarity based classifiers for Mackerel gender determination
NASA Astrophysics Data System (ADS)
Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.
2014-03-01
Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.
Wire connector classification with machine vision and a novel hybrid SVM
NASA Astrophysics Data System (ADS)
Chauhan, Vedang; Joshi, Keyur D.; Surgenor, Brian W.
2018-04-01
A machine vision-based system has been developed and tested that uses a novel hybrid Support Vector Machine (SVM) in a part inspection application with clear plastic wire connectors. The application required the system to differentiate between 4 different known styles of connectors plus one unknown style, for a total of 5 classes. The requirement to handle an unknown class is what necessitated the hybrid approach. The system was trained with the 4 known classes and tested with 5 classes (the 4 known plus the 1 unknown). The hybrid classification approach used two layers of SVMs: one layer was semi-supervised and the other layer was supervised. The semi-supervised SVM was a special case of unsupervised machine learning that classified test images as one of the 4 known classes (to accept) or as the unknown class (to reject). The supervised SVM classified test images as one of the 4 known classes and consequently would give false positives (FPs). Two methods were tested. The difference between the methods was that the order of the layers was switched. The method with the semi-supervised layer first gave an accuracy of 80% with 20% FPs. The method with the supervised layer first gave an accuracy of 98% with 0% FPs. Further work is being conducted to see if the hybrid approach works with other applications that have an unknown class requirement.
Efficient HIK SVM learning for image classification.
Wu, Jianxin
2012-10-01
Histograms are used in almost every aspect of image processing and computer vision, from visual descriptors to image representations. Histogram intersection kernel (HIK) and support vector machine (SVM) classifiers are shown to be very effective in dealing with histograms. This paper presents contributions concerning HIK SVM for image classification. First, we propose intersection coordinate descent (ICD), a deterministic and scalable HIK SVM solver. ICD is much faster than, and has similar accuracies to, general purpose SVM solvers and other fast HIK SVM training methods. We also extend ICD to the efficient training of a broader family of kernels. Second, we show an important empirical observation that ICD is not sensitive to the C parameter in SVM, and we provide some theoretical analyses to explain this observation. ICD achieves high accuracies in many problems, using its default parameters. This is an attractive property for practitioners, because many image processing tasks are too large to choose SVM parameters using cross-validation.
Zhang, Jian; Lockhart, Thurmon E.; Soangra, Rahul
2013-01-01
Fatigue in lower extremity musculature is associated with decline in postural stability, motor performance and alters normal walking patterns in human subjects. Automated recognition of lower extremity muscle fatigue condition may be advantageous in early detection of fall and injury risks. Supervised machine learning methods such as Support Vector Machines (SVM) have been previously used for classifying healthy and pathological gait patterns and also for separating old and young gait patterns. In this study we explore the classification potential of SVM in recognition of gait patterns utilizing an inertial measurement unit associated with lower extremity muscular fatigue. Both kinematic and kinetic gait patterns of 17 participants (29±11 years) were recorded and analyzed in normal and fatigued state of walking. Lower extremities were fatigued by performance of a squatting exercise until the participants reached 60% of their baseline maximal voluntary exertion level. Feature selection methods were used to classify fatigue and no-fatigue conditions based on temporal and frequency information of the signals. Additionally, influences of three different kernel schemes (i.e., linear, polynomial, and radial basis function) were investigated for SVM classification. The results indicated that lower extremity muscle fatigue condition influenced gait and loading responses. In terms of the SVM classification results, an accuracy of 96% was reached in distinguishing the two gait patterns (fatigue and no-fatigue) within the same subject using the kinematic, time and frequency domain features. It is also found that linear kernel and RBF kernel were equally good to identify intra-individual fatigue characteristics. These results suggest that intra-subject fatigue classification using gait patterns from an inertial sensor holds considerable potential in identifying “at-risk” gait due to muscle fatigue. PMID:24081829
NASA Astrophysics Data System (ADS)
Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying
2018-06-01
In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.
Document page structure learning for fixed-layout e-books using conditional random fields
NASA Astrophysics Data System (ADS)
Tao, Xin; Tang, Zhi; Xu, Canhui
2013-12-01
In this paper, a model is proposed to learn logical structure of fixed-layout document pages by combining support vector machine (SVM) and conditional random fields (CRF). Features related to each logical label and their dependencies are extracted from various original Portable Document Format (PDF) attributes. Both local evidence and contextual dependencies are integrated in the proposed model so as to achieve better logical labeling performance. With the merits of SVM as local discriminative classifier and CRF modeling contextual correlations of adjacent fragments, it is capable of resolving the ambiguities of semantic labels. The experimental results show that CRF based models with both tree and chain graph structures outperform the SVM model with an increase of macro-averaged F1 by about 10%.
Cho, Yongwon; Lee, Areum; Park, Jongha; Ko, Bemseok; Kim, Namkug
2018-07-01
Contactless operating room (OR) interfaces are important for computer-aided surgery, and have been developed to decrease the risk of contamination during surgical procedures. In this study, we used Leap Motion™, with a personalized automated classifier, to enhance the accuracy of gesture recognition for contactless interfaces. This software was trained and tested on a personal basis that means the training of gesture per a user. We used 30 features including finger and hand data, which were computed, selected, and fed into a multiclass support vector machine (SVM), and Naïve Bayes classifiers and to predict and train five types of gestures including hover, grab, click, one peak, and two peaks. Overall accuracy of the five gestures was 99.58% ± 0.06, and 98.74% ± 3.64 on a personal basis using SVM and Naïve Bayes classifiers, respectively. We compared gesture accuracy across the entire dataset and used SVM and Naïve Bayes classifiers to examine the strength of personal basis training. We developed and enhanced non-contact interfaces with gesture recognition to enhance OR control systems. Copyright © 2018 Elsevier B.V. All rights reserved.
Combination of minimum enclosing balls classifier with SVM in coal-rock recognition.
Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan
2017-01-01
Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition.
Combination of minimum enclosing balls classifier with SVM in coal-rock recognition
Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan
2017-01-01
Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition. PMID:28937987
Recognizing ovarian cancer from co-registered ultrasound and photoacoustic images
NASA Astrophysics Data System (ADS)
Alqasemi, Umar; Kumavor, Patrick; Aguirre, Andres; Zhu, Quing
2013-03-01
Unique features in co-registered ultrasound and photoacoustic images of ex vivo ovarian tissue are introduced, along with the hypotheses of how these features may relate to the physiology of tumors. The images are compressed with wavelet transform, after which the mean Radon transform of the photoacoustic image is computed and fitted with a Gaussian function to find the centroid of the suspicious area for shift-invariant recognition process. In the next step, 24 features are extracted from a training set of images by several methods; including features from the Fourier domain, image statistics, and the outputs of different composite filters constructed from the joint frequency response of different cancerous images. The features were chosen from more than 400 training images obtained from 33 ex vivo ovaries of 24 patients, and used to train a support vector machine (SVM) structure. The SVM classifier was able to exclusively separate the cancerous from the non-cancerous cases with 100% sensitivity and specificity. At the end, the classifier was used to test 95 new images, obtained from 37 ovaries of 20 additional patients. The SVM classifier achieved 76.92% sensitivity and 95.12% specificity. Furthermore, if we assume that recognizing one image as a cancerous case is sufficient to consider the ovary as malignant, then the SVM classifier achieves 100% sensitivity and 87.88% specificity.
Mourao-Miranda, J; Reinders, A A T S; Rocha-Rego, V; Lappin, J; Rondina, J; Morgan, C; Morgan, K D; Fearon, P; Jones, P B; Doody, G A; Murray, R M; Kapur, S; Dazzan, P
2012-05-01
To date, magnetic resonance imaging (MRI) has made little impact on the diagnosis and monitoring of psychoses in individual patients. In this study, we used a support vector machine (SVM) whole-brain classification approach to predict future illness course at the individual level from MRI data obtained at the first psychotic episode. One hundred patients at their first psychotic episode and 91 healthy controls had an MRI scan. Patients were re-evaluated 6.2 years (s.d.=2.3) later, and were classified as having a continuous, episodic or intermediate illness course. Twenty-eight subjects with a continuous course were compared with 28 patients with an episodic course and with 28 healthy controls. We trained each SVM classifier independently for the following contrasts: continuous versus episodic, continuous versus healthy controls, and episodic versus healthy controls. At baseline, patients with a continuous course were already distinguishable, with significance above chance level, from both patients with an episodic course (p=0.004, sensitivity=71, specificity=68) and healthy individuals (p=0.01, sensitivity=71, specificity=61). Patients with an episodic course could not be distinguished from healthy individuals. When patients with an intermediate outcome were classified according to the discriminating pattern episodic versus continuous, 74% of those who did not develop other episodes were classified as episodic, and 65% of those who did develop further episodes were classified as continuous (p=0.035). We provide preliminary evidence of MRI application in the individualized prediction of future illness course, using a simple and automated SVM pipeline. When replicated and validated in larger groups, this could enable targeted clinical decisions based on imaging data.
Mourao-Miranda, J.; Reinders, A. A. T. S.; Rocha-Rego, V.; Lappin, J.; Rondina, J.; Morgan, C.; Morgan, K. D.; Fearon, P.; Jones, P. B.; Doody, G. A.; Murray, R. M.; Kapur, S.; Dazzan, P.
2012-01-01
Background To date, magnetic resonance imaging (MRI) has made little impact on the diagnosis and monitoring of psychoses in individual patients. In this study, we used a support vector machine (SVM) whole-brain classification approach to predict future illness course at the individual level from MRI data obtained at the first psychotic episode. Method One hundred patients at their first psychotic episode and 91 healthy controls had an MRI scan. Patients were re-evaluated 6.2 years (s.d.=2.3) later, and were classified as having a continuous, episodic or intermediate illness course. Twenty-eight subjects with a continuous course were compared with 28 patients with an episodic course and with 28 healthy controls. We trained each SVM classifier independently for the following contrasts: continuous versus episodic, continuous versus healthy controls, and episodic versus healthy controls. Results At baseline, patients with a continuous course were already distinguishable, with significance above chance level, from both patients with an episodic course (p=0.004, sensitivity=71, specificity=68) and healthy individuals (p=0.01, sensitivity=71, specificity=61). Patients with an episodic course could not be distinguished from healthy individuals. When patients with an intermediate outcome were classified according to the discriminating pattern episodic versus continuous, 74% of those who did not develop other episodes were classified as episodic, and 65% of those who did develop further episodes were classified as continuous (p=0.035). Conclusions We provide preliminary evidence of MRI application in the individualized prediction of future illness course, using a simple and automated SVM pipeline. When replicated and validated in larger groups, this could enable targeted clinical decisions based on imaging data. PMID:22059690
Marafino, Ben J; Davies, Jason M; Bardach, Naomi S; Dean, Mitzi L; Dudley, R Adams
2014-01-01
Existing risk adjustment models for intensive care unit (ICU) outcomes rely on manual abstraction of patient-level predictors from medical charts. Developing an automated method for abstracting these data from free text might reduce cost and data collection times. To develop a support vector machine (SVM) classifier capable of identifying a range of procedures and diagnoses in ICU clinical notes for use in risk adjustment. We selected notes from 2001-2008 for 4191 neonatal ICU (NICU) and 2198 adult ICU patients from the MIMIC-II database from the Beth Israel Deaconess Medical Center. Using these notes, we developed an implementation of the SVM classifier to identify procedures (mechanical ventilation and phototherapy in NICU notes) and diagnoses (jaundice in NICU and intracranial hemorrhage (ICH) in adult ICU). On the jaundice classification task, we also compared classifier performance using n-gram features to unigrams with application of a negation algorithm (NegEx). Our classifier accurately identified mechanical ventilation (accuracy=0.982, F1=0.954) and phototherapy use (accuracy=0.940, F1=0.912), as well as jaundice (accuracy=0.898, F1=0.884) and ICH diagnoses (accuracy=0.938, F1=0.943). Including bigram features improved performance on the jaundice (accuracy=0.898 vs 0.865) and ICH (0.938 vs 0.927) tasks, and outperformed NegEx-derived unigram features (accuracy=0.898 vs 0.863) on the jaundice task. Overall, a classifier using n-gram support vectors displayed excellent performance characteristics. The classifier generalizes to diverse patient populations, diagnoses, and procedures. SVM-based classifiers can accurately identify procedure status and diagnoses among ICU patients, and including n-gram features improves performance, compared to existing methods. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
NASA Astrophysics Data System (ADS)
Salehi, Hassan S.; Li, Hai; Merkulov, Alex; Kumavor, Patrick D.; Vavadi, Hamed; Sanders, Melinda; Kueck, Angela; Brewer, Molly A.; Zhu, Quing
2016-04-01
Most ovarian cancers are diagnosed at advanced stages due to the lack of efficacious screening techniques. Photoacoustic tomography (PAT) has a potential to image tumor angiogenesis and detect early neovascular changes of the ovary. We have developed a coregistered PAT and ultrasound (US) prototype system for real-time assessment of ovarian masses. Features extracted from PAT and US angular beams, envelopes, and images were input to a logistic classifier and a support vector machine (SVM) classifier to diagnose ovaries as benign or malignant. A total of 25 excised ovaries of 15 patients were studied and the logistic and SVM classifiers achieved sensitivities of 70.4 and 87.7%, and specificities of 95.6 and 97.9%, respectively. Furthermore, the ovaries of two patients were noninvasively imaged using the PAT/US system before surgical excision. By using five significant features and the logistic classifier, 12 out of 14 images (86% sensitivity) from a malignant ovarian mass and all 17 images (100% specificity) from a benign mass were accurately classified; the SVM correctly classified 10 out of 14 malignant images (71% sensitivity) and all 17 benign images (100% specificity). These initial results demonstrate the clinical potential of the PAT/US technique for ovarian cancer diagnosis.
Alexandridis, Thomas K; Tamouridou, Afroditi Alexandra; Pantazi, Xanthoula Eirini; Lagopodi, Anastasia L; Kashefi, Javid; Ovakoglou, Georgios; Polychronos, Vassilios; Moshou, Dimitrios
2017-09-01
In the present study, the detection and mapping of Silybum marianum (L.) Gaertn. weed using novelty detection classifiers is reported. A multispectral camera (green-red-NIR) on board a fixed wing unmanned aerial vehicle (UAV) was employed for obtaining high-resolution images. Four novelty detection classifiers were used to identify S. marianum between other vegetation in a field. The classifiers were One Class Support Vector Machine (OC-SVM), One Class Self-Organizing Maps (OC-SOM), Autoencoders and One Class Principal Component Analysis (OC-PCA). As input features to the novelty detection classifiers, the three spectral bands and texture were used. The S. marianum identification accuracy using OC-SVM reached an overall accuracy of 96%. The results show the feasibility of effective S. marianum mapping by means of novelty detection classifiers acting on multispectral UAV imagery.
NASA Astrophysics Data System (ADS)
Yekkehkhany, B.; Safari, A.; Homayouni, S.; Hasanlou, M.
2014-10-01
In this paper, a framework is developed based on Support Vector Machines (SVM) for crop classification using polarimetric features extracted from multi-temporal Synthetic Aperture Radar (SAR) imageries. The multi-temporal integration of data not only improves the overall retrieval accuracy but also provides more reliable estimates with respect to single-date data. Several kernel functions are employed and compared in this study for mapping the input space to higher Hilbert dimension space. These kernel functions include linear, polynomials and Radial Based Function (RBF). The method is applied to several UAVSAR L-band SAR images acquired over an agricultural area near Winnipeg, Manitoba, Canada. In this research, the temporal alpha features of H/A/α decomposition method are used in classification. The experimental tests show an SVM classifier with RBF kernel for three dates of data increases the Overall Accuracy (OA) to up to 3% in comparison to using linear kernel function, and up to 1% in comparison to a 3rd degree polynomial kernel function.
2012-01-01
Background Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in electrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophic disease. To this end, we propose a new method, which employs wavelets and simple feature selection. Methods For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in 90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for differentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalized and signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point. We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers to those features. Results We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods. Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively. The SVM classifier detects 355 ischemic ST episodes. Conclusions We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any numerical values of the parameters to be supplied in advance. In the case of the SVM classifier, one has to select a single parameter. PMID:22703641
DisArticle: a web server for SVM-based discrimination of articles on traditional medicine.
Kim, Sang-Kyun; Nam, SeJin; Kim, SangHyun
2017-01-28
Much research has been done in Northeast Asia to show the efficacy of traditional medicine. While MEDLINE contains many biomedical articles including those on traditional medicine, it does not categorize those articles by specific research area. The aim of this study was to provide a method that searches for articles only on traditional medicine in Northeast Asia, including traditional Chinese medicine, from among the articles in MEDLINE. This research established an SVM-based classifier model to identify articles on traditional medicine. The TAK + HM classifier, trained with the features of title, abstract, keywords, herbal data, and MeSH, has a precision of 0.954 and a recall of 0.902. In particular, the feature of herbal data significantly increased the performance of the classifier. By using the TAK + HM classifier, a total of about 108,000 articles were discriminated as articles on traditional medicine from among all articles in MEDLINE. We also built a web server called DisArticle ( http://informatics.kiom.re.kr/disarticle ), in which users can search for the articles and obtain statistical data. Because much evidence-based research on traditional medicine has been published in recent years, it has become necessary to search for articles on traditional medicine exclusively in literature databases. DisArticle can help users to search for and analyze the research trends in traditional medicine.
NASA Astrophysics Data System (ADS)
Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Tseng, Derek; Benien, Parul; Ozcan, Aydogan
2017-03-01
Giardia lamblia causes a disease known as giardiasis, which results in diarrhea, abdominal cramps, and bloating. Although conventional pathogen detection methods used in water analysis laboratories offer high sensitivity and specificity, they are time consuming, and need experts to operate bulky equipment and analyze the samples. Here we present a field-portable and cost-effective smartphone-based waterborne pathogen detection platform that can automatically classify Giardia cysts using machine learning. Our platform enables the detection and quantification of Giardia cysts in one hour, including sample collection, labeling, filtration, and automated counting steps. We evaluated the performance of three prototypes using Giardia-spiked water samples from different sources (e.g., reagent-grade, tap, non-potable, and pond water samples). We populated a training database with >30,000 cysts and estimated our detection sensitivity and specificity using 20 different classifier models, including decision trees, nearest neighbor classifiers, support vector machines (SVMs), and ensemble classifiers, and compared their speed of training and classification, as well as predicted accuracies. Among them, cubic SVM, medium Gaussian SVM, and bagged-trees were the most promising classifier types with accuracies of 94.1%, 94.2%, and 95%, respectively; we selected the latter as our preferred classifier for the detection and enumeration of Giardia cysts that are imaged using our mobile-phone fluorescence microscope. Without the need for any experts or microbiologists, this field-portable pathogen detection platform can present a useful tool for water quality monitoring in resource-limited-settings.
Signal peptide discrimination and cleavage site identification using SVM and NN.
Kazemian, H B; Yusuf, S A; White, K
2014-02-01
About 15% of all proteins in a genome contain a signal peptide (SP) sequence, at the N-terminus, that targets the protein to intracellular secretory pathways. Once the protein is targeted correctly in the cell, the SP is cleaved, releasing the mature protein. Accurate prediction of the presence of these short amino-acid SP chains is crucial for modelling the topology of membrane proteins, since SP sequences can be confused with transmembrane domains due to similar composition of hydrophobic amino acids. This paper presents a cascaded Support Vector Machine (SVM)-Neural Network (NN) classification methodology for SP discrimination and cleavage site identification. The proposed method utilises a dual phase classification approach using SVM as a primary classifier to discriminate SP sequences from Non-SP. The methodology further employs NNs to predict the most suitable cleavage site candidates. In phase one, a SVM classification utilises hydrophobic propensities as a primary feature vector extraction using symmetric sliding window amino-acid sequence analysis for discrimination of SP and Non-SP. In phase two, a NN classification uses asymmetric sliding window sequence analysis for prediction of cleavage site identification. The proposed SVM-NN method was tested using Uni-Prot non-redundant datasets of eukaryotic and prokaryotic proteins with SP and Non-SP N-termini. Computer simulation results demonstrate an overall accuracy of 0.90 for SP and Non-SP discrimination based on Matthews Correlation Coefficient (MCC) tests using SVM. For SP cleavage site prediction, the overall accuracy is 91.5% based on cross-validation tests using the novel SVM-NN model. © 2013 Published by Elsevier Ltd.
Predicting conversion from MCI to AD using resting-state fMRI, graph theoretical approach and SVM.
Hojjati, Seyed Hani; Ebrahimzadeh, Ata; Khazaee, Ali; Babajani-Feremi, Abbas
2017-04-15
We investigated identifying patients with mild cognitive impairment (MCI) who progress to Alzheimer's disease (AD), MCI converter (MCI-C), from those with MCI who do not progress to AD, MCI non-converter (MCI-NC), based on resting-state fMRI (rs-fMRI). Graph theory and machine learning approach were utilized to predict progress of patients with MCI to AD using rs-fMRI. Eighteen MCI converts (average age 73.6 years; 11 male) and 62 age-matched MCI non-converters (average age 73.0 years, 28 male) were included in this study. We trained and tested a support vector machine (SVM) to classify MCI-C from MCI-NC using features constructed based on the local and global graph measures. A novel feature selection algorithm was developed and utilized to select an optimal subset of features. Using subset of optimal features in SVM, we classified MCI-C from MCI-NC with an accuracy, sensitivity, specificity, and the area under the receiver operating characteristic (ROC) curve of 91.4%, 83.24%, 90.1%, and 0.95, respectively. Furthermore, results of our statistical analyses were used to identify the affected brain regions in AD. To the best of our knowledge, this is the first study that combines the graph measures (constructed based on rs-fMRI) with machine learning approach and accurately classify MCI-C from MCI-NC. Results of this study demonstrate potential of the proposed approach for early AD diagnosis and demonstrate capability of rs-fMRI to predict conversion from MCI to AD by identifying affected brain regions underlying this conversion. Copyright © 2017 Elsevier B.V. All rights reserved.
Torii, Manabu; Yin, Lanlan; Nguyen, Thang; Mazumdar, Chand T.; Liu, Hongfang; Hartley, David M.; Nelson, Noele P.
2014-01-01
Purpose Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles. Methods Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers. Results Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period. Conclusions The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages. PMID:21134784
Men, Hong; Fu, Songlin; Yang, Jialin; Cheng, Meiqi; Shi, Yan; Liu, Jingjing
2018-01-18
Paraffin odor intensity is an important quality indicator when a paraffin inspection is performed. Currently, paraffin odor level assessment is mainly dependent on an artificial sensory evaluation. In this paper, we developed a paraffin odor analysis system to classify and grade four kinds of paraffin samples. The original feature set was optimized using Principal Component Analysis (PCA) and Partial Least Squares (PLS). Support Vector Machine (SVM), Random Forest (RF), and Extreme Learning Machine (ELM) were applied to three different feature data sets for classification and level assessment of paraffin. For classification, the model based on SVM, with an accuracy rate of 100%, was superior to that based on RF, with an accuracy rate of 98.33-100%, and ELM, with an accuracy rate of 98.01-100%. For level assessment, the R² related to the training set was above 0.97 and the R² related to the test set was above 0.87. Through comprehensive comparison, the generalization of the model based on ELM was superior to those based on SVM and RF. The scoring errors for the three models were 0.0016-0.3494, lower than the error of 0.5-1.0 measured by industry standard experts, meaning these methods have a higher prediction accuracy for scoring paraffin level.
Lin, Dongyun; Sun, Lei; Toh, Kar-Ann; Zhang, Jing Bo; Lin, Zhiping
2018-05-01
Automated biomedical image classification could confront the challenges of high level noise, image blur, illumination variation and complicated geometric correspondence among various categorical biomedical patterns in practice. To handle these challenges, we propose a cascade method consisting of two stages for biomedical image classification. At stage 1, we propose a confidence score based classification rule with a reject option for a preliminary decision using the support vector machine (SVM). The testing images going through stage 1 are separated into two groups based on their confidence scores. Those testing images with sufficiently high confidence scores are classified at stage 1 while the others with low confidence scores are rejected and fed to stage 2. At stage 2, the rejected images from stage 1 are first processed by a subspace analysis technique called eigenfeature regularization and extraction (ERE), and then classified by another SVM trained in the transformed subspace learned by ERE. At both stages, images are represented based on two types of local features, i.e., SIFT and SURF, respectively. They are encoded using various bag-of-words (BoW) models to handle biomedical patterns with and without geometric correspondence, respectively. Extensive experiments are implemented to evaluate the proposed method on three benchmark real-world biomedical image datasets. The proposed method significantly outperforms several competing state-of-the-art methods in terms of classification accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.
Ensemble of classifiers for ontology enrichment
NASA Astrophysics Data System (ADS)
Semenova, A. V.; Kureichik, V. M.
2018-05-01
A classifier is a basis of ontology learning systems. Classification of text documents is used in many applications, such as information retrieval, information extraction, definition of spam. A new ensemble of classifiers based on SVM (a method of support vectors), LSTM (neural network) and word embedding are suggested. An experiment was conducted on open data, which allows us to conclude that the proposed classification method is promising. The implementation of the proposed classifier is performed in the Matlab using the functions of the Text Analytics Toolbox. The principal difference between the proposed ensembles of classifiers is the high quality of classification of data at acceptable time costs.
A hybrid approach to select features and classify diseases based on medical data
NASA Astrophysics Data System (ADS)
AbdelLatif, Hisham; Luo, Jiawei
2018-03-01
Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms
NASA Astrophysics Data System (ADS)
Wang, Danshi; Zhang, Min; Cai, Zhongle; Cui, Yue; Li, Ze; Han, Huanhuan; Fu, Meixia; Luo, Bin
2016-06-01
An effective machine learning algorithm, the support vector machine (SVM), is presented in the context of a coherent optical transmission system. As a classifier, the SVM can create nonlinear decision boundaries to mitigate the distortions caused by nonlinear phase noise (NLPN). Without any prior information or heuristic assumptions, the SVM can learn and capture the link properties from only a few training data. Compared with the maximum likelihood estimation (MLE) algorithm, a lower bit-error rate (BER) is achieved by the SVM for a given launch power; moreover, the launch power dynamic range (LPDR) is increased by 3.3 dBm for 8 phase-shift keying (8 PSK), 1.2 dBm for QPSK, and 0.3 dBm for BPSK. The maximum transmission distance corresponding to a BER of 1 ×10-3 is increased by 480 km for the case of 8 PSK. The larger launch power range and longer transmission distance improve the tolerance to amplitude and phase noise, which demonstrates the feasibility of the SVM in digital signal processing for M-PSK formats. Meanwhile, in order to apply the SVM method to 16 quadratic amplitude modulation (16 QAM) detection, we propose a parameter optimization scheme. By utilizing a cross-validation and grid-search techniques, the optimal parameters of SVM can be selected, thus leading to the LPDR improvement by 2.8 dBm. Additionally, we demonstrate that the SVM is also effective in combating the laser phase noise combined with the inphase and quadrature (I/Q) modulator imperfections, but the improvement is insignificant for the linear noise and separate I/Q imbalance. The computational complexity of SVM is also discussed. The relatively low complexity makes it possible for SVM to implement the real-time processing.
Texture- and deformability-based surface recognition by tactile image analysis.
Khasnobish, Anwesha; Pal, Monalisa; Tibarewala, D N; Konar, Amit; Pal, Kunal
2016-08-01
Deformability and texture are two unique object characteristics which are essential for appropriate surface recognition by tactile exploration. Tactile sensation is required to be incorporated in artificial arms for rehabilitative and other human-computer interface applications to achieve efficient and human-like manoeuvring. To accomplish the same, surface recognition by tactile data analysis is one of the prerequisites. The aim of this work is to develop effective technique for identification of various surfaces based on deformability and texture by analysing tactile images which are obtained during dynamic exploration of the item by artificial arms whose gripper is fitted with tactile sensors. Tactile data have been acquired, while human beings as well as a robot hand fitted with tactile sensors explored the objects. The tactile images are pre-processed, and relevant features are extracted from the tactile images. These features are provided as input to the variants of support vector machine (SVM), linear discriminant analysis and k-nearest neighbour (kNN) for classification. Based on deformability, six household surfaces are recognized from their corresponding tactile images. Moreover, based on texture five surfaces of daily use are classified. The method adopted in the former two cases has also been applied for deformability- and texture-based recognition of four biomembranes, i.e. membranes prepared from biomaterials which can be used for various applications such as drug delivery and implants. Linear SVM performed best for recognizing surface deformability with an accuracy of 83 % in 82.60 ms, whereas kNN classifier recognizes surfaces of daily use having different textures with an accuracy of 89 % in 54.25 ms and SVM with radial basis function kernel recognizes biomembranes with an accuracy of 78 % in 53.35 ms. The classifiers are observed to generalize well on the unseen test datasets with very high performance to achieve efficient material recognition based on its deformability and texture.
Effect of finite sample size on feature selection and classification: a simulation study.
Way, Ted W; Sahiner, Berkman; Hadjiiski, Lubomir M; Chan, Heang-Ping
2010-02-01
The small number of samples available for training and testing is often the limiting factor in finding the most effective features and designing an optimal computer-aided diagnosis (CAD) system. Training on a limited set of samples introduces bias and variance in the performance of a CAD system relative to that trained with an infinite sample size. In this work, the authors conducted a simulation study to evaluate the performances of various combinations of classifiers and feature selection techniques and their dependence on the class distribution, dimensionality, and the training sample size. The understanding of these relationships will facilitate development of effective CAD systems under the constraint of limited available samples. Three feature selection techniques, the stepwise feature selection (SFS), sequential floating forward search (SFFS), and principal component analysis (PCA), and two commonly used classifiers, Fisher's linear discriminant analysis (LDA) and support vector machine (SVM), were investigated. Samples were drawn from multidimensional feature spaces of multivariate Gaussian distributions with equal or unequal covariance matrices and unequal means, and with equal covariance matrices and unequal means estimated from a clinical data set. Classifier performance was quantified by the area under the receiver operating characteristic curve Az. The mean Az values obtained by resubstitution and hold-out methods were evaluated for training sample sizes ranging from 15 to 100 per class. The number of simulated features available for selection was chosen to be 50, 100, and 200. It was found that the relative performance of the different combinations of classifier and feature selection method depends on the feature space distributions, the dimensionality, and the available training sample sizes. The LDA and SVM with radial kernel performed similarly for most of the conditions evaluated in this study, although the SVM classifier showed a slightly higher hold-out performance than LDA for some conditions and vice versa for other conditions. PCA was comparable to or better than SFS and SFFS for LDA at small samples sizes, but inferior for SVM with polynomial kernel. For the class distributions simulated from clinical data, PCA did not show advantages over the other two feature selection methods. Under this condition, the SVM with radial kernel performed better than the LDA when few training samples were available, while LDA performed better when a large number of training samples were available. None of the investigated feature selection-classifier combinations provided consistently superior performance under the studied conditions for different sample sizes and feature space distributions. In general, the SFFS method was comparable to the SFS method while PCA may have an advantage for Gaussian feature spaces with unequal covariance matrices. The performance of the SVM with radial kernel was better than, or comparable to, that of the SVM with polynomial kernel under most conditions studied.
Thanh Noi, Phan; Kappas, Martin
2017-01-01
In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909
Thanh Noi, Phan; Kappas, Martin
2017-12-22
In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.
Diagnosis of multiple sclerosis from EEG signals using nonlinear methods.
Torabi, Ali; Daliri, Mohammad Reza; Sabzposhan, Seyyed Hojjat
2017-12-01
EEG signals have essential and important information about the brain and neural diseases. The main purpose of this study is classifying two groups of healthy volunteers and Multiple Sclerosis (MS) patients using nonlinear features of EEG signals while performing cognitive tasks. EEG signals were recorded when users were doing two different attentional tasks. One of the tasks was based on detecting a desired change in color luminance and the other task was based on detecting a desired change in direction of motion. EEG signals were analyzed in two ways: EEG signals analysis without rhythms decomposition and EEG sub-bands analysis. After recording and preprocessing, time delay embedding method was used for state space reconstruction; embedding parameters were determined for original signals and their sub-bands. Afterwards nonlinear methods were used in feature extraction phase. To reduce the feature dimension, scalar feature selections were done by using T-test and Bhattacharyya criteria. Then, the data were classified using linear support vector machines (SVM) and k-nearest neighbor (KNN) method. The best combination of the criteria and classifiers was determined for each task by comparing performances. For both tasks, the best results were achieved by using T-test criterion and SVM classifier. For the direction-based and the color-luminance-based tasks, maximum classification performances were 93.08 and 79.79% respectively which were reached by using optimal set of features. Our results show that the nonlinear dynamic features of EEG signals seem to be useful and effective in MS diseases diagnosis.
Sequential Adaptive Multi-Modality Target Detection and Classification Using Physics Based Models
2006-09-01
estimation," R. Raghuram, R. Raich and A.O. Hero, IEEE Intl. Conf. on Acoustics, Speech , and Signal Processing, Toulouse France, June 2006, <http...can then be solved using off-the-shelf classifiers such as radial basis functions, SVM, or kNN classifier structures. When applied to mine detection we...stage waveform selection for adaptive resource constrained state estimation," 2006 IEEE Intl. Conf. on Acoustics, Speech , and Signal Processing
Semi-supervised SVM for individual tree crown species classification
NASA Astrophysics Data System (ADS)
Dalponte, Michele; Ene, Liviu Theodor; Marconcini, Mattia; Gobakken, Terje; Næsset, Erik
2015-12-01
In this paper a novel semi-supervised SVM classifier is presented, specifically developed for tree species classification at individual tree crown (ITC) level. In ITC tree species classification, all the pixels belonging to an ITC should have the same label. This assumption is used in the learning of the proposed semi-supervised SVM classifier (ITC-S3VM). This method exploits the information contained in the unlabeled ITC samples in order to improve the classification accuracy of a standard SVM. The ITC-S3VM method can be easily implemented using freely available software libraries. The datasets used in this study include hyperspectral imagery and laser scanning data acquired over two boreal forest areas characterized by the presence of three information classes (Pine, Spruce, and Broadleaves). The experimental results quantify the effectiveness of the proposed approach, which provides classification accuracies significantly higher (from 2% to above 27%) than those obtained by the standard supervised SVM and by a state-of-the-art semi-supervised SVM (S3VM). Particularly, by reducing the number of training samples (i.e. from 100% to 25%, and from 100% to 5% for the two datasets, respectively) the proposed method still exhibits results comparable to the ones of a supervised SVM trained with the full available training set. This property of the method makes it particularly suitable for practical forest inventory applications in which collection of in situ information can be very expensive both in terms of cost and time.
Karan, Shivesh Kishore; Samadder, Sukha Ranjan
2016-08-01
One objective of the present study was to evaluate the performance of support vector machine (SVM)-based image classification technique with the maximum likelihood classification (MLC) technique for a rapidly changing landscape of an open-cast mine. The other objective was to assess the change in land use pattern due to coal mining from 2006 to 2016. Assessing the change in land use pattern accurately is important for the development and monitoring of coalfields in conjunction with sustainable development. For the present study, Landsat 5 Thematic Mapper (TM) data of 2006 and Landsat 8 Operational Land Imager (OLI)/Thermal Infrared Sensor (TIRS) data of 2016 of a part of Jharia Coalfield, Dhanbad, India, were used. The SVM classification technique provided greater overall classification accuracy when compared to the MLC technique in classifying heterogeneous landscape with limited training dataset. SVM exceeded MLC in handling a difficult challenge of classifying features having near similar reflectance on the mean signature plot, an improvement of over 11 % was observed in classification of built-up area, and an improvement of 24 % was observed in classification of surface water using SVM; similarly, the SVM technique improved the overall land use classification accuracy by almost 6 and 3 % for Landsat 5 and Landsat 8 images, respectively. Results indicated that land degradation increased significantly from 2006 to 2016 in the study area. This study will help in quantifying the changes and can also serve as a basis for further decision support system studies aiding a variety of purposes such as planning and management of mines and environmental impact assessment.
Maximum margin semi-supervised learning with irrelevant data.
Yang, Haiqin; Huang, Kaizhu; King, Irwin; Lyu, Michael R
2015-10-01
Semi-supervised learning (SSL) is a typical learning paradigms training a model from both labeled and unlabeled data. The traditional SSL models usually assume unlabeled data are relevant to the labeled data, i.e., following the same distributions of the targeted labeled data. In this paper, we address a different, yet formidable scenario in semi-supervised classification, where the unlabeled data may contain irrelevant data to the labeled data. To tackle this problem, we develop a maximum margin model, named tri-class support vector machine (3C-SVM), to utilize the available training data, while seeking a hyperplane for separating the targeted data well. Our 3C-SVM exhibits several characteristics and advantages. First, it does not need any prior knowledge and explicit assumption on the data relatedness. On the contrary, it can relieve the effect of irrelevant unlabeled data based on the logistic principle and maximum entropy principle. That is, 3C-SVM approaches an ideal classifier. This classifier relies heavily on labeled data and is confident on the relevant data lying far away from the decision hyperplane, while maximally ignoring the irrelevant data, which are hardly distinguished. Second, theoretical analysis is provided to prove that in what condition, the irrelevant data can help to seek the hyperplane. Third, 3C-SVM is a generalized model that unifies several popular maximum margin models, including standard SVMs, Semi-supervised SVMs (S(3)VMs), and SVMs learned from the universum (U-SVMs) as its special cases. More importantly, we deploy a concave-convex produce to solve the proposed 3C-SVM, transforming the original mixed integer programming, to a semi-definite programming relaxation, and finally to a sequence of quadratic programming subproblems, which yields the same worst case time complexity as that of S(3)VMs. Finally, we demonstrate the effectiveness and efficiency of our proposed 3C-SVM through systematical experimental comparisons. Copyright © 2015 Elsevier Ltd. All rights reserved.
Comparison of ANN and SVM for classification of eye movements in EOG signals
NASA Astrophysics Data System (ADS)
Qi, Lim Jia; Alias, Norma
2018-03-01
Nowadays, electrooculogram is regarded as one of the most important biomedical signal in measuring and analyzing eye movement patterns. Thus, it is helpful in designing EOG-based Human Computer Interface (HCI). In this research, electrooculography (EOG) data was obtained from five volunteers. The (EOG) data was then preprocessed before feature extraction methods were employed to further reduce the dimensionality of data. Three feature extraction approaches were put forward, namely statistical parameters, autoregressive (AR) coefficients using Burg method, and power spectral density (PSD) using Yule-Walker method. These features would then become input to both artificial neural network (ANN) and support vector machine (SVM). The performance of the combination of different feature extraction methods and classifiers was presented and analyzed. It was found that statistical parameters + SVM achieved the highest classification accuracy of 69.75%.
A support vector machine approach for classification of welding defects from ultrasonic signals
NASA Astrophysics Data System (ADS)
Chen, Yuan; Ma, Hong-Wei; Zhang, Guang-Ming
2014-07-01
Defect classification is an important issue in ultrasonic non-destructive evaluation. A layered multi-class support vector machine (LMSVM) classification system, which combines multiple SVM classifiers through a layered architecture, is proposed in this paper. The proposed LMSVM classification system is applied to the classification of welding defects from ultrasonic test signals. The measured ultrasonic defect echo signals are first decomposed into wavelet coefficients by the wavelet packet transform. The energy of the wavelet coefficients at different frequency channels are used to construct the feature vectors. The bees algorithm (BA) is then used for feature selection and SVM parameter optimisation for the LMSVM classification system. The BA-based feature selection optimises the energy feature vectors. The optimised feature vectors are input to the LMSVM classification system for training and testing. Experimental results of classifying welding defects demonstrate that the proposed technique is highly robust, precise and reliable for ultrasonic defect classification.
LMD Based Features for the Automatic Seizure Detection of EEG Signals Using SVM.
Zhang, Tao; Chen, Wanzhong
2017-08-01
Achieving the goal of detecting seizure activity automatically using electroencephalogram (EEG) signals is of great importance and significance for the treatment of epileptic seizures. To realize this aim, a newly-developed time-frequency analytical algorithm, namely local mean decomposition (LMD), is employed in the presented study. LMD is able to decompose an arbitrary signal into a series of product functions (PFs). Primarily, the raw EEG signal is decomposed into several PFs, and then the temporal statistical and non-linear features of the first five PFs are calculated. The features of each PF are fed into five classifiers, including back propagation neural network (BPNN), K-nearest neighbor (KNN), linear discriminant analysis (LDA), un-optimized support vector machine (SVM) and SVM optimized by genetic algorithm (GA-SVM), for five classification cases, respectively. Confluent features of all PFs and raw EEG are further passed into the high-performance GA-SVM for the same classification tasks. Experimental results on the international public Bonn epilepsy EEG dataset show that the average classification accuracy of the presented approach are equal to or higher than 98.10% in all the five cases, and this indicates the effectiveness of the proposed approach for automated seizure detection.
Prediction of Flood Warning in Taiwan Using Nonlinear SVM with Simulated Annealing Algorithm
NASA Astrophysics Data System (ADS)
Lee, C.
2013-12-01
The issue of the floods is important in Taiwan. It is because the narrow and high topography of the island make lots of rivers steep in Taiwan. The tropical depression likes typhoon always causes rivers to flood. Prediction of river flow under the extreme rainfall circumstances is important for government to announce the warning of flood. Every time typhoon passed through Taiwan, there were always floods along some rivers. The warning is classified to three levels according to the warning water levels in Taiwan. The propose of this study is to predict the level of floods warning from the information of precipitation, rainfall duration and slope of riverbed. To classify the level of floods warning by the above-mentioned information and modeling the problems, a machine learning model, nonlinear Support vector machine (SVM), is formulated to classify the level of floods warning. In addition, simulated annealing (SA), a probabilistic heuristic algorithm, is used to determine the optimal parameter of the SVM model. A case study of flooding-trend rivers of different gradients in Taiwan is conducted. The contribution of this SVM model with simulated annealing is capable of making efficient announcement for flood warning and keeping the danger of flood from residents along the rivers.
NASA Astrophysics Data System (ADS)
Mallepudi, Sri Abhishikth; Calix, Ricardo A.; Knapp, Gerald M.
2011-02-01
In recent years there has been a rapid increase in the size of video and image databases. Effective searching and retrieving of images from these databases is a significant current research area. In particular, there is a growing interest in query capabilities based on semantic image features such as objects, locations, and materials, known as content-based image retrieval. This study investigated mechanisms for identifying materials present in an image. These capabilities provide additional information impacting conditional probabilities about images (e.g. objects made of steel are more likely to be buildings). These capabilities are useful in Building Information Modeling (BIM) and in automatic enrichment of images. I2T methodologies are a way to enrich an image by generating text descriptions based on image analysis. In this work, a learning model is trained to detect certain materials in images. To train the model, an image dataset was constructed containing single material images of bricks, cloth, grass, sand, stones, and wood. For generalization purposes, an additional set of 50 images containing multiple materials (some not used in training) was constructed. Two different supervised learning classification models were investigated: a single multi-class SVM classifier, and multiple binary SVM classifiers (one per material). Image features included Gabor filter parameters for texture, and color histogram data for RGB components. All classification accuracy scores using the SVM-based method were above 85%. The second model helped in gathering more information from the images since it assigned multiple classes to the images. A framework for the I2T methodology is presented.
A Classification Method for Seed Viability Assessment with Infrared Thermography.
Men, Sen; Yan, Lei; Liu, Jiaxin; Qian, Hua; Luo, Qinjuan
2017-04-12
This paper presents a viability assessment method for Pisum sativum L. seeds based on the infrared thermography technique. In this work, different artificial treatments were conducted to prepare seeds samples with different viability. Thermal images and visible images were recorded every five minutes during the standard five day germination test. After the test, the root length of each sample was measured, which can be used as the viability index of that seed. Each individual seed area in the visible images was segmented with an edge detection method, and the average temperature of the corresponding area in the infrared images was calculated as the representative temperature for this seed at that time. The temperature curve of each seed during germination was plotted. Thirteen characteristic parameters extracted from the temperature curve were analyzed to show the difference of the temperature fluctuations between the seeds samples with different viability. With above parameters, support vector machine (SVM) was used to classify the seed samples into three categories: viable, aged and dead according to the root length, the classification accuracy rate was 95%. On this basis, with the temperature data of only the first three hours during the germination, another SVM model was proposed to classify the seed samples, and the accuracy rate was about 91.67%. From these experimental results, it can be seen that infrared thermography can be applied for the prediction of seed viability, based on the SVM algorithm.
Kalatzis, I; Piliouras, N; Ventouras, E; Papageorgiou, C C; Rabavilas, A D; Cavouras, D
2004-07-01
A computer-based classification system has been designed capable of distinguishing patients with depression from normal controls by event-related potential (ERP) signals using the P600 component. Clinical material comprised 25 patients with depression and an equal number of gender and aged-matched healthy controls. All subjects were evaluated by a computerized version of the digit span Wechsler test. EEG activity was recorded and digitized from 15 scalp electrodes (leads). Seventeen features related to the shape of the waveform were generated and were employed in the design of an optimum support vector machine (SVM) classifier at each lead. The outcomes of those SVM classifiers were selected by a majority-vote engine (MVE), which assigned each subject to either the normal or depressive classes. MVE classification accuracy was 94% when using all leads and 92% or 82% when using only the right or left scalp leads, respectively. These findings support the hypothesis that depression is associated with dysfunction of right hemisphere mechanisms mediating the processing of information that assigns a specific response to a specific stimulus, as those mechanisms are reflected by the P600 component of ERPs. Our method may aid the further understanding of the neurophysiology underlying depression, due to its potentiality to integrate theories of depression and psychophysiology.
Dey, Susmita; Sarkar, Ripon; Chatterjee, Kabita; Datta, Pallab; Barui, Ananya; Maity, Santi P
2017-04-01
Habitual smokers are known to be at higher risk for developing oral cancer, which is increasing at an alarming rate globally. Conventionally, oral cancer is associated with high mortality rates, although recent reports show the improved survival outcomes by early diagnosis of disease. An effective prediction system which will enable to identify the probability of cancer development amongst the habitual smokers, is thus expected to benefit sizable number of populations. Present work describes a non-invasive, integrated method for early detection of cellular abnormalities based on analysis of different cyto-morphological features of exfoliative oral epithelial cells. Differential interference contrast (DIC) microscopy provides a potential optical tool as this mode provides a pseudo three dimensional (3-D) image with detailed morphological and textural features obtained from noninvasive, label free epithelial cells. For segmentation of DIC images, gradient vector flow snake model active contour process has been adopted. To evaluate cellular abnormalities amongst habitual smokers, the selected morphological and textural features of epithelial cells are compared with the non-smoker (-ve control group) group and clinically diagnosed pre-cancer patients (+ve control group) using support vector machine (SVM) classifier. Accuracy of the developed SVM based classification has been found to be 86% with 80% sensitivity and 89% specificity in classifying the features from the volunteers having smoking habit. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhao, Shouwei; Zhang, Yong; Zhou, Bin; Ma, Dongxi
2014-09-01
Interaction is one of the key techniques of augmented reality (AR) maintenance guiding system. Because of the complexity of the maintenance guiding system's image background and the high dimensionality of gesture characteristics, the whole process of gesture recognition can be divided into three stages which are gesture segmentation, gesture characteristic feature modeling and trick recognition. In segmentation stage, for solving the misrecognition of skin-like region, a segmentation algorithm combing background mode and skin color to preclude some skin-like regions is adopted. In gesture characteristic feature modeling of image attributes stage, plenty of characteristic features are analyzed and acquired, such as structure characteristics, Hu invariant moments features and Fourier descriptor. In trick recognition stage, a classifier based on Support Vector Machine (SVM) is introduced into the augmented reality maintenance guiding process. SVM is a novel learning method based on statistical learning theory, processing academic foundation and excellent learning ability, having a lot of issues in machine learning area and special advantages in dealing with small samples, non-linear pattern recognition at high dimension. The gesture recognition of augmented reality maintenance guiding system is realized by SVM after the granulation of all the characteristic features. The experimental results of the simulation of number gesture recognition and its application in augmented reality maintenance guiding system show that the real-time performance and robustness of gesture recognition of AR maintenance guiding system can be greatly enhanced by improved SVM.
Hasan, Mehedi; Kotov, Alexander; Carcone, April; Dong, Ming; Naar, Sylvie; Hartlieb, Kathryn Brogan
2016-08-01
This study examines the effectiveness of state-of-the-art supervised machine learning methods in conjunction with different feature types for the task of automatic annotation of fragments of clinical text based on codebooks with a large number of categories. We used a collection of motivational interview transcripts consisting of 11,353 utterances, which were manually annotated by two human coders as the gold standard, and experimented with state-of-art classifiers, including Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), Random Forest (RF), AdaBoost, DiscLDA, Conditional Random Fields (CRF) and Convolutional Neural Network (CNN) in conjunction with lexical, contextual (label of the previous utterance) and semantic (distribution of words in the utterance across the Linguistic Inquiry and Word Count dictionaries) features. We found out that, when the number of classes is large, the performance of CNN and CRF is inferior to SVM. When only lexical features were used, interview transcripts were automatically annotated by SVM with the highest classification accuracy among all classifiers of 70.8%, 61% and 53.7% based on the codebooks consisting of 17, 20 and 41 codes, respectively. Using contextual and semantic features, as well as their combination, in addition to lexical ones, improved the accuracy of SVM for annotation of utterances in motivational interview transcripts with a codebook consisting of 17 classes to 71.5%, 74.2%, and 75.1%, respectively. Our results demonstrate the potential of using machine learning methods in conjunction with lexical, semantic and contextual features for automatic annotation of clinical interview transcripts with near-human accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
Sun, X; Chen, K J; Berg, E P; Newman, D J; Schwartz, C A; Keller, W L; Maddock Carlin, K R
2014-02-01
The objective was to use digital color image texture features to predict troponin-T degradation in beef. Image texture features, including 88 gray level co-occurrence texture features, 81 two-dimension fast Fourier transformation texture features, and 48 Gabor wavelet filter texture features, were extracted from color images of beef strip steaks (longissimus dorsi, n = 102) aged for 10d obtained using a digital camera and additional lighting. Steaks were designated degraded or not-degraded based on troponin-T degradation determined on d 3 and d 10 postmortem by immunoblotting. Statistical analysis (STEPWISE regression model) and artificial neural network (support vector machine model, SVM) methods were designed to classify protein degradation. The d 3 and d 10 STEPWISE models were 94% and 86% accurate, respectively, while the d 3 and d 10 SVM models were 63% and 71%, respectively, in predicting protein degradation in aged meat. STEPWISE and SVM models based on image texture features show potential to predict troponin-T degradation in meat. © 2013.
"When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.
Daniulaityte, Raminta; Chen, Lu; Lamy, Francois R; Carlson, Robert G; Thirunarayan, Krishnaprasad; Sheth, Amit
2016-10-24
To harness the full potential of social media for epidemiological surveillance of drug abuse trends, the field needs a greater level of automation in processing and analyzing social media content. The objective of the study is to describe the development of supervised machine-learning techniques for the eDrugTrends platform to automatically classify tweets by type/source of communication (personal, official/media, retail) and sentiment (positive, negative, neutral) expressed in cannabis- and synthetic cannabinoid-related tweets. Tweets were collected using Twitter streaming Application Programming Interface and filtered through the eDrugTrends platform using keywords related to cannabis, marijuana edibles, marijuana concentrates, and synthetic cannabinoids. After creating coding rules and assessing intercoder reliability, a manually labeled data set (N=4000) was developed by coding several batches of randomly selected subsets of tweets extracted from the pool of 15,623,869 collected by eDrugTrends (May-November 2015). Out of 4000 tweets, 25% (1000/4000) were used to build source classifiers and 75% (3000/4000) were used for sentiment classifiers. Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) were used to train the classifiers. Source classification (n=1000) tested Approach 1 that used short URLs, and Approach 2 where URLs were expanded and included into the bag-of-words analysis. For sentiment classification, Approach 1 used all tweets, regardless of their source/type (n=3000), while Approach 2 applied sentiment classification to personal communication tweets only (2633/3000, 88%). Multiclass and binary classification tasks were examined, and machine-learning sentiment classifier performance was compared with Valence Aware Dictionary for sEntiment Reasoning (VADER), a lexicon and rule-based method. The performance of each classifier was assessed using 5-fold cross validation that calculated average F-scores. One-tailed t test was used to determine if differences in F-scores were statistically significant. In multiclass source classification, the use of expanded URLs did not contribute to significant improvement in classifier performance (0.7972 vs 0.8102 for SVM, P=.19). In binary classification, the identification of all source categories improved significantly when unshortened URLs were used, with personal communication tweets benefiting the most (0.8736 vs 0.8200, P<.001). In multiclass sentiment classification Approach 1, SVM (0.6723) performed similarly to NB (0.6683) and LR (0.6703). In Approach 2, SVM (0.7062) did not differ from NB (0.6980, P=.13) or LR (F=0.6931, P=.05), but it was over 40% more accurate than VADER (F=0.5030, P<.001). In multiclass task, improvements in sentiment classification (Approach 2 vs Approach 1) did not reach statistical significance (eg, SVM: 0.7062 vs 0.6723, P=.052). In binary sentiment classification (positive vs negative), Approach 2 (focus on personal communication tweets only) improved classification results, compared with Approach 1, for LR (0.8752 vs 0.8516, P=.04) and SVM (0.8800 vs 0.8557, P=.045). The study provides an example of the use of supervised machine learning methods to categorize cannabis- and synthetic cannabinoid-related tweets with fairly high accuracy. Use of these content analysis tools along with geographic identification capabilities developed by the eDrugTrends platform will provide powerful methods for tracking regional changes in user opinions related to cannabis and synthetic cannabinoids use over time and across different regions.
Pattern Recognition Approaches for Breast Cancer DCE-MRI Classification: A Systematic Review.
Fusco, Roberta; Sansone, Mario; Filice, Salvatore; Carone, Guglielmo; Amato, Daniela Maria; Sansone, Carlo; Petrillo, Antonella
2016-01-01
We performed a systematic review of several pattern analysis approaches for classifying breast lesions using dynamic, morphological, and textural features in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). Several machine learning approaches, namely artificial neural networks (ANN), support vector machines (SVM), linear discriminant analysis (LDA), tree-based classifiers (TC), and Bayesian classifiers (BC), and features used for classification are described. The findings of a systematic review of 26 studies are presented. The sensitivity and specificity are respectively 91 and 83 % for ANN, 85 and 82 % for SVM, 96 and 85 % for LDA, 92 and 87 % for TC, and 82 and 85 % for BC. The sensitivity and specificity are respectively 82 and 74 % for dynamic features, 93 and 60 % for morphological features, 88 and 81 % for textural features, 95 and 86 % for a combination of dynamic and morphological features, and 88 and 84 % for a combination of dynamic, morphological, and other features. LDA and TC have the best performance. A combination of dynamic and morphological features gives the best performance.
NASA Astrophysics Data System (ADS)
Krell, Mario Michael; Wilshusen, Nils; Seeland, Anett; Kim, Su Kyoung
2017-04-01
Objective. Classifier transfers usually come with dataset shifts. To overcome dataset shifts in practical applications, we consider the limitations in computational resources in this paper for the adaptation of batch learning algorithms, like the support vector machine (SVM). Approach. We focus on data selection strategies which limit the size of the stored training data by different inclusion, exclusion, and further dataset manipulation criteria like handling class imbalance with two new approaches. We provide a comparison of the strategies with linear SVMs on several synthetic datasets with different data shifts as well as on different transfer settings with electroencephalographic (EEG) data. Main results. For the synthetic data, adding only misclassified samples performed astoundingly well. Here, balancing criteria were very important when the other criteria were not well chosen. For the transfer setups, the results show that the best strategy depends on the intensity of the drift during the transfer. Adding all and removing the oldest samples results in the best performance, whereas for smaller drifts, it can be sufficient to only add samples near the decision boundary of the SVM which reduces processing resources. Significance. For brain-computer interfaces based on EEG data, models trained on data from a calibration session, a previous recording session, or even from a recording session with another subject are used. We show, that by using the right combination of data selection criteria, it is possible to adapt the SVM classifier to overcome the performance drop from the transfer.
Muthu Rama Krishnan, M; Shah, Pratik; Chakraborty, Chandan; Ray, Ajoy K
2012-04-01
The objective of this paper is to provide an improved technique, which can assist oncopathologists in correct screening of oral precancerous conditions specially oral submucous fibrosis (OSF) with significant accuracy on the basis of collagen fibres in the sub-epithelial connective tissue. The proposed scheme is composed of collagen fibres segmentation, its textural feature extraction and selection, screening perfomance enhancement under Gaussian transformation and finally classification. In this study, collagen fibres are segmented on R,G,B color channels using back-probagation neural network from 60 normal and 59 OSF histological images followed by histogram specification for reducing the stain intensity variation. Henceforth, textural features of collgen area are extracted using fractal approaches viz., differential box counting and brownian motion curve . Feature selection is done using Kullback-Leibler (KL) divergence criterion and the screening performance is evaluated based on various statistical tests to conform Gaussian nature. Here, the screening performance is enhanced under Gaussian transformation of the non-Gaussian features using hybrid distribution. Moreover, the routine screening is designed based on two statistical classifiers viz., Bayesian classification and support vector machines (SVM) to classify normal and OSF. It is observed that SVM with linear kernel function provides better classification accuracy (91.64%) as compared to Bayesian classifier. The addition of fractal features of collagen under Gaussian transformation improves Bayesian classifier's performance from 80.69% to 90.75%. Results are here studied and discussed.
Kavitha, Muthu Subash; Asano, Akira; Taguchi, Akira; Heo, Min-Suk
2013-09-01
To prevent low bone mineral density (BMD), that is, osteoporosis, in postmenopausal women, it is essential to diagnose osteoporosis more precisely. This study presented an automatic approach utilizing a histogram-based automatic clustering (HAC) algorithm with a support vector machine (SVM) to analyse dental panoramic radiographs (DPRs) and thus improve diagnostic accuracy by identifying postmenopausal women with low BMD or osteoporosis. We integrated our newly-proposed histogram-based automatic clustering (HAC) algorithm with our previously-designed computer-aided diagnosis system. The extracted moment-based features (mean, variance, skewness, and kurtosis) of the mandibular cortical width for the radial basis function (RBF) SVM classifier were employed. We also compared the diagnostic efficacy of the SVM model with the back propagation (BP) neural network model. In this study, DPRs and BMD measurements of 100 postmenopausal women patients (aged >50 years), with no previous record of osteoporosis, were randomly selected for inclusion. The accuracy, sensitivity, and specificity of the BMD measurements using our HAC-SVM model to identify women with low BMD were 93.0% (88.0%-98.0%), 95.8% (91.9%-99.7%) and 86.6% (79.9%-93.3%), respectively, at the lumbar spine; and 89.0% (82.9%-95.1%), 96.0% (92.2%-99.8%) and 84.0% (76.8%-91.2%), respectively, at the femoral neck. Our experimental results predict that the proposed HAC-SVM model combination applied on DPRs could be useful to assist dentists in early diagnosis and help to reduce the morbidity and mortality associated with low BMD and osteoporosis.
NASA Astrophysics Data System (ADS)
Xin, Ni; Gu, Xiao-Feng; Wu, Hao; Hu, Yu-Zhu; Yang, Zhong-Lin
2012-04-01
Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.
Target attribute-based false alarm rejection in small infrared target detection
NASA Astrophysics Data System (ADS)
Kim, Sungho
2012-11-01
Infrared search and track is an important research area in military applications. Although there are a lot of works on small infrared target detection methods, we cannot apply them in real field due to high false alarm rate caused by clutters. This paper presents a novel target attribute extraction and machine learning-based target discrimination method. Eight kinds of target features are extracted and analyzed statistically. Learning-based classifiers such as SVM and Adaboost are developed and compared with conventional classifiers for real infrared images. In addition, the generalization capability is also inspected for various infrared clutters.
Sørensen, Lauge; Nielsen, Mads
2018-05-15
The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Cubillas, J. E.; Japitana, M.
2016-06-01
This study demonstrates the application of CIELAB, Color intensity, and One Dimensional Scalar Constancy as features for image recognition and classifying benthic habitats in an image with the coastal areas of Hinatuan, Surigao Del Sur, Philippines as the study area. The study area is composed of four datasets, namely: (a) Blk66L005, (b) Blk66L021, (c) Blk66L024, and (d) Blk66L0114. SVM optimization was performed in Matlab® software with the help of Parallel Computing Toolbox to hasten the SVM computing speed. The image used for collecting samples for SVM procedure was Blk66L0114 in which a total of 134,516 sample objects of mangrove, possible coral existence with rocks, sand, sea, fish pens and sea grasses were collected and processed. The collected samples were then used as training sets for the supervised learning algorithm and for the creation of class definitions. The learned hyper-planes separating one class from another in the multi-dimensional feature space can be thought of as a super feature which will then be used in developing the C (classifier) rule set in eCognition® software. The classification results of the sampling site yielded an accuracy of 98.85% which confirms the reliability of remote sensing techniques and analysis employed to orthophotos like the CIELAB, Color Intensity and One dimensional scalar constancy and the use of SVM classification algorithm in classifying benthic habitats.
Park, Eunjeong; Chang, Hyuk-Jae; Nam, Hyo Suk
2017-04-18
The pronator drift test (PDT), a neurological examination, is widely used in clinics to measure motor weakness of stroke patients. The aim of this study was to develop a PDT tool with machine learning classifiers to detect stroke symptoms based on quantification of proximal arm weakness using inertial sensors and signal processing. We extracted features of drift and pronation from accelerometer signals of wearable devices on the inner wrists of 16 stroke patients and 10 healthy controls. Signal processing and feature selection approach were applied to discriminate PDT features used to classify stroke patients. A series of machine learning techniques, namely support vector machine (SVM), radial basis function network (RBFN), and random forest (RF), were implemented to discriminate stroke patients from controls with leave-one-out cross-validation. Signal processing by the PDT tool extracted a total of 12 PDT features from sensors. Feature selection abstracted the major attributes from the 12 PDT features to elucidate the dominant characteristics of proximal weakness of stroke patients using machine learning classification. Our proposed PDT classifiers had an area under the receiver operating characteristic curve (AUC) of .806 (SVM), .769 (RBFN), and .900 (RF) without feature selection, and feature selection improves the AUCs to .913 (SVM), .956 (RBFN), and .975 (RF), representing an average performance enhancement of 15.3%. Sensors and machine learning methods can reliably detect stroke signs and quantify proximal arm weakness. Our proposed solution will facilitate pervasive monitoring of stroke patients. ©Eunjeong Park, Hyuk-Jae Chang, Hyo Suk Nam. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 18.04.2017.
Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection
NASA Astrophysics Data System (ADS)
Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu
Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.
Men, Hong; Fu, Songlin; Yang, Jialin; Cheng, Meiqi; Shi, Yan
2018-01-01
Paraffin odor intensity is an important quality indicator when a paraffin inspection is performed. Currently, paraffin odor level assessment is mainly dependent on an artificial sensory evaluation. In this paper, we developed a paraffin odor analysis system to classify and grade four kinds of paraffin samples. The original feature set was optimized using Principal Component Analysis (PCA) and Partial Least Squares (PLS). Support Vector Machine (SVM), Random Forest (RF), and Extreme Learning Machine (ELM) were applied to three different feature data sets for classification and level assessment of paraffin. For classification, the model based on SVM, with an accuracy rate of 100%, was superior to that based on RF, with an accuracy rate of 98.33–100%, and ELM, with an accuracy rate of 98.01–100%. For level assessment, the R2 related to the training set was above 0.97 and the R2 related to the test set was above 0.87. Through comprehensive comparison, the generalization of the model based on ELM was superior to those based on SVM and RF. The scoring errors for the three models were 0.0016–0.3494, lower than the error of 0.5–1.0 measured by industry standard experts, meaning these methods have a higher prediction accuracy for scoring paraffin level. PMID:29346328
Weighted Feature Gaussian Kernel SVM for Emotion Recognition
Jia, Qingxuan
2016-01-01
Emotion recognition with weighted feature based on facial expression is a challenging research topic and has attracted great attention in the past few years. This paper presents a novel method, utilizing subregion recognition rate to weight kernel function. First, we divide the facial expression image into some uniform subregions and calculate corresponding recognition rate and weight. Then, we get a weighted feature Gaussian kernel function and construct a classifier based on Support Vector Machine (SVM). At last, the experimental results suggest that the approach based on weighted feature Gaussian kernel function has good performance on the correct rate in emotion recognition. The experiments on the extended Cohn-Kanade (CK+) dataset show that our method has achieved encouraging recognition results compared to the state-of-the-art methods. PMID:27807443
NASA Astrophysics Data System (ADS)
Mukhopadhyay, Sabyasachi; Das, Nandan K.; Kurmi, Indrajit; Pradhan, Asima; Ghosh, Nirmalya; Panigrahi, Prasanta K.
2017-10-01
We report the application of a hidden Markov model (HMM) on multifractal tissue optical properties derived via the Born approximation-based inverse light scattering method for effective discrimination of precancerous human cervical tissue sites from the normal ones. Two global fractal parameters, generalized Hurst exponent and the corresponding singularity spectrum width, computed by multifractal detrended fluctuation analysis (MFDFA), are used here as potential biomarkers. We develop a methodology that makes use of these multifractal parameters by integrating with different statistical classifiers like the HMM and support vector machine (SVM). It is shown that the MFDFA-HMM integrated model achieves significantly better discrimination between normal and different grades of cancer as compared to the MFDFA-SVM integrated model.
Biometric identification based on feature fusion with PCA and SVM
NASA Astrophysics Data System (ADS)
Lefkovits, László; Lefkovits, Szidónia; Emerich, Simina
2018-04-01
Biometric identification is gaining ground compared to traditional identification methods. Many biometric measurements may be used for secure human identification. The most reliable among them is the iris pattern because of its uniqueness, stability, unforgeability and inalterability over time. The approach presented in this paper is a fusion of different feature descriptor methods such as HOG, LIOP, LBP, used for extracting iris texture information. The classifiers obtained through the SVM and PCA methods demonstrate the effectiveness of our system applied to one and both irises. The performances measured are highly accurate and foreshadow a fusion system with a rate of identification approaching 100% on the UPOL database.
E-Nose Vapor Identification Based on Dempster-Shafer Fusion of Multiple Classifiers
NASA Technical Reports Server (NTRS)
Li, Winston; Leung, Henry; Kwan, Chiman; Linnell, Bruce R.
2005-01-01
Electronic nose (e-nose) vapor identification is an efficient approach to monitor air contaminants in space stations and shuttles in order to ensure the health and safety of astronauts. Data preprocessing (measurement denoising and feature extraction) and pattern classification are important components of an e-nose system. In this paper, a wavelet-based denoising method is applied to filter the noisy sensor measurements. Transient-state features are then extracted from the denoised sensor measurements, and are used to train multiple classifiers such as multi-layer perceptions (MLP), support vector machines (SVM), k nearest neighbor (KNN), and Parzen classifier. The Dempster-Shafer (DS) technique is used at the end to fuse the results of the multiple classifiers to get the final classification. Experimental analysis based on real vapor data shows that the wavelet denoising method can remove both random noise and outliers successfully, and the classification rate can be improved by using classifier fusion.
A Novel Mittag-Leffler Kernel Based Hybrid Fault Diagnosis Method for Wheeled Robot Driving System.
Yuan, Xianfeng; Song, Mumin; Zhou, Fengyu; Chen, Zhumin; Li, Yan
2015-01-01
The wheeled robots have been successfully applied in many aspects, such as industrial handling vehicles, and wheeled service robots. To improve the safety and reliability of wheeled robots, this paper presents a novel hybrid fault diagnosis framework based on Mittag-Leffler kernel (ML-kernel) support vector machine (SVM) and Dempster-Shafer (D-S) fusion. Using sensor data sampled under different running conditions, the proposed approach initially establishes multiple principal component analysis (PCA) models for fault feature extraction. The fault feature vectors are then applied to train the probabilistic SVM (PSVM) classifiers that arrive at a preliminary fault diagnosis. To improve the accuracy of preliminary results, a novel ML-kernel based PSVM classifier is proposed in this paper, and the positive definiteness of the ML-kernel is proved as well. The basic probability assignments (BPAs) are defined based on the preliminary fault diagnosis results and their confidence values. Eventually, the final fault diagnosis result is archived by the fusion of the BPAs. Experimental results show that the proposed framework not only is capable of detecting and identifying the faults in the robot driving system, but also has better performance in stability and diagnosis accuracy compared with the traditional methods.
A Novel Mittag-Leffler Kernel Based Hybrid Fault Diagnosis Method for Wheeled Robot Driving System
Yuan, Xianfeng; Song, Mumin; Chen, Zhumin; Li, Yan
2015-01-01
The wheeled robots have been successfully applied in many aspects, such as industrial handling vehicles, and wheeled service robots. To improve the safety and reliability of wheeled robots, this paper presents a novel hybrid fault diagnosis framework based on Mittag-Leffler kernel (ML-kernel) support vector machine (SVM) and Dempster-Shafer (D-S) fusion. Using sensor data sampled under different running conditions, the proposed approach initially establishes multiple principal component analysis (PCA) models for fault feature extraction. The fault feature vectors are then applied to train the probabilistic SVM (PSVM) classifiers that arrive at a preliminary fault diagnosis. To improve the accuracy of preliminary results, a novel ML-kernel based PSVM classifier is proposed in this paper, and the positive definiteness of the ML-kernel is proved as well. The basic probability assignments (BPAs) are defined based on the preliminary fault diagnosis results and their confidence values. Eventually, the final fault diagnosis result is archived by the fusion of the BPAs. Experimental results show that the proposed framework not only is capable of detecting and identifying the faults in the robot driving system, but also has better performance in stability and diagnosis accuracy compared with the traditional methods. PMID:26229526
Pedestrian detection in crowded scenes with the histogram of gradients principle
NASA Astrophysics Data System (ADS)
Sidla, O.; Rosner, M.; Lypetskyy, Y.
2006-10-01
This paper describes a close to real-time scale invariant implementation of a pedestrian detector system which is based on the Histogram of Oriented Gradients (HOG) principle. Salient HOG features are first selected from a manually created very large database of samples with an evolutionary optimization procedure that directly trains a polynomial Support Vector Machine (SVM). Real-time operation is achieved by a cascaded 2-step classifier which uses first a very fast linear SVM (with the same features as the polynomial SVM) to reject most of the irrelevant detections and then computes the decision function with a polynomial SVM on the remaining set of candidate detections. Scale invariance is achieved by running the detector of constant size on scaled versions of the original input images and by clustering the results over all resolutions. The pedestrian detection system has been implemented in two versions: i) fully body detection, and ii) upper body only detection. The latter is especially suited for very busy and crowded scenarios. On a state-of-the-art PC it is able to run at a frequency of 8 - 20 frames/sec.
A mechatronics platform to study prosthetic hand control using EMG signals.
Geethanjali, P
2016-09-01
In this paper, a low-cost mechatronics platform for the design and development of robotic hands as well as a surface electromyogram (EMG) pattern recognition system is proposed. This paper also explores various EMG classification techniques using a low-cost electronics system in prosthetic hand applications. The proposed platform involves the development of a four channel EMG signal acquisition system; pattern recognition of acquired EMG signals; and development of a digital controller for a robotic hand. Four-channel surface EMG signals, acquired from ten healthy subjects for six different movements of the hand, were used to analyse pattern recognition in prosthetic hand control. Various time domain features were extracted and grouped into five ensembles to compare the influence of features in feature-selective classifiers (SLR) with widely considered non-feature-selective classifiers, such as neural networks (NN), linear discriminant analysis (LDA) and support vector machines (SVM) applied with different kernels. The results divulged that the average classification accuracy of the SVM, with a linear kernel function, outperforms other classifiers with feature ensembles, Hudgin's feature set and auto regression (AR) coefficients. However, the slight improvement in classification accuracy of SVM incurs more processing time and memory space in the low-level controller. The Kruskal-Wallis (KW) test also shows that there is no significant difference in the classification performance of SLR with Hudgin's feature set to that of SVM with Hudgin's features along with AR coefficients. In addition, the KW test shows that SLR was found to be better in respect to computation time and memory space, which is vital in a low-level controller. Similar to SVM, with a linear kernel function, other non-feature selective LDA and NN classifiers also show a slight improvement in performance using twice the features but with the drawback of increased memory space requirement and time. This prototype facilitated the study of various issues of pattern recognition and identified an efficient classifier, along with a feature ensemble, in the implementation of EMG controlled prosthetic hands in a laboratory setting at low-cost. This platform may help to motivate and facilitate prosthetic hand research in developing countries.
Bias in error estimation when using cross-validation for model selection.
Varma, Sudhir; Simon, Richard
2006-02-23
Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these "null" datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With "null" and "non null" (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the "null" datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training data-sets. For SVM with optimal parameters the estimated error rate was less than 30% on 38% of "null" data-sets. Performance of the optimized classifiers on the independent test set was no better than chance. The nested CV procedure reduces the bias considerably and gives an estimate of the error that is very close to that obtained on the independent testing set for both Shrunken Centroids and SVM classifiers for "null" and "non-null" data distributions. We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.
NASA Astrophysics Data System (ADS)
Adelabu, Samuel; Mutanga, Onisimo; Adam, Elhadi; Cho, Moses Azong
2013-01-01
Classification of different tree species in semiarid areas can be challenging as a result of the change in leaf structure and orientation due to soil moisture constraints. Tree species mapping is, however, a key parameter for forest management in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples.We performed classification using random forest (RF) and support vector machines (SVM) based on EnMap box. The overall accuracies for classifying the five tree species was 88.75 and 85% for both SVM and RF, respectively. We also demonstrated that the new red-edge band in the RapidEye sensor has the potential for classifying tree species in semiarid environments when integrated with other standard bands. Similarly, we observed that where there are limited training samples, SVM is preferred over RF. Finally, we demonstrated that the two accuracy measures of quantity and allocation disagreement are simpler and more helpful for the vast majority of remote sensing classification process than the kappa coefficient. Overall, high species classification can be achieved using strategically located RapidEye bands integrated with advanced processing algorithms.
Nanthagopal, A Padma; Rajamony, R Sukanesh
2012-07-01
The proposed system provides new textural information for segmenting tumours, efficiently and accurately and with less computational time, from benign and malignant tumour images, especially in smaller dimensions of tumour regions of computed tomography (CT) images. Region-based segmentation of tumour from brain CT image data is an important but time-consuming task performed manually by medical experts. The objective of this work is to segment brain tumour from CT images using combined grey and texture features with new edge features and nonlinear support vector machine (SVM) classifier. The selected optimal features are used to model and train the nonlinear SVM classifier to segment the tumour from computed tomography images and the segmentation accuracies are evaluated for each slice of the tumour image. The method is applied on real data of 80 benign, malignant tumour images. The results are compared with the radiologist labelled ground truth. Quantitative analysis between ground truth and the segmented tumour is presented in terms of segmentation accuracy and the overlap similarity measure dice metric. From the analysis and performance measures such as segmentation accuracy and dice metric, it is inferred that better segmentation accuracy and higher dice metric are achieved with the normalized cut segmentation method than with the fuzzy c-means clustering method.
NASA Astrophysics Data System (ADS)
Gatos, I.; Tsantis, S.; Karamesini, M.; Skouroliakou, A.; Kagadis, G.
2015-09-01
Purpose: The design and implementation of a computer-based image analysis system employing the support vector machine (SVM) classifier system for the classification of Focal Liver Lesions (FLLs) on routine non-enhanced, T2-weighted Magnetic Resonance (MR) images. Materials and Methods: The study comprised 92 patients; each one of them has undergone MRI performed on a Magnetom Concerto (Siemens). Typical signs on dynamic contrast-enhanced MRI and biopsies were employed towards a three class categorization of the 92 cases: 40-benign FLLs, 25-Hepatocellular Carcinomas (HCC) within Cirrhotic liver parenchyma and 27-liver metastases from Non-Cirrhotic liver. Prior to FLLs classification an automated lesion segmentation algorithm based on Marcov Random Fields was employed in order to acquire each FLL Region of Interest. 42 texture features derived from the gray-level histogram, co-occurrence and run-length matrices and 12 morphological features were obtained from each lesion. Stepwise multi-linear regression analysis was utilized to avoid feature redundancy leading to a feature subset that fed the multiclass SVM classifier designed for lesion classification. SVM System evaluation was performed by means of leave-one-out method and ROC analysis. Results: Maximum accuracy for all three classes (90.0%) was obtained by means of the Radial Basis Kernel Function and three textural features (Inverse- Different-Moment, Sum-Variance and Long-Run-Emphasis) that describe lesion's contrast, variability and shape complexity. Sensitivity values for the three classes were 92.5%, 81.5% and 96.2% respectively, whereas specificity values were 94.2%, 95.3% and 95.5%. The AUC value achieved for the selected subset was 0.89 with 0.81 - 0.94 confidence interval. Conclusion: The proposed SVM system exhibit promising results that could be utilized as a second opinion tool to the radiologist in order to decrease the time/cost of diagnosis and the need for patients to undergo invasive examination.
Nanthini, B. Suguna; Santhi, B.
2017-01-01
Background: Epilepsy causes when the repeated seizure occurs in the brain. Electroencephalogram (EEG) test provides valuable information about the brain functions and can be useful to detect brain disorder, especially for epilepsy. In this study, application for an automated seizure detection model has been introduced successfully. Materials and Methods: The EEG signals are decomposed into sub-bands by discrete wavelet transform using db2 (daubechies) wavelet. The eight statistical features, the four gray level co-occurrence matrix and Renyi entropy estimation with four different degrees of order, are extracted from the raw EEG and its sub-bands. Genetic algorithm (GA) is used to select eight relevant features from the 16 dimension features. The model has been trained and tested using support vector machine (SVM) classifier successfully for EEG signals. The performance of the SVM classifier is evaluated for two different databases. Results: The study has been experimented through two different analyses and achieved satisfactory performance for automated seizure detection using relevant features as the input to the SVM classifier. Conclusion: Relevant features using GA give better accuracy performance for seizure detection. PMID:28781480
Gabere, Musa Nur; Hussein, Mohamed Aly; Aziz, Mohammad Azhar
2016-01-01
Purpose There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. Methods In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). Results The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. Conclusion This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. PMID:27330311
An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Fang, Yu-Hong; Zhao, Yu-Jun; Zhang, Ming
2016-01-01
We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.
Geographical classification of apple based on hyperspectral imaging
NASA Astrophysics Data System (ADS)
Guo, Zhiming; Huang, Wenqian; Chen, Liping; Zhao, Chunjiang; Peng, Yankun
2013-05-01
Attribute of apple according to geographical origin is often recognized and appreciated by the consumers. It is usually an important factor to determine the price of a commercial product. Hyperspectral imaging technology and supervised pattern recognition was attempted to discriminate apple according to geographical origins in this work. Hyperspectral images of 207 Fuji apple samples were collected by hyperspectral camera (400-1000nm). Principal component analysis (PCA) was performed on hyperspectral imaging data to determine main efficient wavelength images, and then characteristic variables were extracted by texture analysis based on gray level co-occurrence matrix (GLCM) from dominant waveband image. All characteristic variables were obtained by fusing the data of images in efficient spectra. Support vector machine (SVM) was used to construct the classification model, and showed excellent performance in classification results. The total classification rate had the high classify accuracy of 92.75% in the training set and 89.86% in the prediction sets, respectively. The overall results demonstrated that the hyperspectral imaging technique coupled with SVM classifier can be efficiently utilized to discriminate Fuji apple according to geographical origins.
Velazquez-Pupo, Roxana; Sierra-Romero, Alberto; Torres-Roman, Deni; Shkvarko, Yuriy V.; Romero-Delgado, Misael
2018-01-01
This paper presents a high performance vision-based system with a single static camera for traffic surveillance, for moving vehicle detection with occlusion handling, tracking, counting, and One Class Support Vector Machine (OC-SVM) classification. In this approach, moving objects are first segmented from the background using the adaptive Gaussian Mixture Model (GMM). After that, several geometric features are extracted, such as vehicle area, height, width, centroid, and bounding box. As occlusion is present, an algorithm was implemented to reduce it. The tracking is performed with adaptive Kalman filter. Finally, the selected geometric features: estimated area, height, and width are used by different classifiers in order to sort vehicles into three classes: small, midsize, and large. Extensive experimental results in eight real traffic videos with more than 4000 ground truth vehicles have shown that the improved system can run in real time under an occlusion index of 0.312 and classify vehicles with a global detection rate or recall, precision, and F-measure of up to 98.190%, and an F-measure of up to 99.051% for midsize vehicles. PMID:29382078
Link prediction in multiplex online social networks
NASA Astrophysics Data System (ADS)
Jalili, Mahdi; Orouskhani, Yasin; Asgari, Milad; Alipourfard, Nazanin; Perc, Matjaž
2017-02-01
Online social networks play a major role in modern societies, and they have shaped the way social relationships evolve. Link prediction in social networks has many potential applications such as recommending new items to users, friendship suggestion and discovering spurious connections. Many real social networks evolve the connections in multiple layers (e.g. multiple social networking platforms). In this article, we study the link prediction problem in multiplex networks. As an example, we consider a multiplex network of Twitter (as a microblogging service) and Foursquare (as a location-based social network). We consider social networks of the same users in these two platforms and develop a meta-path-based algorithm for predicting the links. The connectivity information of the two layers is used to predict the links in Foursquare network. Three classical classifiers (naive Bayes, support vector machines (SVM) and K-nearest neighbour) are used for the classification task. Although the networks are not highly correlated in the layers, our experiments show that including the cross-layer information significantly improves the prediction performance. The SVM classifier results in the best performance with an average accuracy of 89%.
Link prediction in multiplex online social networks.
Jalili, Mahdi; Orouskhani, Yasin; Asgari, Milad; Alipourfard, Nazanin; Perc, Matjaž
2017-02-01
Online social networks play a major role in modern societies, and they have shaped the way social relationships evolve. Link prediction in social networks has many potential applications such as recommending new items to users, friendship suggestion and discovering spurious connections. Many real social networks evolve the connections in multiple layers (e.g. multiple social networking platforms). In this article, we study the link prediction problem in multiplex networks. As an example, we consider a multiplex network of Twitter (as a microblogging service) and Foursquare (as a location-based social network). We consider social networks of the same users in these two platforms and develop a meta-path-based algorithm for predicting the links. The connectivity information of the two layers is used to predict the links in Foursquare network. Three classical classifiers (naive Bayes, support vector machines (SVM) and K-nearest neighbour) are used for the classification task. Although the networks are not highly correlated in the layers, our experiments show that including the cross-layer information significantly improves the prediction performance. The SVM classifier results in the best performance with an average accuracy of 89%.
Du, Yuncheng; Budman, Hector M; Duever, Thomas A
2016-06-01
Accurate automated quantitative analysis of living cells based on fluorescence microscopy images can be very useful for fast evaluation of experimental outcomes and cell culture protocols. In this work, an algorithm is developed for fast differentiation of normal and apoptotic viable Chinese hamster ovary (CHO) cells. For effective segmentation of cell images, a stochastic segmentation algorithm is developed by combining a generalized polynomial chaos expansion with a level set function-based segmentation algorithm. This approach provides a probabilistic description of the segmented cellular regions along the boundary, from which it is possible to calculate morphological changes related to apoptosis, i.e., the curvature and length of a cell's boundary. These features are then used as inputs to a support vector machine (SVM) classifier that is trained to distinguish between normal and apoptotic viable states of CHO cell images. The use of morphological features obtained from the stochastic level set segmentation of cell images in combination with the trained SVM classifier is more efficient in terms of differentiation accuracy as compared with the original deterministic level set method.
Support vector machine-based facial-expression recognition method combining shape and appearance
NASA Astrophysics Data System (ADS)
Han, Eun Jung; Kang, Byung Jun; Park, Kang Ryoung; Lee, Sangyoun
2010-11-01
Facial expression recognition can be widely used for various applications, such as emotion-based human-machine interaction, intelligent robot interfaces, face recognition robust to expression variation, etc. Previous studies have been classified as either shape- or appearance-based recognition. The shape-based method has the disadvantage that the individual variance of facial feature points exists irrespective of similar expressions, which can cause a reduction of the recognition accuracy. The appearance-based method has a limitation in that the textural information of the face is very sensitive to variations in illumination. To overcome these problems, a new facial-expression recognition method is proposed, which combines both shape and appearance information, based on the support vector machine (SVM). This research is novel in the following three ways as compared to previous works. First, the facial feature points are automatically detected by using an active appearance model. From these, the shape-based recognition is performed by using the ratios between the facial feature points based on the facial-action coding system. Second, the SVM, which is trained to recognize the same and different expression classes, is proposed to combine two matching scores obtained from the shape- and appearance-based recognitions. Finally, a single SVM is trained to discriminate four different expressions, such as neutral, a smile, anger, and a scream. By determining the expression of the input facial image whose SVM output is at a minimum, the accuracy of the expression recognition is much enhanced. The experimental results showed that the recognition accuracy of the proposed method was better than previous researches and other fusion methods.
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2008-12-01
Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.
Automatic system for radar echoes filtering based on textural features and artificial intelligence
NASA Astrophysics Data System (ADS)
Hedir, Mehdia; Haddad, Boualem
2017-10-01
Among the very popular Artificial Intelligence (AI) techniques, Artificial Neural Network (ANN) and Support Vector Machine (SVM) have been retained to process Ground Echoes (GE) on meteorological radar images taken from Setif (Algeria) and Bordeaux (France) with different climates and topologies. To achieve this task, AI techniques were associated with textural approaches. We used Gray Level Co-occurrence Matrix (GLCM) and Completed Local Binary Pattern (CLBP); both methods were largely used in image analysis. The obtained results show the efficiency of texture to preserve precipitations forecast on both sites with the accuracy of 98% on Bordeaux and 95% on Setif despite the AI technique used. 98% of GE are suppressed with SVM, this rate is outperforming ANN skills. CLBP approach associated to SVM eliminates 98% of GE and preserves precipitations forecast on Bordeaux site better than on Setif's, while it exhibits lower accuracy with ANN. SVM classifier is well adapted to the proposed application since the average filtering rate is 95-98% with texture and 92-93% with CLBP. These approaches allow removing Anomalous Propagations (APs) too with a better accuracy of 97.15% with texture and SVM. In fact, textural features associated to AI techniques are an efficient tool for incoherent radars to surpass spurious echoes.
Guo, Shuxiang; Pang, Muye; Gao, Baofeng; Hirata, Hideyuki; Ishihara, Hidenori
2015-01-01
The surface electromyography (sEMG) technique is proposed for muscle activation detection and intuitive control of prostheses or robot arms. Motion recognition is widely used to map sEMG signals to the target motions. One of the main factors preventing the implementation of this kind of method for real-time applications is the unsatisfactory motion recognition rate and time consumption. The purpose of this paper is to compare eight combinations of four feature extraction methods (Root Mean Square (RMS), Detrended Fluctuation Analysis (DFA), Weight Peaks (WP), and Muscular Model (MM)) and two classifiers (Neural Networks (NN) and Support Vector Machine (SVM)), for the task of mapping sEMG signals to eight upper-limb motions, to find out the relation between these methods and propose a proper combination to solve this issue. Seven subjects participated in the experiment and six muscles of the upper-limb were selected to record sEMG signals. The experimental results showed that NN classifier obtained the highest recognition accuracy rate (88.7%) during the training process while SVM performed better in real-time experiments (85.9%). For time consumption, SVM took less time than NN during the training process but needed more time for real-time computation. Among the four feature extraction methods, WP had the highest recognition rate for the training process (97.7%) while MM performed the best during real-time tests (94.3%). The combination of MM and NN is recommended for strict real-time applications while a combination of MM and SVM will be more suitable when time consumption is not a key requirement. PMID:25894941
Image Augmentation for Object Image Classification Based On Combination of Pre-Trained CNN and SVM
NASA Astrophysics Data System (ADS)
Shima, Yoshihiro
2018-04-01
Neural networks are a powerful means of classifying object images. The proposed image category classification method for object images combines convolutional neural networks (CNNs) and support vector machines (SVMs). A pre-trained CNN, called Alex-Net, is used as a pattern-feature extractor. Alex-Net is pre-trained for the large-scale object-image dataset ImageNet. Instead of training, Alex-Net, pre-trained for ImageNet is used. An SVM is used as trainable classifier. The feature vectors are passed to the SVM from Alex-Net. The STL-10 dataset are used as object images. The number of classes is ten. Training and test samples are clearly split. STL-10 object images are trained by the SVM with data augmentation. We use the pattern transformation method with the cosine function. We also apply some augmentation method such as rotation, skewing and elastic distortion. By using the cosine function, the original patterns were left-justified, right-justified, top-justified, or bottom-justified. Patterns were also center-justified and enlarged. Test error rate is decreased by 0.435 percentage points from 16.055% by augmentation with cosine transformation. Error rates are increased by other augmentation method such as rotation, skewing and elastic distortion, compared without augmentation. Number of augmented data is 30 times that of the original STL-10 5K training samples. Experimental test error rate for the test 8k STL-10 object images was 15.620%, which shows that image augmentation is effective for image category classification.
NASA Astrophysics Data System (ADS)
Taha, Zahari; Muazu Musa, Rabiu; Majeed, A. P. P. Abdul; Razali Abdullah, Mohamad; Aizzat Zakaria, Muhammad; Muaz Alim, Muhammad; Arif Mat Jizat, Jessnor; Fauzi Ibrahim, Mohamad
2018-03-01
Support Vector Machine (SVM) has been revealed to be a powerful learning algorithm for classification and prediction. However, the use of SVM for prediction and classification in sport is at its inception. The present study classified and predicted high and low potential archers from a collection of psychological coping skills variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. Psychological coping skills inventory which evaluates the archers level of related coping skills were filled out by the archers prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models, i.e. linear and fine radial basis function (RBF) kernel functions, were trained on the psychological variables. The k-means clustered the archers into high psychologically prepared archers (HPPA) and low psychologically prepared archers (LPPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy and precision throughout the exercise with an accuracy of 92% and considerably fewer error rate for the prediction of the HPPA and the LPPA as compared to the fine RBF SVM. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected psychological coping skills variables examined which would consequently save time and energy during talent identification and development programme.
NASA Technical Reports Server (NTRS)
Kocurek, Michael J.
2005-01-01
The HARVIST project seeks to automatically provide an accurate, interactive interface to predict crop yield over the entire United States. In order to accomplish this goal, large images must be quickly and automatically classified by crop type. Current trained and untrained classification algorithms, while accurate, are highly inefficient when operating on large datasets. This project sought to develop new variants of two standard trained and untrained classification algorithms that are optimized to take advantage of the spatial nature of image data. The first algorithm, harvist-cluster, utilizes divide-and-conquer techniques to precluster an image in the hopes of increasing overall clustering speed. The second algorithm, harvistSVM, utilizes support vector machines (SVMs), a type of trained classifier. It seeks to increase classification speed by applying a "meta-SVM" to a quick (but inaccurate) SVM to approximate a slower, yet more accurate, SVM. Speedups were achieved by tuning the algorithm to quickly identify when the quick SVM was incorrect, and then reclassifying low-confidence pixels as necessary. Comparing the classification speeds of both algorithms to known baselines showed a slight speedup for large values of k (the number of clusters) for harvist-cluster, and a significant speedup for harvistSVM. Future work aims to automate the parameter tuning process required for harvistSVM, and further improve classification accuracy and speed. Additionally, this research will move documents created in Canvas into ArcGIS. The launch of the Mars Reconnaissance Orbiter (MRO) will provide a wealth of image data such as global maps of Martian weather and high resolution global images of Mars. The ability to store this new data in a georeferenced format will support future Mars missions by providing data for landing site selection and the search for water on Mars.
Yu, Wei; Clyne, Melinda; Dolan, Siobhan M; Yesupriya, Ajay; Wulf, Anja; Liu, Tiebin; Khoury, Muin J; Gwinn, Marta
2008-04-22
Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.
Realistic Subsurface Anomaly Discrimination Using Electromagnetic Induction and an SVM Classifier
NASA Astrophysics Data System (ADS)
Pablo Fernández, Juan; Shubitidze, Fridon; Shamatava, Irma; Barrowes, Benjamin E.; O'Neill, Kevin
2010-12-01
The environmental research program of the United States military has set up blind tests for detection and discrimination of unexploded ordnance. One such test consists of measurements taken with the EM-63 sensor at Camp Sibert, AL. We review the performance on the test of a procedure that combines a field-potential (HAP) method to locate targets, the normalized surface magnetic source (NSMS) model to characterize them, and a support vector machine (SVM) to classify them. The HAP method infers location from the scattered magnetic field and its associated scalar potential, the latter reconstructed using equivalent sources. NSMS replaces the target with an enclosing spheroid of equivalent radial magnetization whose integral it uses as a discriminator. SVM generalizes from empirical evidence and can be adapted for multiclass discrimination using a voting system. Our method identifies all potentially dangerous targets correctly and has a false-alarm rate of about 5%.
Trakoolwilaiwan, Thanawin; Behboodi, Bahareh; Lee, Jaeseok; Kim, Kyungsoo; Choi, Ji-Woong
2018-01-01
The aim of this work is to develop an effective brain-computer interface (BCI) method based on functional near-infrared spectroscopy (fNIRS). In order to improve the performance of the BCI system in terms of accuracy, the ability to discriminate features from input signals and proper classification are desired. Previous studies have mainly extracted features from the signal manually, but proper features need to be selected carefully. To avoid performance degradation caused by manual feature selection, we applied convolutional neural networks (CNNs) as the automatic feature extractor and classifier for fNIRS-based BCI. In this study, the hemodynamic responses evoked by performing rest, right-, and left-hand motor execution tasks were measured on eight healthy subjects to compare performances. Our CNN-based method provided improvements in classification accuracy over conventional methods employing the most commonly used features of mean, peak, slope, variance, kurtosis, and skewness, classified by support vector machine (SVM) and artificial neural network (ANN). Specifically, up to 6.49% and 3.33% improvement in classification accuracy was achieved by CNN compared with SVM and ANN, respectively.
Computer-Vision-Assisted Palm Rehabilitation With Supervised Learning.
Vamsikrishna, K M; Dogra, Debi Prosad; Desarkar, Maunendra Sankar
2016-05-01
Physical rehabilitation supported by the computer-assisted-interface is gaining popularity among health-care fraternity. In this paper, we have proposed a computer-vision-assisted contactless methodology to facilitate palm and finger rehabilitation. Leap motion controller has been interfaced with a computing device to record parameters describing 3-D movements of the palm of a user undergoing rehabilitation. We have proposed an interface using Unity3D development platform. Our interface is capable of analyzing intermediate steps of rehabilitation without the help of an expert, and it can provide online feedback to the user. Isolated gestures are classified using linear discriminant analysis (DA) and support vector machines (SVM). Finally, a set of discrete hidden Markov models (HMM) have been used to classify gesture sequence performed during rehabilitation. Experimental validation using a large number of samples collected from healthy volunteers reveals that DA and SVM perform similarly while applied on isolated gesture recognition. We have compared the results of HMM-based sequence classification with CRF-based techniques. Our results confirm that both HMM and CRF perform quite similarly when tested on gesture sequences. The proposed system can be used for home-based palm or finger rehabilitation in the absence of experts.
NASA Astrophysics Data System (ADS)
Lee, Youngjoo; Kim, Namkug; Seo, Joon Beom; Lee, JuneGoo; Kang, Suk Ho
2007-03-01
In this paper, we proposed novel shape features to improve classification performance of differentiating obstructive lung diseases, based on HRCT (High Resolution Computerized Tomography) images. The images were selected from HRCT images, obtained from 82 subjects. For each image, two experienced radiologists selected rectangular ROIs with various sizes (16x16, 32x32, and 64x64 pixels), representing each disease or normal lung parenchyma. Besides thirteen textural features, we employed additional seven shape features; cluster shape features, and Top-hat transform features. To evaluate the contribution of shape features for differentiation of obstructive lung diseases, several experiments were conducted with two different types of classifiers and various ROI sizes. For automated classification, the Bayesian classifier and support vector machine (SVM) were implemented. To assess the performance and cross-validation of the system, 5-folding method was used. In comparison to employing only textural features, adding shape features yields significant enhancement of overall sensitivity(5.9, 5.4, 4.4% in the Bayesian and 9.0, 7.3, 5.3% in the SVM), in the order of ROI size 16x16, 32x32, 64x64 pixels, respectively (t-test, p<0.01). Moreover, this enhancement was largely due to the improvement on class-specific sensitivity of mild centrilobular emphysema and bronchiolitis obliterans which are most hard to differentiate for radiologists. According to these experimental results, adding shape features to conventional texture features is much useful to improve classification performance of obstructive lung diseases in both Bayesian and SVM classifiers.
Automatic Organ Segmentation for CT Scans Based on Super-Pixel and Convolutional Neural Networks.
Liu, Xiaoming; Guo, Shuxu; Yang, Bingtao; Ma, Shuzhi; Zhang, Huimao; Li, Jing; Sun, Changjian; Jin, Lanyi; Li, Xueyan; Yang, Qi; Fu, Yu
2018-04-20
Accurate segmentation of specific organ from computed tomography (CT) scans is a basic and crucial task for accurate diagnosis and treatment. To avoid time-consuming manual optimization and to help physicians distinguish diseases, an automatic organ segmentation framework is presented. The framework utilized convolution neural networks (CNN) to classify pixels. To reduce the redundant inputs, the simple linear iterative clustering (SLIC) of super-pixels and the support vector machine (SVM) classifier are introduced. To establish the perfect boundary of organs in one-pixel-level, the pixels need to be classified step-by-step. First, the SLIC is used to cut an image into grids and extract respective digital signatures. Next, the signature is classified by the SVM, and the rough edges are acquired. Finally, a precise boundary is obtained by the CNN, which is based on patches around each pixel-point. The framework is applied to abdominal CT scans of livers and high-resolution computed tomography (HRCT) scans of lungs. The experimental CT scans are derived from two public datasets (Sliver 07 and a Chinese local dataset). Experimental results show that the proposed method can precisely and efficiently detect the organs. This method consumes 38 s/slice for liver segmentation. The Dice coefficient of the liver segmentation results reaches to 97.43%. For lung segmentation, the Dice coefficient is 97.93%. This finding demonstrates that the proposed framework is a favorable method for lung segmentation of HRCT scans.
NASA Astrophysics Data System (ADS)
Chen, Zhang; Peng, Zhenming; Peng, Lingbing; Liao, Dongyi; He, Xin
2011-11-01
With the swift and violent development of the Multimedia Messaging Service (MMS), it becomes an urgent task to filter the Multimedia Message (MM) spam effectively in real-time. For the fact that most MMs contain images or videos, a method based on retrieving images is given in this paper for filtering MM spam. The detection method used in this paper is a combination of skin-color detection, texture detection, and face detection, and the classifier for this imbalanced problem is a very fast multi-classification combining Support vector machine (SVM) with unilateral binary decision tree. The experiments on 3 test sets show that the proposed method is effective, with the interception rate up to 60% and the average detection time for each image less than 1 second.
Classifying epileptic EEG signals with delay permutation entropy and Multi-Scale K-means.
Zhu, Guohun; Li, Yan; Wen, Peng Paul; Wang, Shuaifang
2015-01-01
Most epileptic EEG classification algorithms are supervised and require large training datasets, that hinder their use in real time applications. This chapter proposes an unsupervised Multi-Scale K-means (MSK-means) MSK-means algorithm to distinguish epileptic EEG signals and identify epileptic zones. The random initialization of the K-means algorithm can lead to wrong clusters. Based on the characteristics of EEGs, the MSK-means MSK-means algorithm initializes the coarse-scale centroid of a cluster with a suitable scale factor. In this chapter, the MSK-means algorithm is proved theoretically superior to the K-means algorithm on efficiency. In addition, three classifiers: the K-means, MSK-means MSK-means and support vector machine (SVM), are used to identify seizure and localize epileptogenic zone using delay permutation entropy features. The experimental results demonstrate that identifying seizure with the MSK-means algorithm and delay permutation entropy achieves 4. 7 % higher accuracy than that of K-means, and 0. 7 % higher accuracy than that of the SVM.
Graña, M; Termenon, M; Savio, A; Gonzalez-Pinto, A; Echeveste, J; Pérez, J M; Besga, A
2011-09-20
The aim of this paper is to obtain discriminant features from two scalar measures of Diffusion Tensor Imaging (DTI) data, Fractional Anisotropy (FA) and Mean Diffusivity (MD), and to train and test classifiers able to discriminate Alzheimer's Disease (AD) patients from controls on the basis of features extracted from the FA or MD volumes. In this study, support vector machine (SVM) classifier was trained and tested on FA and MD data. Feature selection is done computing the Pearson's correlation between FA or MD values at voxel site across subjects and the indicative variable specifying the subject class. Voxel sites with high absolute correlation are selected for feature extraction. Results are obtained over an on-going study in Hospital de Santiago Apostol collecting anatomical T1-weighted MRI volumes and DTI data from healthy control subjects and AD patients. FA features and a linear SVM classifier achieve perfect accuracy, sensitivity and specificity in several cross-validation studies, supporting the usefulness of DTI-derived features as an image-marker for AD and to the feasibility of building Computer Aided Diagnosis systems for AD based on them. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Xiaoli; Zou, Jie; Le, Daniel X.; Thoma, George
2010-01-01
"Investigator Names" is a newly required field in MEDLINE citations. It consists of personal names listed as members of corporate organizations in an article. Extracting investigator names automatically is necessary because of the increasing volume of articles reporting collaborative biomedical research in which a large number of investigators participate. In this paper, we present an SVM-based stacked sequential learning method in a novel application - recognizing named entities such as the first and last names of investigators from online medical journal articles. Stacked sequential learning is a meta-learning algorithm which can boost any base learner. It exploits contextual information by adding the predicted labels of the surrounding tokens as features. We apply this method to tag words in text paragraphs containing investigator names, and demonstrate that stacked sequential learning improves the performance of a nonsequential base learner such as an SVM classifier.
Fall Detection Using Smartphone Audio Features.
Cheffena, Michael
2016-07-01
An automated fall detection system based on smartphone audio features is developed. The spectrogram, mel frequency cepstral coefficents (MFCCs), linear predictive coding (LPC), and matching pursuit (MP) features of different fall and no-fall sound events are extracted from experimental data. Based on the extracted audio features, four different machine learning classifiers: k-nearest neighbor classifier (k-NN), support vector machine (SVM), least squares method (LSM), and artificial neural network (ANN) are investigated for distinguishing between fall and no-fall events. For each audio feature, the performance of each classifier in terms of sensitivity, specificity, accuracy, and computational complexity is evaluated. The best performance is achieved using spectrogram features with ANN classifier with sensitivity, specificity, and accuracy all above 98%. The classifier also has acceptable computational requirement for training and testing. The system is applicable in home environments where the phone is placed in the vicinity of the user.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features
Mohammad-Noori, Morteza; Beer, Michael A.
2014-01-01
Abstract Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. PMID:25033408
Enhanced regulatory sequence prediction using gapped k-mer features.
Ghandi, Mahmoud; Lee, Dongwon; Mohammad-Noori, Morteza; Beer, Michael A
2014-07-01
Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.
lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.
Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia
2015-01-01
Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.
Accuracy of automated classification of major depressive disorder as a function of symptom severity.
Ramasubbu, Rajamannar; Brown, Matthew R G; Cortese, Filmeno; Gaxiola, Ismael; Goodyear, Bradley; Greenshaw, Andrew J; Dursun, Serdar M; Greiner, Russell
2016-01-01
Growing evidence documents the potential of machine learning for developing brain based diagnostic methods for major depressive disorder (MDD). As symptom severity may influence brain activity, we investigated whether the severity of MDD affected the accuracies of machine learned MDD-vs-Control diagnostic classifiers. Forty-five medication-free patients with DSM-IV defined MDD and 19 healthy controls participated in the study. Based on depression severity as determined by the Hamilton Rating Scale for Depression (HRSD), MDD patients were sorted into three groups: mild to moderate depression (HRSD 14-19), severe depression (HRSD 20-23), and very severe depression (HRSD ≥ 24). We collected functional magnetic resonance imaging (fMRI) data during both resting-state and an emotional-face matching task. Patients in each of the three severity groups were compared against controls in separate analyses, using either the resting-state or task-based fMRI data. We use each of these six datasets with linear support vector machine (SVM) binary classifiers for identifying individuals as patients or controls. The resting-state fMRI data showed statistically significant classification accuracy only for the very severe depression group (accuracy 66%, p = 0.012 corrected), while mild to moderate (accuracy 58%, p = 1.0 corrected) and severe depression (accuracy 52%, p = 1.0 corrected) were only at chance. With task-based fMRI data, the automated classifier performed at chance in all three severity groups. Binary linear SVM classifiers achieved significant classification of very severe depression with resting-state fMRI, but the contribution of brain measurements may have limited potential in differentiating patients with less severe depression from healthy controls.
A minimum spanning forest based classification method for dedicated breast CT images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pike, Robert; Sechopoulos, Ioannis; Fei, Baowei, E-mail: bfei@emory.edu
Purpose: To develop and test an automated algorithm to classify different types of tissue in dedicated breast CT images. Methods: Images of a single breast of five different patients were acquired with a dedicated breast CT clinical prototype. The breast CT images were processed by a multiscale bilateral filter to reduce noise while keeping edge information and were corrected to overcome cupping artifacts. As skin and glandular tissue have similar CT values on breast CT images, morphologic processing is used to identify the skin based on its position information. A support vector machine (SVM) is trained and the resulting modelmore » used to create a pixelwise classification map of fat and glandular tissue. By combining the results of the skin mask with the SVM results, the breast tissue is classified as skin, fat, and glandular tissue. This map is then used to identify markers for a minimum spanning forest that is grown to segment the image using spatial and intensity information. To evaluate the authors’ classification method, they use DICE overlap ratios to compare the results of the automated classification to those obtained by manual segmentation on five patient images. Results: Comparison between the automatic and the manual segmentation shows that the minimum spanning forest based classification method was able to successfully classify dedicated breast CT image with average DICE ratios of 96.9%, 89.8%, and 89.5% for fat, glandular, and skin tissue, respectively. Conclusions: A 2D minimum spanning forest based classification method was proposed and evaluated for classifying the fat, skin, and glandular tissue in dedicated breast CT images. The classification method can be used for dense breast tissue quantification, radiation dose assessment, and other applications in breast imaging.« less
Amaral, Jorge L M; Lopes, Agnaldo J; Jansen, José M; Faria, Alvaro C D; Melo, Pedro L
2013-12-01
The purpose of this study was to develop an automatic classifier to increase the accuracy of the forced oscillation technique (FOT) for diagnosing early respiratory abnormalities in smoking patients. The data consisted of FOT parameters obtained from 56 volunteers, 28 healthy and 28 smokers with low tobacco consumption. Many supervised learning techniques were investigated, including logistic linear classifiers, k nearest neighbor (KNN), neural networks and support vector machines (SVM). To evaluate performance, the ROC curve of the most accurate parameter was established as baseline. To determine the best input features and classifier parameters, we used genetic algorithms and a 10-fold cross-validation using the average area under the ROC curve (AUC). In the first experiment, the original FOT parameters were used as input. We observed a significant improvement in accuracy (KNN=0.89 and SVM=0.87) compared with the baseline (0.77). The second experiment performed a feature selection on the original FOT parameters. This selection did not cause any significant improvement in accuracy, but it was useful in identifying more adequate FOT parameters. In the third experiment, we performed a feature selection on the cross products of the FOT parameters. This selection resulted in a further increase in AUC (KNN=SVM=0.91), which allows for high diagnostic accuracy. In conclusion, machine learning classifiers can help identify early smoking-induced respiratory alterations. The use of FOT cross products and the search for the best features and classifier parameters can markedly improve the performance of machine learning classifiers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Design of Clinical Support Systems Using Integrated Genetic Algorithm and Support Vector Machine
NASA Astrophysics Data System (ADS)
Chen, Yung-Fu; Huang, Yung-Fa; Jiang, Xiaoyi; Hsu, Yuan-Nian; Lin, Hsuan-Hung
Clinical decision support system (CDSS) provides knowledge and specific information for clinicians to enhance diagnostic efficiency and improving healthcare quality. An appropriate CDSS can highly elevate patient safety, improve healthcare quality, and increase cost-effectiveness. Support vector machine (SVM) is believed to be superior to traditional statistical and neural network classifiers. However, it is critical to determine suitable combination of SVM parameters regarding classification performance. Genetic algorithm (GA) can find optimal solution within an acceptable time, and is faster than greedy algorithm with exhaustive searching strategy. By taking the advantage of GA in quickly selecting the salient features and adjusting SVM parameters, a method using integrated GA and SVM (IGS), which is different from the traditional method with GA used for feature selection and SVM for classification, was used to design CDSSs for prediction of successful ventilation weaning, diagnosis of patients with severe obstructive sleep apnea, and discrimination of different cell types form Pap smear. The results show that IGS is better than methods using SVM alone or linear discriminator.
Classification of burn wounds using support vector machines
NASA Astrophysics Data System (ADS)
Acha, Begona; Serrano, Carmen; Palencia, Sergio; Murillo, Juan Jose
2004-05-01
The purpose of this work is to improve a previous method developed by the authors for the classification of burn wounds into their depths. The inputs of the system are color and texture information, as these are the characteristics observed by physicians in order to give a diagnosis. Our previous work consisted in segmenting the burn wound from the rest of the image and classifying the burn into its depth. In this paper we focus on the classification problem only. We already proposed to use a Fuzzy-ARTMAP neural network (NN). However, we may take advantage of new powerful classification tools such as Support Vector Machines (SVM). We apply the five-folded cross validation scheme to divide the database into training and validating sets. Then, we apply a feature selection method for each classifier, which will give us the set of features that yields the smallest classification error for each classifier. Features used to classify are first-order statistical parameters extracted from the L*, u* and v* color components of the image. The feature selection algorithms used are the Sequential Forward Selection (SFS) and the Sequential Backward Selection (SBS) methods. As data of the problem faced here are not linearly separable, the SVM was trained using some different kernels. The validating process shows that the SVM method, when using a Gaussian kernel of variance 1, outperforms classification results obtained with the rest of the classifiers, yielding an error classification rate of 0.7% whereas the Fuzzy-ARTMAP NN attained 1.6 %.
Lu, Shen; Xia, Yong; Cai, Tom Weidong; Feng, David Dagan
2015-01-01
Dementia, Alzheimer's disease (AD) in particular is a global problem and big threat to the aging population. An image based computer-aided dementia diagnosis method is needed to providing doctors help during medical image examination. Many machine learning based dementia classification methods using medical imaging have been proposed and most of them achieve accurate results. However, most of these methods make use of supervised learning requiring fully labeled image dataset, which usually is not practical in real clinical environment. Using large amount of unlabeled images can improve the dementia classification performance. In this study we propose a new semi-supervised dementia classification method based on random manifold learning with affinity regularization. Three groups of spatial features are extracted from positron emission tomography (PET) images to construct an unsupervised random forest which is then used to regularize the manifold learning objective function. The proposed method, stat-of-the-art Laplacian support vector machine (LapSVM) and supervised SVM are applied to classify AD and normal controls (NC). The experiment results show that learning with unlabeled images indeed improves the classification performance. And our method outperforms LapSVM on the same dataset.
Comparison of Random Forest and Support Vector Machine classifiers using UAV remote sensing imagery
NASA Astrophysics Data System (ADS)
Piragnolo, Marco; Masiero, Andrea; Pirotti, Francesco
2017-04-01
Since recent years surveying with unmanned aerial vehicles (UAV) is getting a great amount of attention due to decreasing costs, higher precision and flexibility of usage. UAVs have been applied for geomorphological investigations, forestry, precision agriculture, cultural heritage assessment and for archaeological purposes. It can be used for land use and land cover classification (LULC). In literature, there are two main types of approaches for classification of remote sensing imagery: pixel-based and object-based. On one hand, pixel-based approach mostly uses training areas to define classes and respective spectral signatures. On the other hand, object-based classification considers pixels, scale, spatial information and texture information for creating homogeneous objects. Machine learning methods have been applied successfully for classification, and their use is increasing due to the availability of faster computing capabilities. The methods learn and train the model from previous computation. Two machine learning methods which have given good results in previous investigations are Random Forest (RF) and Support Vector Machine (SVM). The goal of this work is to compare RF and SVM methods for classifying LULC using images collected with a fixed wing UAV. The processing chain regarding classification uses packages in R, an open source scripting language for data analysis, which provides all necessary algorithms. The imagery was acquired and processed in November 2015 with cameras providing information over the red, blue, green and near infrared wavelength reflectivity over a testing area in the campus of Agripolis, in Italy. Images were elaborated and ortho-rectified through Agisoft Photoscan. The ortho-rectified image is the full data set, and the test set is derived from partial sub-setting of the full data set. Different tests have been carried out, using a percentage from 2 % to 20 % of the total. Ten training sets and ten validation sets are obtained from each test set. The control dataset consist of an independent visual classification done by an expert over the whole area. The classes are (i) broadleaf, (ii) building, (iii) grass, (iv) headland access path, (v) road, (vi) sowed land, (vii) vegetable. The RF and SVM are applied to the test set. The performances of the methods are evaluated using the three following accuracy metrics: Kappa index, Classification accuracy and Classification Error. All three are calculated in three different ways: with K-fold cross validation, using the validation test set and using the full test set. The analysis indicates that SVM gets better results in terms of good scores using K-fold cross or validation test set. Using the full test set, RF achieves a better result in comparison to SVM. It also seems that SVM performs better with smaller training sets, whereas RF performs better as training sets get larger.
NASA Astrophysics Data System (ADS)
Erener, A.
2013-04-01
Automatic extraction of urban features from high resolution satellite images is one of the main applications in remote sensing. It is useful for wide scale applications, namely: urban planning, urban mapping, disaster management, GIS (geographic information systems) updating, and military target detection. One common approach to detecting urban features from high resolution images is to use automatic classification methods. This paper has four main objectives with respect to detecting buildings. The first objective is to compare the performance of the most notable supervised classification algorithms, including the maximum likelihood classifier (MLC) and the support vector machine (SVM). In this experiment the primary consideration is the impact of kernel configuration on the performance of the SVM. The second objective of the study is to explore the suitability of integrating additional bands, namely first principal component (1st PC) and the intensity image, for original data for multi classification approaches. The performance evaluation of classification results is done using two different accuracy assessment methods: pixel based and object based approaches, which reflect the third aim of the study. The objective here is to demonstrate the differences in the evaluation of accuracies of classification methods. Considering consistency, the same set of ground truth data which is produced by labeling the building boundaries in the GIS environment is used for accuracy assessment. Lastly, the fourth aim is to experimentally evaluate variation in the accuracy of classifiers for six different real situations in order to identify the impact of spatial and spectral diversity on results. The method is applied to Quickbird images for various urban complexity levels, extending from simple to complex urban patterns. The simple surface type includes a regular urban area with low density and systematic buildings with brick rooftops. The complex surface type involves almost all kinds of challenges, such as high dense build up areas, regions with bare soil, and small and large buildings with different rooftops, such as concrete, brick, and metal. Using the pixel based accuracy assessment it was shown that the percent building detection (PBD) and quality percent (QP) of the MLC and SVM depend on the complexity and texture variation of the region. Generally, PBD values range between 70% and 90% for the MLC and SVM, respectively. No substantial improvements were observed when the SVM and MLC classifications were developed by the addition of more variables, instead of the use of only four bands. In the evaluation of object based accuracy assessment, it was demonstrated that while MLC and SVM provide higher rates of correct detection, they also provide higher rates of false alarms.
Nonlinear Demodulation and Channel Coding in EBPSK Scheme
Chen, Xianqing; Wu, Lenan
2012-01-01
The extended binary phase shift keying (EBPSK) is an efficient modulation technique, and a special impacting filter (SIF) is used in its demodulator to improve the bit error rate (BER) performance. However, the conventional threshold decision cannot achieve the optimum performance, and the SIF brings more difficulty in obtaining the posterior probability for LDPC decoding. In this paper, we concentrate not only on reducing the BER of demodulation, but also on providing accurate posterior probability estimates (PPEs). A new approach for the nonlinear demodulation based on the support vector machine (SVM) classifier is introduced. The SVM method which selects only a few sampling points from the filter output was used for getting PPEs. The simulation results show that the accurate posterior probability can be obtained with this method and the BER performance can be improved significantly by applying LDPC codes. Moreover, we analyzed the effect of getting the posterior probability with different methods and different sampling rates. We show that there are more advantages of the SVM method under bad condition and it is less sensitive to the sampling rate than other methods. Thus, SVM is an effective method for EBPSK demodulation and getting posterior probability for LDPC decoding. PMID:23213281
Nonlinear demodulation and channel coding in EBPSK scheme.
Chen, Xianqing; Wu, Lenan
2012-01-01
The extended binary phase shift keying (EBPSK) is an efficient modulation technique, and a special impacting filter (SIF) is used in its demodulator to improve the bit error rate (BER) performance. However, the conventional threshold decision cannot achieve the optimum performance, and the SIF brings more difficulty in obtaining the posterior probability for LDPC decoding. In this paper, we concentrate not only on reducing the BER of demodulation, but also on providing accurate posterior probability estimates (PPEs). A new approach for the nonlinear demodulation based on the support vector machine (SVM) classifier is introduced. The SVM method which selects only a few sampling points from the filter output was used for getting PPEs. The simulation results show that the accurate posterior probability can be obtained with this method and the BER performance can be improved significantly by applying LDPC codes. Moreover, we analyzed the effect of getting the posterior probability with different methods and different sampling rates. We show that there are more advantages of the SVM method under bad condition and it is less sensitive to the sampling rate than other methods. Thus, SVM is an effective method for EBPSK demodulation and getting posterior probability for LDPC decoding.
NASA Astrophysics Data System (ADS)
Lee, Donghoon; Kim, Ye-seul; Choi, Sunghoon; Lee, Haenghwa; Jo, Byungdu; Choi, Seungyeon; Shin, Jungwook; Kim, Hee-Joung
2017-03-01
The chest digital tomosynthesis(CDT) is recently developed medical device that has several advantage for diagnosing lung disease. For example, CDT provides depth information with relatively low radiation dose compared to computed tomography (CT). However, a major problem with CDT is the image artifacts associated with data incompleteness resulting from limited angle data acquisition in CDT geometry. For this reason, the sensitivity of lung disease was not clear compared to CT. In this study, to improve sensitivity of lung disease detection in CDT, we developed computer aided diagnosis (CAD) systems based on machine learning. For design CAD systems, we used 100 cases of lung nodules cropped images and 100 cases of normal lesion cropped images acquired by lung man phantoms and proto type CDT. We used machine learning techniques based on support vector machine and Gabor filter. The Gabor filter was used for extracting characteristics of lung nodules and we compared performance of feature extraction of Gabor filter with various scale and orientation parameters. We used 3, 4, 5 scales and 4, 6, 8 orientations. After extracting features, support vector machine (SVM) was used for classifying feature of lesions. The linear, polynomial and Gaussian kernels of SVM were compared to decide the best SVM conditions for CDT reconstruction images. The results of CAD system with machine learning showed the capability of automatically lung lesion detection. Furthermore detection performance was the best when Gabor filter with 5 scale and 8 orientation and SVM with Gaussian kernel were used. In conclusion, our suggested CAD system showed improving sensitivity of lung lesion detection in CDT and decide Gabor filter and SVM conditions to achieve higher detection performance of our developed CAD system for CDT.
Classification of human carcinoma cells using multispectral imagery
NASA Astrophysics Data System (ADS)
Ćinar, Umut; Y. Ćetin, Yasemin; Ćetin-Atalay, Rengul; Ćetin, Enis
2016-03-01
In this paper, we present a technique for automatically classifying human carcinoma cell images using textural features. An image dataset containing microscopy biopsy images from different patients for 14 distinct cancer cell line type is studied. The images are captured using a RGB camera attached to an inverted microscopy device. Texture based Gabor features are extracted from multispectral input images. SVM classifier is used to generate a descriptive model for the purpose of cell line classification. The experimental results depict satisfactory performance, and the proposed method is versatile for various microscopy magnification options.
Multi-class SVM model for fMRI-based classification and grading of liver fibrosis
NASA Astrophysics Data System (ADS)
Freiman, M.; Sela, Y.; Edrei, Y.; Pappo, O.; Joskowicz, L.; Abramovitch, R.
2010-03-01
We present a novel non-invasive automatic method for the classification and grading of liver fibrosis from fMRI maps based on hepatic hemodynamic changes. This method automatically creates a model for liver fibrosis grading based on training datasets. Our supervised learning method evaluates hepatic hemodynamics from an anatomical MRI image and three T2*-W fMRI signal intensity time-course scans acquired during the breathing of air, air-carbon dioxide, and carbogen. It constructs a statistical model of liver fibrosis from these fMRI scans using a binary-based one-against-all multi class Support Vector Machine (SVM) classifier. We evaluated the resulting classification model with the leave-one out technique and compared it to both full multi-class SVM and K-Nearest Neighbor (KNN) classifications. Our experimental study analyzed 57 slice sets from 13 mice, and yielded a 98.2% separation accuracy between healthy and low grade fibrotic subjects, and an overall accuracy of 84.2% for fibrosis grading. These results are better than the existing image-based methods which can only discriminate between healthy and high grade fibrosis subjects. With appropriate extensions, our method may be used for non-invasive classification and progression monitoring of liver fibrosis in human patients instead of more invasive approaches, such as biopsy or contrast-enhanced imaging.
Natural Language Processing Based Instrument for Classification of Free Text Medical Records
2016-01-01
According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful. PMID:27668260
A random forest model based classification scheme for neonatal amplitude-integrated EEG.
Chen, Weiting; Wang, Yu; Cao, Guitao; Chen, Guoqiang; Gu, Qiufang
2014-01-01
Modern medical advances have greatly increased the survival rate of infants, while they remain in the higher risk group for neurological problems later in life. For the infants with encephalopathy or seizures, identification of the extent of brain injury is clinically challenging. Continuous amplitude-integrated electroencephalography (aEEG) monitoring offers a possibility to directly monitor the brain functional state of the newborns over hours, and has seen an increasing application in neonatal intensive care units (NICUs). This paper presents a novel combined feature set of aEEG and applies random forest (RF) method to classify aEEG tracings. To that end, a series of experiments were conducted on 282 aEEG tracing cases (209 normal and 73 abnormal ones). Basic features, statistic features and segmentation features were extracted from both the tracing as a whole and the segmented recordings, and then form a combined feature set. All the features were sent to a classifier afterwards. The significance of feature, the data segmentation, the optimization of RF parameters, and the problem of imbalanced datasets were examined through experiments. Experiments were also done to evaluate the performance of RF on aEEG signal classifying, compared with several other widely used classifiers including SVM-Linear, SVM-RBF, ANN, Decision Tree (DT), Logistic Regression(LR), ML, and LDA. The combined feature set can better characterize aEEG signals, compared with basic features, statistic features and segmentation features respectively. With the combined feature set, the proposed RF-based aEEG classification system achieved a correct rate of 92.52% and a high F1-score of 95.26%. Among all of the seven classifiers examined in our work, the RF method got the highest correct rate, sensitivity, specificity, and F1-score, which means that RF outperforms all of the other classifiers considered here. The results show that the proposed RF-based aEEG classification system with the combined feature set is efficient and helpful to better detect the brain disorders in newborns.
Human action recognition with group lasso regularized-support vector machine
NASA Astrophysics Data System (ADS)
Luo, Huiwu; Lu, Huanzhang; Wu, Yabei; Zhao, Fei
2016-05-01
The bag-of-visual-words (BOVW) and Fisher kernel are two popular models in human action recognition, and support vector machine (SVM) is the most commonly used classifier for the two models. We show two kinds of group structures in the feature representation constructed by BOVW and Fisher kernel, respectively, since the structural information of feature representation can be seen as a prior for the classifier and can improve the performance of the classifier, which has been verified in several areas. However, the standard SVM employs L2-norm regularization in its learning procedure, which penalizes each variable individually and cannot express the structural information of feature representation. We replace the L2-norm regularization with group lasso regularization in standard SVM, and a group lasso regularized-support vector machine (GLRSVM) is proposed. Then, we embed the group structural information of feature representation into GLRSVM. Finally, we introduce an algorithm to solve the optimization problem of GLRSVM by alternating directions method of multipliers. The experiments evaluated on KTH, YouTube, and Hollywood2 datasets show that our method achieves promising results and improves the state-of-the-art methods on KTH and YouTube datasets.
Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM.
Davari Dolatabadi, Azam; Khadem, Siamak Esmael Zadeh; Asl, Babak Mohammadzadeh
2017-01-01
Currently Coronary Artery Disease (CAD) is one of the most prevalent diseases, and also can lead to death, disability and economic loss in patients who suffer from cardiovascular disease. Diagnostic procedures of this disease by medical teams are typically invasive, although they do not satisfy the required accuracy. In this study, we have proposed a methodology for the automatic diagnosis of normal and Coronary Artery Disease conditions using Heart Rate Variability (HRV) signal extracted from electrocardiogram (ECG). The features are extracted from HRV signal in time, frequency and nonlinear domains. The Principal Component Analysis (PCA) is applied to reduce the dimension of the extracted features in order to reduce computational complexity and to reveal the hidden information underlaid in the data. Finally, Support Vector Machine (SVM) classifier has been utilized to classify two classes of data using the extracted distinguishing features. In this paper, parameters of the SVM have been optimized in order to improve the accuracy. Provided reports in this paper indicate that the detection of CAD class from normal class using the proposed algorithm was performed with accuracy of 99.2%, sensitivity of 98.43%, and specificity of 100%. This study has shown that methods which are based on the feature extraction of the biomedical signals are an appropriate approach to predict the health situation of the patients. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
STAR-GALAXY CLASSIFICATION IN MULTI-BAND OPTICAL IMAGING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fadely, Ross; Willman, Beth; Hogg, David W.
2012-11-20
Ground-based optical surveys such as PanSTARRS, DES, and LSST will produce large catalogs to limiting magnitudes of r {approx}> 24. Star-galaxy separation poses a major challenge to such surveys because galaxies-even very compact galaxies-outnumber halo stars at these depths. We investigate photometric classification techniques on stars and galaxies with intrinsic FWHM <0.2 arcsec. We consider unsupervised spectral energy distribution template fitting and supervised, data-driven support vector machines (SVMs). For template fitting, we use a maximum likelihood (ML) method and a new hierarchical Bayesian (HB) method, which learns the prior distribution of template probabilities from the data. SVM requires training datamore » to classify unknown sources; ML and HB do not. We consider (1) a best-case scenario (SVM{sub best}) where the training data are (unrealistically) a random sampling of the data in both signal-to-noise and demographics and (2) a more realistic scenario where training is done on higher signal-to-noise data (SVM{sub real}) at brighter apparent magnitudes. Testing with COSMOS ugriz data, we find that HB outperforms ML, delivering {approx}80% completeness, with purity of {approx}60%-90% for both stars and galaxies. We find that no algorithm delivers perfect performance and that studies of metal-poor main-sequence turnoff stars may be challenged by poor star-galaxy separation. Using the Receiver Operating Characteristic curve, we find a best-to-worst ranking of SVM{sub best}, HB, ML, and SVM{sub real}. We conclude, therefore, that a well-trained SVM will outperform template-fitting methods. However, a normally trained SVM performs worse. Thus, HB template fitting may prove to be the optimal classification method in future surveys.« less
Tharwat, Alaa; Moemen, Yasmine S; Hassanien, Aboul Ella
2017-04-01
Measuring toxicity is an important step in drug development. Nevertheless, the current experimental methods used to estimate the drug toxicity are expensive and time-consuming, indicating that they are not suitable for large-scale evaluation of drug toxicity in the early stage of drug development. Hence, there is a high demand to develop computational models that can predict the drug toxicity risks. In this study, we used a dataset that consists of 553 drugs that biotransformed in liver. The toxic effects were calculated for the current data, namely, mutagenic, tumorigenic, irritant and reproductive effect. Each drug is represented by 31 chemical descriptors (features). The proposed model consists of three phases. In the first phase, the most discriminative subset of features is selected using rough set-based methods to reduce the classification time while improving the classification performance. In the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique (SMOTE), BorderLine SMOTE and Safe Level SMOTE are used to solve the problem of imbalanced dataset. In the third phase, the Support Vector Machines (SVM) classifier is used to classify an unknown drug into toxic or non-toxic. SVM parameters such as the penalty parameter and kernel parameter have a great impact on the classification accuracy of the model. In this paper, Whale Optimization Algorithm (WOA) has been proposed to optimize the parameters of SVM, so that the classification error can be reduced. The experimental results proved that the proposed model achieved high sensitivity to all toxic effects. Overall, the high sensitivity of the WOA+SVM model indicates that it could be used for the prediction of drug toxicity in the early stage of drug development. Copyright © 2017 Elsevier Inc. All rights reserved.
Prediction of cell penetrating peptides by support vector machines.
Sanders, William S; Johnston, C Ian; Bridges, Susan M; Burgess, Shane C; Willeford, Kenneth O
2011-07-01
Cell penetrating peptides (CPPs) are those peptides that can transverse cell membranes to enter cells. Once inside the cell, different CPPs can localize to different cellular components and perform different roles. Some generate pore-forming complexes resulting in the destruction of cells while others localize to various organelles. Use of machine learning methods to predict potential new CPPs will enable more rapid screening for applications such as drug delivery. We have investigated the influence of the composition of training datasets on the ability to classify peptides as cell penetrating using support vector machines (SVMs). We identified 111 known CPPs and 34 known non-penetrating peptides from the literature and commercial vendors and used several approaches to build training data sets for the classifiers. Features were calculated from the datasets using a set of basic biochemical properties combined with features from the literature determined to be relevant in the prediction of CPPs. Our results using different training datasets confirm the importance of a balanced training set with approximately equal number of positive and negative examples. The SVM based classifiers have greater classification accuracy than previously reported methods for the prediction of CPPs, and because they use primary biochemical properties of the peptides as features, these classifiers provide insight into the properties needed for cell-penetration. To confirm our SVM classifications, a subset of peptides classified as either penetrating or non-penetrating was selected for synthesis and experimental validation. Of the synthesized peptides predicted to be CPPs, 100% of these peptides were shown to be penetrating.
A Features Selection for Crops Classification
NASA Astrophysics Data System (ADS)
Liu, Yifan; Shao, Luyi; Yin, Qiang; Hong, Wen
2016-08-01
The components of the polarimetric target decomposition reflect the differences of target since they linked with the scattering properties of the target and can be imported into SVM as the classification features. The result of decomposition usually concentrate on part of the components. Selecting a combination of components can reduce the features that importing into the SVM. The features reduction can lead to less calculation and targeted classification of one target when we classify a multi-class area. In this research, we import different combinations of features into the SVM and find a better combination for classification with a data of AGRISAR.
Arbitrary norm support vector machines.
Huang, Kaizhu; Zheng, Danian; King, Irwin; Lyu, Michael R
2009-02-01
Support vector machines (SVM) are state-of-the-art classifiers. Typically L2-norm or L1-norm is adopted as a regularization term in SVMs, while other norm-based SVMs, for example, the L0-norm SVM or even the L(infinity)-norm SVM, are rarely seen in the literature. The major reason is that L0-norm describes a discontinuous and nonconvex term, leading to a combinatorially NP-hard optimization problem. In this letter, motivated by Bayesian learning, we propose a novel framework that can implement arbitrary norm-based SVMs in polynomial time. One significant feature of this framework is that only a sequence of sequential minimal optimization problems needs to be solved, thus making it practical in many real applications. The proposed framework is important in the sense that Bayesian priors can be efficiently plugged into most learning methods without knowing the explicit form. Hence, this builds a connection between Bayesian learning and the kernel machines. We derive the theoretical framework, demonstrate how our approach works on the L0-norm SVM as a typical example, and perform a series of experiments to validate its advantages. Experimental results on nine benchmark data sets are very encouraging. The implemented L0-norm is competitive with or even better than the standard L2-norm SVM in terms of accuracy but with a reduced number of support vectors, -9.46% of the number on average. When compared with another sparse model, the relevance vector machine, our proposed algorithm also demonstrates better sparse properties with a training speed over seven times faster.
LaSVM-based big data learning system for dynamic prediction of air pollution in Tehran.
Ghaemi, Z; Alimohammadi, A; Farnaghi, M
2018-04-20
Due to critical impacts of air pollution, prediction and monitoring of air quality in urban areas are important tasks. However, because of the dynamic nature and high spatio-temporal variability, prediction of the air pollutant concentrations is a complex spatio-temporal problem. Distribution of pollutant concentration is influenced by various factors such as the historical pollution data and weather conditions. Conventional methods such as the support vector machine (SVM) or artificial neural networks (ANN) show some deficiencies when huge amount of streaming data have to be analyzed for urban air pollution prediction. In order to overcome the limitations of the conventional methods and improve the performance of urban air pollution prediction in Tehran, a spatio-temporal system is designed using a LaSVM-based online algorithm. Pollutant concentration and meteorological data along with geographical parameters are continually fed to the developed online forecasting system. Performance of the system is evaluated by comparing the prediction results of the Air Quality Index (AQI) with those of a traditional SVM algorithm. Results show an outstanding increase of speed by the online algorithm while preserving the accuracy of the SVM classifier. Comparison of the hourly predictions for next coming 24 h, with those of the measured pollution data in Tehran pollution monitoring stations shows an overall accuracy of 0.71, root mean square error of 0.54 and coefficient of determination of 0.81. These results are indicators of the practical usefulness of the online algorithm for real-time spatial and temporal prediction of the urban air quality.
NASA Astrophysics Data System (ADS)
Li, Hai; Kumavor, Patrick; Salman Alqasemi, Umar; Zhu, Quing
2015-01-01
A composite set of ovarian tissue features extracted from photoacoustic spectral data, beam envelope, and co-registered ultrasound and photoacoustic images are used to characterize malignant and normal ovaries using logistic and support vector machine (SVM) classifiers. Normalized power spectra were calculated from the Fourier transform of the photoacoustic beamformed data, from which the spectral slopes and 0-MHz intercepts were extracted. Five features were extracted from the beam envelope and another 10 features were extracted from the photoacoustic images. These 17 features were ranked by their p-values from t-tests on which a filter type of feature selection method was used to determine the optimal feature number for final classification. A total of 169 samples from 19 ex vivo ovaries were randomly distributed into training and testing groups. Both classifiers achieved a minimum value of the mean misclassification error when the seven features with lowest p-values were selected. Using these seven features, the logistic and SVM classifiers obtained sensitivities of 96.39±3.35% and 97.82±2.26%, and specificities of 98.92±1.39% and 100%, respectively, for the training group. For the testing group, logistic and SVM classifiers achieved sensitivities of 92.71±3.55% and 92.64±3.27%, and specificities of 87.52±8.78% and 98.49±2.05%, respectively.
NASA Astrophysics Data System (ADS)
Liu, Qingsheng; Liang, Li; Liu, Gaohuan; Huang, Chong
2017-09-01
Vegetation often exists as patch in arid and semi-arid region throughout the world. Vegetation patch can be effectively monitored by remote sensing images. However, not all satellite platforms are suitable to study quasi-circular vegetation patch. This study compares fine (GF-1) and coarse (CBERS-04) resolution platforms, specifically focusing on the quasicircular vegetation patches in the Yellow River Delta (YRD), China. Vegetation patch features (area, shape) were extracted from GF-1 and CBERS-04 imagery using unsupervised classifier (K-Means) and object-oriented approach (Example-based feature extraction with SVM classifier) in order to analyze vegetation patterns. These features were then compared using vector overlay and differencing, and the Root Mean Squared Error (RMSE) was used to determine if the mapped vegetation patches were significantly different. Regardless of K-Means or Example-based feature extraction with SVM classification, it was found that the area of quasi-circular vegetation patches from visual interpretation from QuickBird image (ground truth data) was greater than that from both of GF-1 and CBERS-04, and the number of patches detected from GF-1 data was more than that of CBERS-04 image. It was seen that without expert's experience and professional training on object-oriented approach, K-Means was better than example-based feature extraction with SVM for detecting the patch. It indicated that CBERS-04 could be used to detect the patch with area of more than 300 m2, but GF-1 data was a sufficient source for patch detection in the YRD. However, in the future, finer resolution platforms such as Worldview are needed to gain more detailed insight on patch structures and components and formation mechanism.
Liu, Yi-Hung; Wu, Chien-Te; Cheng, Wei-Teng; Hsiao, Yu-Tsung; Chen, Po-Ming; Teng, Jyh-Tong
2014-01-01
Electroencephalogram-based emotion recognition (EEG-ER) has received increasing attention in the fields of health care, affective computing, and brain-computer interface (BCI). However, satisfactory ER performance within a bi-dimensional and non-discrete emotional space using single-trial EEG data remains a challenging task. To address this issue, we propose a three-layer scheme for single-trial EEG-ER. In the first layer, a set of spectral powers of different EEG frequency bands are extracted from multi-channel single-trial EEG signals. In the second layer, the kernel Fisher's discriminant analysis method is applied to further extract features with better discrimination ability from the EEG spectral powers. The feature vector produced by layer 2 is called a kernel Fisher's emotion pattern (KFEP), and is sent into layer 3 for further classification where the proposed imbalanced quasiconformal kernel support vector machine (IQK-SVM) serves as the emotion classifier. The outputs of the three layer EEG-ER system include labels of emotional valence and arousal. Furthermore, to collect effective training and testing datasets for the current EEG-ER system, we also use an emotion-induction paradigm in which a set of pictures selected from the International Affective Picture System (IAPS) are employed as emotion induction stimuli. The performance of the proposed three-layer solution is compared with that of other EEG spectral power-based features and emotion classifiers. Results on 10 healthy participants indicate that the proposed KFEP feature performs better than other spectral power features, and IQK-SVM outperforms traditional SVM in terms of the EEG-ER accuracy. Our findings also show that the proposed EEG-ER scheme achieves the highest classification accuracies of valence (82.68%) and arousal (84.79%) among all testing methods. PMID:25061837
Liu, Yi-Hung; Wu, Chien-Te; Cheng, Wei-Teng; Hsiao, Yu-Tsung; Chen, Po-Ming; Teng, Jyh-Tong
2014-07-24
Electroencephalogram-based emotion recognition (EEG-ER) has received increasing attention in the fields of health care, affective computing, and brain-computer interface (BCI). However, satisfactory ER performance within a bi-dimensional and non-discrete emotional space using single-trial EEG data remains a challenging task. To address this issue, we propose a three-layer scheme for single-trial EEG-ER. In the first layer, a set of spectral powers of different EEG frequency bands are extracted from multi-channel single-trial EEG signals. In the second layer, the kernel Fisher's discriminant analysis method is applied to further extract features with better discrimination ability from the EEG spectral powers. The feature vector produced by layer 2 is called a kernel Fisher's emotion pattern (KFEP), and is sent into layer 3 for further classification where the proposed imbalanced quasiconformal kernel support vector machine (IQK-SVM) serves as the emotion classifier. The outputs of the three layer EEG-ER system include labels of emotional valence and arousal. Furthermore, to collect effective training and testing datasets for the current EEG-ER system, we also use an emotion-induction paradigm in which a set of pictures selected from the International Affective Picture System (IAPS) are employed as emotion induction stimuli. The performance of the proposed three-layer solution is compared with that of other EEG spectral power-based features and emotion classifiers. Results on 10 healthy participants indicate that the proposed KFEP feature performs better than other spectral power features, and IQK-SVM outperforms traditional SVM in terms of the EEG-ER accuracy. Our findings also show that the proposed EEG-ER scheme achieves the highest classification accuracies of valence (82.68%) and arousal (84.79%) among all testing methods.
Wang, Ying; Coiera, Enrico; Runciman, William; Magrabi, Farah
2017-06-12
Approximately 10% of admissions to acute-care hospitals are associated with an adverse event. Analysis of incident reports helps to understand how and why incidents occur and can inform policy and practice for safer care. Unfortunately our capacity to monitor and respond to incident reports in a timely manner is limited by the sheer volumes of data collected. In this study, we aim to evaluate the feasibility of using multiclass classification to automate the identification of patient safety incidents in hospitals. Text based classifiers were applied to identify 10 incident types and 4 severity levels. Using the one-versus-one (OvsO) and one-versus-all (OvsA) ensemble strategies, we evaluated regularized logistic regression, linear support vector machine (SVM) and SVM with a radial-basis function (RBF) kernel. Classifiers were trained and tested with "balanced" datasets (n_ Type = 2860, n_ SeverityLevel = 1160) from a state-wide incident reporting system. Testing was also undertaken with imbalanced "stratified" datasets (n_ Type = 6000, n_ SeverityLevel =5950) from the state-wide system and an independent hospital reporting system. Classifier performance was evaluated using a confusion matrix, as well as F-score, precision and recall. The most effective combination was a OvsO ensemble of binary SVM RBF classifiers with binary count feature extraction. For incident type, classifiers performed well on balanced and stratified datasets (F-score: 78.3, 73.9%), but were worse on independent datasets (68.5%). Reports about falls, medications, pressure injury, aggression and blood products were identified with high recall and precision. "Documentation" was the hardest type to identify. For severity level, F-score for severity assessment code (SAC) 1 (extreme risk) was 87.3 and 64% for SAC4 (low risk) on balanced data. With stratified data, high recall was achieved for SAC1 (82.8-84%) but precision was poor (6.8-11.2%). High risk incidents (SAC2) were confused with medium risk incidents (SAC3). Binary classifier ensembles appear to be a feasible method for identifying incidents by type and severity level. Automated identification should enable safety problems to be detected and addressed in a more timely manner. Multi-label classifiers may be necessary for reports that relate to more than one incident type.
Overlaid caption extraction in news video based on SVM
NASA Astrophysics Data System (ADS)
Liu, Manman; Su, Yuting; Ji, Zhong
2007-11-01
Overlaid caption in news video often carries condensed semantic information which is key cues for content-based video indexing and retrieval. However, it is still a challenging work to extract caption from video because of its complex background and low resolution. In this paper, we propose an effective overlaid caption extraction approach for news video. We first scan the video key frames using a small window, and then classify the blocks into the text and non-text ones via support vector machine (SVM), with statistical features extracted from the gray level co-occurrence matrices, the LH and HL sub-bands wavelet coefficients and the orientated edge intensity ratios. Finally morphological filtering and projection profile analysis are employed to localize and refine the candidate caption regions. Experiments show its high performance on four 30-minute news video programs.
Bai, Ou; Lin, Peter; Vorbach, Sherry; Li, Jiang; Furlani, Steve; Hallett, Mark
2007-12-01
To explore effective combinations of computational methods for the prediction of movement intention preceding the production of self-paced right and left hand movements from single trial scalp electroencephalogram (EEG). Twelve naïve subjects performed self-paced movements consisting of three key strokes with either hand. EEG was recorded from 128 channels. The exploration was performed offline on single trial EEG data. We proposed that a successful computational procedure for classification would consist of spatial filtering, temporal filtering, feature selection, and pattern classification. A systematic investigation was performed with combinations of spatial filtering using principal component analysis (PCA), independent component analysis (ICA), common spatial patterns analysis (CSP), and surface Laplacian derivation (SLD); temporal filtering using power spectral density estimation (PSD) and discrete wavelet transform (DWT); pattern classification using linear Mahalanobis distance classifier (LMD), quadratic Mahalanobis distance classifier (QMD), Bayesian classifier (BSC), multi-layer perceptron neural network (MLP), probabilistic neural network (PNN), and support vector machine (SVM). A robust multivariate feature selection strategy using a genetic algorithm was employed. The combinations of spatial filtering using ICA and SLD, temporal filtering using PSD and DWT, and classification methods using LMD, QMD, BSC and SVM provided higher performance than those of other combinations. Utilizing one of the better combinations of ICA, PSD and SVM, the discrimination accuracy was as high as 75%. Further feature analysis showed that beta band EEG activity of the channels over right sensorimotor cortex was most appropriate for discrimination of right and left hand movement intention. Effective combinations of computational methods provide possible classification of human movement intention from single trial EEG. Such a method could be the basis for a potential brain-computer interface based on human natural movement, which might reduce the requirement of long-term training. Effective combinations of computational methods can classify human movement intention from single trial EEG with reasonable accuracy.
Cheng, Zhanzhan; Zhou, Shuigeng; Wang, Yang; Liu, Hui; Guan, Jihong; Chen, Yi-Ping Phoebe
2016-05-18
Prediction of compound-protein interactions (CPIs) is to find new compound-protein pairs where a protein is targeted by at least a compound, which is a crucial step in new drug design. Currently, a number of machine learning based methods have been developed to predict new CPIs in the literature. However, as there is not yet any publicly available set of validated negative CPIs, most existing machine learning based approaches use the unknown interactions (not validated CPIs) selected randomly as the negative examples to train classifiers for predicting new CPIs. Obviously, this is not quite reasonable and unavoidably impacts the CPI prediction performance. In this paper, we simply take the unknown CPIs as unlabeled examples, and propose a new method called PUCPI (the abbreviation of PU learning for Compound-Protein Interaction identification) that employs biased-SVM (Support Vector Machine) to predict CPIs using only positive and unlabeled examples. PU learning is a class of learning methods that leans from positive and unlabeled (PU) samples. To the best of our knowledge, this is the first work that identifies CPIs using only positive and unlabeled examples. We first collect known CPIs as positive examples and then randomly select compound-protein pairs not in the positive set as unlabeled examples. For each CPI/compound-protein pair, we extract protein domains as protein features and compound substructures as chemical features, then take the tensor product of the corresponding compound features and protein features as the feature vector of the CPI/compound-protein pair. After that, biased-SVM is employed to train classifiers on different datasets of CPIs and compound-protein pairs. Experiments over various datasets show that our method outperforms six typical classifiers, including random forest, L1- and L2-regularized logistic regression, naive Bayes, SVM and k-nearest neighbor (kNN), and three types of existing CPI prediction models. Source code, datasets and related documents of PUCPI are available at: http://admis.fudan.edu.cn/projects/pucpi.html.
Xu, Jun-Feng; Kang, Qian; Ma, Xing-Yong; Pan, Yuan-Ming; Yang, Lang; Jin, Peng; Wang, Xin; Li, Chen-Guang; Chen, Xiao-Chen; Wu, Chao; Jiao, Shao-Zhuo; Sheng, Jian-Qiu
2018-01-01
Colonoscopy screening has been accepted broadly to evaluate the risk and incidence of colorectal cancer (CRC) during health examination in outpatients. However, the intrusiveness, complexity and discomfort of colonoscopy may limit its application and the compliance of patients. Thus, more reliable and convenient diagnostic methods are necessary for CRC screening. Genome instability, especially copy-number variation (CNV), is a hallmark of cancer and has been proved to have potential in clinical application. We determined the diagnostic potential of chromosomal CNV at the arm level by whole-genome sequencing of CRC plasma samples (n = 32) and healthy controls (n = 38). Arm level CNV was determined and the consistence of arm-level CNV between plasma and tissue was further analyzed. Two methods including regular z score and trained Support Vector Machine (SVM) classifier were applied for detection of colorectal cancer. In plasma samples of CRC patients, the most frequent deletions were detected on chromosomes 6, 8p, 14q and 1p, and the most frequent amplifications occurred on chromosome 19, 5, 2, 9p and 20p. These arm-level alterations detected in plasma were also observed in tumor tissues. We showed that the specificity of regular z score analysis for the detection of colorectal cancer was 86.8% (33/38), whereas its sensitivity was only 56.3% (18/32). Applying a trained SVM classifier (n = 40 in trained group) as the standard to detect colorectal cancer relevance ratio in the test samples (n = 30), a sensitivity of 91.7% (11/12) and a specificity 88.9% (16/18) were finally reached. Furthermore, all five early CRC patients in stages I and II were successfully detected. Trained SVM classifier based on arm-level CNVs can be used as a promising method to screen early-stage CRC. © 2018 The Author(s). Published by S. Karger AG, Basel.
Lefebvre, Alexandre; Rochefort, Gael Y.; Santos, Frédéric; Le Denmat, Dominique; Salmon, Benjamin; Pétillon, Jean-Marc
2016-01-01
Over the last decade, biomedical 3D-imaging tools have gained widespread use in the analysis of prehistoric bone artefacts. While initial attempts to characterise the major categories used in osseous industry (i.e. bone, antler, and dentine/ivory) have been successful, the taxonomic determination of prehistoric artefacts remains to be investigated. The distinction between reindeer and red deer antler can be challenging, particularly in cases of anthropic and/or taphonomic modifications. In addition to the range of destructive physicochemical identification methods available (mass spectrometry, isotopic ratio, and DNA analysis), X-ray micro-tomography (micro-CT) provides convincing non-destructive 3D images and analyses. This paper presents the experimental protocol (sample scans, image processing, and statistical analysis) we have developed in order to identify modern and archaeological antler collections (from Isturitz, France). This original method is based on bone microstructure analysis combined with advanced statistical support vector machine (SVM) classifiers. A combination of six microarchitecture biomarkers (bone volume fraction, trabecular number, trabecular separation, trabecular thickness, trabecular bone pattern factor, and structure model index) were screened using micro-CT in order to characterise internal alveolar structure. Overall, reindeer alveoli presented a tighter mesh than red deer alveoli, and statistical analysis allowed us to distinguish archaeological antler by species with an accuracy of 96%, regardless of anatomical location on the antler. In conclusion, micro-CT combined with SVM classifiers proves to be a promising additional non-destructive method for antler identification, suitable for archaeological artefacts whose degree of human modification and cultural heritage or scientific value has previously made it impossible (tools, ornaments, etc.). PMID:26901355
NASA Astrophysics Data System (ADS)
Sosa, Germán. D.; Cruz-Roa, Angel; González, Fabio A.
2015-01-01
This work addresses the problem of lung sound classification, in particular, the problem of distinguishing between wheeze and normal sounds. Wheezing sound detection is an important step to associate lung sounds with an abnormal state of the respiratory system, usually associated with tuberculosis or another chronic obstructive pulmonary diseases (COPD). The paper presents an approach for automatic lung sound classification, which uses different state-of-the-art sound features in combination with a C-weighted support vector machine (SVM) classifier that works better for unbalanced data. Feature extraction methods used here are commonly applied in speech recognition and related problems thanks to the fact that they capture the most informative spectral content from the original signals. The evaluated methods were: Fourier transform (FT), wavelet decomposition using Wavelet Packet Transform bank of filters (WPT) and Mel Frequency Cepstral Coefficients (MFCC). For comparison, we evaluated and contrasted the proposed approach against previous works using different combination of features and/or classifiers. The different methods were evaluated on a set of lung sounds including normal and wheezing sounds. A leave-two-out per-case cross-validation approach was used, which, in each fold, chooses as validation set a couple of cases, one including normal sounds and the other including wheezing sounds. Experimental results were reported in terms of traditional classification performance measures: sensitivity, specificity and balanced accuracy. Our best results using the suggested approach, C-weighted SVM and MFCC, achieve a 82.1% of balanced accuracy obtaining the best result for this problem until now. These results suggest that supervised classifiers based on kernel methods are able to learn better models for this challenging classification problem even using the same feature extraction methods.
Duarte, João V; Ribeiro, Maria J; Violante, Inês R; Cunha, Gil; Silva, Eduardo; Castelo-Branco, Miguel
2014-01-01
Neurofibromatosis Type 1 (NF1) is a common genetic condition associated with cognitive dysfunction. However, the pathophysiology of the NF1 cognitive deficits is not well understood. Abnormal brain structure, including increased total brain volume, white matter (WM) and grey matter (GM) abnormalities have been reported in the NF1 brain. These previous studies employed univariate model-driven methods preventing detection of subtle and spatially distributed differences in brain anatomy. Multivariate pattern analysis allows the combination of information from multiple spatial locations yielding a discriminative power beyond that of single voxels. Here we investigated for the first time subtle anomalies in the NF1 brain, using a multivariate data-driven classification approach. We used support vector machines (SVM) to classify whole-brain GM and WM segments of structural T1 -weighted MRI scans from 39 participants with NF1 and 60 non-affected individuals, divided in children/adolescents and adults groups. We also employed voxel-based morphometry (VBM) as a univariate gold standard to study brain structural differences. SVM classifiers correctly classified 94% of cases (sensitivity 92%; specificity 96%) revealing the existence of brain structural anomalies that discriminate NF1 individuals from controls. Accordingly, VBM analysis revealed structural differences in agreement with the SVM weight maps representing the most relevant brain regions for group discrimination. These included the hippocampus, basal ganglia, thalamus, and visual cortex. This multivariate data-driven analysis thus identified subtle anomalies in brain structure in the absence of visible pathology. Our results provide further insight into the neuroanatomical correlates of known features of the cognitive phenotype of NF1. Copyright © 2012 Wiley Periodicals, Inc.
Nayak, Deepak Ranjan; Dash, Ratnakar; Majhi, Banshidhar
2017-01-01
This paper presents an automatic classification system for segregating pathological brain from normal brains in magnetic resonance imaging scanning. The proposed system employs contrast limited adaptive histogram equalization scheme to enhance the diseased region in brain MR images. Two-dimensional stationary wavelet transform is harnessed to extract features from the preprocessed images. The feature vector is constructed using the energy and entropy values, computed from the level- 2 SWT coefficients. Then, the relevant and uncorrelated features are selected using symmetric uncertainty ranking filter. Subsequently, the selected features are given input to the proposed AdaBoost with support vector machine classifier, where SVM is used as the base classifier of AdaBoost algorithm. To validate the proposed system, three standard MR image datasets, Dataset-66, Dataset-160, and Dataset- 255 have been utilized. The 5 runs of k-fold stratified cross validation results indicate the suggested scheme offers better performance than other existing schemes in terms of accuracy and number of features. The proposed system earns ideal classification over Dataset-66 and Dataset-160; whereas, for Dataset- 255, an accuracy of 99.45% is achieved. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Wahba, Maram A; Ashour, Amira S; Napoleon, Sameh A; Abd Elnaby, Mustafa M; Guo, Yanhui
2017-12-01
Basal cell carcinoma is one of the most common malignant skin lesions. Automated lesion identification and classification using image processing techniques is highly required to reduce the diagnosis errors. In this study, a novel technique is applied to classify skin lesion images into two classes, namely the malignant Basal cell carcinoma and the benign nevus. A hybrid combination of bi-dimensional empirical mode decomposition and gray-level difference method features is proposed after hair removal. The combined features are further classified using quadratic support vector machine (Q-SVM). The proposed system has achieved outstanding performance of 100% accuracy, sensitivity and specificity compared to other support vector machine procedures as well as with different extracted features. Basal Cell Carcinoma is effectively classified using Q-SVM with the proposed combined features.
Applying six classifiers to airborne hyperspectral imagery for detecting giant reed
USDA-ARS?s Scientific Manuscript database
This study evaluated and compared six different image classifiers, including minimum distance (MD), Mahalanobis distance (MAHD), maximum likelihood (ML), spectral angle mapper (SAM), mixture tuned matched filtering (MTMF) and support vector machine (SVM), for detecting and mapping giant reed (Arundo...
Yang, Jiaojiao; Guo, Qian; Li, Wenjie; Wang, Suhong; Zou, Ling
2016-04-01
This paper aims to assist the individual clinical diagnosis of children with attention-deficit/hyperactivity disorder using electroencephalogram signal detection method.Firstly,in our experiments,we obtained and studied the electroencephalogram signals from fourteen attention-deficit/hyperactivity disorder children and sixteen typically developing children during the classic interference control task of Simon-spatial Stroop,and we completed electroencephalogram data preprocessing including filtering,segmentation,removal of artifacts and so on.Secondly,we selected the subset electroencephalogram electrodes using principal component analysis(PCA)method,and we collected the common channels of the optimal electrodes which occurrence rates were more than 90%in each kind of stimulation.We then extracted the latency(200~450ms)mean amplitude features of the common electrodes.Finally,we used the k-nearest neighbor(KNN)classifier based on Euclidean distance and the support vector machine(SVM)classifier based on radial basis kernel function to classify.From the experiment,at the same kind of interference control task,the attention-deficit/hyperactivity disorder children showed lower correct response rates and longer reaction time.The N2 emerged in prefrontal cortex while P2 presented in the inferior parietal area when all kinds of stimuli demonstrated.Meanwhile,the children with attention-deficit/hyperactivity disorder exhibited markedly reduced N2 and P2amplitude compared to typically developing children.KNN resulted in better classification accuracy than SVM classifier,and the best classification rate was 89.29%in StI task.The results showed that the electroencephalogram signals were different in the brain regions of prefrontal cortex and inferior parietal cortex between attention-deficit/hyperactivity disorder and typically developing children during the interference control task,which provided a scientific basis for the clinical diagnosis of attention-deficit/hyperactivity disorder individuals.
ElGokhy, Sherin M; ElHefnawi, Mahmoud; Shoukry, Amin
2014-05-06
MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons. The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f-measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index.The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred.The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs. The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.
NASA Astrophysics Data System (ADS)
Li, Manchun; Ma, Lei; Blaschke, Thomas; Cheng, Liang; Tiede, Dirk
2016-07-01
Geographic Object-Based Image Analysis (GEOBIA) is becoming more prevalent in remote sensing classification, especially for high-resolution imagery. Many supervised classification approaches are applied to objects rather than pixels, and several studies have been conducted to evaluate the performance of such supervised classification techniques in GEOBIA. However, these studies did not systematically investigate all relevant factors affecting the classification (segmentation scale, training set size, feature selection and mixed objects). In this study, statistical methods and visual inspection were used to compare these factors systematically in two agricultural case studies in China. The results indicate that Random Forest (RF) and Support Vector Machines (SVM) are highly suitable for GEOBIA classifications in agricultural areas and confirm the expected general tendency, namely that the overall accuracies decline with increasing segmentation scale. All other investigated methods except for RF and SVM are more prone to obtain a lower accuracy due to the broken objects at fine scales. In contrast to some previous studies, the RF classifiers yielded the best results and the k-nearest neighbor classifier were the worst results, in most cases. Likewise, the RF and Decision Tree classifiers are the most robust with or without feature selection. The results of training sample analyses indicated that the RF and adaboost. M1 possess a superior generalization capability, except when dealing with small training sample sizes. Furthermore, the classification accuracies were directly related to the homogeneity/heterogeneity of the segmented objects for all classifiers. Finally, it was suggested that RF should be considered in most cases for agricultural mapping.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morshed, Nader; Lawrence Berkeley National Laboratory, Berkeley, CA 94720; Echols, Nathaniel, E-mail: nechols@lbl.gov
2015-05-01
A method to automatically identify possible elemental ions in X-ray crystal structures has been extended to use support vector machine (SVM) classifiers trained on selected structures in the PDB, with significantly improved sensitivity over manually encoded heuristics. In the process of macromolecular model building, crystallographers must examine electron density for isolated atoms and differentiate sites containing structured solvent molecules from those containing elemental ions. This task requires specific knowledge of metal-binding chemistry and scattering properties and is prone to error. A method has previously been described to identify ions based on manually chosen criteria for a number of elements. Here,more » the use of support vector machines (SVMs) to automatically classify isolated atoms as either solvent or one of various ions is described. Two data sets of protein crystal structures, one containing manually curated structures deposited with anomalous diffraction data and another with automatically filtered, high-resolution structures, were constructed. On the manually curated data set, an SVM classifier was able to distinguish calcium from manganese, zinc, iron and nickel, as well as all five of these ions from water molecules, with a high degree of accuracy. Additionally, SVMs trained on the automatically curated set of high-resolution structures were able to successfully classify most common elemental ions in an independent validation test set. This method is readily extensible to other elemental ions and can also be used in conjunction with previous methods based on a priori expectations of the chemical environment and X-ray scattering.« less
Application of the Teager-Kaiser energy operator in bearing fault diagnosis.
Henríquez Rodríguez, Patricia; Alonso, Jesús B; Ferrer, Miguel A; Travieso, Carlos M
2013-03-01
Condition monitoring of rotating machines is important in the prevention of failures. As most machine malfunctions are related to bearing failures, several bearing diagnosis techniques have been developed. Some of them feature the bearing vibration signal with statistical measures and others extract the bearing fault characteristic frequency from the AM component of the vibration signal. In this paper, we propose to transform the vibration signal to the Teager-Kaiser domain and feature it with statistical and energy-based measures. A bearing database with normal and faulty bearings is used. The diagnosis is performed with two classifiers: a neural network classifier and a LS-SVM classifier. Experiments show that the Teager domain features outperform those based on the temporal or AM signal. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Ninos, K.; Georgiadis, P.; Cavouras, D.; Nomicos, C.
2010-05-01
This study presents the design and development of a mobile wireless platform to be used for monitoring and analysis of seismic events and related electromagnetic (EM) signals, employing Personal Digital Assistants (PDAs). A prototype custom-developed application was deployed on a 3G enabled PDA that could connect to the FTP server of the Institute of Geodynamics of the National Observatory of Athens and receive and display EM signals at 4 receiver frequencies (3 KHz (E-W, N-S), 10 KHz (E-W, N-S), 41 MHz and 46 MHz). Signals may originate from any one of the 16 field-stations located around the Greek territory. Employing continuous recordings of EM signals gathered from January 2003 till December 2007, a Support Vector Machines (SVM)-based classification system was designed to distinguish EM precursor signals within noisy background. EM-signals corresponding to recordings preceding major seismic events (Ms≥5R) were segmented, by an experienced scientist, and five features (mean, variance, skewness, kurtosis, and a wavelet based feature), derived from the EM-signals were calculated. These features were used to train the SVM-based classification scheme. The performance of the system was evaluated by the exhaustive search and leave-one-out methods giving 87.2% overall classification accuracy, in correctly identifying EM precursor signals within noisy background employing all calculated features. Due to the insufficient processing power of the PDAs, this task was performed on a typical desktop computer. This optimal trained context of the SVM classifier was then integrated in the PDA based application rendering the platform capable to discriminate between EM precursor signals and noise. System's efficiency was evaluated by an expert who reviewed 1/ multiple EM-signals, up to 18 days prior to corresponding past seismic events, and 2/ the possible EM-activity of a specific region employing the trained SVM classifier. Additionally, the proposed architecture can form a base platform for a future integrated system that will incorporate services such as notifications for field station power failures, disruption of data flow, occurring SEs, and even other types of measurement and analysis processes such as the integration of a special analysis algorithm based on the ratio of short term to long term signal average.
Beta Atomic Contacts: Identifying Critical Specific Contacts in Protein Binding Interfaces
Liu, Qian; Kwoh, Chee Keong; Hoi, Steven C. H.
2013-01-01
Specific binding between proteins plays a crucial role in molecular functions and biological processes. Protein binding interfaces and their atomic contacts are typically defined by simple criteria, such as distance-based definitions that only use some threshold of spatial distance in previous studies. These definitions neglect the nearby atomic organization of contact atoms, and thus detect predominant contacts which are interrupted by other atoms. It is questionable whether such kinds of interrupted contacts are as important as other contacts in protein binding. To tackle this challenge, we propose a new definition called beta (β) atomic contacts. Our definition, founded on the β-skeletons in computational geometry, requires that there is no other atom in the contact spheres defined by two contact atoms; this sphere is similar to the van der Waals spheres of atoms. The statistical analysis on a large dataset shows that β contacts are only a small fraction of conventional distance-based contacts. To empirically quantify the importance of β contacts, we design βACV, an SVM classifier with β contacts as input, to classify homodimers from crystal packing. We found that our βACV is able to achieve the state-of-the-art classification performance superior to SVM classifiers with distance-based contacts as input. Our βACV also outperforms several existing methods when being evaluated on several datasets in previous works. The promising empirical performance suggests that β contacts can truly identify critical specific contacts in protein binding interfaces. β contacts thus provide a new model for more precise description of atomic organization in protein quaternary structures than distance-based contacts. PMID:23630569
Classification of Focal and Non Focal Epileptic Seizures Using Multi-Features and SVM Classifier.
Sriraam, N; Raghu, S
2017-09-02
Identifying epileptogenic zones prior to surgery is an essential and crucial step in treating patients having pharmacoresistant focal epilepsy. Electroencephalogram (EEG) is a significant measurement benchmark to assess patients suffering from epilepsy. This paper investigates the application of multi-features derived from different domains to recognize the focal and non focal epileptic seizures obtained from pharmacoresistant focal epilepsy patients from Bern Barcelona database. From the dataset, five different classification tasks were formed. Total 26 features were extracted from focal and non focal EEG. Significant features were selected using Wilcoxon rank sum test by setting p-value (p < 0.05) and z-score (-1.96 > z > 1.96) at 95% significance interval. Hypothesis was made that the effect of removing outliers improves the classification accuracy. Turkey's range test was adopted for pruning outliers from feature set. Finally, 21 features were classified using optimized support vector machine (SVM) classifier with 10-fold cross validation. Bayesian optimization technique was adopted to minimize the cross-validation loss. From the simulation results, it was inferred that the highest sensitivity, specificity, and classification accuracy of 94.56%, 89.74%, and 92.15% achieved respectively and found to be better than the state-of-the-art approaches. Further, it was observed that the classification accuracy improved from 80.2% with outliers to 92.15% without outliers. The classifier performance metrics ensures the suitability of the proposed multi-features with optimized SVM classifier. It can be concluded that the proposed approach can be applied for recognition of focal EEG signals to localize epileptogenic zones.
Balanced VS Imbalanced Training Data: Classifying Rapideye Data with Support Vector Machines
NASA Astrophysics Data System (ADS)
Ustuner, M.; Sanli, F. B.; Abdikan, S.
2016-06-01
The accuracy of supervised image classification is highly dependent upon several factors such as the design of training set (sample selection, composition, purity and size), resolution of input imagery and landscape heterogeneity. The design of training set is still a challenging issue since the sensitivity of classifier algorithm at learning stage is different for the same dataset. In this paper, the classification of RapidEye imagery with balanced and imbalanced training data for mapping the crop types was addressed. Classification with imbalanced training data may result in low accuracy in some scenarios. Support Vector Machines (SVM), Maximum Likelihood (ML) and Artificial Neural Network (ANN) classifications were implemented here to classify the data. For evaluating the influence of the balanced and imbalanced training data on image classification algorithms, three different training datasets were created. Two different balanced datasets which have 70 and 100 pixels for each class of interest and one imbalanced dataset in which each class has different number of pixels were used in classification stage. Results demonstrate that ML and NN classifications are affected by imbalanced training data in resulting a reduction in accuracy (from 90.94% to 85.94% for ML and from 91.56% to 88.44% for NN) while SVM is not affected significantly (from 94.38% to 94.69%) and slightly improved. Our results highlighted that SVM is proven to be a very robust, consistent and effective classifier as it can perform very well under balanced and imbalanced training data situations. Furthermore, the training stage should be precisely and carefully designed for the need of adopted classifier.
2013-01-01
Background Breast cancer is the leading cause of both incidence and mortality in women population. For this reason, much research effort has been devoted to develop Computer-Aided Detection (CAD) systems for early detection of the breast cancers on mammograms. In this paper, we propose a new and novel dictionary configuration underpinning sparse representation based classification (SRC). The key idea of the proposed algorithm is to improve the sparsity in terms of mass margins for the purpose of improving classification performance in CAD systems. Methods The aim of the proposed SRC framework is to construct separate dictionaries according to the types of mass margins. The underlying idea behind our method is that the separated dictionaries can enhance the sparsity of mass class (true-positive), leading to an improved performance for differentiating mammographic masses from normal tissues (false-positive). When a mass sample is given for classification, the sparse solutions based on corresponding dictionaries are separately solved and combined at score level. Experiments have been performed on both database (DB) named as Digital Database for Screening Mammography (DDSM) and clinical Full Field Digital Mammogram (FFDM) DBs. In our experiments, sparsity concentration in the true class (SCTC) and area under the Receiver operating characteristic (ROC) curve (AUC) were measured for the comparison between the proposed method and a conventional single dictionary based approach. In addition, a support vector machine (SVM) was used for comparing our method with state-of-the-arts classifier extensively used for mass classification. Results Comparing with the conventional single dictionary configuration, the proposed approach is able to improve SCTC of up to 13.9% and 23.6% on DDSM and FFDM DBs, respectively. Moreover, the proposed method is able to improve AUC with 8.2% and 22.1% on DDSM and FFDM DBs, respectively. Comparing to SVM classifier, the proposed method improves AUC with 2.9% and 11.6% on DDSM and FFDM DBs, respectively. Conclusions The proposed dictionary configuration is found to well improve the sparsity of dictionaries, resulting in an enhanced classification performance. Moreover, the results show that the proposed method is better than conventional SVM classifier for classifying breast masses subject to various margins from normal tissues. PMID:24564973
Max-AUC Feature Selection in Computer-Aided Detection of Polyps in CT Colonography
Xu, Jian-Wu; Suzuki, Kenji
2014-01-01
We propose a feature selection method based on a sequential forward floating selection (SFFS) procedure to improve the performance of a classifier in computerized detection of polyps in CT colonography (CTC). The feature selection method is coupled with a nonlinear support vector machine (SVM) classifier. Unlike the conventional linear method based on Wilks' lambda, the proposed method selected the most relevant features that would maximize the area under the receiver operating characteristic curve (AUC), which directly maximizes classification performance, evaluated based on AUC value, in the computer-aided detection (CADe) scheme. We presented two variants of the proposed method with different stopping criteria used in the SFFS procedure. The first variant searched all feature combinations allowed in the SFFS procedure and selected the subsets that maximize the AUC values. The second variant performed a statistical test at each step during the SFFS procedure, and it was terminated if the increase in the AUC value was not statistically significant. The advantage of the second variant is its lower computational cost. To test the performance of the proposed method, we compared it against the popular stepwise feature selection method based on Wilks' lambda for a colonic-polyp database (25 polyps and 2624 nonpolyps). We extracted 75 morphologic, gray-level-based, and texture features from the segmented lesion candidate regions. The two variants of the proposed feature selection method chose 29 and 7 features, respectively. Two SVM classifiers trained with these selected features yielded a 96% by-polyp sensitivity at false-positive (FP) rates of 4.1 and 6.5 per patient, respectively. Experiments showed a significant improvement in the performance of the classifier with the proposed feature selection method over that with the popular stepwise feature selection based on Wilks' lambda that yielded 18.0 FPs per patient at the same sensitivity level. PMID:24608058
Max-AUC feature selection in computer-aided detection of polyps in CT colonography.
Xu, Jian-Wu; Suzuki, Kenji
2014-03-01
We propose a feature selection method based on a sequential forward floating selection (SFFS) procedure to improve the performance of a classifier in computerized detection of polyps in CT colonography (CTC). The feature selection method is coupled with a nonlinear support vector machine (SVM) classifier. Unlike the conventional linear method based on Wilks' lambda, the proposed method selected the most relevant features that would maximize the area under the receiver operating characteristic curve (AUC), which directly maximizes classification performance, evaluated based on AUC value, in the computer-aided detection (CADe) scheme. We presented two variants of the proposed method with different stopping criteria used in the SFFS procedure. The first variant searched all feature combinations allowed in the SFFS procedure and selected the subsets that maximize the AUC values. The second variant performed a statistical test at each step during the SFFS procedure, and it was terminated if the increase in the AUC value was not statistically significant. The advantage of the second variant is its lower computational cost. To test the performance of the proposed method, we compared it against the popular stepwise feature selection method based on Wilks' lambda for a colonic-polyp database (25 polyps and 2624 nonpolyps). We extracted 75 morphologic, gray-level-based, and texture features from the segmented lesion candidate regions. The two variants of the proposed feature selection method chose 29 and 7 features, respectively. Two SVM classifiers trained with these selected features yielded a 96% by-polyp sensitivity at false-positive (FP) rates of 4.1 and 6.5 per patient, respectively. Experiments showed a significant improvement in the performance of the classifier with the proposed feature selection method over that with the popular stepwise feature selection based on Wilks' lambda that yielded 18.0 FPs per patient at the same sensitivity level.
Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals
2014-09-30
floor 1176 Howell St Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR...APPROACH Odontocete click detection and classification. A multi-class support vector machine (SVM) classifier was previously developed ( Jarvis ...beaked whales, Risso’s dolphins, short-finned pilot whales, and sperm whales. Here Moretti’s group, particularly S. Jarvis , is improving the SVM
Using deep learning for detecting gender in adult chest radiographs
NASA Astrophysics Data System (ADS)
Xue, Zhiyun; Antani, Sameer; Long, L. Rodney; Thoma, George R.
2018-03-01
In this paper, we present a method for automatically identifying the gender of an imaged person using their frontal chest x-ray images. Our work is motivated by the need to determine missing gender information in some datasets. The proposed method employs the technique of convolutional neural network (CNN) based deep learning and transfer learning to overcome the challenge of developing handcrafted features in limited data. Specifically, the method consists of four main steps: pre-processing, CNN feature extractor, feature selection, and classifier. The method is tested on a combined dataset obtained from several sources with varying acquisition quality resulting in different pre-processing steps that are applied for each. For feature extraction, we tested and compared four CNN architectures, viz., AlexNet, VggNet, GoogLeNet, and ResNet. We applied a feature selection technique, since the feature length is larger than the number of images. Two popular classifiers: SVM and Random Forest, are used and compared. We evaluated the classification performance by cross-validation and used seven performance measures. The best performer is the VggNet-16 feature extractor with the SVM classifier, with accuracy of 86.6% and ROC Area being 0.932 for 5-fold cross validation. We also discuss several misclassified cases and describe future work for performance improvement.
Classification of Partial Discharge Measured under Different Levels of Noise Contamination.
Jee Keen Raymond, Wong; Illias, Hazlee Azil; Abu Bakar, Ab Halim
2017-01-01
Cable joint insulation breakdown may cause a huge loss to power companies. Therefore, it is vital to diagnose the insulation quality to detect early signs of insulation failure. It is well known that there is a correlation between Partial discharge (PD) and the insulation quality. Although many works have been done on PD pattern recognition, it is usually performed in a noise free environment. Also, works on PD pattern recognition in actual cable joint are less likely to be found in literature. Therefore, in this work, classifications of actual cable joint defect types from partial discharge data contaminated by noise were performed. Five cross-linked polyethylene (XLPE) cable joints with artificially created defects were prepared based on the defects commonly encountered on site. Three different types of input feature were extracted from the PD pattern under artificially created noisy environment. These include statistical features, fractal features and principal component analysis (PCA) features. These input features were used to train the classifiers to classify each PD defect types. Classifications were performed using three different artificial intelligence classifiers, which include Artificial Neural Networks (ANN), Adaptive Neuro-Fuzzy Inference System (ANFIS) and Support Vector Machine (SVM). It was found that the classification accuracy decreases with higher noise level but PCA features used in SVM and ANN showed the strongest tolerance against noise contamination.
Yu, Wei; Clyne, Melinda; Dolan, Siobhan M; Yesupriya, Ajay; Wulf, Anja; Liu, Tiebin; Khoury, Muin J; Gwinn, Marta
2008-01-01
Background Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. Results The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. Conclusion GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge. PMID:18430222
Building Reliable Metaclassifiers for Text Learning
2006-05-01
outputs are often poor [Ben00, DP96] but can be improved [Ben00, ZE01, ZE02]. SVM For linear SVMs, we use the Smox toolkit which is based on Platt’s...and implementations are the same as discussed in Section 6.3. The exception is that for an implementation of linear SVMs, we used the Smox toolkit which...is based on Platt’s Sequential Minimal Optimization algorithm [Pla98]. Since Smox is the best base classifier in the experiments below, it is the
Multicategory Composite Least Squares Classifiers
Park, Seo Young; Liu, Yufeng; Liu, Dacheng; Scholl, Paul
2010-01-01
Classification is a very useful statistical tool for information extraction. In particular, multicategory classification is commonly seen in various applications. Although binary classification problems are heavily studied, extensions to the multicategory case are much less so. In view of the increased complexity and volume of modern statistical problems, it is desirable to have multicategory classifiers that are able to handle problems with high dimensions and with a large number of classes. Moreover, it is necessary to have sound theoretical properties for the multicategory classifiers. In the literature, there exist several different versions of simultaneous multicategory Support Vector Machines (SVMs). However, the computation of the SVM can be difficult for large scale problems, especially for problems with large number of classes. Furthermore, the SVM cannot produce class probability estimation directly. In this article, we propose a novel efficient multicategory composite least squares classifier (CLS classifier), which utilizes a new composite squared loss function. The proposed CLS classifier has several important merits: efficient computation for problems with large number of classes, asymptotic consistency, ability to handle high dimensional data, and simple conditional class probability estimation. Our simulated and real examples demonstrate competitive performance of the proposed approach. PMID:21218128
Yu, Hualong; Ni, Jun
2014-01-01
Training classifiers on skewed data can be technically challenging tasks, especially if the data is high-dimensional simultaneously, the tasks can become more difficult. In biomedicine field, skewed data type often appears. In this study, we try to deal with this problem by combining asymmetric bagging ensemble classifier (asBagging) that has been presented in previous work and an improved random subspace (RS) generation strategy that is called feature subspace (FSS). Specifically, FSS is a novel method to promote the balance level between accuracy and diversity of base classifiers in asBagging. In view of the strong generalization capability of support vector machine (SVM), we adopt it to be base classifier. Extensive experiments on four benchmark biomedicine data sets indicate that the proposed ensemble learning method outperforms many baseline approaches in terms of Accuracy, F-measure, G-mean and AUC evaluation criterions, thus it can be regarded as an effective and efficient tool to deal with high-dimensional and imbalanced biomedical data.
Kamarudin, Nur Diyana; Ooi, Chia Yee; Kawanabe, Tadaaki; Odaguchi, Hiroshi; Kobayashi, Fuminori
2017-01-01
In tongue diagnosis, colour information of tongue body has kept valuable information regarding the state of disease and its correlation with the internal organs. Qualitatively, practitioners may have difficulty in their judgement due to the instable lighting condition and naked eye's ability to capture the exact colour distribution on the tongue especially the tongue with multicolour substance. To overcome this ambiguity, this paper presents a two-stage tongue's multicolour classification based on a support vector machine (SVM) whose support vectors are reduced by our proposed k -means clustering identifiers and red colour range for precise tongue colour diagnosis. In the first stage, k -means clustering is used to cluster a tongue image into four clusters of image background (black), deep red region, red/light red region, and transitional region. In the second-stage classification, red/light red tongue images are further classified into red tongue or light red tongue based on the red colour range derived in our work. Overall, true rate classification accuracy of the proposed two-stage classification to diagnose red, light red, and deep red tongue colours is 94%. The number of support vectors in SVM is improved by 41.2%, and the execution time for one image is recorded as 48 seconds.
Ooi, Chia Yee; Kawanabe, Tadaaki; Odaguchi, Hiroshi; Kobayashi, Fuminori
2017-01-01
In tongue diagnosis, colour information of tongue body has kept valuable information regarding the state of disease and its correlation with the internal organs. Qualitatively, practitioners may have difficulty in their judgement due to the instable lighting condition and naked eye's ability to capture the exact colour distribution on the tongue especially the tongue with multicolour substance. To overcome this ambiguity, this paper presents a two-stage tongue's multicolour classification based on a support vector machine (SVM) whose support vectors are reduced by our proposed k-means clustering identifiers and red colour range for precise tongue colour diagnosis. In the first stage, k-means clustering is used to cluster a tongue image into four clusters of image background (black), deep red region, red/light red region, and transitional region. In the second-stage classification, red/light red tongue images are further classified into red tongue or light red tongue based on the red colour range derived in our work. Overall, true rate classification accuracy of the proposed two-stage classification to diagnose red, light red, and deep red tongue colours is 94%. The number of support vectors in SVM is improved by 41.2%, and the execution time for one image is recorded as 48 seconds. PMID:29065640
Compound Structure-Independent Activity Prediction in High-Dimensional Target Space.
Balfer, Jenny; Hu, Ye; Bajorath, Jürgen
2014-08-01
Profiling of compound libraries against arrays of targets has become an important approach in pharmaceutical research. The prediction of multi-target compound activities also represents an attractive task for machine learning with potential for drug discovery applications. Herein, we have explored activity prediction in high-dimensional target space. Different types of models were derived to predict multi-target activities. The models included naïve Bayesian (NB) and support vector machine (SVM) classifiers based upon compound structure information and NB models derived on the basis of activity profiles, without considering compound structure. Because the latter approach can be applied to incomplete training data and principally depends on the feature independence assumption, SVM modeling was not applicable in this case. Furthermore, iterative hybrid NB models making use of both activity profiles and compound structure information were built. In high-dimensional target space, NB models utilizing activity profile data were found to yield more accurate activity predictions than structure-based NB and SVM models or hybrid models. An in-depth analysis of activity profile-based models revealed the presence of correlation effects across different targets and rationalized prediction accuracy. Taken together, the results indicate that activity profile information can be effectively used to predict the activity of test compounds against novel targets. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Liu, Changhong; Liu, Wei; Lu, Xuzhong; Ma, Fei; Chen, Wei; Yang, Jianbo; Zheng, Lei
2014-01-01
Multispectral imaging with 19 wavelengths in the range of 405-970 nm has been evaluated for nondestructive determination of firmness, total soluble solids (TSS) content and ripeness stage in strawberry fruit. Several analysis approaches, including partial least squares (PLS), support vector machine (SVM) and back propagation neural network (BPNN), were applied to develop theoretical models for predicting the firmness and TSS of intact strawberry fruit. Compared with PLS and SVM, BPNN considerably improved the performance of multispectral imaging for predicting firmness and total soluble solids content with the correlation coefficient (r) of 0.94 and 0.83, SEP of 0.375 and 0.573, and bias of 0.035 and 0.056, respectively. Subsequently, the ability of multispectral imaging technology to classify fruit based on ripeness stage was tested using SVM and principal component analysis-back propagation neural network (PCA-BPNN) models. The higher classification accuracy of 100% was achieved using SVM model. Moreover, the results of all these models demonstrated that the VIS parts of the spectra were the main contributor to the determination of firmness, TSS content estimation and classification of ripeness stage in strawberry fruit. These results suggest that multispectral imaging, together with suitable analysis model, is a promising technology for rapid estimation of quality attributes and classification of ripeness stage in strawberry fruit.
Xu, Xiaoyi; Li, Ao; Wang, Minghui
2015-08-01
Phosphorylation is a crucial post-translational modification, which regulates almost all cellular processes in life. It has long been recognised that protein phosphorylation has close relationship with diseases, and therefore many researches are undertaken to predict phosphorylation sites for disease treatment and drug design. However, despite the success achieved by these approaches, no method focuses on disease-associated phosphorylation sites prediction. Herein, for the first time the authors propose a novel approach that is specially designed to identify associations between phosphorylation sites and human diseases. To take full advantage of local sequence information, a combined feature selection method-based support vector machine (CFS-SVM) that incorporates minimum-redundancy-maximum-relevance filtering process and forward feature selection process is developed. Performance evaluation shows that CFS-SVM is significantly better than the widely used classifiers including Bayesian decision theory, k nearest neighbour and random forest. With the extremely high specificity of 99%, CFS-SVM can still achieve a high sensitivity. Besides, tests on extra data confirm the effectiveness and general applicability of CFS-SVM approach on a variety of diseases. Finally, the analysis of selected features and corresponding kinases also help the understanding of the potential mechanism of disease-phosphorylation relationships and guide further experimental validations.
On Utilizing Optimal and Information Theoretic Syntactic Modeling for Peptide Classification
NASA Astrophysics Data System (ADS)
Aygün, Eser; Oommen, B. John; Cataltepe, Zehra
Syntactic methods in pattern recognition have been used extensively in bioinformatics, and in particular, in the analysis of gene and protein expressions, and in the recognition and classification of bio-sequences. These methods are almost universally distance-based. This paper concerns the use of an Optimal and Information Theoretic (OIT) probabilistic model [11] to achieve peptide classification using the information residing in their syntactic representations. The latter has traditionally been achieved using the edit distances required in the respective peptide comparisons. We advocate that one can model the differences between compared strings as a mutation model consisting of random Substitutions, Insertions and Deletions (SID) obeying the OIT model. Thus, in this paper, we show that the probability measure obtained from the OIT model can be perceived as a sequence similarity metric, using which a Support Vector Machine (SVM)-based peptide classifier, referred to as OIT_SVM, can be devised.
Nonlinear Classification of AVO Attributes Using SVM
NASA Astrophysics Data System (ADS)
Zhao, B.; Zhou, H.
2005-05-01
A key research topic in reservoir characterization is the detection of the presence of fluids using seismic and well-log data. In particular, partial gas discrimination is very challenging because low and high gas saturation can result in similar anomalies in terms of Amplitude Variation with Offset (AVO), bright spot, and velocity sag. Hence, a successful fluid detection will require a good understanding of the seismic signatures of the fluids, high-quality data, and good detection methodology. Traditional attempts of partial gas discrimination employ the Neural Network algorithm. A new approach is to use the Support Vector Machine (SVM) (Vapnik, 1995; Liu and Sacchi, 2003). While the potential of the SVM has not been fully explored for reservoir fluid detection, the current nonlinear methods classify seismic attributes without the use of rock physics constraints. The objective of this study is to improve the capability of distinguishing a fizz-water reservoir from a commercial gas reservoir by developing a new detection method using AVO attributes and rock physics constraints. This study will first test the SVM classification with synthetic data, and then apply the algorithm to field data from the King-Kong and Lisa-Anne fields in Gulf of Mexico. While both field areas have high amplitude seismic anomalies, King-Kong field produces commercial gas but Lisa-Anne field does not. We expect that the new SVM-based nonlinear classification of AVO attributes may be able to separate commercial gas from fizz-water in these two fields.
Machine learning algorithms to classify spinal muscular atrophy subtypes.
Srivastava, Tuhin; Darras, Basil T; Wu, Jim S; Rutkove, Seward B
2012-07-24
The development of better biomarkers for disease assessment remains an ongoing effort across the spectrum of neurologic illnesses. One approach for refining biomarkers is based on the concept of machine learning, in which individual, unrelated biomarkers are simultaneously evaluated. In this cross-sectional study, we assess the possibility of using machine learning, incorporating both quantitative muscle ultrasound (QMU) and electrical impedance myography (EIM) data, for classification of muscles affected by spinal muscular atrophy (SMA). Twenty-one normal subjects, 15 subjects with SMA type 2, and 10 subjects with SMA type 3 underwent EIM and QMU measurements of unilateral biceps, wrist extensors, quadriceps, and tibialis anterior. EIM and QMU parameters were then applied in combination using a support vector machine (SVM), a type of machine learning, in an attempt to accurately categorize 165 individual muscles. For all 3 classification problems, normal vs SMA, normal vs SMA 3, and SMA 2 vs SMA 3, use of SVM provided the greatest accuracy in discrimination, surpassing both EIM and QMU individually. For example, the accuracy, as measured by the receiver operating characteristic area under the curve (ROC-AUC) for the SVM discriminating SMA 2 muscles from SMA 3 muscles was 0.928; in comparison, the ROC-AUCs for EIM and QMU parameters alone were only 0.877 (p < 0.05) and 0.627 (p < 0.05), respectively. Combining EIM and QMU data categorizes individual SMA-affected muscles with very high accuracy. Further investigation of this approach for classifying and for following the progression of neuromuscular illness is warranted.
Fast mental states decoding in mixed reality.
De Massari, Daniele; Pacheco, Daniel; Malekshahi, Rahim; Betella, Alberto; Verschure, Paul F M J; Birbaumer, Niels; Caria, Andrea
2014-01-01
The combination of Brain-Computer Interface (BCI) technology, allowing online monitoring and decoding of brain activity, with virtual and mixed reality (MR) systems may help to shape and guide implicit and explicit learning using ecological scenarios. Real-time information of ongoing brain states acquired through BCI might be exploited for controlling data presentation in virtual environments. Brain states discrimination during mixed reality experience is thus critical for adapting specific data features to contingent brain activity. In this study we recorded electroencephalographic (EEG) data while participants experienced MR scenarios implemented through the eXperience Induction Machine (XIM). The XIM is a novel framework modeling the integration of a sensing system that evaluates and measures physiological and psychological states with a number of actuators and effectors that coherently reacts to the user's actions. We then assessed continuous EEG-based discrimination of spatial navigation, reading and calculation performed in MR, using linear discriminant analysis (LDA) and support vector machine (SVM) classifiers. Dynamic single trial classification showed high accuracy of LDA and SVM classifiers in detecting multiple brain states as well as in differentiating between high and low mental workload, using a 5 s time-window shifting every 200 ms. Our results indicate overall better performance of LDA with respect to SVM and suggest applicability of our approach in a BCI-controlled MR scenario. Ultimately, successful prediction of brain states might be used to drive adaptation of data representation in order to boost information processing in MR.
Fast mental states decoding in mixed reality
De Massari, Daniele; Pacheco, Daniel; Malekshahi, Rahim; Betella, Alberto; Verschure, Paul F. M. J.; Birbaumer, Niels; Caria, Andrea
2014-01-01
The combination of Brain-Computer Interface (BCI) technology, allowing online monitoring and decoding of brain activity, with virtual and mixed reality (MR) systems may help to shape and guide implicit and explicit learning using ecological scenarios. Real-time information of ongoing brain states acquired through BCI might be exploited for controlling data presentation in virtual environments. Brain states discrimination during mixed reality experience is thus critical for adapting specific data features to contingent brain activity. In this study we recorded electroencephalographic (EEG) data while participants experienced MR scenarios implemented through the eXperience Induction Machine (XIM). The XIM is a novel framework modeling the integration of a sensing system that evaluates and measures physiological and psychological states with a number of actuators and effectors that coherently reacts to the user's actions. We then assessed continuous EEG-based discrimination of spatial navigation, reading and calculation performed in MR, using linear discriminant analysis (LDA) and support vector machine (SVM) classifiers. Dynamic single trial classification showed high accuracy of LDA and SVM classifiers in detecting multiple brain states as well as in differentiating between high and low mental workload, using a 5 s time-window shifting every 200 ms. Our results indicate overall better performance of LDA with respect to SVM and suggest applicability of our approach in a BCI-controlled MR scenario. Ultimately, successful prediction of brain states might be used to drive adaptation of data representation in order to boost information processing in MR. PMID:25505878
Support vector machines classifiers of physical activities in preschoolers
USDA-ARS?s Scientific Manuscript database
The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...
USDA-ARS?s Scientific Manuscript database
This paper presents a novel wrinkle evaluation method that uses modified wavelet coefficients and an optimized support-vector-machine (SVM) classification scheme to characterize and classify wrinkle appearance of fabric. Fabric images were decomposed with the wavelet transform (WT), and five parame...
A 15-gene signature for prediction of colon cancer recurrence and prognosis based on SVM.
Xu, Guangru; Zhang, Minghui; Zhu, Hongxing; Xu, Jinhua
2017-03-10
To screen the gene signature for distinguishing patients with high risks from those with low-risks for colon cancer recurrence and predicting their prognosis. Five microarray datasets of colon cancer samples were collected from Gene Expression Omnibus database and one was obtained from The Cancer Genome Atlas (TCGA). After preprocessing, data in GSE17537 were analyzed using the Linear Models for Microarray data (LIMMA) method to identify the differentially expressed genes (DEGs). The DEGs further underwent PPI network-based neighborhood scoring and support vector machine (SVM) analyses to screen the feature genes associated with recurrence and prognosis, which were then validated by four datasets GSE38832, GSE17538, GSE28814 and TCGA using SVM and Cox regression analyses. A total of 1207 genes were identified as DEGs between recurrence and no-recurrence samples, including 726 downregulated and 481 upregulated genes. Using SVM analysis and five gene expression profile data confirmation, a 15-gene signature (HES5, ZNF417, GLRA2, OR8D2, HOXA7, FABP6, MUSK, HTR6, GRIP2, KLRK1, VEGFA, AKAP12, RHEB, NCRNA00152 and PMEPA1) were identified as a predictor of recurrence risk and prognosis for colon cancer patients. Our identified 15-gene signature may be useful to classify colon cancer patients with different prognosis and some genes in this signature may represent new therapeutic targets. Copyright © 2016. Published by Elsevier B.V.
Super short term forecasting of photovoltaic power generation output in micro grid
NASA Astrophysics Data System (ADS)
Gong, Cheng; Ma, Longfei; Chi, Zhongjun; Zhang, Baoqun; Jiao, Ran; Yang, Bing; Chen, Jianshu; Zeng, Shuang
2017-01-01
The prediction model combining data mining and support vector machine (SVM) was built. Which provide information of photovoltaic (PV) power generation output for economic operation and optimal control of micro gird, and which reduce influence of power system from PV fluctuation. Because of the characteristic which output of PV rely on radiation intensity, ambient temperature, cloudiness, etc., so data mining was brought in. This technology can deal with large amounts of historical data and eliminate superfluous data, by using fuzzy classifier of daily type and grey related degree. The model of SVM was built, which can dock with information from data mining. Based on measured data from a small PV station, the prediction model was tested. The numerical example shows that the prediction model is fast and accurate.
Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization.
Nishio, Mizuho; Nishizawa, Mitsuo; Sugiyama, Osamu; Kojima, Ryosuke; Yakami, Masahiro; Kuroda, Tomohiro; Togashi, Kaori
2018-01-01
We aimed to evaluate a computer-aided diagnosis (CADx) system for lung nodule classification focussing on (i) usefulness of the conventional CADx system (hand-crafted imaging feature + machine learning algorithm), (ii) comparison between support vector machine (SVM) and gradient tree boosting (XGBoost) as machine learning algorithms, and (iii) effectiveness of parameter optimization using Bayesian optimization and random search. Data on 99 lung nodules (62 lung cancers and 37 benign lung nodules) were included from public databases of CT images. A variant of the local binary pattern was used for calculating a feature vector. SVM or XGBoost was trained using the feature vector and its corresponding label. Tree Parzen Estimator (TPE) was used as Bayesian optimization for parameters of SVM and XGBoost. Random search was done for comparison with TPE. Leave-one-out cross-validation was used for optimizing and evaluating the performance of our CADx system. Performance was evaluated using area under the curve (AUC) of receiver operating characteristic analysis. AUC was calculated 10 times, and its average was obtained. The best averaged AUC of SVM and XGBoost was 0.850 and 0.896, respectively; both were obtained using TPE. XGBoost was generally superior to SVM. Optimal parameters for achieving high AUC were obtained with fewer numbers of trials when using TPE, compared with random search. Bayesian optimization of SVM and XGBoost parameters was more efficient than random search. Based on observer study, AUC values of two board-certified radiologists were 0.898 and 0.822. The results show that diagnostic accuracy of our CADx system was comparable to that of radiologists with respect to classifying lung nodules.
Probabilistic topic modeling for the analysis and classification of genomic sequences
2015-01-01
Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734
Non-Mutually Exclusive Deep Neural Network Classifier for Combined Modes of Bearing Fault Diagnosis.
Duong, Bach Phi; Kim, Jong-Myon
2018-04-07
The simultaneous occurrence of various types of defects in bearings makes their diagnosis more challenging owing to the resultant complexity of the constituent parts of the acoustic emission (AE) signals. To address this issue, a new approach is proposed in this paper for the detection of multiple combined faults in bearings. The proposed methodology uses a deep neural network (DNN) architecture to effectively diagnose the combined defects. The DNN structure is based on the stacked denoising autoencoder non-mutually exclusive classifier (NMEC) method for combined modes. The NMEC-DNN is trained using data for a single fault and it classifies both single faults and multiple combined faults. The results of experiments conducted on AE data collected through an experimental test-bed demonstrate that the DNN achieves good classification performance with a maximum accuracy of 95%. The proposed method is compared with a multi-class classifier based on support vector machines (SVMs). The NMEC-DNN yields better diagnostic performance in comparison to the multi-class classifier based on SVM. The NMEC-DNN reduces the number of necessary data collections and improves the bearing fault diagnosis performance.
Automated classification of neurological disorders of gait using spatio-temporal gait parameters.
Pradhan, Cauchy; Wuehr, Max; Akrami, Farhoud; Neuhaeusser, Maximilian; Huth, Sabrina; Brandt, Thomas; Jahn, Klaus; Schniepp, Roman
2015-04-01
Automated pattern recognition systems have been used for accurate identification of neurological conditions as well as the evaluation of the treatment outcomes. This study aims to determine the accuracy of diagnoses of (oto-)neurological gait disorders using different types of automated pattern recognition techniques. Clinically confirmed cases of phobic postural vertigo (N = 30), cerebellar ataxia (N = 30), progressive supranuclear palsy (N = 30), bilateral vestibulopathy (N = 30), as well as healthy subjects (N = 30) were recruited for the study. 8 measurements with 136 variables using a GAITRite(®) sensor carpet were obtained from each subject. Subjects were randomly divided into two groups (training cases and validation cases). Sensitivity and specificity of k-nearest neighbor (KNN), naive-bayes classifier (NB), artificial neural network (ANN), and support vector machine (SVM) in classifying the validation cases were calculated. ANN and SVM had the highest overall sensitivity with 90.6% and 92.0% respectively, followed by NB (76.0%) and KNN (73.3%). SVM and ANN showed high false negative rates for bilateral vestibulopathy cases (20.0% and 26.0%); while KNN and NB had high false negative rates for progressive supranuclear palsy cases (76.7% and 40.0%). Automated pattern recognition systems are able to identify pathological gait patterns and establish clinical diagnosis with good accuracy. SVM and ANN in particular differentiate gait patterns of several distinct oto-neurological disorders of gait with high sensitivity and specificity compared to KNN and NB. Both SVM and ANN appear to be a reliable diagnostic and management tool for disorders of gait. Copyright © 2015 Elsevier Ltd. All rights reserved.
Trainor, Patrick J; DeFilippis, Andrew P; Rai, Shesh N
2017-06-21
Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k -Nearest Neighbors ( k -NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction, classifier performance (least to greatest error) was ranked as follows: SVM, Random Forest, Naïve Bayes, sPLS-DA, Neural Networks, PLS-DA and k -NN classifiers. When non-normal error distributions were introduced, the performance of PLS-DA and k -NN classifiers deteriorated further relative to the remaining techniques. Over the real datasets, a trend of better performance of SVM and Random Forest classifier performance was observed.
McAllister, Patrick; Zheng, Huiru; Bond, Raymond; Moorhead, Anne
2018-04-01
Obesity is increasing worldwide and can cause many chronic conditions such as type-2 diabetes, heart disease, sleep apnea, and some cancers. Monitoring dietary intake through food logging is a key method to maintain a healthy lifestyle to prevent and manage obesity. Computer vision methods have been applied to food logging to automate image classification for monitoring dietary intake. In this work we applied pretrained ResNet-152 and GoogleNet convolutional neural networks (CNNs), initially trained using ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset with MatConvNet package, to extract features from food image datasets; Food 5K, Food-11, RawFooT-DB, and Food-101. Deep features were extracted from CNNs and used to train machine learning classifiers including artificial neural network (ANN), support vector machine (SVM), Random Forest, and Naive Bayes. Results show that using ResNet-152 deep features with SVM with RBF kernel can accurately detect food items with 99.4% accuracy using Food-5K validation food image dataset and 98.8% with Food-5K evaluation dataset using ANN, SVM-RBF, and Random Forest classifiers. Trained with ResNet-152 features, ANN can achieve 91.34%, 99.28% when applied to Food-11 and RawFooT-DB food image datasets respectively and SVM with RBF kernel can achieve 64.98% with Food-101 image dataset. From this research it is clear that using deep CNN features can be used efficiently for diverse food item image classification. The work presented in this research shows that pretrained ResNet-152 features provide sufficient generalisation power when applied to a range of food image classification tasks. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Qingjie; Xin, Jingmin; Wu, Jiayi; Zheng, Nanning
2017-03-01
Microaneurysms are the earliest clinic signs of diabetic retinopathy, and many algorithms were developed for the automatic classification of these specific pathology. However, the imbalanced class distribution of dataset usually causes the classification accuracy of true microaneurysms be low. Therefore, by combining the borderline synthetic minority over-sampling technique (BSMOTE) with the data cleaning techniques such as Tomek links and Wilson's edited nearest neighbor rule (ENN) to resample the imbalanced dataset, we propose two new support vector machine (SVM) classification algorithms for the microaneurysms. The proposed BSMOTE-Tomek and BSMOTE-ENN algorithms consist of: 1) the adaptive synthesis of the minority samples in the neighborhood of the borderline, and 2) the remove of redundant training samples for improving the efficiency of data utilization. Moreover, the modified SVM classifier with probabilistic outputs is used to divide the microaneurysm candidates into two groups: true microaneurysms and false microaneurysms. The experiments with a public microaneurysms database shows that the proposed algorithms have better classification performance including the receiver operating characteristic (ROC) curve and the free-response receiver operating characteristic (FROC) curve.
Automated classification of four types of developmental odontogenic cysts.
Frydenlund, A; Eramian, M; Daley, T
2014-04-01
Odontogenic cysts originate from remnants of the tooth forming epithelium in the jaws and gingiva. There are various kinds of such cysts with different biological behaviours that carry different patient risks and require different treatment plans. Types of odontogenic cysts can be distinguished by the properties of their epithelial layers in H&E stained samples. Herein we detail a set of image features for automatically distinguishing between four types of odontogenic cyst in digital micrographs and evaluate their effectiveness using two statistical classifiers - a support vector machine (SVM) and bagging with logistic regression as the base learner (BLR). Cyst type was correctly predicted from among four classes of odontogenic cysts between 83.8% and 92.3% of the time with an SVM and between 90 ± 0.92% and 95.4 ± 1.94% with a BLR. One particular cyst type was associated with the majority of misclassifications. Omission of this cyst type from the data set improved the classification rate for the remaining three cyst types to 96.2% for both SVM and BLR. Copyright © 2013 Elsevier Ltd. All rights reserved.
Classifying machinery condition using oil samples and binary logistic regression
NASA Astrophysics Data System (ADS)
Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.
2015-08-01
The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.
NASA Astrophysics Data System (ADS)
Kose, Kivanc; Bozkurt, Alican; Ariafar, Setareh; Alessi-Fox, Christi A.; Gill, Melissa; Dy, Jennifer G.; Brooks, Dana H.; Rajadhyaksha, Milind
2017-02-01
In this study we present a deep learning based classification algorithm for discriminating morphological patterns that appear in RCM mosaics of melanocytic lesions collected at the dermal epidermal junction (DEJ). These patterns are classified into 6 distinct types in the literature: background, meshwork, ring, clod, mixed, and aspecific. Clinicians typically identify these morphological patterns by examination of their textural appearance at 10X magnification. To mimic this process we divided mosaics into smaller regions, which we call tiles, and classify each tile in a deep learning framework. We used previously acquired DEJ mosaics of lesions deemed clinically suspicious, from 20 different patients, which were then labelled according to those 6 types by 2 expert users. We tried three different approaches for classification, all starting with a publicly available convolutional neural network (CNN) trained on natural image, consisting of a series of convolutional layers followed by a series of fully connected layers: (1) We fine-tuned this network using training data from the dataset. (2) Instead, we added an additional fully connected layer before the output layer network and then re-trained only last two layers, (3) We used only the CNN convolutional layers as a feature extractor, encoded the features using a bag of words model, and trained a support vector machine (SVM) classifier. Sensitivity and specificity were generally comparable across the three methods, and in the same ranges as our previous work using SURF features with SVM . Approach (3) was less computationally intensive to train but more sensitive to unbalanced representation of the 6 classes in the training data. However we expect CNN performance to improve as we add more training data because both the features and the classifier are learned jointly from the data. *First two authors share first authorship.
NASA Astrophysics Data System (ADS)
Marwaha, Richa; Kumar, Anil; Kumar, Arumugam Senthil
2015-01-01
Our primary objective was to explore a classification algorithm for thermal hyperspectral data. Minimum noise fraction is applied to thermal hyperspectral data and eight pixel-based classifiers, i.e., constrained energy minimization, matched filter, spectral angle mapper (SAM), adaptive coherence estimator, orthogonal subspace projection, mixture-tuned matched filter, target-constrained interference-minimized filter, and mixture-tuned target-constrained interference minimized filter are tested. The long-wave infrared (LWIR) has not yet been exploited for classification purposes. The LWIR data contain emissivity and temperature information about an object. A highest overall accuracy of 90.99% was obtained using the SAM algorithm for the combination of thermal data with a colored digital photograph. Similarly, an object-oriented approach is applied to thermal data. The image is segmented into meaningful objects based on properties such as geometry, length, etc., which are grouped into pixels using a watershed algorithm and an applied supervised classification algorithm, i.e., support vector machine (SVM). The best algorithm in the pixel-based category is the SAM technique. SVM is useful for thermal data, providing a high accuracy of 80.00% at a scale value of 83 and a merge value of 90, whereas for the combination of thermal data with a colored digital photograph, SVM gives the highest accuracy of 85.71% at a scale value of 82 and a merge value of 90.
PCA-HOG symmetrical feature based diseased cell detection
NASA Astrophysics Data System (ADS)
Wan, Min-jie
2016-04-01
A histogram of oriented gradient (HOG) feature is applied to the field of diseased cell detection, which can detect diseased cells in high resolution tissue images rapidly, accurately and efficiently. Firstly, motivated by symmetrical cellular forms, a new HOG symmetrical feature based on the traditional HOG feature is proposed to meet the condition of cell detection. Secondly, considering the high feature dimension of traditional HOG feature leads to plenty of memory resources and long runtime in practical applications, a classical dimension reduction method called principal component analysis (PCA) is used to reduce the dimension of high-dimensional HOG descriptor. Because of that, computational speed is increased greatly, and the accuracy of detection can be controlled in a proper range at the same time. Thirdly, support vector machine (SVM) classifier is trained with PCA-HOG symmetrical features proposed above. At last, practical tissue images is detected and analyzed by SVM classifier. In order to verify the effectiveness of this new algorithm, it is practically applied to conduct diseased cell detection which takes 200 pieces of H&E (hematoxylin & eosin) high resolution staining histopathological images collected from 20 breast cancer patients as a sample. The experiment shows that the average processing rate can be 25 frames per second and the detection accuracy can be 92.1%.
Mirsky, Simcha K; Barnea, Itay; Levi, Mattan; Greenspan, Hayit; Shaked, Natan T
2017-09-01
Currently, the delicate process of selecting sperm cells to be used for in vitro fertilization (IVF) is still based on the subjective, qualitative analysis of experienced clinicians using non-quantitative optical microscopy techniques. In this work, a method was developed for the automated analysis of sperm cells based on the quantitative phase maps acquired through use of interferometric phase microscopy (IPM). Over 1,400 human sperm cells from 8 donors were imaged using IPM, and an algorithm was designed to digitally isolate sperm cell heads from the quantitative phase maps while taking into consideration both the cell 3D morphology and contents, as well as acquire features describing sperm head morphology. A subset of these features was used to train a support vector machine (SVM) classifier to automatically classify sperm of good and bad morphology. The SVM achieves an area under the receiver operating characteristic curve of 88.59% and an area under the precision-recall curve of 88.67%, as well as precisions of 90% or higher. We believe that our automatic analysis can become the basis for objective and automatic sperm cell selection in IVF. © 2017 International Society for Advancement of Cytometry. © 2017 International Society for Advancement of Cytometry.
Aksu, Yaman; Miller, David J; Kesidis, George; Yang, Qing X
2010-05-01
Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature "markers." For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both improved margin and improved generalization, compared with RFE. Moreover, for the case of a nonlinear kernel, we show that RFE assumes that the squared weight vector 2-norm is strictly decreasing as features are eliminated. We demonstrate this is not true for the Gaussian kernel and, consequently, RFE may give poor results in this case. MFE for nonlinear kernels gives better margin and generalization. We also present an extension which achieves further margin gains, by optimizing only two degrees of freedom--the hyperplane's intercept and its squared 2-norm--with the weight vector orientation fixed. We finally introduce an extension that allows margin slackness. We compare against several alternatives, including RFE and a linear programming method that embeds feature selection within the classifier design. On high-dimensional gene microarray data sets, University of California at Irvine (UCI) repository data sets, and Alzheimer's disease brain image data, MFE methods give promising results.
Heidelberg Retina Tomograph 3 machine learning classifiers for glaucoma detection
Townsend, K A; Wollstein, G; Danks, D; Sung, K R; Ishikawa, H; Kagemann, L; Gabriele, M L; Schuman, J S
2010-01-01
Aims To assess performance of classifiers trained on Heidelberg Retina Tomograph 3 (HRT3) parameters for discriminating between healthy and glaucomatous eyes. Methods Classifiers were trained using HRT3 parameters from 60 healthy subjects and 140 glaucomatous subjects. The classifiers were trained on all 95 variables and smaller sets created with backward elimination. Seven types of classifiers, including Support Vector Machines with radial basis (SVM-radial), and Recursive Partitioning and Regression Trees (RPART), were trained on the parameters. The area under the ROC curve (AUC) was calculated for classifiers, individual parameters and HRT3 glaucoma probability scores (GPS). Classifier AUCs and leave-one-out accuracy were compared with the highest individual parameter and GPS AUCs and accuracies. Results The highest AUC and accuracy for an individual parameter were 0.848 and 0.79, for vertical cup/disc ratio (vC/D). For GPS, global GPS performed best with AUC 0.829 and accuracy 0.78. SVM-radial with all parameters showed significant improvement over global GPS and vC/ D with AUC 0.916 and accuracy 0.85. RPART with all parameters provided significant improvement over global GPS with AUC 0.899 and significant improvement over global GPS and vC/D with accuracy 0.875. Conclusions Machine learning classifiers of HRT3 data provide significant enhancement over current methods for detection of glaucoma. PMID:18523087
A method of real-time fault diagnosis for power transformers based on vibration analysis
NASA Astrophysics Data System (ADS)
Hong, Kaixing; Huang, Hai; Zhou, Jianping; Shen, Yimin; Li, Yujie
2015-11-01
In this paper, a novel probability-based classification model is proposed for real-time fault detection of power transformers. First, the transformer vibration principle is introduced, and two effective feature extraction techniques are presented. Next, the details of the classification model based on support vector machine (SVM) are shown. The model also includes a binary decision tree (BDT) which divides transformers into different classes according to health state. The trained model produces posterior probabilities of membership to each predefined class for a tested vibration sample. During the experiments, the vibrations of transformers under different conditions are acquired, and the corresponding feature vectors are used to train the SVM classifiers. The effectiveness of this model is illustrated experimentally on typical in-service transformers. The consistency between the results of the proposed model and the actual condition of the test transformers indicates that the model can be used as a reliable method for transformer fault detection.
Recognition Using Hybrid Classifiers.
Osadchy, Margarita; Keren, Daniel; Raviv, Dolev
2016-04-01
A canonical problem in computer vision is category recognition (e.g., find all instances of human faces, cars etc., in an image). Typically, the input for training a binary classifier is a relatively small sample of positive examples, and a huge sample of negative examples, which can be very diverse, consisting of images from a large number of categories. The difficulty of the problem sharply increases with the dimension and size of the negative example set. We propose to alleviate this problem by applying a "hybrid" classifier, which replaces the negative samples by a prior, and then finds a hyperplane which separates the positive samples from this prior. The method is extended to kernel space and to an ensemble-based approach. The resulting binary classifiers achieve an identical or better classification rate than SVM, while requiring far smaller memory and lower computational complexity to train and apply.
Classification of Regional Ionospheric Disturbances Based on Support Vector Machines
NASA Astrophysics Data System (ADS)
Begüm Terzi, Merve; Arikan, Feza; Arikan, Orhan; Karatay, Secil
2016-07-01
Ionosphere is an anisotropic, inhomogeneous, time varying and spatio-temporally dispersive medium whose parameters can be estimated almost always by using indirect measurements. Geomagnetic, gravitational, solar or seismic activities cause variations of ionosphere at various spatial and temporal scales. This complex spatio-temporal variability is challenging to be identified due to extensive scales in period, duration, amplitude and frequency of disturbances. Since geomagnetic and solar indices such as Disturbance storm time (Dst), F10.7 solar flux, Sun Spot Number (SSN), Auroral Electrojet (AE), Kp and W-index provide information about variability on a global scale, identification and classification of regional disturbances poses a challenge. The main aim of this study is to classify the regional effects of global geomagnetic storms and classify them according to their risk levels. For this purpose, Total Electron Content (TEC) estimated from GPS receivers, which is one of the major parameters of ionosphere, will be used to model the regional and local variability that differs from global activity along with solar and geomagnetic indices. In this work, for the automated classification of the regional disturbances, a classification technique based on a robust machine learning technique that have found wide spread use, Support Vector Machine (SVM) is proposed. SVM is a supervised learning model used for classification with associated learning algorithm that analyze the data and recognize patterns. In addition to performing linear classification, SVM can efficiently perform nonlinear classification by embedding data into higher dimensional feature spaces. Performance of the developed classification technique is demonstrated for midlatitude ionosphere over Anatolia using TEC estimates generated from the GPS data provided by Turkish National Permanent GPS Network (TNPGN-Active) for solar maximum year of 2011. As a result of implementing the developed classification technique to the Global Ionospheric Map (GIM) TEC data which is provided by the NASA Jet Propulsion Laboratory (JPL), it will be shown that SVM can be a suitable learning method to detect the anomalies in Total Electron Content (TEC) variations. This study is supported by TUBITAK 114E541 project as a part of the Scientific and Technological Research Projects Funding Program (1001).
Machine learning-based methods for prediction of linear B-cell epitopes.
Wang, Hsin-Wei; Pai, Tun-Wen
2014-01-01
B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.
Farhan, Saima; Fahiem, Muhammad Abuzar; Tauseef, Huma
2014-01-01
Structural brain imaging is playing a vital role in identification of changes that occur in brain associated with Alzheimer's disease. This paper proposes an automated image processing based approach for the identification of AD from MRI of the brain. The proposed approach is novel in a sense that it has higher specificity/accuracy values despite the use of smaller feature set as compared to existing approaches. Moreover, the proposed approach is capable of identifying AD patients in early stages. The dataset selected consists of 85 age and gender matched individuals from OASIS database. The features selected are volume of GM, WM, and CSF and size of hippocampus. Three different classification models (SVM, MLP, and J48) are used for identification of patients and controls. In addition, an ensemble of classifiers, based on majority voting, is adopted to overcome the error caused by an independent base classifier. Ten-fold cross validation strategy is applied for the evaluation of our scheme. Moreover, to evaluate the performance of proposed approach, individual features and combination of features are fed to individual classifiers and ensemble based classifier. Using size of left hippocampus as feature, the accuracy achieved with ensemble of classifiers is 93.75%, with 100% specificity and 87.5% sensitivity.
Automatic Cataract Hardness Classification Ex Vivo by Ultrasound Techniques.
Caixinha, Miguel; Santos, Mário; Santos, Jaime
2016-04-01
To demonstrate the feasibility of a new methodology for cataract hardness characterization and automatic classification using ultrasound techniques, different cataract degrees were induced in 210 porcine lenses. A 25-MHz ultrasound transducer was used to obtain acoustical parameters (velocity and attenuation) and backscattering signals. B-Scan and parametric Nakagami images were constructed. Ninety-seven parameters were extracted and subjected to a Principal Component Analysis. Bayes, K-Nearest-Neighbours, Fisher Linear Discriminant and Support Vector Machine (SVM) classifiers were used to automatically classify the different cataract severities. Statistically significant increases with cataract formation were found for velocity, attenuation, mean brightness intensity of the B-Scan images and mean Nakagami m parameter (p < 0.01). The four classifiers showed a good performance for healthy versus cataractous lenses (F-measure ≥ 92.68%), while for initial versus severe cataracts the SVM classifier showed the higher performance (90.62%). The results showed that ultrasound techniques can be used for non-invasive cataract hardness characterization and automatic classification. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Classifying High-noise EEG in Complex Environments for Brain-computer Interaction Technologies
2012-02-01
differentiation in the brain signal that our classification approach seeks to identify despite the noise in the recorded EEG signal and the complexity of...performed two offline classifications , one using BCILab (1), the other using LibSVM (2). Distinct classifiers were trained for each individual in...order to improve individual classifier performance (3). The highest classification performance results were obtained using individual frequency bands
Yousef, Malik; Khalifa, Waleed; AbedAllah, Loai
2016-12-22
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that ECkNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Yousef, Malik; Khalifa, Waleed; AbdAllah, Loai
2016-12-01
The performance of many learning and data mining algorithms depends critically on suitable metrics to assess efficiency over the input space. Learning a suitable metric from examples may, therefore, be the key to successful application of these algorithms. We have demonstrated that the k-nearest neighbor (kNN) classification can be significantly improved by learning a distance metric from labeled examples. The clustering ensemble is used to define the distance between points in respect to how they co-cluster. This distance is then used within the framework of the kNN algorithm to define a classifier named ensemble clustering kNN classifier (EC-kNN). In many instances in our experiments we achieved highest accuracy while SVM failed to perform as well. In this study, we compare the performance of a two-class classifier using EC-kNN with different one-class and two-class classifiers. The comparison was applied to seven different plant microRNA species considering eight feature selection methods. In this study, the averaged results show that EC-kNN outperforms all other methods employed here and previously published results for the same data. In conclusion, this study shows that the chosen classifier shows high performance when the distance metric is carefully chosen.
Villa-Manríquez, J F; Castro-Ramos, J; Gutiérrez-Delgado, F; Lopéz-Pacheco, M A; Villanueva-Luna, A E
2017-08-01
In this study we identify and classify high and low levels of glycated hemoglobin (HbA1c) in healthy volunteers (HV) and diabetic patients (DP). Overall, 86 subjects were evaluated. The Raman spectrum was measured in three anatomical regions of the body: index fingertip, right ear lobe, and forehead. The measurements were performed to compare the difference between the HV and DP (22 well controlled diabetic patients (WCDP) (HbA1c <6.5%), and 49 not controlled diabetic patients (NCDP) (HbA1c ≥6.5%)). Multivariable methods such as principal components analysis (PCA) combined with support vector machine (SVM) were used to develop effective diagnostic algorithms for classification among these groups. The forehead of HV versus WCDP showed the highest sensitivity (100%) and specificity (100%). Sensitivity (100%) and specificity (60%), were highest in the forehead of WCDP, versus NCDP. In HV versus NCDP, the fingertip had the highest sensitivity (100%) and specificity (80%). The efficacy of the diagnostic algorithm by receiver operating characteristic (ROC) curve was confirmed. Overall, our study demonstrated that the combination of Raman spectroscopy and PCA-SVM are feasible non-invasive diagnostic tool in diabetes to classify qualitatively high and low levels of HbA1c in vivo. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Deep neural mapping support vector machines.
Li, Yujian; Zhang, Ting
2017-09-01
The choice of kernel has an important effect on the performance of a support vector machine (SVM). The effect could be reduced by NEUROSVM, an architecture using multilayer perceptron for feature extraction and SVM for classification. In binary classification, a general linear kernel NEUROSVM can be theoretically simplified as an input layer, many hidden layers, and an SVM output layer. As a feature extractor, the sub-network composed of the input and hidden layers is first trained together with a virtual ordinary output layer by backpropagation, then with the output of its last hidden layer taken as input of the SVM classifier for further training separately. By taking the sub-network as a kernel mapping from the original input space into a feature space, we present a novel model, called deep neural mapping support vector machine (DNMSVM), from the viewpoint of deep learning. This model is also a new and general kernel learning method, where the kernel mapping is indeed an explicit function expressed as a sub-network, different from an implicit function induced by a kernel function traditionally. Moreover, we exploit a two-stage procedure of contrastive divergence learning and gradient descent for DNMSVM to jointly training an adaptive kernel mapping instead of a kernel function, without requirement of kernel tricks. As a whole of the sub-network and the SVM classifier, the joint training of DNMSVM is done by using gradient descent to optimize the objective function with the sub-network layer-wise pre-trained via contrastive divergence learning of restricted Boltzmann machines. Compared to the separate training of NEUROSVM, the joint training is a new algorithm for DNMSVM to have advantages over NEUROSVM. Experimental results show that DNMSVM can outperform NEUROSVM and RBFSVM (i.e., SVM with the kernel of radial basis function), demonstrating its effectiveness. Copyright © 2017 Elsevier Ltd. All rights reserved.
Classification of Partial Discharge Measured under Different Levels of Noise Contamination
2017-01-01
Cable joint insulation breakdown may cause a huge loss to power companies. Therefore, it is vital to diagnose the insulation quality to detect early signs of insulation failure. It is well known that there is a correlation between Partial discharge (PD) and the insulation quality. Although many works have been done on PD pattern recognition, it is usually performed in a noise free environment. Also, works on PD pattern recognition in actual cable joint are less likely to be found in literature. Therefore, in this work, classifications of actual cable joint defect types from partial discharge data contaminated by noise were performed. Five cross-linked polyethylene (XLPE) cable joints with artificially created defects were prepared based on the defects commonly encountered on site. Three different types of input feature were extracted from the PD pattern under artificially created noisy environment. These include statistical features, fractal features and principal component analysis (PCA) features. These input features were used to train the classifiers to classify each PD defect types. Classifications were performed using three different artificial intelligence classifiers, which include Artificial Neural Networks (ANN), Adaptive Neuro-Fuzzy Inference System (ANFIS) and Support Vector Machine (SVM). It was found that the classification accuracy decreases with higher noise level but PCA features used in SVM and ANN showed the strongest tolerance against noise contamination. PMID:28085953
Fast and robust segmentation of white blood cell images by self-supervised learning.
Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo
2018-04-01
A fast and accurate white blood cell (WBC) segmentation remains a challenging task, as different WBCs vary significantly in color and shape due to cell type differences, staining technique variations and the adhesion between the WBC and red blood cells. In this paper, a self-supervised learning approach, consisting of unsupervised initial segmentation and supervised segmentation refinement, is presented. The first module extracts the overall foreground region from the cell image by K-means clustering, and then generates a coarse WBC region by touching-cell splitting based on concavity analysis. The second module further uses the coarse segmentation result of the first module as automatic labels to actively train a support vector machine (SVM) classifier. Then, the trained SVM classifier is further used to classify each pixel of the image and achieve a more accurate segmentation result. To improve its segmentation accuracy, median color features representing the topological structure and a new weak edge enhancement operator (WEEO) handling fuzzy boundary are introduced. To further reduce its time cost, an efficient cluster sampling strategy is also proposed. We tested the proposed approach with two blood cell image datasets obtained under various imaging and staining conditions. The experiment results show that our approach has a superior performance of accuracy and time cost on both datasets. Copyright © 2018 Elsevier Ltd. All rights reserved.
Support Vector Machine Model for Automatic Detection and Classification of Seismic Events
NASA Astrophysics Data System (ADS)
Barros, Vesna; Barros, Lucas
2016-04-01
The automated processing of multiple seismic signals to detect, localize and classify seismic events is a central tool in both natural hazards monitoring and nuclear treaty verification. However, false detections and missed detections caused by station noise and incorrect classification of arrivals are still an issue and the events are often unclassified or poorly classified. Thus, machine learning techniques can be used in automatic processing for classifying the huge database of seismic recordings and provide more confidence in the final output. Applied in the context of the International Monitoring System (IMS) - a global sensor network developed for the Comprehensive Nuclear-Test-Ban Treaty (CTBT) - we propose a fully automatic method for seismic event detection and classification based on a supervised pattern recognition technique called the Support Vector Machine (SVM). According to Kortström et al., 2015, the advantages of using SVM are handleability of large number of features and effectiveness in high dimensional spaces. Our objective is to detect seismic events from one IMS seismic station located in an area of high seismicity and mining activity and classify them as earthquakes or quarry blasts. It is expected to create a flexible and easily adjustable SVM method that can be applied in different regions and datasets. Taken a step further, accurate results for seismic stations could lead to a modification of the model and its parameters to make it applicable to other waveform technologies used to monitor nuclear explosions such as infrasound and hydroacoustic waveforms. As an authorized user, we have direct access to all IMS data and bulletins through a secure signatory account. A set of significant seismic waveforms containing different types of events (e.g. earthquake, quarry blasts) and noise is being analysed to train the model and learn the typical pattern of the signal from these events. Moreover, comparing the performance of the support-vector network to various classical learning algorithms used before in seismic detection and classification is an essential final step to analyze the advantages and disadvantages of the model.
Efficacy Evaluation of Different Wavelet Feature Extraction Methods on Brain MRI Tumor Detection
NASA Astrophysics Data System (ADS)
Nabizadeh, Nooshin; John, Nigel; Kubat, Miroslav
2014-03-01
Automated Magnetic Resonance Imaging brain tumor detection and segmentation is a challenging task. Among different available methods, feature-based methods are very dominant. While many feature extraction techniques have been employed, it is still not quite clear which of feature extraction methods should be preferred. To help improve the situation, we present the results of a study in which we evaluate the efficiency of using different wavelet transform features extraction methods in brain MRI abnormality detection. Applying T1-weighted brain image, Discrete Wavelet Transform (DWT), Discrete Wavelet Packet Transform (DWPT), Dual Tree Complex Wavelet Transform (DTCWT), and Complex Morlet Wavelet Transform (CMWT) methods are applied to construct the feature pool. Three various classifiers as Support Vector Machine, K Nearest Neighborhood, and Sparse Representation-Based Classifier are applied and compared for classifying the selected features. The results show that DTCWT and CMWT features classified with SVM, result in the highest classification accuracy, proving of capability of wavelet transform features to be informative in this application.
Chen, Qiu-Feng; Chen, Hua-Jun; Liu, Jun; Sun, Tao; Shen, Qun-Tai
2016-01-01
Machine learning-based approaches play an important role in examining functional magnetic resonance imaging (fMRI) data in a multivariate manner and extracting features predictive of group membership. This study was performed to assess the potential for measuring brain intrinsic activity to identify minimal hepatic encephalopathy (MHE) in cirrhotic patients, using the support vector machine (SVM) method. Resting-state fMRI data were acquired in 16 cirrhotic patients with MHE and 19 cirrhotic patients without MHE. The regional homogeneity (ReHo) method was used to investigate the local synchrony of intrinsic brain activity. Psychometric Hepatic Encephalopathy Score (PHES) was used to define MHE condition. SVM-classifier was then applied using leave-one-out cross-validation, to determine the discriminative ReHo-map for MHE. The discrimination map highlights a set of regions, including the prefrontal cortex, anterior cingulate cortex, anterior insular cortex, inferior parietal lobule, precentral and postcentral gyri, superior and medial temporal cortices, and middle and inferior occipital gyri. The optimized discriminative model showed total accuracy of 82.9% and sensitivity of 81.3%. Our results suggested that a combination of the SVM approach and brain intrinsic activity measurement could be helpful for detection of MHE in cirrhotic patients.
Vehicle classification in WAMI imagery using deep network
NASA Astrophysics Data System (ADS)
Yi, Meng; Yang, Fan; Blasch, Erik; Sheaff, Carolyn; Liu, Kui; Chen, Genshe; Ling, Haibin
2016-05-01
Humans have always had a keen interest in understanding activities and the surrounding environment for mobility, communication, and survival. Thanks to recent progress in photography and breakthroughs in aviation, we are now able to capture tens of megapixels of ground imagery, namely Wide Area Motion Imagery (WAMI), at multiple frames per second from unmanned aerial vehicles (UAVs). WAMI serves as a great source for many applications, including security, urban planning and route planning. These applications require fast and accurate image understanding which is time consuming for humans, due to the large data volume and city-scale area coverage. Therefore, automatic processing and understanding of WAMI imagery has been gaining attention in both industry and the research community. This paper focuses on an essential step in WAMI imagery analysis, namely vehicle classification. That is, deciding whether a certain image patch contains a vehicle or not. We collect a set of positive and negative sample image patches, for training and testing the detector. Positive samples are 64 × 64 image patches centered on annotated vehicles. We generate two sets of negative images. The first set is generated from positive images with some location shift. The second set of negative patches is generated from randomly sampled patches. We also discard those patches if a vehicle accidentally locates at the center. Both positive and negative samples are randomly divided into 9000 training images and 3000 testing images. We propose to train a deep convolution network for classifying these patches. The classifier is based on a pre-trained AlexNet Model in the Caffe library, with an adapted loss function for vehicle classification. The performance of our classifier is compared to several traditional image classifier methods using Support Vector Machine (SVM) and Histogram of Oriented Gradient (HOG) features. While the SVM+HOG method achieves an accuracy of 91.2%, the accuracy of our deep network-based classifier reaches 97.9%.
NASA Astrophysics Data System (ADS)
Anees, Asim; Aryal, Jagannath; O'Reilly, Małgorzata M.; Gale, Timothy J.; Wardlaw, Tim
2016-12-01
A robust non-parametric framework, based on multiple Radial Basic Function (RBF) kernels, is proposed in this study, for detecting land/forest cover changes using Landsat 7 ETM+ images. One of the widely used frameworks is to find change vectors (difference image) and use a supervised classifier to differentiate between change and no-change. The Bayesian Classifiers e.g. Maximum Likelihood Classifier (MLC), Naive Bayes (NB), are widely used probabilistic classifiers which assume parametric models, e.g. Gaussian function, for the class conditional distributions. However, their performance can be limited if the data set deviates from the assumed model. The proposed framework exploits the useful properties of Least Squares Probabilistic Classifier (LSPC) formulation i.e. non-parametric and probabilistic nature, to model class posterior probabilities of the difference image using a linear combination of a large number of Gaussian kernels. To this end, a simple technique, based on 10-fold cross-validation is also proposed for tuning model parameters automatically instead of selecting a (possibly) suboptimal combination from pre-specified lists of values. The proposed framework has been tested and compared with Support Vector Machine (SVM) and NB for detection of defoliation, caused by leaf beetles (Paropsisterna spp.) in Eucalyptus nitens and Eucalyptus globulus plantations of two test areas, in Tasmania, Australia, using raw bands and band combination indices of Landsat 7 ETM+. It was observed that due to multi-kernel non-parametric formulation and probabilistic nature, the LSPC outperforms parametric NB with Gaussian assumption in change detection framework, with Overall Accuracy (OA) ranging from 93.6% (κ = 0.87) to 97.4% (κ = 0.94) against 85.3% (κ = 0.69) to 93.4% (κ = 0.85), and is more robust to changing data distributions. Its performance was comparable to SVM, with added advantages of being probabilistic and capable of handling multi-class problems naturally with its original formulation.
Arshad, Sannia; Rho, Seungmin
2014-01-01
We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes. PMID:25295302
Khalid, Shehzad; Arshad, Sannia; Jabbar, Sohail; Rho, Seungmin
2014-01-01
We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.
An Adaptive S-Method to Analyze Micro-Doppler Signals for Human Activity Classification
Yang, Chao; Xia, Yuqing; Ma, Xiaolin; Zhang, Tao; Zhou, Zhou
2017-01-01
In this paper, we propose the multiwindow Adaptive S-method (AS-method) distribution approach used in the time-frequency analysis for radar signals. Based on the results of orthogonal Hermite functions that have good time-frequency resolution, we vary the length of window to suppress the oscillating component caused by cross-terms. This method can bring a better compromise in the auto-terms concentration and cross-terms suppressing, which contributes to the multi-component signal separation. Finally, the effective micro signal is extracted by threshold segmentation and envelope extraction. To verify the proposed method, six states of motion are separated by a classifier of a support vector machine (SVM) trained to the extracted features. The trained SVM can detect a human subject with an accuracy of 95.4% for two cases without interference. PMID:29186075
An Adaptive S-Method to Analyze Micro-Doppler Signals for Human Activity Classification.
Li, Fangmin; Yang, Chao; Xia, Yuqing; Ma, Xiaolin; Zhang, Tao; Zhou, Zhou
2017-11-29
In this paper, we propose the multiwindow Adaptive S-method (AS-method) distribution approach used in the time-frequency analysis for radar signals. Based on the results of orthogonal Hermite functions that have good time-frequency resolution, we vary the length of window to suppress the oscillating component caused by cross-terms. This method can bring a better compromise in the auto-terms concentration and cross-terms suppressing, which contributes to the multi-component signal separation. Finally, the effective micro signal is extracted by threshold segmentation and envelope extraction. To verify the proposed method, six states of motion are separated by a classifier of a support vector machine (SVM) trained to the extracted features. The trained SVM can detect a human subject with an accuracy of 95.4% for two cases without interference.
Liu, Changhong; Liu, Wei; Lu, Xuzhong; Ma, Fei; Chen, Wei; Yang, Jianbo; Zheng, Lei
2014-01-01
Multispectral imaging with 19 wavelengths in the range of 405–970 nm has been evaluated for nondestructive determination of firmness, total soluble solids (TSS) content and ripeness stage in strawberry fruit. Several analysis approaches, including partial least squares (PLS), support vector machine (SVM) and back propagation neural network (BPNN), were applied to develop theoretical models for predicting the firmness and TSS of intact strawberry fruit. Compared with PLS and SVM, BPNN considerably improved the performance of multispectral imaging for predicting firmness and total soluble solids content with the correlation coefficient (r) of 0.94 and 0.83, SEP of 0.375 and 0.573, and bias of 0.035 and 0.056, respectively. Subsequently, the ability of multispectral imaging technology to classify fruit based on ripeness stage was tested using SVM and principal component analysis-back propagation neural network (PCA-BPNN) models. The higher classification accuracy of 100% was achieved using SVM model. Moreover, the results of all these models demonstrated that the VIS parts of the spectra were the main contributor to the determination of firmness, TSS content estimation and classification of ripeness stage in strawberry fruit. These results suggest that multispectral imaging, together with suitable analysis model, is a promising technology for rapid estimation of quality attributes and classification of ripeness stage in strawberry fruit. PMID:24505317
Li, Yanxin; Knoll, Joan H; Wilkins, Ruth C; Flegal, Farrah N; Rogan, Peter K
2016-05-01
Dose from radiation exposure can be estimated from dicentric chromosome (DC) frequencies in metaphase cells of peripheral blood lymphocytes. We automated DC detection by extracting features in Giemsa-stained metaphase chromosome images and classifying objects by machine learning (ML). DC detection involves (i) intensity thresholded segmentation of metaphase objects, (ii) chromosome separation by watershed transformation and elimination of inseparable chromosome clusters, fragments and staining debris using a morphological decision tree filter, (iii) determination of chromosome width and centreline, (iv) derivation of centromere candidates, and (v) distinction of DCs from monocentric chromosomes (MC) by ML. Centromere candidates are inferred from 14 image features input to a Support Vector Machine (SVM). Sixteen features derived from these candidates are then supplied to a Boosting classifier and a second SVM which determines whether a chromosome is either a DC or MC. The SVM was trained with 292 DCs and 3135 MCs, and then tested with cells exposed to either low (1 Gy) or high (2-4 Gy) radiation dose. Results were then compared with those of 3 experts. True positive rates (TPR) and positive predictive values (PPV) were determined for the tuning parameter, σ. At larger σ, PPV decreases and TPR increases. At high dose, for σ = 1.3, TPR = 0.52 and PPV = 0.83, while at σ = 1.6, the TPR = 0.65 and PPV = 0.72. At low dose and σ = 1.3, TPR = 0.67 and PPV = 0.26. The algorithm differentiates DCs from MCs, overlapped chromosomes and other objects with acceptable accuracy over a wide range of radiation exposures. © 2016 Wiley Periodicals, Inc.
A novel computer-aided detection system for pulmonary nodule identification in CT images
NASA Astrophysics Data System (ADS)
Han, Hao; Li, Lihong; Wang, Huafeng; Zhang, Hao; Moore, William; Liang, Zhengrong
2014-03-01
Computer-aided detection (CADe) of pulmonary nodules from computer tomography (CT) scans is critical for assisting radiologists to identify lung lesions at an early stage. In this paper, we propose a novel approach for CADe of lung nodules using a two-stage vector quantization (VQ) scheme. The first-stage VQ aims to extract lung from the chest volume, while the second-stage VQ is designed to extract initial nodule candidates (INCs) within the lung volume. Then rule-based expert filtering is employed to prune obvious FPs from INCs, and the commonly-used support vector machine (SVM) classifier is adopted to further reduce the FPs. The proposed system was validated on 100 CT scans randomly selected from the 262 scans that have at least one juxta-pleural nodule annotation in the publicly available database - Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI). The two-stage VQ only missed 2 out of the 207 nodules at agreement level 1, and the INCs detection for each scan took about 30 seconds in average. Expert filtering reduced FPs more than 18 times, while maintaining a sensitivity of 93.24%. As it is trivial to distinguish INCs attached to pleural wall versus not on wall, we investigated the feasibility of training different SVM classifiers to further reduce FPs from these two kinds of INCs. Experiment results indicated that SVM classification over the entire set of INCs was in favor of, where the optimal operating of our CADe system achieved a sensitivity of 89.4% at a specificity of 86.8%.
A general prediction model for the detection of ADHD and Autism using structural and functional MRI.
Sen, Bhaskar; Borle, Neil C; Greiner, Russell; Brown, Matthew R G
2018-01-01
This work presents a novel method for learning a model that can diagnose Attention Deficit Hyperactivity Disorder (ADHD), as well as Autism, using structural texture and functional connectivity features obtained from 3-dimensional structural magnetic resonance imaging (MRI) and 4-dimensional resting-state functional magnetic resonance imaging (fMRI) scans of subjects. We explore a series of three learners: (1) The LeFMS learner first extracts features from the structural MRI images using the texture-based filters produced by a sparse autoencoder. These filters are then convolved with the original MRI image using an unsupervised convolutional network. The resulting features are used as input to a linear support vector machine (SVM) classifier. (2) The LeFMF learner produces a diagnostic model by first computing spatial non-stationary independent components of the fMRI scans, which it uses to decompose each subject's fMRI scan into the time courses of these common spatial components. These features can then be used with a learner by themselves or in combination with other features to produce the model. Regardless of which approach is used, the final set of features are input to a linear support vector machine (SVM) classifier. (3) Finally, the overall LeFMSF learner uses the combined features obtained from the two feature extraction processes in (1) and (2) above as input to an SVM classifier, achieving an accuracy of 0.673 on the ADHD-200 holdout data and 0.643 on the ABIDE holdout data. Both of these results, obtained with the same LeFMSF framework, are the best known, over all hold-out accuracies on these datasets when only using imaging data-exceeding previously-published results by 0.012 for ADHD and 0.042 for Autism. Our results show that combining multi-modal features can yield good classification accuracy for diagnosis of ADHD and Autism, which is an important step towards computer-aided diagnosis of these psychiatric diseases and perhaps others as well.
NASA Astrophysics Data System (ADS)
Geelen, Christopher D.; Wijnhoven, Rob G. J.; Dubbelman, Gijs; de With, Peter H. N.
2015-03-01
This research considers gender classification in surveillance environments, typically involving low-resolution images and a large amount of viewpoint variations and occlusions. Gender classification is inherently difficult due to the large intra-class variation and interclass correlation. We have developed a gender classification system, which is successfully evaluated on two novel datasets, which realistically consider the above conditions, typical for surveillance. The system reaches a mean accuracy of up to 90% and approaches our human baseline of 92.6%, proving a high-quality gender classification system. We also present an in-depth discussion of the fundamental differences between SVM and RF classifiers. We conclude that balancing the degree of randomization in any classifier is required for the highest classification accuracy. For our problem, an RF-SVM hybrid classifier exploiting the combination of HSV and LBP features results in the highest classification accuracy of 89.9 0.2%, while classification computation time is negligible compared to the detection time of pedestrians.
NASA Astrophysics Data System (ADS)
Lawi, Armin; Adhitya, Yudhi
2018-03-01
The objective of this research is to determine the quality of cocoa beans through morphology of their digital images. Samples of cocoa beans were scattered on a bright white paper under a controlled lighting condition. A compact digital camera was used to capture the images. The images were then processed to extract their morphological parameters. Classification process begins with an analysis of cocoa beans image based on morphological feature extraction. Parameters for extraction of morphological or physical feature parameters, i.e., Area, Perimeter, Major Axis Length, Minor Axis Length, Aspect Ratio, Circularity, Roundness, Ferret Diameter. The cocoa beans are classified into 4 groups, i.e.: Normal Beans, Broken Beans, Fractured Beans, and Skin Damaged Beans. The model of classification used in this paper is the Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM), a proposed improvement model of SVM using ensemble method in which the separate hyperplanes are obtained by least square approach and the multiclass procedure uses One-Against- All method. The result of our proposed model showed that the classification with morphological feature input parameters were accurately as 99.705% for the four classes, respectively.
Using X-Ray In-Line Phase-Contrast Imaging for the Investigation of Nude Mouse Hepatic Tumors
Zhang, Lu; Luo, Shuqian
2012-01-01
The purpose of this paper is to report the noninvasive imaging of hepatic tumors without contrast agents. Both normal tissues and tumor tissues can be detected, and tumor tissues in different stages can be classified quantitatively. We implanted BEL-7402 human hepatocellular carcinoma cells into the livers of nude mice and then imaged the livers using X-ray in-line phase-contrast imaging (ILPCI). The projection images' texture feature based on gray level co-occurrence matrix (GLCM) and dual-tree complex wavelet transforms (DTCWT) were extracted to discriminate normal tissues and tumor tissues. Different stages of hepatic tumors were classified using support vector machines (SVM). Images of livers from nude mice sacrificed 6 days after inoculation with cancer cells show diffuse distribution of the tumor tissue, but images of livers from nude mice sacrificed 9, 12, or 15 days after inoculation with cancer cells show necrotic lumps in the tumor tissue. The results of the principal component analysis (PCA) of the texture features based on GLCM of normal regions were positive, but those of tumor regions were negative. The results of PCA of the texture features based on DTCWT of normal regions were greater than those of tumor regions. The values of the texture features in low-frequency coefficient images increased monotonically with the growth of the tumors. Different stages of liver tumors can be classified using SVM, and the accuracy is 83.33%. Noninvasive and micron-scale imaging can be achieved by X-ray ILPCI. We can observe hepatic tumors and small vessels from the phase-contrast images. This new imaging approach for hepatic cancer is effective and has potential use in the early detection and classification of hepatic tumors. PMID:22761929
Multi-feature classifiers for burst detection in single EEG channels from preterm infants
NASA Astrophysics Data System (ADS)
Navarro, X.; Porée, F.; Kuchenbuch, M.; Chavez, M.; Beuchée, Alain; Carrault, G.
2017-08-01
Objective. The study of electroencephalographic (EEG) bursts in preterm infants provides valuable information about maturation or prognostication after perinatal asphyxia. Over the last two decades, a number of works proposed algorithms to automatically detect EEG bursts in preterm infants, but they were designed for populations under 35 weeks of post menstrual age (PMA). However, as the brain activity evolves rapidly during postnatal life, these solutions might be under-performing with increasing PMA. In this work we focused on preterm infants reaching term ages (PMA ⩾36 weeks) using multi-feature classification on a single EEG channel. Approach. Five EEG burst detectors relying on different machine learning approaches were compared: logistic regression (LR), linear discriminant analysis (LDA), k-nearest neighbors (kNN), support vector machines (SVM) and thresholding (Th). Classifiers were trained by visually labeled EEG recordings from 14 very preterm infants (born after 28 weeks of gestation) with 36-41 weeks PMA. Main results. The most performing classifiers reached about 95% accuracy (kNN, SVM and LR) whereas Th obtained 84%. Compared to human-automatic agreements, LR provided the highest scores (Cohen’s kappa = 0.71) using only three EEG features. Applying this classifier in an unlabeled database of 21 infants ⩾36 weeks PMA, we found that long EEG bursts and short inter-burst periods are characteristic of infants with the highest PMA and weights. Significance. In view of these results, LR-based burst detection could be a suitable tool to study maturation in monitoring or portable devices using a single EEG channel.
Non-Mutually Exclusive Deep Neural Network Classifier for Combined Modes of Bearing Fault Diagnosis
Kim, Jong-Myon
2018-01-01
The simultaneous occurrence of various types of defects in bearings makes their diagnosis more challenging owing to the resultant complexity of the constituent parts of the acoustic emission (AE) signals. To address this issue, a new approach is proposed in this paper for the detection of multiple combined faults in bearings. The proposed methodology uses a deep neural network (DNN) architecture to effectively diagnose the combined defects. The DNN structure is based on the stacked denoising autoencoder non-mutually exclusive classifier (NMEC) method for combined modes. The NMEC-DNN is trained using data for a single fault and it classifies both single faults and multiple combined faults. The results of experiments conducted on AE data collected through an experimental test-bed demonstrate that the DNN achieves good classification performance with a maximum accuracy of 95%. The proposed method is compared with a multi-class classifier based on support vector machines (SVMs). The NMEC-DNN yields better diagnostic performance in comparison to the multi-class classifier based on SVM. The NMEC-DNN reduces the number of necessary data collections and improves the bearing fault diagnosis performance. PMID:29642466
The generalization ability of online SVM classification based on Markov sampling.
Xu, Jie; Yan Tang, Yuan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang
2015-03-01
In this paper, we consider online support vector machine (SVM) classification learning algorithms with uniformly ergodic Markov chain (u.e.M.c.) samples. We establish the bound on the misclassification error of an online SVM classification algorithm with u.e.M.c. samples based on reproducing kernel Hilbert spaces and obtain a satisfactory convergence rate. We also introduce a novel online SVM classification algorithm based on Markov sampling, and present the numerical studies on the learning ability of online SVM classification based on Markov sampling for benchmark repository. The numerical studies show that the learning performance of the online SVM classification algorithm based on Markov sampling is better than that of classical online SVM classification based on random sampling as the size of training samples is larger.
Wang, Yanming; Zhou, Yawen; Wang, Huijuan; Cui, Jin; Nguchu, Benedictor Alexander; Zhang, Xufei; Qiu, Bensheng; Wang, Xiaoxiao; Zhu, Mingwang
2018-05-21
The aim of this study was to automatically detect focal cortical dysplasia (FCD) lesions in patients with extratemporal lobe epilepsy by relying on diffusion tensor imaging (DTI) and T2-weighted magnetic resonance imaging (MRI) data. We implemented an automated classifier using voxel-based multimodal features to identify gray and white matter abnormalities of FCD in patient cohorts. In addition to the commonly used T2-weighted image intensity feature, DTI-based features were also utilized. A Gaussian processes for machine learning (GPML) classifier was tested on 12 patients with FCD (8 with histologically confirmed FCD) scanned at 1.5 T and cross-validated using a leave-one-out strategy. Moreover, we compared the multimodal GPML paradigm's performance with that of single modal GPML and classical support vector machine (SVM). Our results demonstrated that the GPML performance on DTI-based features (mean AUC = 0.63) matches with the GPML performance on T2-weighted image intensity feature (mean AUC = 0.64). More promisingly, GPML yielded significantly improved performance (mean AUC = 0.76) when applying DTI-based features to multimodal paradigm. Based on the results, it can also be clearly stated that the proposed GPML strategy performed better and is robust to unbalanced dataset contrary to SVM that performed poorly (AUC = 0.69). Therefore, the GPML paradigm using multimodal MRI data containing DTI modality has promising result towards detection of the FCD lesions and provides an effective direction for future researches. Copyright © 2018 Elsevier Inc. All rights reserved.
Optical biopsy using fluorescence spectroscopy for prostate cancer diagnosis
NASA Astrophysics Data System (ADS)
Wu, Binlin; Gao, Xin; Smith, Jason; Bailin, Jacob
2017-02-01
Native fluorescence spectra are acquired from fresh normal and cancerous human prostate tissues. The fluorescence data are analyzed using a multivariate analysis algorithm such as non-negative matrix factorization. The nonnegative spectral components are retrieved and attributed to the native fluorophores such as collagen, reduced nicotinamide adenine dinucleotide (NADH), and flavin adenine dinucleotide (FAD) in tissue. The retrieved weights of the components, e.g. NADH and FAD are used to estimate the relative concentrations of the native fluorophores and the redox ratio. A machine learning algorithm such as support vector machine (SVM) is used for classification to distinguish normal and cancerous tissue samples based on either the relative concentrations of NADH and FAD or the redox ratio alone. The classification performance is shown based on statistical measures such as sensitivity, specificity, and accuracy, along with the area under receiver operating characteristic (ROC) curve. A cross validation method such as leave-one-out is used to evaluate the predictive performance of the SVM classifier to avoid bias due to overfitting.
NASA Astrophysics Data System (ADS)
Shao, Yanhua; Mei, Yanying; Chu, Hongyu; Chang, Zhiyuan; He, Yuxuan; Zhan, Huayi
2018-04-01
Pedestrian detection (PD) is an important application domain in computer vision and pattern recognition. Unmanned Aerial Vehicles (UAVs) have become a major field of research in recent years. In this paper, an algorithm for a robust pedestrian detection method based on the combination of the infrared HOG (IR-HOG) feature and SVM is proposed for highly complex outdoor scenarios on the basis of airborne IR image sequences from UAV. The basic flow of our application operation is as follows. Firstly, the thermal infrared imager (TAU2-336), which was installed on our Outdoor Autonomous Searching (OAS) UAV, is used for taking pictures of the designated outdoor area. Secondly, image sequences collecting and processing were accomplished by using high-performance embedded system with Samsung ODROID-XU4 and Ubuntu as the core and operating system respectively, and IR-HOG features were extracted. Finally, the SVM is used to train the pedestrian classifier. Experiment show that, our method shows promising results under complex conditions including strong noise corruption, partial occlusion etc.
A Transcriptional Signature of Fatigue Derived from Patients with Primary Sjögren's Syndrome.
James, Katherine; Al-Ali, Shereen; Tarn, Jessica; Cockell, Simon J; Gillespie, Colin S; Hindmarsh, Victoria; Locke, James; Mitchell, Sheryl; Lendrem, Dennis; Bowman, Simon; Price, Elizabeth; Pease, Colin T; Emery, Paul; Lanyon, Peter; Hunter, John A; Gupta, Monica; Bombardieri, Michele; Sutcliffe, Nurhan; Pitzalis, Costantino; McLaren, John; Cooper, Annie; Regan, Marian; Giles, Ian; Isenberg, David; Saravanan, Vadivelu; Coady, David; Dasgupta, Bhaskar; McHugh, Neil; Young-Min, Steven; Moots, Robert; Gendi, Nagui; Akil, Mohammed; Griffiths, Bridget; Wipat, Anil; Newton, Julia; Jones, David E; Isaacs, John; Hallinan, Jennifer; Ng, Wan-Fai
2015-01-01
Fatigue is a debilitating condition with a significant impact on patients' quality of life. Fatigue is frequently reported by patients suffering from primary Sjögren's Syndrome (pSS), a chronic autoimmune condition characterised by dryness of the eyes and the mouth. However, although fatigue is common in pSS, it does not manifest in all sufferers, providing an excellent model with which to explore the potential underpinning biological mechanisms. Whole blood samples from 133 fully-phenotyped pSS patients stratified for the presence of fatigue, collected by the UK primary Sjögren's Syndrome Registry, were used for whole genome microarray. The resulting data were analysed both on a gene by gene basis and using pre-defined groups of genes. Finally, gene set enrichment analysis (GSEA) was used as a feature selection technique for input into a support vector machine (SVM) classifier. Classification was assessed using area under curve (AUC) of receiver operator characteristic and standard error of Wilcoxon statistic, SE(W). Although no genes were individually found to be associated with fatigue, 19 metabolic pathways were enriched in the high fatigue patient group using GSEA. Analysis revealed that these enrichments arose from the presence of a subset of 55 genes. A radial kernel SVM classifier with this subset of genes as input displayed significantly improved performance over classifiers using all pathway genes as input. The classifiers had AUCs of 0.866 (SE(W) 0.002) and 0.525 (SE(W) 0.006), respectively. Systematic analysis of gene expression data from pSS patients discordant for fatigue identified 55 genes which are predictive of fatigue level using SVM classification. This list represents the first step in understanding the underlying pathophysiological mechanisms of fatigue in patients with pSS.
EEG-Based Detection of Braking Intention Under Different Car Driving Conditions
Hernández, Luis G.; Mozos, Oscar Martinez; Ferrández, José M.; Antelis, Javier M.
2018-01-01
The anticipatory recognition of braking is essential to prevent traffic accidents. For instance, driving assistance systems can be useful to properly respond to emergency braking situations. Moreover, the response time to emergency braking situations can be affected and even increased by different driver's cognitive states caused by stress, fatigue, and extra workload. This work investigates the detection of emergency braking from driver's electroencephalographic (EEG) signals that precede the brake pedal actuation. Bioelectrical signals were recorded while participants were driving in a car simulator while avoiding potential collisions by performing emergency braking. In addition, participants were subjected to stress, workload, and fatigue. EEG signals were classified using support vector machines (SVM) and convolutional neural networks (CNN) in order to discriminate between braking intention and normal driving. Results showed significant recognition of emergency braking intention which was on average 71.1% for SVM and 71.8% CNN. In addition, the classification accuracy for the best participant was 80.1 and 88.1% for SVM and CNN, respectively. These results show the feasibility of incorporating recognizable driver's bioelectrical responses into advanced driver-assistance systems to carry out early detection of emergency braking situations which could be useful to reduce car accidents. PMID:29910722
Application of texture analysis method for mammogram density classification
NASA Astrophysics Data System (ADS)
Nithya, R.; Santhi, B.
2017-07-01
Mammographic density is considered a major risk factor for developing breast cancer. This paper proposes an automated approach to classify breast tissue types in digital mammogram. The main objective of the proposed Computer-Aided Diagnosis (CAD) system is to investigate various feature extraction methods and classifiers to improve the diagnostic accuracy in mammogram density classification. Texture analysis methods are used to extract the features from the mammogram. Texture features are extracted by using histogram, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Difference Matrix (GLDM), Local Binary Pattern (LBP), Entropy, Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT), Gabor transform and trace transform. These extracted features are selected using Analysis of Variance (ANOVA). The features selected by ANOVA are fed into the classifiers to characterize the mammogram into two-class (fatty/dense) and three-class (fatty/glandular/dense) breast density classification. This work has been carried out by using the mini-Mammographic Image Analysis Society (MIAS) database. Five classifiers are employed namely, Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). Experimental results show that ANN provides better performance than LDA, NB, KNN and SVM classifiers. The proposed methodology has achieved 97.5% accuracy for three-class and 99.37% for two-class density classification.
Automated diagnosis of dry eye using infrared thermography images
NASA Astrophysics Data System (ADS)
Acharya, U. Rajendra; Tan, Jen Hong; Koh, Joel E. W.; Sudarshan, Vidya K.; Yeo, Sharon; Too, Cheah Loon; Chua, Chua Kuang; Ng, E. Y. K.; Tong, Louis
2015-07-01
Dry Eye (DE) is a condition of either decreased tear production or increased tear film evaporation. Prolonged DE damages the cornea causing the corneal scarring, thinning and perforation. There is no single uniform diagnosis test available to date; combinations of diagnostic tests are to be performed to diagnose DE. The current diagnostic methods available are subjective, uncomfortable and invasive. Hence in this paper, we have developed an efficient, fast and non-invasive technique for the automated identification of normal and DE classes using infrared thermography images. The features are extracted from nonlinear method called Higher Order Spectra (HOS). Features are ranked using t-test ranking strategy. These ranked features are fed to various classifiers namely, K-Nearest Neighbor (KNN), Nave Bayesian Classifier (NBC), Decision Tree (DT), Probabilistic Neural Network (PNN), and Support Vector Machine (SVM) to select the best classifier using minimum number of features. Our proposed system is able to identify the DE and normal classes automatically with classification accuracy of 99.8%, sensitivity of 99.8%, and specificity if 99.8% for left eye using PNN and KNN classifiers. And we have reported classification accuracy of 99.8%, sensitivity of 99.9%, and specificity if 99.4% for right eye using SVM classifier with polynomial order 2 kernel.
NASA Astrophysics Data System (ADS)
Zhan, Liwei; Li, Chengwei
2017-02-01
A hybrid PSO-SVM-based model is proposed to predict the friction coefficient between aircraft tire and coating. The presented hybrid model combines a support vector machine (SVM) with particle swarm optimization (PSO) technique. SVM has been adopted to solve regression problems successfully. Its regression accuracy is greatly related to optimizing parameters such as the regularization constant C , the parameter gamma γ corresponding to RBF kernel and the epsilon parameter \\varepsilon in the SVM training procedure. However, the friction coefficient which is predicted based on SVM has yet to be explored between aircraft tire and coating. The experiment reveals that drop height and tire rotational speed are the factors affecting friction coefficient. Bearing in mind, the friction coefficient can been predicted using the hybrid PSO-SVM-based model by the measured friction coefficient between aircraft tire and coating. To compare regression accuracy, a grid search (GS) method and a genetic algorithm (GA) are used to optimize the relevant parameters (C , γ and \\varepsilon ), respectively. The regression accuracy could be reflected by the coefficient of determination ({{R}2} ). The result shows that the hybrid PSO-RBF-SVM-based model has better accuracy compared with the GS-RBF-SVM- and GA-RBF-SVM-based models. The agreement of this model (PSO-RBF-SVM) with experiment data confirms its good performance.
2012-03-01
with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random
Millennial Filipino Student Engagement Analyzer Using Facial Feature Classification
NASA Astrophysics Data System (ADS)
Manseras, R.; Eugenio, F.; Palaoag, T.
2018-03-01
Millennials has been a word of mouth of everybody and a target market of various companies nowadays. In the Philippines, they comprise one third of the total population and most of them are still in school. Having a good education system is important for this generation to prepare them for better careers. And a good education system means having quality instruction as one of the input component indicators. In a classroom environment, teachers use facial features to measure the affect state of the class. Emerging technologies like Affective Computing is one of today’s trends to improve quality instruction delivery. This, together with computer vision, can be used in analyzing affect states of the students and improve quality instruction delivery. This paper proposed a system of classifying student engagement using facial features. Identifying affect state, specifically Millennial Filipino student engagement, is one of the main priorities of every educator and this directed the authors to develop a tool to assess engagement percentage. Multiple face detection framework using Face API was employed to detect as many student faces as possible to gauge current engagement percentage of the whole class. The binary classifier model using Support Vector Machine (SVM) was primarily set in the conceptual framework of this study. To achieve the most accuracy performance of this model, a comparison of SVM to two of the most widely used binary classifiers were tested. Results show that SVM bested RandomForest and Naive Bayesian algorithms in most of the experiments from the different test datasets.
Computer-aided diagnosis of malignant mammograms using Zernike moments and SVM.
Sharma, Shubhi; Khanna, Pritee
2015-02-01
This work is directed toward the development of a computer-aided diagnosis (CAD) system to detect abnormalities or suspicious areas in digital mammograms and classify them as malignant or nonmalignant. Original mammogram is preprocessed to separate the breast region from its background. To work on the suspicious area of the breast, region of interest (ROI) patches of a fixed size of 128×128 are extracted from the original large-sized digital mammograms. For training, patches are extracted manually from a preprocessed mammogram. For testing, patches are extracted from a highly dense area identified by clustering technique. For all extracted patches corresponding to a mammogram, Zernike moments of different orders are computed and stored as a feature vector. A support vector machine (SVM) is used to classify extracted ROI patches. The experimental study shows that the use of Zernike moments with order 20 and SVM classifier gives better results among other studies. The proposed system is tested on Image Retrieval In Medical Application (IRMA) reference dataset and Digital Database for Screening Mammography (DDSM) mammogram database. On IRMA reference dataset, it attains 99% sensitivity and 99% specificity, and on DDSM mammogram database, it obtained 97% sensitivity and 96% specificity. To verify the applicability of Zernike moments as a fitting texture descriptor, the performance of the proposed CAD system is compared with the other well-known texture descriptors namely gray-level co-occurrence matrix (GLCM) and discrete cosine transform (DCT).
Evaluation of handwriting kinematics and pressure for differential diagnosis of Parkinson's disease.
Drotár, Peter; Mekyska, Jiří; Rektorová, Irena; Masarová, Lucia; Smékal, Zdeněk; Faundez-Zanuy, Marcos
2016-02-01
We present the PaHaW Parkinson's disease handwriting database, consisting of handwriting samples from Parkinson's disease (PD) patients and healthy controls. Our goal is to show that kinematic features and pressure features in handwriting can be used for the differential diagnosis of PD. The database contains records from 37 PD patients and 38 healthy controls performing eight different handwriting tasks. The tasks include drawing an Archimedean spiral, repetitively writing orthographically simple syllables and words, and writing of a sentence. In addition to the conventional kinematic features related to the dynamics of handwriting, we investigated new pressure features based on the pressure exerted on the writing surface. To discriminate between PD patients and healthy subjects, three different classifiers were compared: K-nearest neighbors (K-NN), ensemble AdaBoost classifier, and support vector machines (SVM). For predicting PD based on kinematic and pressure features of handwriting, the best performing model was SVM with classification accuracy of Pacc=81.3% (sensitivity Psen=87.4% and specificity of Pspe=80.9%). When evaluated separately, pressure features proved to be relevant for PD diagnosis, yielding Pacc=82.5% compared to Pacc=75.4% using kinematic features. Experimental results showed that an analysis of kinematic and pressure features during handwriting can help assess subtle characteristics of handwriting and discriminate between PD patients and healthy controls. Copyright © 2016 Elsevier B.V. All rights reserved.
Forest tree species clssification based on airborne hyper-spectral imagery
NASA Astrophysics Data System (ADS)
Dian, Yuanyong; Li, Zengyuan; Pang, Yong
2013-10-01
Forest precision classification products were the basic data for surveying of forest resource, updating forest subplot information, logging and design of forest. However, due to the diversity of stand structure, complexity of the forest growth environment, it's difficult to discriminate forest tree species using multi-spectral image. The airborne hyperspectral images can achieve the high spatial and spectral resolution imagery of forest canopy, so it will good for tree species level classification. The aim of this paper was to test the effective of combining spatial and spectral features in airborne hyper-spectral image classification. The CASI hyper spectral image data were acquired from Liangshui natural reserves area. Firstly, we use the MNF (minimum noise fraction) transform method for to reduce the hyperspectral image dimensionality and highlighting variation. And secondly, we use the grey level co-occurrence matrix (GLCM) to extract the texture features of forest tree canopy from the hyper-spectral image, and thirdly we fused the texture and the spectral features of forest canopy to classify the trees species using support vector machine (SVM) with different kernel functions. The results showed that when using the SVM classifier, MNF and texture-based features combined with linear kernel function can achieve the best overall accuracy which was 85.92%. It was also confirm that combine the spatial and spectral information can improve the accuracy of tree species classification.
An Efficient Algorithm for Server Thermal Fault Diagnosis Based on Infrared Image
NASA Astrophysics Data System (ADS)
Liu, Hang; Xie, Ting; Ran, Jian; Gao, Shan
2017-10-01
It is essential for a data center to maintain server security and stability. Long-time overload operation or high room temperature may cause service disruption even a server crash, which would result in great economic loss for business. Currently, the methods to avoid server outages are monitoring and forecasting. Thermal camera can provide fine texture information for monitoring and intelligent thermal management in large data center. This paper presents an efficient method for server thermal fault monitoring and diagnosis based on infrared image. Initially thermal distribution of server is standardized and the interest regions of the image are segmented manually. Then the texture feature, Hu moments feature as well as modified entropy feature are extracted from the segmented regions. These characteristics are applied to analyze and classify thermal faults, and then make efficient energy-saving thermal management decisions such as job migration. For the larger feature space, the principal component analysis is employed to reduce the feature dimensions, and guarantee high processing speed without losing the fault feature information. Finally, different feature vectors are taken as input for SVM training, and do the thermal fault diagnosis after getting the optimized SVM classifier. This method supports suggestions for optimizing data center management, it can improve air conditioning efficiency and reduce the energy consumption of the data center. The experimental results show that the maximum detection accuracy is 81.5%.
Pedestrian detection based on redundant wavelet transform
NASA Astrophysics Data System (ADS)
Huang, Lin; Ji, Liping; Hu, Ping; Yang, Tiejun
2016-10-01
Intelligent video surveillance is to analysis video or image sequences captured by a fixed or mobile surveillance camera, including moving object detection, segmentation and recognition. By using it, we can be notified immediately in an abnormal situation. Pedestrian detection plays an important role in an intelligent video surveillance system, and it is also a key technology in the field of intelligent vehicle. So pedestrian detection has very vital significance in traffic management optimization, security early warn and abnormal behavior detection. Generally, pedestrian detection can be summarized as: first to estimate moving areas; then to extract features of region of interest; finally to classify using a classifier. Redundant wavelet transform (RWT) overcomes the deficiency of shift variant of discrete wavelet transform, and it has better performance in motion estimation when compared to discrete wavelet transform. Addressing the problem of the detection of multi-pedestrian with different speed, we present an algorithm of pedestrian detection based on motion estimation using RWT, combining histogram of oriented gradients (HOG) and support vector machine (SVM). Firstly, three intensities of movement (IoM) are estimated using RWT and the corresponding areas are segmented. According to the different IoM, a region proposal (RP) is generated. Then, the features of a RP is extracted using HOG. Finally, the features are fed into a SVM trained by pedestrian databases and the final detection results are gained. Experiments show that the proposed algorithm can detect pedestrians accurately and efficiently.
Automatic classification of seismic events within a regional seismograph network
NASA Astrophysics Data System (ADS)
Tiira, Timo; Kortström, Jari; Uski, Marja
2015-04-01
A fully automatic method for seismic event classification within a sparse regional seismograph network is presented. The tool is based on a supervised pattern recognition technique, Support Vector Machine (SVM), trained here to distinguish weak local earthquakes from a bulk of human-made or spurious seismic events. The classification rules rely on differences in signal energy distribution between natural and artificial seismic sources. Seismic records are divided into four windows, P, P coda, S, and S coda. For each signal window STA is computed in 20 narrow frequency bands between 1 and 41 Hz. The 80 discrimination parameters are used as a training data for the SVM. The SVM models are calculated for 19 on-line seismic stations in Finland. The event data are compiled mainly from fully automatic event solutions that are manually classified after automatic location process. The station-specific SVM training events include 11-302 positive (earthquake) and 227-1048 negative (non-earthquake) examples. The best voting rules for combining results from different stations are determined during an independent testing period. Finally, the network processing rules are applied to an independent evaluation period comprising 4681 fully automatic event determinations, of which 98 % have been manually identified as explosions or noise and 2 % as earthquakes. The SVM method correctly identifies 94 % of the non-earthquakes and all the earthquakes. The results imply that the SVM tool can identify and filter out blasts and spurious events from fully automatic event solutions with a high level of confidence. The tool helps to reduce work-load in manual seismic analysis by leaving only ~5 % of the automatic event determinations, i.e. the probable earthquakes for more detailed seismological analysis. The approach presented is easy to adjust to requirements of a denser or wider high-frequency network, once enough training examples for building a station-specific data set are available.
Chen, Zhenyu; Li, Jianping; Wei, Liwei
2007-10-01
Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.
Explosive hazard detection using MIMO forward-looking ground penetrating radar
NASA Astrophysics Data System (ADS)
Shaw, Darren; Ho, K. C.; Stone, Kevin; Keller, James M.; Popescu, Mihail; Anderson, Derek T.; Luke, Robert H.; Burns, Brian
2015-05-01
This paper proposes a machine learning algorithm for subsurface object detection on multiple-input-multiple-output (MIMO) forward-looking ground-penetrating radar (FLGPR). By detecting hazards using FLGPR, standoff distances of up to tens of meters can be acquired, but this is at the degradation of performance due to high false alarm rates. The proposed system utilizes an anomaly detection prescreener to identify potential object locations. Alarm locations have multiple one-dimensional (ML) spectral features, two-dimensional (2D) spectral features, and log-Gabor statistic features extracted. The ability of these features to reduce the number of false alarms and increase the probability of detection is evaluated for both co-polarizations present in the Akela MIMO array. Classification is performed by a Support Vector Machine (SVM) with lane-based cross-validation for training and testing. Class imbalance and optimized SVM kernel parameters are considered during classifier training.
Discrimination of transgenic soybean seeds by terahertz spectroscopy
NASA Astrophysics Data System (ADS)
Liu, Wei; Liu, Changhong; Chen, Feng; Yang, Jianbo; Zheng, Lei
2016-10-01
Discrimination of genetically modified organisms is increasingly demanded by legislation and consumers worldwide. The feasibility of a non-destructive discrimination of glyphosate-resistant and conventional soybean seeds and their hybrid descendants was examined by terahertz time-domain spectroscopy system combined with chemometrics. Principal component analysis (PCA), least squares-support vector machines (LS-SVM) and PCA-back propagation neural network (PCA-BPNN) models with the first and second derivative and standard normal variate (SNV) transformation pre-treatments were applied to classify soybean seeds based on genotype. Results demonstrated clear differences among glyphosate-resistant, hybrid descendants and conventional non-transformed soybean seeds could easily be visualized with an excellent classification (accuracy was 88.33% in validation set) using the LS-SVM and the spectra with SNV pre-treatment. The results indicated that THz spectroscopy techniques together with chemometrics would be a promising technique to distinguish transgenic soybean seeds from non-transformed seeds with high efficiency and without any major sample preparation.
NASA Astrophysics Data System (ADS)
Vaudour, Emmanuelle; Baghdadi, Nicolas; Gilliot, Jean-Marc
2014-05-01
Tillage operations (TOs) affect nutrient uptake, carbon sequestration, water and CO2 exchanges in soil, and therefore impact soil ecology together with biophysical processes such as soil erosion, leaching, run-off and infiltration. They are critical for parameterizing complex dynamic models of carbon and nitrogen. This study done in the framework of the Prostock-Gessol3 project presents an approach for mapping TOs of bare agricultural fields over a peri-urban area characterized by conventional tillage system in the western suburbs of Paris (France), combining synchronous SPOT4 and ENVISAT/ASAR images (HH and HV polarizations). Spatial modeling relied on 57 reference within-field areas named 'reference zones' (RZs) homogeneous for their soil properties, constructed in the vicinity of 57 roughness measurement locations and spread across 20 agricultural fields for which TOs were known. Soil roughness expressed as the standard deviation of surface height (Hrms) was estimated on the ground with a fully automatic photogrammetric method based on the processing of a set of overlapping pictures taken from different viewpoints from a simple digital camera all around a rectangular frame. The relationship was studied between the mean backscattering coefficient of the ASAR image and Hrms choosing a limited set of 28 RZs, on which successive random selections of training/validating RZs were then performed; the remaining 29 RZs were kept for validating the final map results. Six supervised per-pixel classifiers were used in order to map 2 TOs classes (seedbed&harrowed and late winter plough) in addition to 4 landuse classes (forest, urban,crops and grass, water bodies): support vector machine with polynomial kernel (pSVM), SVM with radial basis kernel (rSVM), artificial neural network (ANN), Maximum Likelihood (ML), regression tree (RT), and random forests (RF). All 6 classifiers were implemented in a bootstrapping approach in order to assess the uncertainty of map results. The best results were obtained with pSVM for the SPOT4/ASAR pair with producer's and user's mean validation accuracies of 91.7%/89.8% and 73.2%/73.3% for seedbed&harrowed and late winter plough conditions, respectively. Whatever classifier, the SPOT4/ASAR pair appeared to perform better than each of the single images, particularly for late winter plough: producer's and user's mean validation accuracies were 61.6%/53.0% for the single SPOT4 image; 0%/6% for the single ASAR image. About 73% of the validation agricultural fields (79% of the RZs) were correctly predicted in terms of TOs in the best pSVM-derived final map. Final map results could be improved through masking non-agricultural areas with land use identification system layer prior to classifying images. These results show promises for further monitoring of TOs within a crop production campaign, in the perspective of the ESA Sentinel-1/2 time series. Such knowledge of cultural operations is likely to facilitate the mapping of agricultural systems which otherwise proceed from time-consuming surveys to farmers.
Moghaddasi, Hanie; Nourian, Saeed
2016-06-01
Heart disease is the major cause of death as well as a leading cause of disability in the developed countries. Mitral Regurgitation (MR) is a common heart disease which does not cause symptoms until its end stage. Therefore, early diagnosis of the disease is of crucial importance in the treatment process. Echocardiography is a common method of diagnosis in the severity of MR. Hence, a method which is based on echocardiography videos, image processing techniques and artificial intelligence could be helpful for clinicians, especially in borderline cases. In this paper, we introduce novel features to detect micro-patterns of echocardiography images in order to determine the severity of MR. Extensive Local Binary Pattern (ELBP) and Extensive Volume Local Binary Pattern (EVLBP) are presented as image descriptors which include details from different viewpoints of the heart in feature vectors. Support Vector Machine (SVM), Linear Discriminant Analysis (LDA) and Template Matching techniques are used as classifiers to determine the severity of MR based on textural descriptors. The SVM classifier with Extensive Uniform Local Binary Pattern (ELBPU) and Extensive Volume Local Binary Pattern (EVLBP) have the best accuracy with 99.52%, 99.38%, 99.31% and 99.59%, respectively, for the detection of Normal, Mild MR, Moderate MR and Severe MR subjects among echocardiography videos. The proposed method achieves 99.38% sensitivity and 99.63% specificity for the detection of the severity of MR and normal subjects. Copyright © 2016 Elsevier Ltd. All rights reserved.
2013-01-01
Background Plastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning. Results In this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, Nterminal-Center-Cterminal composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction accuracy for distinguishing plastid vs. non-plastids and only 20% in classifying various plastid-types, indicating the need and importance of machine learning algorithms. Conclusion The current work is a first attempt to develop a methodology for classifying various plastid-type proteins. The prediction modules have also been made available as a web tool, PLpred available at http://bioinfo.okstate.edu/PLpred/ for real time identification/characterization. We believe this tool will be very useful in the functional annotation of various genomes. PMID:24266945
Zhai, Xiaolong; Jelfs, Beth; Chan, Rosa H. M.; Tin, Chung
2017-01-01
Hand movement classification based on surface electromyography (sEMG) pattern recognition is a promising approach for upper limb neuroprosthetic control. However, maintaining day-to-day performance is challenged by the non-stationary nature of sEMG in real-life operation. In this study, we propose a self-recalibrating classifier that can be automatically updated to maintain a stable performance over time without the need for user retraining. Our classifier is based on convolutional neural network (CNN) using short latency dimension-reduced sEMG spectrograms as inputs. The pretrained classifier is recalibrated routinely using a corrected version of the prediction results from recent testing sessions. Our proposed system was evaluated with the NinaPro database comprising of hand movement data of 40 intact and 11 amputee subjects. Our system was able to achieve ~10.18% (intact, 50 movement types) and ~2.99% (amputee, 10 movement types) increase in classification accuracy averaged over five testing sessions with respect to the unrecalibrated classifier. When compared with a support vector machine (SVM) classifier, our CNN-based system consistently showed higher absolute performance and larger improvement as well as more efficient training. These results suggest that the proposed system can be a useful tool to facilitate long-term adoption of prosthetics for amputees in real-life applications. PMID:28744189
Zhai, Xiaolong; Jelfs, Beth; Chan, Rosa H M; Tin, Chung
2017-01-01
Hand movement classification based on surface electromyography (sEMG) pattern recognition is a promising approach for upper limb neuroprosthetic control. However, maintaining day-to-day performance is challenged by the non-stationary nature of sEMG in real-life operation. In this study, we propose a self-recalibrating classifier that can be automatically updated to maintain a stable performance over time without the need for user retraining. Our classifier is based on convolutional neural network (CNN) using short latency dimension-reduced sEMG spectrograms as inputs. The pretrained classifier is recalibrated routinely using a corrected version of the prediction results from recent testing sessions. Our proposed system was evaluated with the NinaPro database comprising of hand movement data of 40 intact and 11 amputee subjects. Our system was able to achieve ~10.18% (intact, 50 movement types) and ~2.99% (amputee, 10 movement types) increase in classification accuracy averaged over five testing sessions with respect to the unrecalibrated classifier. When compared with a support vector machine (SVM) classifier, our CNN-based system consistently showed higher absolute performance and larger improvement as well as more efficient training. These results suggest that the proposed system can be a useful tool to facilitate long-term adoption of prosthetics for amputees in real-life applications.
Feature selection and classification of multiparametric medical images using bagging and SVM
NASA Astrophysics Data System (ADS)
Fan, Yong; Resnick, Susan M.; Davatzikos, Christos
2008-03-01
This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.
Real-time image annotation by manifold-based biased Fisher discriminant analysis
NASA Astrophysics Data System (ADS)
Ji, Rongrong; Yao, Hongxun; Wang, Jicheng; Sun, Xiaoshuai; Liu, Xianming
2008-01-01
Automatic Linguistic Annotation is a promising solution to bridge the semantic gap in content-based image retrieval. However, two crucial issues are not well addressed in state-of-art annotation algorithms: 1. The Small Sample Size (3S) problem in keyword classifier/model learning; 2. Most of annotation algorithms can not extend to real-time online usage due to their low computational efficiencies. This paper presents a novel Manifold-based Biased Fisher Discriminant Analysis (MBFDA) algorithm to address these two issues by transductive semantic learning and keyword filtering. To address the 3S problem, Co-Training based Manifold learning is adopted for keyword model construction. To achieve real-time annotation, a Bias Fisher Discriminant Analysis (BFDA) based semantic feature reduction algorithm is presented for keyword confidence discrimination and semantic feature reduction. Different from all existing annotation methods, MBFDA views image annotation from a novel Eigen semantic feature (which corresponds to keywords) selection aspect. As demonstrated in experiments, our manifold-based biased Fisher discriminant analysis annotation algorithm outperforms classical and state-of-art annotation methods (1.K-NN Expansion; 2.One-to-All SVM; 3.PWC-SVM) in both computational time and annotation accuracy with a large margin.
Guo, Lei; Abbosh, Amin
2018-05-01
For any chance for stroke patients to survive, the stroke type should be classified to enable giving medication within a few hours of the onset of symptoms. In this paper, a microwave-based stroke localization and classification framework is proposed. It is based on microwave tomography, k-means clustering, and a support vector machine (SVM) method. The dielectric profile of the brain is first calculated using the Born iterative method, whereas the amplitude of the dielectric profile is then taken as the input to k-means clustering. The cluster is selected as the feature vector for constructing and testing the SVM. A database of MRI-derived realistic head phantoms at different signal-to-noise ratios is used in the classification procedure. The performance of the proposed framework is evaluated using the receiver operating characteristic (ROC) curve. The results based on a two-dimensional framework show that 88% classification accuracy, with a sensitivity of 91% and a specificity of 87%, can be achieved. Bioelectromagnetics. 39:312-324, 2018. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.
Cao, D-S; Zhao, J-C; Yang, Y-N; Zhao, C-X; Yan, J; Liu, S; Hu, Q-N; Xu, Q-S; Liang, Y-Z
2012-01-01
There is a great need to assess the harmful effects or toxicities of chemicals to which man is exposed. In the present paper, the simplified molecular input line entry specification (SMILES) representation-based string kernel, together with the state-of-the-art support vector machine (SVM) algorithm, were used to classify the toxicity of chemicals from the US Environmental Protection Agency Distributed Structure-Searchable Toxicity (DSSTox) database network. In this method, the molecular structure can be directly encoded by a series of SMILES substrings that represent the presence of some chemical elements and different kinds of chemical bonds (double, triple and stereochemistry) in the molecules. Thus, SMILES string kernel can accurately and directly measure the similarities of molecules by a series of local information hidden in the molecules. Two model validation approaches, five-fold cross-validation and independent validation set, were used for assessing the predictive capability of our developed models. The results obtained indicate that SVM based on the SMILES string kernel can be regarded as a very promising and alternative modelling approach for potential toxicity prediction of chemicals.
Batres-Mendoza, Patricia; Montoro-Sanjose, Carlos R; Guerra-Hernandez, Erick I; Almanza-Ojeda, Dora L; Rostro-Gonzalez, Horacio; Romero-Troncoso, Rene J; Ibarra-Manzano, Mario A
2016-03-05
Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG) signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA) to represent EEG signals obtained using a brain-computer interface (BCI) device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT), support vector machine (SVM) and k-nearest neighbor (KNN) techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states.
Batres-Mendoza, Patricia; Montoro-Sanjose, Carlos R.; Guerra-Hernandez, Erick I.; Almanza-Ojeda, Dora L.; Rostro-Gonzalez, Horacio; Romero-Troncoso, Rene J.; Ibarra-Manzano, Mario A.
2016-01-01
Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG) signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA) to represent EEG signals obtained using a brain-computer interface (BCI) device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT), support vector machine (SVM) and k-nearest neighbor (KNN) techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states. PMID:26959029
Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines
del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J.; Raboso, Mariano
2015-01-01
Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation—based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking—to reduce the dimensions of images—and binarization—to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements. PMID:26091392
Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines.
del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J; Raboso, Mariano
2015-06-17
Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation-based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking-to reduce the dimensions of images-and binarization-to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements.
An online semi-supervised brain-computer interface.
Gu, Zhenghui; Yu, Zhuliang; Shen, Zhifang; Li, Yuanqing
2013-09-01
Practical brain-computer interface (BCI) systems should require only low training effort for the user, and the algorithms used to classify the intent of the user should be computationally efficient. However, due to inter- and intra-subject variations in EEG signal, intermittent training/calibration is often unavoidable. In this paper, we present an online semi-supervised P300 BCI speller system. After a short initial training (around or less than 1 min in our experiments), the system is switched to a mode where the user can input characters through selective attention. In this mode, a self-training least squares support vector machine (LS-SVM) classifier is gradually enhanced in back end with the unlabeled EEG data collected online after every character input. In this way, the classifier is gradually enhanced. Even though the user may experience some errors in input at the beginning due to the small initial training dataset, the accuracy approaches that of fully supervised method in a few minutes. The algorithm based on LS-SVM and its sequential update has low computational complexity; thus, it is suitable for online applications. The effectiveness of the algorithm has been validated through data analysis on BCI Competition III dataset II (P300 speller BCI data). The performance of the online system was evaluated through experimental results on eight healthy subjects, where all of them achieved the spelling accuracy of 85 % or above within an average online semi-supervised learning time of around 3 min.
Lim, Dong Kyu; Long, Nguyen Phuoc; Mo, Changyeun; Dong, Ziyuan; Cui, Lingmei; Kim, Giyoung; Kwon, Sung Won
2017-10-01
The mixing of extraneous ingredients with original products is a common adulteration practice in food and herbal medicines. In particular, authenticity of white rice and its corresponding blended products has become a key issue in food industry. Accordingly, our current study aimed to develop and evaluate a novel discrimination method by combining targeted lipidomics with powerful supervised learning methods, and eventually introduce a platform to verify the authenticity of white rice. A total of 30 cultivars were collected, and 330 representative samples of white rice from Korea and China as well as seven mixing ratios were examined. Random forests (RF), support vector machines (SVM) with a radial basis function kernel, C5.0, model averaged neural network, and k-nearest neighbor classifiers were used for the classification. We achieved desired results, and the classifiers effectively differentiated white rice from Korea to blended samples with high prediction accuracy for the contamination ratio as low as five percent. In addition, RF and SVM classifiers were generally superior to and more robust than the other techniques. Our approach demonstrated that the relative differences in lysoGPLs can be successfully utilized to detect the adulterated mixing of white rice originating from different countries. In conclusion, the present study introduces a novel and high-throughput platform that can be applied to authenticate adulterated admixtures from original white rice samples. Copyright © 2017 Elsevier Ltd. All rights reserved.
A novel and efficient technique for identification and classification of GPCRs.
Gupta, Ravi; Mittal, Ankush; Singh, Kuldip
2008-07-01
G-protein coupled receptors (GPCRs) play a vital role in different biological processes, such as regulation of growth, death, and metabolism of cells. GPCRs are the focus of significant amount of current pharmaceutical research since they interact with more than 50% of prescription drugs. The dipeptide-based support vector machine (SVM) approach is the most accurate technique to identify and classify the GPCRs. However, this approach has two major disadvantages. First, the dimension of dipeptide-based feature vector is equal to 400. The large dimension makes the classification task computationally and memory wise inefficient. Second, it does not consider the biological properties of protein sequence for identification and classification of GPCRs. In this paper, we present a novel-feature-based SVM classification technique. The novel features are derived by applying wavelet-based time series analysis approach on protein sequences. The proposed feature space summarizes the variance information of seven important biological properties of amino acids in a protein sequence. In addition, the dimension of the feature vector for proposed technique is equal to 35. Experiments were performed on GPCRs protein sequences available at GPCRs Database. Our approach achieves an accuracy of 99.9%, 98.06%, 97.78%, and 94.08% for GPCR superfamily, families, subfamilies, and subsubfamilies (amine group), respectively, when evaluated using fivefold cross-validation. Further, an accuracy of 99.8%, 97.26%, and 97.84% was obtained when evaluated on unseen or recall datasets of GPCR superfamily, families, and subfamilies, respectively. Comparison with dipeptide-based SVM technique shows the effectiveness of our approach.
NASA Astrophysics Data System (ADS)
Hekmatmanesh, Amin; Jamaloo, Fatemeh; Wu, Huapeng; Handroos, Heikki; Kilpeläinen, Asko
2018-04-01
Brain Computer Interface (BCI) can be a challenge for developing of robotic, prosthesis and human-controlled systems. This work focuses on the implementation of a common spatial pattern (CSP) base algorithm to detect event related desynchronization patterns. Utilizing famous previous work in this area, features are extracted by filter bank with common spatial pattern (FBCSP) method, and then weighted by a sensitive learning vector quantization (SLVQ) algorithm. In the current work, application of the radial basis function (RBF) as a mapping kernel of linear discriminant analysis (KLDA) method on the weighted features, allows the transfer of data into a higher dimension for more discriminated data scattering by RBF kernel. Afterwards, support vector machine (SVM) with generalized radial basis function (GRBF) kernel is employed to improve the efficiency and robustness of the classification. Averagely, 89.60% accuracy and 74.19% robustness are achieved. BCI Competition III, Iva data set is used to evaluate the algorithm for detecting right hand and foot imagery movement patterns. Results show that combination of KLDA with SVM-GRBF classifier makes 8.9% and 14.19% improvements in accuracy and robustness, respectively. For all the subjects, it is concluded that mapping the CSP features into a higher dimension by RBF and utilization GRBF as a kernel of SVM, improve the accuracy and reliability of the proposed method.
Epileptic seizure detection in EEG signal using machine learning techniques.
Jaiswal, Abeg Kumar; Banka, Haider
2018-03-01
Epilepsy is a well-known nervous system disorder characterized by seizures. Electroencephalograms (EEGs), which capture brain neural activity, can detect epilepsy. Traditional methods for analyzing an EEG signal for epileptic seizure detection are time-consuming. Recently, several automated seizure detection frameworks using machine learning technique have been proposed to replace these traditional methods. The two basic steps involved in machine learning are feature extraction and classification. Feature extraction reduces the input pattern space by keeping informative features and the classifier assigns the appropriate class label. In this paper, we propose two effective approaches involving subpattern based PCA (SpPCA) and cross-subpattern correlation-based PCA (SubXPCA) with Support Vector Machine (SVM) for automated seizure detection in EEG signals. Feature extraction was performed using SpPCA and SubXPCA. Both techniques explore the subpattern correlation of EEG signals, which helps in decision-making process. SVM is used for classification of seizure and non-seizure EEG signals. The SVM was trained with radial basis kernel. All the experiments have been carried out on the benchmark epilepsy EEG dataset. The entire dataset consists of 500 EEG signals recorded under different scenarios. Seven different experimental cases for classification have been conducted. The classification accuracy was evaluated using tenfold cross validation. The classification results of the proposed approaches have been compared with the results of some of existing techniques proposed in the literature to establish the claim.
NASA Astrophysics Data System (ADS)
Wu, W.; Chen, G. Y.; Kang, R.; Xia, J. C.; Huang, Y. P.; Chen, K. J.
2017-07-01
During slaughtering and further processing, chicken carcasses are inevitably contaminated by microbial pathogen contaminants. Due to food safety concerns, many countries implement a zero-tolerance policy that forbids the placement of visibly contaminated carcasses in ice-water chiller tanks during processing. Manual detection of contaminants is labor consuming and imprecise. Here, a successive projections algorithm (SPA)-multivariable linear regression (MLR) classifier based on an optimal performance threshold was developed for automatic detection of contaminants on chicken carcasses. Hyperspectral images were obtained using a hyperspectral imaging system. A regression model of the classifier was established by MLR based on twelve characteristic wavelengths (505, 537, 561, 562, 564, 575, 604, 627, 656, 665, 670, and 689 nm) selected by SPA , and the optimal threshold T = 1 was obtained from the receiver operating characteristic (ROC) analysis. The SPA-MLR classifier provided the best detection results when compared with the SPA-partial least squares (PLS) regression classifier and the SPA-least squares supported vector machine (LS-SVM) classifier. The true positive rate (TPR) of 100% and the false positive rate (FPR) of 0.392% indicate that the SPA-MLR classifier can utilize spatial and spectral information to effectively detect contaminants on chicken carcasses.
NASA Astrophysics Data System (ADS)
Pietrzyk, Mariusz W.; Donovan, Tim; Brennan, Patrick C.; Dix, Alan; Manning, David J.
2011-03-01
Aim: To optimize automated classification of radiological errors during lung nodule detection from chest radiographs (CxR) using a support vector machine (SVM) run on the spatial frequency features extracted from the local background of selected regions. Background: The majority of the unreported pulmonary nodules are visually detected but not recognized; shown by the prolonged dwell time values at false-negative regions. Similarly, overestimated nodule locations are capturing substantial amounts of foveal attention. Spatial frequency properties of selected local backgrounds are correlated with human observer responses either in terms of accuracy in indicating abnormality position or in the precision of visual sampling the medical images. Methods: Seven radiologists participated in the eye tracking experiments conducted under conditions of pulmonary nodule detection from a set of 20 postero-anterior CxR. The most dwelled locations have been identified and subjected to spatial frequency (SF) analysis. The image-based features of selected ROI were extracted with un-decimated Wavelet Packet Transform. An analysis of variance was run to select SF features and a SVM schema was implemented to classify False-Negative and False-Positive from all ROI. Results: A relative high overall accuracy was obtained for each individually developed Wavelet-SVM algorithm, with over 90% average correct ratio for errors recognition from all prolonged dwell locations. Conclusion: The preliminary results show that combined eye-tracking and image-based features can be used for automated detection of radiological error with SVM. The work is still in progress and not all analytical procedures have been completed, which might have an effect on the specificity of the algorithm.
NASA Astrophysics Data System (ADS)
Lartizien, Carole; Marache-Francisco, Simon; Prost, Rémy
2012-02-01
Positron emission tomography (PET) using fluorine-18 deoxyglucose (18F-FDG) has become an increasingly recommended tool in clinical whole-body oncology imaging for the detection, diagnosis, and follow-up of many cancers. One way to improve the diagnostic utility of PET oncology imaging is to assist physicians facing difficult cases of residual or low-contrast lesions. This study aimed at evaluating different schemes of computer-aided detection (CADe) systems for the guided detection and localization of small and low-contrast lesions in PET. These systems are based on two supervised classifiers, linear discriminant analysis (LDA) and the nonlinear support vector machine (SVM). The image feature sets that serve as input data consisted of the coefficients of an undecimated wavelet transform. An optimization study was conducted to select the best combination of parameters for both the SVM and the LDA. Different false-positive reduction (FPR) methods were evaluated to reduce the number of false-positive detections per image (FPI). This includes the removal of small detected clusters and the combination of the LDA and SVM detection maps. The different CAD schemes were trained and evaluated based on a simulated whole-body PET image database containing 250 abnormal cases with 1230 lesions and 250 normal cases with no lesion. The detection performance was measured on a separate series of 25 testing images with 131 lesions. The combination of the LDA and SVM score maps was shown to produce very encouraging detection performance for both the lung lesions, with 91% sensitivity and 18 FPIs, and the liver lesions, with 94% sensitivity and 10 FPIs. Comparison with human performance indicated that the different CAD schemes significantly outperformed human detection sensitivities, especially regarding the low-contrast lesions.
Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat
2016-12-22
The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.
Guo, Shengwen; Lai, Chunren; Wu, Congling; Cen, Guiyin
2017-01-01
Neuroimaging measurements derived from magnetic resonance imaging provide important information required for detecting changes related to the progression of mild cognitive impairment (MCI). Cortical features and changes play a crucial role in revealing unique anatomical patterns of brain regions, and further differentiate MCI patients from normal states. Four cortical features, namely, gray matter volume, cortical thickness, surface area, and mean curvature, were explored for discriminative analysis among three groups including the stable MCI (sMCI), the converted MCI (cMCI), and the normal control (NC) groups. In this study, 158 subjects (72 NC, 46 sMCI, and 40 cMCI) were selected from the Alzheimer's Disease Neuroimaging Initiative. A sparse-constrained regression model based on the l2-1-norm was introduced to reduce the feature dimensionality and retrieve essential features for the discrimination of the three groups by using a support vector machine (SVM). An optimized strategy of feature addition based on the weight of each feature was adopted for the SVM classifier in order to achieve the best classification performance. The baseline cortical features combined with the longitudinal measurements for 2 years of follow-up data yielded prominent classification results. In particular, the cortical thickness produced a classification with 98.84% accuracy, 97.5% sensitivity, and 100% specificity for the sMCI-cMCI comparison; 92.37% accuracy, 84.78% sensitivity, and 97.22% specificity for the cMCI-NC comparison; and 93.75% accuracy, 92.5% sensitivity, and 94.44% specificity for the sMCI-NC comparison. The best performances obtained by the SVM classifier using the essential features were 5-40% more than those using all of the retained features. The feasibility of the cortical features for the recognition of anatomical patterns was certified; thus, the proposed method has the potential to improve the clinical diagnosis of sub-types of MCI and predict the risk of its conversion to Alzheimer's disease.
New KF-PP-SVM classification method for EEG in brain-computer interfaces.
Yang, Banghua; Han, Zhijun; Zan, Peng; Wang, Qian
2014-01-01
Classification methods are a crucial direction in the current study of brain-computer interfaces (BCIs). To improve the classification accuracy for electroencephalogram (EEG) signals, a novel KF-PP-SVM (kernel fisher, posterior probability, and support vector machine) classification method is developed. Its detailed process entails the use of common spatial patterns to obtain features, based on which the within-class scatter is calculated. Then the scatter is added into the kernel function of a radial basis function to construct a new kernel function. This new kernel is integrated into the SVM to obtain a new classification model. Finally, the output of SVM is calculated based on posterior probability and the final recognition result is obtained. To evaluate the effectiveness of the proposed KF-PP-SVM method, EEG data collected from laboratory are processed with four different classification schemes (KF-PP-SVM, KF-SVM, PP-SVM, and SVM). The results showed that the overall average improvements arising from the use of the KF-PP-SVM scheme as opposed to KF-SVM, PP-SVM and SVM schemes are 2.49%, 5.83 % and 6.49 % respectively.
Park, Jae Hong; Kim, Chang-Eop; Shin, Jaewoo; Im, Changkyun; Koh, Chin Su; Seo, In Seok; Kim, Sang Jeong; Shin, Hyung-Cheul
2013-10-01
Chronic monitoring of the state of the bladder can be used to notify patients with urinary dysfunction when the bladder should be voided. Given that many spinal neurons respond both to somatic and visceral inputs, it is necessary to extract bladder information selectively from the spinal cord. Here, we hypothesize that sensory information with distinct modalities should be represented by the distinct ensemble activity patterns within the neuronal population and, therefore, analyzing the activity patterns of the neuronal population could distinguish bladder fullness from somatic stimuli. We simultaneously recorded 26-27 single unit activities in response to bladder distension or tactile stimuli in the dorsal spinal cord of each Sprague-Dawley rat. In order to discriminate between bladder fullness and tactile stimulus inputs, we analyzed the ensemble activity patterns of the entire neuronal population. A support vector machine (SVM) was employed as a classifier, and discrimination performance was measured by k-fold cross-validation tests. Most of the units responding to bladder fullness also responded to the tactile stimuli (88.9-100%). The SVM classifier precisely distinguished the bladder fullness from the somatic input (100%), indicating that the ensemble activity patterns of the unit population in the spinal cord are distinct enough to identify the current input modality. Moreover, our ensemble activity pattern-based classifier showed high robustness against random losses of signals. This study is the first to demonstrate that the two main issues of electroneurographic monitoring of bladder fullness, low signals and selectiveness, can be solved by an ensemble activity pattern-based approach, improving the feasibility of chronic monitoring of bladder fullness by neural recording.
Verma, Gyanendra K; Tiwary, Uma Shanker
2014-11-15
The purpose of this paper is twofold: (i) to investigate the emotion representation models and find out the possibility of a model with minimum number of continuous dimensions and (ii) to recognize and predict emotion from the measured physiological signals using multiresolution approach. The multimodal physiological signals are: Electroencephalogram (EEG) (32 channels) and peripheral (8 channels: Galvanic skin response (GSR), blood volume pressure, respiration pattern, skin temperature, electromyogram (EMG) and electrooculogram (EOG)) as given in the DEAP database. We have discussed the theories of emotion modeling based on i) basic emotions, ii) cognitive appraisal and physiological response approach and iii) the dimensional approach and proposed a three continuous dimensional representation model for emotions. The clustering experiment on the given valence, arousal and dominance values of various emotions has been done to validate the proposed model. A novel approach for multimodal fusion of information from a large number of channels to classify and predict emotions has also been proposed. Discrete Wavelet Transform, a classical transform for multiresolution analysis of signal has been used in this study. The experiments are performed to classify different emotions from four classifiers. The average accuracies are 81.45%, 74.37%, 57.74% and 75.94% for SVM, MLP, KNN and MMC classifiers respectively. The best accuracy is for 'Depressing' with 85.46% using SVM. The 32 EEG channels are considered as independent modes and features from each channel are considered with equal importance. May be some of the channel data are correlated but they may contain supplementary information. In comparison with the results given by others, the high accuracy of 85% with 13 emotions and 32 subjects from our proposed method clearly proves the potential of our multimodal fusion approach. Copyright © 2013 Elsevier Inc. All rights reserved.
Celik, Turgay; Lee, Hwee Kuan; Petznick, Andrea; Tong, Louis
2013-01-01
Background Infrared (IR) meibography is an imaging technique to capture the Meibomian glands in the eyelids. These ocular surface structures are responsible for producing the lipid layer of the tear film which helps to reduce tear evaporation. In a normal healthy eye, the glands have similar morphological features in terms of spatial width, in-plane elongation, length. On the other hand, eyes with Meibomian gland dysfunction show visible structural irregularities that help in the diagnosis and prognosis of the disease. However, currently there is no universally accepted algorithm for detection of these image features which will be clinically useful. We aim to develop a method of automated gland segmentation which allows images to be classified. Methods A set of 131 meibography images were acquired from patients from the Singapore National Eye Center. We used a method of automated gland segmentation using Gabor wavelets. Features of the imaged glands including orientation, width, length and curvature were extracted and the IR images enhanced. The images were classified as ‘healthy’, ‘intermediate’ or ‘unhealthy’, through the use of a support vector machine classifier (SVM). Half the images were used for training the SVM and the other half for validation. Independently of this procedure, the meibographs were classified by an expert clinician into the same 3 grades. Results The algorithm correctly detected 94% and 98% of mid-line pixels of gland and inter-gland regions, respectively, on healthy images. On intermediate images, correct detection rates of 92% and 97% of mid-line pixels of gland and inter-gland regions were achieved respectively. The true positive rate of detecting healthy images was 86%, and for intermediate images, 74%. The corresponding false positive rates were 15% and 31% respectively. Using the SVM, the proposed method has 88% accuracy in classifying images into the 3 classes. The classification of images into healthy and unhealthy classes achieved a 100% accuracy, but 7/38 intermediate images were incorrectly classified. Conclusions This technique of image analysis in meibography can help clinicians to interpret the degree of gland destruction in patients with dry eye and meibomian gland dysfunction.
Incremental classification learning for anomaly detection in medical images
NASA Astrophysics Data System (ADS)
Giritharan, Balathasan; Yuan, Xiaohui; Liu, Jianguo
2009-02-01
Computer-aided diagnosis usually screens thousands of instances to find only a few positive cases that indicate probable presence of disease.The amount of patient data increases consistently all the time. In diagnosis of new instances, disagreement occurs between a CAD system and physicians, which suggests inaccurate classifiers. Intuitively, misclassified instances and the previously acquired data should be used to retrain the classifier. This, however, is very time consuming and, in some cases where dataset is too large, becomes infeasible. In addition, among the patient data, only a small percentile shows positive sign, which is known as imbalanced data.We present an incremental Support Vector Machines(SVM) as a solution for the class imbalance problem in classification of anomaly in medical images. The support vectors provide a concise representation of the distribution of the training data. Here we use bootstrapping to identify potential candidate support vectors for future iterations. Experiments were conducted using images from endoscopy videos, and the sensitivity and specificity were close to that of SVM trained using all samples available at a given incremental step with significantly improved efficiency in training the classifier.
Improving imbalanced scientific text classification using sampling strategies and dictionaries.
Borrajo, L; Romero, R; Iglesias, E L; Redondo Marey, C M
2011-09-15
Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation. Sampling strategies such as Oversampling and Subsampling are popular in tackling the problem of class imbalance. In this work, we study their effects on three types of classifiers (Knn, SVM and Naive-Bayes) when they are applied to search on the PubMed scientific database. Another purpose of this paper is to study the use of dictionaries in the classification of biomedical texts. Experiments are conducted with three different dictionaries (BioCreative, NLPBA, and an ad-hoc subset of the UniProt database named Protein) using the mentioned classifiers and sampling strategies. Best results were obtained with NLPBA and Protein dictionaries and the SVM classifier using the Subsampling balancing technique. These results were compared with those obtained by other authors using the TREC Genomics 2005 public corpus. Copyright 2011 The Author(s). Published by Journal of Integrative Bioinformatics.
Automated diagnosis of epilepsy using CWT, HOS and texture parameters.
Acharya, U Rajendra; Yanti, Ratna; Zheng, Jia Wei; Krishnan, M Muthu Rama; Tan, Jen Hong; Martis, Roshan Joy; Lim, Choo Min
2013-06-01
Epilepsy is a chronic brain disorder which manifests as recurrent seizures. Electroencephalogram (EEG) signals are generally analyzed to study the characteristics of epileptic seizures. In this work, we propose a method for the automated classification of EEG signals into normal, interictal and ictal classes using Continuous Wavelet Transform (CWT), Higher Order Spectra (HOS) and textures. First the CWT plot was obtained for the EEG signals and then the HOS and texture features were extracted from these plots. Then the statistically significant features were fed to four classifiers namely Decision Tree (DT), K-Nearest Neighbor (KNN), Probabilistic Neural Network (PNN) and Support Vector Machine (SVM) to select the best classifier. We observed that the SVM classifier with Radial Basis Function (RBF) kernel function yielded the best results with an average accuracy of 96%, average sensitivity of 96.9% and average specificity of 97% for 23.6 s duration of EEG data. Our proposed technique can be used as an automatic seizure monitoring software. It can also assist the doctors to cross check the efficacy of their prescribed drugs.
Thurston, Rebecca C; Hernandez, Javier; Del Rio, Jose M; De La Torre, Fernando
2011-07-01
Most midlife women have hot flashes. The conventional criterion (≥2 μmho rise/30 s) for classifying hot flashes physiologically has shown poor performance. We improved this performance in the laboratory with Support Vector Machines (SVMs), a pattern classification method. We aimed to compare conventional to SVM methods to classify hot flashes in the ambulatory setting. Thirty-one women with hot flashes underwent 24 h of ambulatory sternal skin conductance monitoring. Hot flashes were quantified with conventional (≥2 μmho/30 s) and SVM methods. Conventional methods had low sensitivity (sensitivity=.57, specificity=.98, positive predictive value (PPV)=.91, negative predictive value (NPV)=.90, F1=.60), with performance lower with higher body mass index (BMI). SVMs improved this performance (sensitivity=.87, specificity=.97, PPV=.90, NPV=.96, F1=.88) and reduced BMI variation. SVMs can improve ambulatory physiologic hot flash measures. Copyright © 2010 Society for Psychophysiological Research.
Meng, Jun; Shi, Lin; Luan, Yushi
2014-01-01
Background Confident identification of microRNA-target interactions is significant for studying the function of microRNA (miRNA). Although some computational miRNA target prediction methods have been proposed for plants, results of various methods tend to be inconsistent and usually lead to more false positive. To address these issues, we developed an integrated model for identifying plant miRNA–target interactions. Results Three online miRNA target prediction toolkits and machine learning algorithms were integrated to identify and analyze Arabidopsis thaliana miRNA-target interactions. Principle component analysis (PCA) feature extraction and self-training technology were introduced to improve the performance. Results showed that the proposed model outperformed the previously existing methods. The results were validated by using degradome sequencing supported Arabidopsis thaliana miRNA-target interactions. The proposed model constructed on Arabidopsis thaliana was run over Oryza sativa and Vitis vinifera to demonstrate that our model is effective for other plant species. Conclusions The integrated model of online predictors and local PCA-SVM classifier gained credible and high quality miRNA-target interactions. The supervised learning algorithm of PCA-SVM classifier was employed in plant miRNA target identification for the first time. Its performance can be substantially improved if more experimentally proved training samples are provided. PMID:25051153
Comparison of different methods for gender estimation from face image of various poses
NASA Astrophysics Data System (ADS)
Ishii, Yohei; Hongo, Hitoshi; Niwa, Yoshinori; Yamamoto, Kazuhiko
2003-04-01
Recently, gender estimation from face images has been studied for frontal facial images. However, it is difficult to obtain such facial images constantly in the case of application systems for security, surveillance and marketing research. In order to build such systems, a method is required to estimate gender from the image of various facial poses. In this paper, three different classifiers are compared in appearance-based gender estimation, which use four directional features (FDF). The classifiers are linear discriminant analysis (LDA), Support Vector Machines (SVMs) and Sparse Network of Winnows (SNoW). Face images used for experiments were obtained from 35 viewpoints. The direction of viewpoints varied +/-45 degrees horizontally, +/-30 degrees vertically at 15 degree intervals respectively. Although LDA showed the best performance for frontal facial images, SVM with Gaussian kernel was found the best performance (86.0%) for the facial images of 35 viewpoints. It is considered that SVM with Gaussian kernel is robust to changes in viewpoint when estimating gender from these results. Furthermore, the estimation rate was quite close to the average estimation rate at 35 viewpoints respectively. It is supposed that the methods are reasonable to estimate gender within the range of experimented viewpoints by learning face images from multiple directions by one class.
Daily River Flow Forecasting with Hybrid Support Vector Machine – Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Zaini, N.; Malek, M. A.; Yusoff, M.; Mardi, N. H.; Norhisham, S.
2018-04-01
The application of artificial intelligence techniques for river flow forecasting can further improve the management of water resources and flood prevention. This study concerns the development of support vector machine (SVM) based model and its hybridization with particle swarm optimization (PSO) to forecast short term daily river flow at Upper Bertam Catchment located in Cameron Highland, Malaysia. Ten years duration of historical rainfall, antecedent river flow data and various meteorology parameters data from 2003 to 2012 are used in this study. Four SVM based models are proposed which are SVM1, SVM2, SVM-PSO1 and SVM-PSO2 to forecast 1 to 7 day ahead of river flow. SVM1 and SVM-PSO1 are the models with historical rainfall and antecedent river flow as its input, while SVM2 and SVM-PSO2 are the models with historical rainfall, antecedent river flow data and additional meteorological parameters as input. The performances of the proposed model are measured in term of RMSE and R2 . It is found that, SVM2 outperformed SVM1 and SVM-PSO2 outperformed SVM-PSO1 which meant the additional meteorology parameters used as input to the proposed models significantly affect the model performances. Hybrid models SVM-PSO1 and SVM-PSO2 yield higher performances as compared to SVM1 and SVM2. It is found that hybrid models are more effective in forecasting river flow at 1 to 7 day ahead at the study area.
Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-01-01
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514
Force sensor based tool condition monitoring using a heterogeneous ensemble learning model.
Wang, Guofeng; Yang, Yinwei; Li, Zhimeng
2014-11-14
Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.