Sample records for machine ls-svm classifiers

  1. Semisupervised learning using Bayesian interpretation: application to LS-SVM.

    PubMed

    Adankon, Mathias M; Cheriet, Mohamed; Biem, Alain

    2011-04-01

    Bayesian reasoning provides an ideal basis for representing and manipulating uncertain knowledge, with the result that many interesting algorithms in machine learning are based on Bayesian inference. In this paper, we use the Bayesian approach with one and two levels of inference to model the semisupervised learning problem and give its application to the successful kernel classifier support vector machine (SVM) and its variant least-squares SVM (LS-SVM). Taking advantage of Bayesian interpretation of LS-SVM, we develop a semisupervised learning algorithm for Bayesian LS-SVM using our approach based on two levels of inference. Experimental results on both artificial and real pattern recognition problems show the utility of our method.

  2. Online Least Squares One-Class Support Vector Machines-Based Abnormal Visual Event Detection

    PubMed Central

    Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem

    2013-01-01

    The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method. PMID:24351629

  3. Online least squares one-class support vector machines-based abnormal visual event detection.

    PubMed

    Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem

    2013-12-12

    The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method.

  4. Discrimination of raw and processed Dipsacus asperoides by near infrared spectroscopy combined with least squares-support vector machine and random forests

    NASA Astrophysics Data System (ADS)

    Xin, Ni; Gu, Xiao-Feng; Wu, Hao; Hu, Yu-Zhu; Yang, Zhong-Lin

    2012-04-01

    Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.

  5. Evaluation of extreme learning machine for classification of individual and combined finger movements using electromyography on amputees and non-amputees.

    PubMed

    Anam, Khairul; Al-Jumaily, Adel

    2017-01-01

    The success of myoelectric pattern recognition (M-PR) mostly relies on the features extracted and classifier employed. This paper proposes and evaluates a fast classifier, extreme learning machine (ELM), to classify individual and combined finger movements on amputees and non-amputees. ELM is a single hidden layer feed-forward network (SLFN) that avoids iterative learning by determining input weights randomly and output weights analytically. Therefore, it can accelerate the training time of SLFNs. In addition to the classifier evaluation, this paper evaluates various feature combinations to improve the performance of M-PR and investigate some feature projections to improve the class separability of the features. Different from other studies on the implementation of ELM in the myoelectric controller, this paper presents a complete and thorough investigation of various types of ELMs including the node-based and kernel-based ELM. Furthermore, this paper provides comparisons of ELMs and other well-known classifiers such as linear discriminant analysis (LDA), k-nearest neighbour (kNN), support vector machine (SVM) and least-square SVM (LS-SVM). The experimental results show the most accurate ELM classifier is radial basis function ELM (RBF-ELM). The comparison of RBF-ELM and other well-known classifiers shows that RBF-ELM is as accurate as SVM and LS-SVM but faster than the SVM family; it is superior to LDA and kNN. The experimental results also indicate that the accuracy gap of the M-PR on the amputees and non-amputees is not too much with the accuracy of 98.55% on amputees and 99.5% on the non-amputees using six electromyography (EMG) channels. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Computer-aided classification of optical images for diagnosis of osteoarthritis in the finger joints.

    PubMed

    Zhang, Jiang; Wang, James Z; Yuan, Zhen; Sobel, Eric S; Jiang, Huabei

    2011-01-01

    This study presents a computer-aided classification method to distinguish osteoarthritis finger joints from healthy ones based on the functional images captured by x-ray guided diffuse optical tomography. Three imaging features, joint space width, optical absorption, and scattering coefficients, are employed to train a Least Squares Support Vector Machine (LS-SVM) classifier for osteoarthritis classification. The 10-fold validation results show that all osteoarthritis joints are clearly identified and all healthy joints are ruled out by the LS-SVM classifier. The best sensitivity, specificity, and overall accuracy of the classification by experienced technicians based on manual calculation of optical properties and visual examination of optical images are only 85%, 93%, and 90%, respectively. Therefore, our LS-SVM based computer-aided classification is a considerably improved method for osteoarthritis diagnosis.

  7. Predicting breast cancer using an expression values weighted clinical classifier.

    PubMed

    Thomas, Minta; De Brabanter, Kris; Suykens, Johan A K; De Moor, Bart

    2014-12-31

    Clinical data, such as patient history, laboratory analysis, ultrasound parameters-which are the basis of day-to-day clinical decision support-are often used to guide the clinical management of cancer in the presence of microarray data. Several data fusion techniques are available to integrate genomics or proteomics data, but only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. To improve clinical management, these data should be fully exploited. This requires efficient algorithms to integrate these data sets and design a final classifier. LS-SVM classifiers and generalized eigenvalue/singular value decompositions are successfully used in many bioinformatics applications for prediction tasks. While bringing up the benefits of these two techniques, we propose a machine learning approach, a weighted LS-SVM classifier to integrate two data sources: microarray and clinical parameters. We compared and evaluated the proposed methods on five breast cancer case studies. Compared to LS-SVM classifier on individual data sets, generalized eigenvalue decomposition (GEVD) and kernel GEVD, the proposed weighted LS-SVM classifier offers good prediction performance, in terms of test area under ROC Curve (AUC), on all breast cancer case studies. Thus a clinical classifier weighted with microarray data set results in significantly improved diagnosis, prognosis and prediction responses to therapy. The proposed model has been shown as a promising mathematical framework in both data fusion and non-linear classification problems.

  8. Classification of epileptic EEG signals based on simple random sampling and sequential feature selection.

    PubMed

    Ghayab, Hadi Ratham Al; Li, Yan; Abdulla, Shahab; Diykh, Mohammed; Wan, Xiangkui

    2016-06-01

    Electroencephalogram (EEG) signals are used broadly in the medical fields. The main applications of EEG signals are the diagnosis and treatment of diseases such as epilepsy, Alzheimer, sleep problems and so on. This paper presents a new method which extracts and selects features from multi-channel EEG signals. This research focuses on three main points. Firstly, simple random sampling (SRS) technique is used to extract features from the time domain of EEG signals. Secondly, the sequential feature selection (SFS) algorithm is applied to select the key features and to reduce the dimensionality of the data. Finally, the selected features are forwarded to a least square support vector machine (LS_SVM) classifier to classify the EEG signals. The LS_SVM classifier classified the features which are extracted and selected from the SRS and the SFS. The experimental results show that the method achieves 99.90, 99.80 and 100 % for classification accuracy, sensitivity and specificity, respectively.

  9. [Application of near infrared spectroscopy combined with particle swarm optimization based least square support vactor machine to rapid quantitative analysis of Corni Fructus].

    PubMed

    Liu, Xue-song; Sun, Fen-fang; Jin, Ye; Wu, Yong-jiang; Gu, Zhi-xin; Zhu, Li; Yan, Dong-lan

    2015-12-01

    A novel method was developed for the rapid determination of multi-indicators in corni fructus by means of near infrared (NIR) spectroscopy. Particle swarm optimization (PSO) based least squares support vector machine was investigated to increase the levels of quality control. The calibration models of moisture, extractum, morroniside and loganin were established using the PSO-LS-SVM algorithm. The performance of PSO-LS-SVM models was compared with partial least squares regression (PLSR) and back propagation artificial neural network (BP-ANN). The calibration and validation results of PSO-LS-SVM were superior to both PLS and BP-ANN. For PSO-LS-SVM models, the correlation coefficients (r) of calibrations were all above 0.942. The optimal prediction results were also achieved by PSO-LS-SVM models with the RMSEP (root mean square error of prediction) and RSEP (relative standard errors of prediction) less than 1.176 and 15.5% respectively. The results suggest that PSO-LS-SVM algorithm has a good model performance and high prediction accuracy. NIR has a potential value for rapid determination of multi-indicators in Corni Fructus.

  10. A Two-Layer Least Squares Support Vector Machine Approach to Credit Risk Assessment

    NASA Astrophysics Data System (ADS)

    Liu, Jingli; Li, Jianping; Xu, Weixuan; Shi, Yong

    Least squares support vector machine (LS-SVM) is a revised version of support vector machine (SVM) and has been proved to be a useful tool for pattern recognition. LS-SVM had excellent generalization performance and low computational cost. In this paper, we propose a new method called two-layer least squares support vector machine which combines kernel principle component analysis (KPCA) and linear programming form of least square support vector machine. With this method sparseness and robustness is obtained while solving large dimensional and large scale database. A U.S. commercial credit card database is used to test the efficiency of our method and the result proved to be a satisfactory one.

  11. Prediction of p38 map kinase inhibitory activity of 3, 4-dihydropyrido [3, 2-d] pyrimidone derivatives using an expert system based on principal component analysis and least square support vector machine

    PubMed Central

    Shahlaei, M.; Saghaie, L.

    2014-01-01

    A quantitative structure–activity relationship (QSAR) study is suggested for the prediction of biological activity (pIC50) of 3, 4-dihydropyrido [3,2-d] pyrimidone derivatives as p38 inhibitors. Modeling of the biological activities of compounds of interest as a function of molecular structures was established by means of principal component analysis (PCA) and least square support vector machine (LS-SVM) methods. The results showed that the pIC50 values calculated by LS-SVM are in good agreement with the experimental data, and the performance of the LS-SVM regression model is superior to the PCA-based model. The developed LS-SVM model was applied for the prediction of the biological activities of pyrimidone derivatives, which were not in the modeling procedure. The resulted model showed high prediction ability with root mean square error of prediction of 0.460 for LS-SVM. The study provided a novel and effective approach for predicting biological activities of 3, 4-dihydropyrido [3,2-d] pyrimidone derivatives as p38 inhibitors and disclosed that LS-SVM can be used as a powerful chemometrics tool for QSAR studies. PMID:26339262

  12. [Based on the LS-SVM modeling method determination of soil available N and available K by using near-infrared spectroscopy].

    PubMed

    Liu, Xue-Mei; Liu, Jian-She

    2012-11-01

    Visible infrared spectroscopy (Vis/SW-NIRS) was investigated in the present study for measurement accuracy of soil properties,namely, available nitrogen(N) and available potassium(K). Three types of pretreatments including standard normal variate (SNV), multiplicative scattering correction (MSC) and Savitzky-Golay smoothing+first derivative were adopted to eliminate the system noises and external disturbances. Then partial least squares (PLS) and least squares-support vector machine (LS-SVM) models analysis were implemented for calibration models. Simultaneously, the performance of least squares-support vector machine (LS-SVM) models was compared with three kinds of inputs, including PCA(PCs), latent variables (LVs), and effective wavelengths (EWs). The results indicated that all LS-SVM models outperformed PLS models. The performance of the model was evaluated by the correlation coefficient (r2) and RMSEP. The optimal EWs-LS-SVM models were achieved, and the correlation coefficient (r2) and RMSEP were 0.82 and 17.2 for N and 0.72 and 15.0 for K, respectively. The results indicated that visible and short wave-near infrared spectroscopy (Vis/SW-NIRS)(325-1 075 nm) combined with LS-SVM could be utilized as a precision method for the determination of soil properties.

  13. Robust Least-Squares Support Vector Machine With Minimization of Mean and Variance of Modeling Error.

    PubMed

    Lu, Xinjiang; Liu, Wenbo; Zhou, Chuang; Huang, Minghui

    2017-06-13

    The least-squares support vector machine (LS-SVM) is a popular data-driven modeling method and has been successfully applied to a wide range of applications. However, it has some disadvantages, including being ineffective at handling non-Gaussian noise as well as being sensitive to outliers. In this paper, a robust LS-SVM method is proposed and is shown to have more reliable performance when modeling a nonlinear system under conditions where Gaussian or non-Gaussian noise is present. The construction of a new objective function allows for a reduction of the mean of the modeling error as well as the minimization of its variance, and it does not constrain the mean of the modeling error to zero. This differs from the traditional LS-SVM, which uses a worst-case scenario approach in order to minimize the modeling error and constrains the mean of the modeling error to zero. In doing so, the proposed method takes the modeling error distribution information into consideration and is thus less conservative and more robust in regards to random noise. A solving method is then developed in order to determine the optimal parameters for the proposed robust LS-SVM. An additional analysis indicates that the proposed LS-SVM gives a smaller weight to a large-error training sample and a larger weight to a small-error training sample, and is thus more robust than the traditional LS-SVM. The effectiveness of the proposed robust LS-SVM is demonstrated using both artificial and real life cases.

  14. An Automated and Intelligent Medical Decision Support System for Brain MRI Scans Classification.

    PubMed

    Siddiqui, Muhammad Faisal; Reza, Ahmed Wasif; Kanesan, Jeevan

    2015-01-01

    A wide interest has been observed in the medical health care applications that interpret neuroimaging scans by machine learning systems. This research proposes an intelligent, automatic, accurate, and robust classification technique to classify the human brain magnetic resonance image (MRI) as normal or abnormal, to cater down the human error during identifying the diseases in brain MRIs. In this study, fast discrete wavelet transform (DWT), principal component analysis (PCA), and least squares support vector machine (LS-SVM) are used as basic components. Firstly, fast DWT is employed to extract the salient features of brain MRI, followed by PCA, which reduces the dimensions of the features. These reduced feature vectors also shrink the memory storage consumption by 99.5%. At last, an advanced classification technique based on LS-SVM is applied to brain MR image classification using reduced features. For improving the efficiency, LS-SVM is used with non-linear radial basis function (RBF) kernel. The proposed algorithm intelligently determines the optimized values of the hyper-parameters of the RBF kernel and also applied k-fold stratified cross validation to enhance the generalization of the system. The method was tested by 340 patients' benchmark datasets of T1-weighted and T2-weighted scans. From the analysis of experimental results and performance comparisons, it is observed that the proposed medical decision support system outperformed all other modern classifiers and achieves 100% accuracy rate (specificity/sensitivity 100%/100%). Furthermore, in terms of computation time, the proposed technique is significantly faster than the recent well-known methods, and it improves the efficiency by 71%, 3%, and 4% on feature extraction stage, feature reduction stage, and classification stage, respectively. These results indicate that the proposed well-trained machine learning system has the potential to make accurate predictions about brain abnormalities from the individual subjects, therefore, it can be used as a significant tool in clinical practice.

  15. Inline Measurement of Particle Concentrations in Multicomponent Suspensions using Ultrasonic Sensor and Least Squares Support Vector Machines.

    PubMed

    Zhan, Xiaobin; Jiang, Shulan; Yang, Yili; Liang, Jian; Shi, Tielin; Li, Xiwen

    2015-09-18

    This paper proposes an ultrasonic measurement system based on least squares support vector machines (LS-SVM) for inline measurement of particle concentrations in multicomponent suspensions. Firstly, the ultrasonic signals are analyzed and processed, and the optimal feature subset that contributes to the best model performance is selected based on the importance of features. Secondly, the LS-SVM model is tuned, trained and tested with different feature subsets to obtain the optimal model. In addition, a comparison is made between the partial least square (PLS) model and the LS-SVM model. Finally, the optimal LS-SVM model with the optimal feature subset is applied to inline measurement of particle concentrations in the mixing process. The results show that the proposed method is reliable and accurate for inline measuring the particle concentrations in multicomponent suspensions and the measurement accuracy is sufficiently high for industrial application. Furthermore, the proposed method is applicable to the modeling of the nonlinear system dynamically and provides a feasible way to monitor industrial processes.

  16. An improved conjugate gradient scheme to the solution of least squares SVM.

    PubMed

    Chu, Wei; Ong, Chong Jin; Keerthi, S Sathiya

    2005-03-01

    The least square support vector machines (LS-SVM) formulation corresponds to the solution of a linear system of equations. Several approaches to its numerical solutions have been proposed in the literature. In this letter, we propose an improved method to the numerical solution of LS-SVM and show that the problem can be solved using one reduced system of linear equations. Compared with the existing algorithm for LS-SVM, the approach used in this letter is about twice as efficient. Numerical results using the proposed method are provided for comparisons with other existing algorithms.

  17. Support vector machine regression (LS-SVM)--an alternative to artificial neural networks (ANNs) for the analysis of quantum chemistry data?

    PubMed

    Balabin, Roman M; Lomakina, Ekaterina I

    2011-06-28

    A multilayer feed-forward artificial neural network (MLP-ANN) with a single, hidden layer that contains a finite number of neurons can be regarded as a universal non-linear approximator. Today, the ANN method and linear regression (MLR) model are widely used for quantum chemistry (QC) data analysis (e.g., thermochemistry) to improve their accuracy (e.g., Gaussian G2-G4, B3LYP/B3-LYP, X1, or W1 theoretical methods). In this study, an alternative approach based on support vector machines (SVMs) is used, the least squares support vector machine (LS-SVM) regression. It has been applied to ab initio (first principle) and density functional theory (DFT) quantum chemistry data. So, QC + SVM methodology is an alternative to QC + ANN one. The task of the study was to estimate the Møller-Plesset (MPn) or DFT (B3LYP, BLYP, BMK) energies calculated with large basis sets (e.g., 6-311G(3df,3pd)) using smaller ones (6-311G, 6-311G*, 6-311G**) plus molecular descriptors. A molecular set (BRM-208) containing a total of 208 organic molecules was constructed and used for the LS-SVM training, cross-validation, and testing. MP2, MP3, MP4(DQ), MP4(SDQ), and MP4/MP4(SDTQ) ab initio methods were tested. Hartree-Fock (HF/SCF) results were also reported for comparison. Furthermore, constitutional (CD: total number of atoms and mole fractions of different atoms) and quantum-chemical (QD: HOMO-LUMO gap, dipole moment, average polarizability, and quadrupole moment) molecular descriptors were used for the building of the LS-SVM calibration model. Prediction accuracies (MADs) of 1.62 ± 0.51 and 0.85 ± 0.24 kcal mol(-1) (1 kcal mol(-1) = 4.184 kJ mol(-1)) were reached for SVM-based approximations of ab initio and DFT energies, respectively. The LS-SVM model was more accurate than the MLR model. A comparison with the artificial neural network approach shows that the accuracy of the LS-SVM method is similar to the accuracy of ANN. The extrapolation and interpolation results show that LS-SVM is superior by almost an order of magnitude over the ANN method in terms of the stability, generality, and robustness of the final model. The LS-SVM model needs a much smaller numbers of samples (a much smaller sample set) to make accurate prediction results. Potential energy surface (PES) approximations for molecular dynamics (MD) studies are discussed as a promising application for the LS-SVM calibration approach. This journal is © the Owner Societies 2011

  18. Modelling and Prediction of Spark-ignition Engine Power Performance Using Incremental Least Squares Support Vector Machines

    NASA Astrophysics Data System (ADS)

    Wong, Pak-kin; Vong, Chi-man; Wong, Hang-cheong; Li, Ke

    2010-05-01

    Modern automotive spark-ignition (SI) power performance usually refers to output power and torque, and they are significantly affected by the setup of control parameters in the engine management system (EMS). EMS calibration is done empirically through tests on the dynamometer (dyno) because no exact mathematical engine model is yet available. With an emerging nonlinear function estimation technique of Least squares support vector machines (LS-SVM), the approximate power performance model of a SI engine can be determined by training the sample data acquired from the dyno. A novel incremental algorithm based on typical LS-SVM is also proposed in this paper, so the power performance models built from the incremental LS-SVM can be updated whenever new training data arrives. With updating the models, the model accuracies can be continuously increased. The predicted results using the estimated models from the incremental LS-SVM are good agreement with the actual test results and with the almost same average accuracy of retraining the models from scratch, but the incremental algorithm can significantly shorten the model construction time when new training data arrives.

  19. Discrimination of transgenic soybean seeds by terahertz spectroscopy

    NASA Astrophysics Data System (ADS)

    Liu, Wei; Liu, Changhong; Chen, Feng; Yang, Jianbo; Zheng, Lei

    2016-10-01

    Discrimination of genetically modified organisms is increasingly demanded by legislation and consumers worldwide. The feasibility of a non-destructive discrimination of glyphosate-resistant and conventional soybean seeds and their hybrid descendants was examined by terahertz time-domain spectroscopy system combined with chemometrics. Principal component analysis (PCA), least squares-support vector machines (LS-SVM) and PCA-back propagation neural network (PCA-BPNN) models with the first and second derivative and standard normal variate (SNV) transformation pre-treatments were applied to classify soybean seeds based on genotype. Results demonstrated clear differences among glyphosate-resistant, hybrid descendants and conventional non-transformed soybean seeds could easily be visualized with an excellent classification (accuracy was 88.33% in validation set) using the LS-SVM and the spectra with SNV pre-treatment. The results indicated that THz spectroscopy techniques together with chemometrics would be a promising technique to distinguish transgenic soybean seeds from non-transformed seeds with high efficiency and without any major sample preparation.

  20. A Fast Reduced Kernel Extreme Learning Machine.

    PubMed

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. [Measurement of soil organic matter and available K based on SPA-LS-SVM].

    PubMed

    Zhang, Hai-Liang; Liu, Xue-Mei; He, Yong

    2014-05-01

    Visible and short wave infrared spectroscopy (Vis/SW-NIRS) was investigated in the present study for measurement of soil organic matter (OM) and available potassium (K). Four types of pretreatments including smoothing, SNV, MSC and SG smoothing+first derivative were adopted to eliminate the system noises and external disturbances. Then partial least squares regression (PLSR) and least squares-support vector machine (LS-SVM) models were implemented for calibration models. The LS-SVM model was built by using characteristic wavelength based on successive projections algorithm (SPA). Simultaneously, the performance of LSSVM models was compared with PLSR models. The results indicated that LS-SVM models using characteristic wavelength as inputs based on SPA outperformed PLSR models. The optimal SPA-LS-SVM models were achieved, and the correlation coefficient (r), and RMSEP were 0. 860 2 and 2. 98 for OM and 0. 730 5 and 15. 78 for K, respectively. The results indicated that visible and short wave near infrared spectroscopy (Vis/SW-NIRS) (325 approximately 1 075 nm) combined with LS-SVM based on SPA could be utilized as a precision method for the determination of soil properties.

  2. [Discrimination of donkey meat by NIR and chemometrics].

    PubMed

    Niu, Xiao-Ying; Shao, Li-Min; Dong, Fang; Zhao, Zhi-Lei; Zhu, Yan

    2014-10-01

    Donkey meat samples (n = 167) from different parts of donkey body (neck, costalia, rump, and tendon), beef (n = 47), pork (n = 51) and mutton (n = 32) samples were used to establish near-infrared reflectance spectroscopy (NIR) classification models in the spectra range of 4,000~12,500 cm(-1). The accuracies of classification models constructed by Mahalanobis distances analysis, soft independent modeling of class analogy (SIMCA) and least squares-support vector machine (LS-SVM), respectively combined with pretreatment of Savitzky-Golay smooth (5, 15 and 25 points) and derivative (first and second), multiplicative scatter correction and standard normal variate, were compared. The optimal models for intact samples were obtained by Mahalanobis distances analysis with the first 11 principal components (PCs) from original spectra as inputs and by LS-SVM with the first 6 PCs as inputs, and correctly classified 100% of calibration set and 98. 96% of prediction set. For minced samples of 7 mm diameter the optimal result was attained by LS-SVM with the first 5 PCs from original spectra as inputs, which gained an accuracy of 100% for calibration and 97.53% for prediction. For minced diameter of 5 mm SIMCA model with the first 8 PCs from original spectra as inputs correctly classified 100% of calibration and prediction. And for minced diameter of 3 mm Mahalanobis distances analysis and SIMCA models both achieved 100% accuracy for calibration and prediction respectively with the first 7 and 9 PCs from original spectra as inputs. And in these models, donkey meat samples were all correctly classified with 100% either in calibration or prediction. The results show that it is feasible that NIR with chemometrics methods is used to discriminate donkey meat from the else meat.

  3. Density-Dependent Quantized Least Squares Support Vector Machine for Large Data Sets.

    PubMed

    Nan, Shengyu; Sun, Lei; Chen, Badong; Lin, Zhiping; Toh, Kar-Ann

    2017-01-01

    Based on the knowledge that input data distribution is important for learning, a data density-dependent quantization scheme (DQS) is proposed for sparse input data representation. The usefulness of the representation scheme is demonstrated by using it as a data preprocessing unit attached to the well-known least squares support vector machine (LS-SVM) for application on big data sets. Essentially, the proposed DQS adopts a single shrinkage threshold to obtain a simple quantization scheme, which adapts its outputs to input data density. With this quantization scheme, a large data set is quantized to a small subset where considerable sample size reduction is generally obtained. In particular, the sample size reduction can save significant computational cost when using the quantized subset for feature approximation via the Nyström method. Based on the quantized subset, the approximated features are incorporated into LS-SVM to develop a data density-dependent quantized LS-SVM (DQLS-SVM), where an analytic solution is obtained in the primal solution space. The developed DQLS-SVM is evaluated on synthetic and benchmark data with particular emphasis on large data sets. Extensive experimental results show that the learning machine incorporating DQS attains not only high computational efficiency but also good generalization performance.

  4. Application of the Teager-Kaiser energy operator in bearing fault diagnosis.

    PubMed

    Henríquez Rodríguez, Patricia; Alonso, Jesús B; Ferrer, Miguel A; Travieso, Carlos M

    2013-03-01

    Condition monitoring of rotating machines is important in the prevention of failures. As most machine malfunctions are related to bearing failures, several bearing diagnosis techniques have been developed. Some of them feature the bearing vibration signal with statistical measures and others extract the bearing fault characteristic frequency from the AM component of the vibration signal. In this paper, we propose to transform the vibration signal to the Teager-Kaiser domain and feature it with statistical and energy-based measures. A bearing database with normal and faulty bearings is used. The diagnosis is performed with two classifiers: a neural network classifier and a LS-SVM classifier. Experiments show that the Teager domain features outperform those based on the temporal or AM signal. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  5. Laser-Induced Breakdown Spectroscopy Coupled with Multivariate Chemometrics for Variety Discrimination of Soil

    PubMed Central

    Yu, Ke-Qiang; Zhao, Yan-Ru; Liu, Fei; He, Yong

    2016-01-01

    The aim of this work was to analyze the variety of soil by laser-induced breakdown spectroscopy (LIBS) coupled with chemometrics methods. 6 certified reference materials (CRMs) of soil samples were selected and their LIBS spectra were captured. Characteristic emission lines of main elements were identified based on the LIBS curves and corresponding contents. From the identified emission lines, LIBS spectra in 7 lines with high signal-to-noise ratio (SNR) were chosen for further analysis. Principal component analysis (PCA) was carried out using the LIBS spectra at 7 selected lines and an obvious cluster of 6 soils was observed. Soft independent modeling of class analogy (SIMCA) and least-squares support vector machine (LS-SVM) were introduced to establish discriminant models for classifying the 6 types of soils, and they offered the correct discrimination rates of 90% and 100%, respectively. Receiver operating characteristic (ROC) curve was used to evaluate the performance of models and the results demonstrated that the LS-SVM model was promising. Lastly, 8 types of soils from different places were gathered to conduct the same experiments for verifying the selected 7 emission lines and LS-SVM model. The research revealed that LIBS technology coupled with chemometrics could conduct the variety discrimination of soil. PMID:27279284

  6. An assessment of support vector machines for land cover classification

    USGS Publications Warehouse

    Huang, C.; Davis, L.S.; Townshend, J.R.G.

    2002-01-01

    The support vector machine (SVM) is a group of theoretically superior machine learning algorithms. It was found competitive with the best available machine learning algorithms in classifying high-dimensional data sets. This paper gives an introduction to the theoretical development of the SVM and an experimental evaluation of its accuracy, stability and training speed in deriving land cover classifications from satellite images. The SVM was compared to three other popular classifiers, including the maximum likelihood classifier (MLC), neural network classifiers (NNC) and decision tree classifiers (DTC). The impacts of kernel configuration on the performance of the SVM and of the selection of training data and input variables on the four classifiers were also evaluated in this experiment.

  7. Prediction of biochar yield from cattle manure pyrolysis via least squares support vector machine intelligent approach.

    PubMed

    Cao, Hongliang; Xin, Ya; Yuan, Qiaoxia

    2016-02-01

    To predict conveniently the biochar yield from cattle manure pyrolysis, intelligent modeling approach was introduced in this research. A traditional artificial neural networks (ANN) model and a novel least squares support vector machine (LS-SVM) model were developed. For the identification and prediction evaluation of the models, a data set with 33 experimental data was used, which were obtained using a laboratory-scale fixed bed reaction system. The results demonstrated that the intelligent modeling approach is greatly convenient and effective for the prediction of the biochar yield. In particular, the novel LS-SVM model has a more satisfying predicting performance and its robustness is better than the traditional ANN model. The introduction and application of the LS-SVM modeling method gives a successful example, which is a good reference for the modeling study of cattle manure pyrolysis process, even other similar processes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Optimization of Support Vector Machine (SVM) for Object Classification

    NASA Technical Reports Server (NTRS)

    Scholten, Matthew; Dhingra, Neil; Lu, Thomas T.; Chao, Tien-Hsin

    2012-01-01

    The Support Vector Machine (SVM) is a powerful algorithm, useful in classifying data into species. The SVMs implemented in this research were used as classifiers for the final stage in a Multistage Automatic Target Recognition (ATR) system. A single kernel SVM known as SVMlight, and a modified version known as a SVM with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SVM as a method for classification. From trial to trial, SVM produces consistent results.

  9. Study on for soluble solids contents measurement of grape juice beverage based on Vis/NIRS and chemomtrics

    NASA Astrophysics Data System (ADS)

    Wu, Di; He, Yong

    2007-11-01

    The aim of this study is to investigate the potential of the visible and near infrared spectroscopy (Vis/NIRS) technique for non-destructive measurement of soluble solids contents (SSC) in grape juice beverage. 380 samples were studied in this paper. Smoothing way of Savitzky-Golay and standard normal variate were applied for the pre-processing of spectral data. Least-squares support vector machines (LS-SVM) with RBF kernel function was applied to developing the SSC prediction model based on the Vis/NIRS absorbance data. The determination coefficient for prediction (Rp2) of the results predicted by LS-SVM model was 0. 962 and root mean square error (RMSEP) was 0. 434137. It is concluded that Vis/NIRS technique can quantify the SSC of grape juice beverage fast and non-destructively.. At the same time, LS-SVM model was compared with PLS and back propagation neural network (BP-NN) methods. The results showed that LS-SVM was superior to the conventional linear and non-linear methods in predicting SSC of grape juice beverage. In this study, the generation ability of LS-SVM, PLS and BP-NN models were also investigated. It is concluded that LS-SVM regression method is a promising technique for chemometrics in quantitative prediction.

  10. Least-Squares Support Vector Machine Approach to Viral Replication Origin Prediction

    PubMed Central

    Cruz-Cano, Raul; Chew, David S.H.; Kwok-Pui, Choi; Ming-Ying, Leung

    2010-01-01

    Replication of their DNA genomes is a central step in the reproduction of many viruses. Procedures to find replication origins, which are initiation sites of the DNA replication process, are therefore of great importance for controlling the growth and spread of such viruses. Existing computational methods for viral replication origin prediction have mostly been tested within the family of herpesviruses. This paper proposes a new approach by least-squares support vector machines (LS-SVMs) and tests its performance not only on the herpes family but also on a collection of caudoviruses coming from three viral families under the order of caudovirales. The LS-SVM approach provides sensitivities and positive predictive values superior or comparable to those given by the previous methods. When suitably combined with previous methods, the LS-SVM approach further improves the prediction accuracy for the herpesvirus replication origins. Furthermore, by recursive feature elimination, the LS-SVM has also helped find the most significant features of the data sets. The results suggest that the LS-SVMs will be a highly useful addition to the set of computational tools for viral replication origin prediction and illustrate the value of optimization-based computing techniques in biomedical applications. PMID:20729987

  11. Least-Squares Support Vector Machine Approach to Viral Replication Origin Prediction.

    PubMed

    Cruz-Cano, Raul; Chew, David S H; Kwok-Pui, Choi; Ming-Ying, Leung

    2010-06-01

    Replication of their DNA genomes is a central step in the reproduction of many viruses. Procedures to find replication origins, which are initiation sites of the DNA replication process, are therefore of great importance for controlling the growth and spread of such viruses. Existing computational methods for viral replication origin prediction have mostly been tested within the family of herpesviruses. This paper proposes a new approach by least-squares support vector machines (LS-SVMs) and tests its performance not only on the herpes family but also on a collection of caudoviruses coming from three viral families under the order of caudovirales. The LS-SVM approach provides sensitivities and positive predictive values superior or comparable to those given by the previous methods. When suitably combined with previous methods, the LS-SVM approach further improves the prediction accuracy for the herpesvirus replication origins. Furthermore, by recursive feature elimination, the LS-SVM has also helped find the most significant features of the data sets. The results suggest that the LS-SVMs will be a highly useful addition to the set of computational tools for viral replication origin prediction and illustrate the value of optimization-based computing techniques in biomedical applications.

  12. Nondestructive determination of transgenic Bacillus thuringiensis rice seeds (Oryza sativa L.) using multispectral imaging and chemometric methods.

    PubMed

    Liu, Changhong; Liu, Wei; Lu, Xuzhong; Chen, Wei; Yang, Jianbo; Zheng, Lei

    2014-06-15

    Crop-to-crop transgene flow may affect the seed purity of non-transgenic rice varieties, resulting in unwanted biosafety consequences. The feasibility of a rapid and nondestructive determination of transgenic rice seeds from its non-transgenic counterparts was examined by using multispectral imaging system combined with chemometric data analysis. Principal component analysis (PCA), partial least squares discriminant analysis (PLSDA), least squares-support vector machines (LS-SVM), and PCA-back propagation neural network (PCA-BPNN) methods were applied to classify rice seeds according to their genetic origins. The results demonstrated that clear differences between non-transgenic and transgenic rice seeds could be easily visualized with the nondestructive determination method developed through this study and an excellent classification (up to 100% with LS-SVM model) can be achieved. It is concluded that multispectral imaging together with chemometric data analysis is a promising technique to identify transgenic rice seeds with high efficiency, providing bright prospects for future applications. Copyright © 2013 Elsevier Ltd. All rights reserved.

  13. Detection of Glutamic Acid in Oilseed Rape Leaves Using Near Infrared Spectroscopy and the Least Squares-Support Vector Machine

    PubMed Central

    Bao, Yidan; Kong, Wenwen; Liu, Fei; Qiu, Zhengjun; He, Yong

    2012-01-01

    Amino acids are quite important indices to indicate the growth status of oilseed rape under herbicide stress. Near infrared (NIR) spectroscopy combined with chemometrics was applied for fast determination of glutamic acid in oilseed rape leaves. The optimal spectral preprocessing method was obtained after comparing Savitzky-Golay smoothing, standard normal variate, multiplicative scatter correction, first and second derivatives, detrending and direct orthogonal signal correction. Linear and nonlinear calibration methods were developed, including partial least squares (PLS) and least squares-support vector machine (LS-SVM). The most effective wavelengths (EWs) were determined by the successive projections algorithm (SPA), and these wavelengths were used as the inputs of PLS and LS-SVM model. The best prediction results were achieved by SPA-LS-SVM (Raw) model with correlation coefficient r = 0.9943 and root mean squares error of prediction (RMSEP) = 0.0569 for prediction set. These results indicated that NIR spectroscopy combined with SPA-LS-SVM was feasible for the fast and effective detection of glutamic acid in oilseed rape leaves. The selected EWs could be used to develop spectral sensors, and the important and basic amino acid data were helpful to study the function mechanism of herbicide. PMID:23203052

  14. [Different wavelengths selection methods for identification of early blight on tomato leaves by using hyperspectral imaging technique].

    PubMed

    Cheng, Shu-Xi; Xie, Chuan-Qi; Wang, Qiao-Nan; He, Yong; Shao, Yong-Ni

    2014-05-01

    Identification of early blight on tomato leaves by using hyperspectral imaging technique based on different effective wavelengths selection methods (successive projections algorithm, SPA; x-loading weights, x-LW; gram-schmidt orthogonaliza-tion, GSO) was studied in the present paper. Hyperspectral images of seventy healthy and seventy infected tomato leaves were obtained by hyperspectral imaging system across the wavelength range of 380-1023 nm. Reflectance of all pixels in region of interest (ROI) was extracted by ENVI 4. 7 software. Least squares-support vector machine (LS-SVM) model was established based on the full spectral wavelengths. It obtained an excellent result with the highest identification accuracy (100%) in both calibration and prediction sets. Then, EW-LS-SVM and EW-LDA models were established based on the selected wavelengths suggested by SPA, x-LW and GSO, respectively. The results showed that all of the EW-LS-SVM and EW-LDA models performed well with the identification accuracy of 100% in EW-LS-SVM model and 100%, 100% and 97. 83% in EW-LDA model, respectively. Moreover, the number of input wavelengths of SPA-LS-SVM, x-LW-LS-SVM and GSO-LS-SVM models were four (492, 550, 633 and 680 nm), three (631, 719 and 747 nm) and two (533 and 657 nm), respectively. Fewer input variables were beneficial for the development of identification instrument. It demonstrated that it is feasible to identify early blight on tomato leaves by using hyperspectral imaging, and SPA, x-LW and GSO were effective wavelengths selection methods.

  15. Rapid determination of biogenic amines in cooked beef using hyperspectral imaging with sparse representation algorithm

    NASA Astrophysics Data System (ADS)

    Yang, Dong; Lu, Anxiang; Ren, Dong; Wang, Jihua

    2017-11-01

    This study explored the feasibility of rapid detection of biogenic amines (BAs) in cooked beef during the storage process using hyperspectral imaging technique combined with sparse representation (SR) algorithm. The hyperspectral images of samples were collected in the two spectral ranges of 400-1000 nm and 1000-1800 nm, separately. The spectral data were reduced dimensionality by SR and principal component analysis (PCA) algorithms, and then integrated the least square support vector machine (LS-SVM) to build the SR-LS-SVM and PC-LS-SVM models for the prediction of BAs values in cooked beef. The results showed that the SR-LS-SVM model exhibited the best predictive ability with determination coefficients (RP2) of 0.943 and root mean square errors (RMSEP) of 1.206 in the range of 400-1000 nm of prediction set. The SR and PCA algorithms were further combined to establish the best SR-PC-LS-SVM model for BAs prediction, which had high RP2of 0.969 and low RMSEP of 1.039 in the region of 400-1000 nm. The visual map of the BAs was generated using the best SR-PC-LS-SVM model with imaging process algorithms, which could be used to observe the changes of BAs in cooked beef more intuitively. The study demonstrated that hyperspectral imaging technique combined with sparse representation were able to detect effectively the BAs values in cooked beef during storage and the built SR-PC-LS-SVM model had a potential for rapid and accurate determination of freshness indexes in other meat and meat products.

  16. A RLS-SVM Aided Fusion Methodology for INS during GPS Outages

    PubMed Central

    Yao, Yiqing; Xu, Xiaosu

    2017-01-01

    In order to maintain a relatively high accuracy of navigation performance during global positioning system (GPS) outages, a novel robust least squares support vector machine (LS-SVM)-aided fusion methodology is explored to provide the pseudo-GPS position information for the inertial navigation system (INS). The relationship between the yaw, specific force, velocity, and the position increment is modeled. Rather than share the same weight in the traditional LS-SVM, the proposed algorithm allocates various weights for different data, which makes the system immune to the outliers. Field test data was collected to evaluate the proposed algorithm. The comparison results indicate that the proposed algorithm can effectively provide position corrections for standalone INS during the 300 s GPS outage, which outperforms the traditional LS-SVM method. Historical information is also involved to better represent the vehicle dynamics. PMID:28245549

  17. A RLS-SVM Aided Fusion Methodology for INS during GPS Outages.

    PubMed

    Yao, Yiqing; Xu, Xiaosu

    2017-02-24

    In order to maintain a relatively high accuracy of navigation performance during global positioning system (GPS) outages, a novel robust least squares support vector machine (LS-SVM)-aided fusion methodology is explored to provide the pseudo-GPS position information for the inertial navigation system (INS). The relationship between the yaw, specific force, velocity, and the position increment is modeled. Rather than share the same weight in the traditional LS-SVM, the proposed algorithm allocates various weights for different data, which makes the system immune to the outliers. Field test data was collected to evaluate the proposed algorithm. The comparison results indicate that the proposed algorithm can effectively provide position corrections for standalone INS during the 300 s GPS outage, which outperforms the traditional LS-SVM method. Historical information is also involved to better represent the vehicle dynamics.

  18. A Bayesian least squares support vector machines based framework for fault diagnosis and failure prognosis

    NASA Astrophysics Data System (ADS)

    Khawaja, Taimoor Saleem

    A high-belief low-overhead Prognostics and Health Management (PHM) system is desired for online real-time monitoring of complex non-linear systems operating in a complex (possibly non-Gaussian) noise environment. This thesis presents a Bayesian Least Squares Support Vector Machine (LS-SVM) based framework for fault diagnosis and failure prognosis in nonlinear non-Gaussian systems. The methodology assumes the availability of real-time process measurements, definition of a set of fault indicators and the existence of empirical knowledge (or historical data) to characterize both nominal and abnormal operating conditions. An efficient yet powerful Least Squares Support Vector Machine (LS-SVM) algorithm, set within a Bayesian Inference framework, not only allows for the development of real-time algorithms for diagnosis and prognosis but also provides a solid theoretical framework to address key concepts related to classification for diagnosis and regression modeling for prognosis. SVM machines are founded on the principle of Structural Risk Minimization (SRM) which tends to find a good trade-off between low empirical risk and small capacity. The key features in SVM are the use of non-linear kernels, the absence of local minima, the sparseness of the solution and the capacity control obtained by optimizing the margin. The Bayesian Inference framework linked with LS-SVMs allows a probabilistic interpretation of the results for diagnosis and prognosis. Additional levels of inference provide the much coveted features of adaptability and tunability of the modeling parameters. The two main modules considered in this research are fault diagnosis and failure prognosis. With the goal of designing an efficient and reliable fault diagnosis scheme, a novel Anomaly Detector is suggested based on the LS-SVM machines. The proposed scheme uses only baseline data to construct a 1-class LS-SVM machine which, when presented with online data is able to distinguish between normal behavior and any abnormal or novel data during real-time operation. The results of the scheme are interpreted as a posterior probability of health (1 - probability of fault). As shown through two case studies in Chapter 3, the scheme is well suited for diagnosing imminent faults in dynamical non-linear systems. Finally, the failure prognosis scheme is based on an incremental weighted Bayesian LS-SVR machine. It is particularly suited for online deployment given the incremental nature of the algorithm and the quick optimization problem solved in the LS-SVR algorithm. By way of kernelization and a Gaussian Mixture Modeling (GMM) scheme, the algorithm can estimate "possibly" non-Gaussian posterior distributions for complex non-linear systems. An efficient regression scheme associated with the more rigorous core algorithm allows for long-term predictions, fault growth estimation with confidence bounds and remaining useful life (RUL) estimation after a fault is detected. The leading contributions of this thesis are (a) the development of a novel Bayesian Anomaly Detector for efficient and reliable Fault Detection and Identification (FDI) based on Least Squares Support Vector Machines, (b) the development of a data-driven real-time architecture for long-term Failure Prognosis using Least Squares Support Vector Machines, (c) Uncertainty representation and management using Bayesian Inference for posterior distribution estimation and hyper-parameter tuning, and finally (d) the statistical characterization of the performance of diagnosis and prognosis algorithms in order to relate the efficiency and reliability of the proposed schemes.

  19. Novel Hybrid of LS-SVM and Kalman Filter for GPS/INS Integration

    NASA Astrophysics Data System (ADS)

    Xu, Zhenkai; Li, Yong; Rizos, Chris; Xu, Xiaosu

    Integration of Global Positioning System (GPS) and Inertial Navigation System (INS) technologies can overcome the drawbacks of the individual systems. One of the advantages is that the integrated solution can provide continuous navigation capability even during GPS outages. However, bridging the GPS outages is still a challenge when Micro-Electro-Mechanical System (MEMS) inertial sensors are used. Methods being currently explored by the research community include applying vehicle motion constraints, optimal smoother, and artificial intelligence (AI) techniques. In the research area of AI, the neural network (NN) approach has been extensively utilised up to the present. In an NN-based integrated system, a Kalman filter (KF) estimates position, velocity and attitude errors, as well as the inertial sensor errors, to output navigation solutions while GPS signals are available. At the same time, an NN is trained to map the vehicle dynamics with corresponding KF states, and to correct INS measurements when GPS measurements are unavailable. To achieve good performance it is critical to select suitable quality and an optimal number of samples for the NN. This is sometimes too rigorous a requirement which limits real world application of NN-based methods.The support vector machine (SVM) approach is based on the structural risk minimisation principle, instead of the minimised empirical error principle that is commonly implemented in an NN. The SVM can avoid local minimisation and over-fitting problems in an NN, and therefore potentially can achieve a higher level of global performance. This paper focuses on the least squares support vector machine (LS-SVM), which can solve highly nonlinear and noisy black-box modelling problems. This paper explores the application of the LS-SVM to aid the GPS/INS integrated system, especially during GPS outages. The paper describes the principles of the LS-SVM and of the KF hybrid method, and introduces the LS-SVM regression algorithm. Field test data is processed to evaluate the performance of the proposed approach.

  20. Testing of the Support Vector Machine for Binary-Class Classification

    NASA Technical Reports Server (NTRS)

    Scholten, Matthew

    2011-01-01

    The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results

  1. SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.

    PubMed

    Pirooznia, Mehdi; Deng, Youping

    2006-12-12

    Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.

  2. Working set selection using functional gain for LS-SVM.

    PubMed

    Bo, Liefeng; Jiao, Licheng; Wang, Ling

    2007-09-01

    The efficiency of sequential minimal optimization (SMO) depends strongly on the working set selection. This letter shows how the improvement of SMO in each iteration, named the functional gain (FG), is used to select the working set for least squares support vector machine (LS-SVM). We prove the convergence of the proposed method and give some theoretical support for its performance. Empirical comparisons demonstrate that our method is superior to the maximum violating pair (MVP) working set selection.

  3. Spectral feature extraction of EEG signals and pattern recognition during mental tasks of 2-D cursor movements for BCI using SVM and ANN.

    PubMed

    Bascil, M Serdar; Tesneli, Ahmet Y; Temurtas, Feyzullah

    2016-09-01

    Brain computer interface (BCI) is a new communication way between man and machine. It identifies mental task patterns stored in electroencephalogram (EEG). So, it extracts brain electrical activities recorded by EEG and transforms them machine control commands. The main goal of BCI is to make available assistive environmental devices for paralyzed people such as computers and makes their life easier. This study deals with feature extraction and mental task pattern recognition on 2-D cursor control from EEG as offline analysis approach. The hemispherical power density changes are computed and compared on alpha-beta frequency bands with only mental imagination of cursor movements. First of all, power spectral density (PSD) features of EEG signals are extracted and high dimensional data reduced by principle component analysis (PCA) and independent component analysis (ICA) which are statistical algorithms. In the last stage, all features are classified with two types of support vector machine (SVM) which are linear and least squares (LS-SVM) and three different artificial neural network (ANN) structures which are learning vector quantization (LVQ), multilayer neural network (MLNN) and probabilistic neural network (PNN) and mental task patterns are successfully identified via k-fold cross validation technique.

  4. Design of a multiple kernel learning algorithm for LS-SVM by convex programming.

    PubMed

    Jian, Ling; Xia, Zhonghang; Liang, Xijun; Gao, Chuanhou

    2011-06-01

    As a kernel based method, the performance of least squares support vector machine (LS-SVM) depends on the selection of the kernel as well as the regularization parameter (Duan, Keerthi, & Poo, 2003). Cross-validation is efficient in selecting a single kernel and the regularization parameter; however, it suffers from heavy computational cost and is not flexible to deal with multiple kernels. In this paper, we address the issue of multiple kernel learning for LS-SVM by formulating it as semidefinite programming (SDP). Furthermore, we show that the regularization parameter can be optimized in a unified framework with the kernel, which leads to an automatic process for model selection. Extensive experimental validations are performed and analyzed. Copyright © 2011 Elsevier Ltd. All rights reserved.

  5. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations

    PubMed Central

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862

  6. Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.

    PubMed

    Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

    2015-01-01

    Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.

  7. An effective parameter optimization technique for vibration flow field characterization of PP melts via LS-SVM combined with SALS in an electromagnetism dynamic extruder

    NASA Astrophysics Data System (ADS)

    Xian, Guangming

    2018-03-01

    A method for predicting the optimal vibration field parameters by least square support vector machine (LS-SVM) is presented in this paper. One convenient and commonly used technique for characterizing the the vibration flow field of polymer melts films is small angle light scattering (SALS) in a visualized slit die of the electromagnetism dynamic extruder. The optimal value of vibration vibration frequency, vibration amplitude, and the maximum light intensity projection area can be obtained by using LS-SVM for prediction. For illustrating this method and show its validity, the flowing material is used with polypropylene (PP) and fifteen samples are tested at the rotation speed of screw at 36rpm. This paper first describes the apparatus of SALS to perform the experiments, then gives the theoretical basis of this new method, and detail the experimental results for parameter prediction of vibration flow field. It is demonstrated that it is possible to use the method of SALS and obtain detailed information on optimal parameter of vibration flow field of PP melts by LS-SVM.

  8. Lamb Wave Damage Quantification Using GA-Based LS-SVM.

    PubMed

    Sun, Fuqiang; Wang, Ning; He, Jingjing; Guan, Xuefei; Yang, Jinsong

    2017-06-12

    Lamb waves have been reported to be an efficient tool for non-destructive evaluations (NDE) for various application scenarios. However, accurate and reliable damage quantification using the Lamb wave method is still a practical challenge, due to the complex underlying mechanism of Lamb wave propagation and damage detection. This paper presents a Lamb wave damage quantification method using a least square support vector machine (LS-SVM) and a genetic algorithm (GA). Three damage sensitive features, namely, normalized amplitude, phase change, and correlation coefficient, were proposed to describe changes of Lamb wave characteristics caused by damage. In view of commonly used data-driven methods, the GA-based LS-SVM model using the proposed three damage sensitive features was implemented to evaluate the crack size. The GA method was adopted to optimize the model parameters. The results of GA-based LS-SVM were validated using coupon test data and lap joint component test data with naturally developed fatigue cracks. Cases of different loading and manufacturer were also included to further verify the robustness of the proposed method for crack quantification.

  9. Lamb Wave Damage Quantification Using GA-Based LS-SVM

    PubMed Central

    Sun, Fuqiang; Wang, Ning; He, Jingjing; Guan, Xuefei; Yang, Jinsong

    2017-01-01

    Lamb waves have been reported to be an efficient tool for non-destructive evaluations (NDE) for various application scenarios. However, accurate and reliable damage quantification using the Lamb wave method is still a practical challenge, due to the complex underlying mechanism of Lamb wave propagation and damage detection. This paper presents a Lamb wave damage quantification method using a least square support vector machine (LS-SVM) and a genetic algorithm (GA). Three damage sensitive features, namely, normalized amplitude, phase change, and correlation coefficient, were proposed to describe changes of Lamb wave characteristics caused by damage. In view of commonly used data-driven methods, the GA-based LS-SVM model using the proposed three damage sensitive features was implemented to evaluate the crack size. The GA method was adopted to optimize the model parameters. The results of GA-based LS-SVM were validated using coupon test data and lap joint component test data with naturally developed fatigue cracks. Cases of different loading and manufacturer were also included to further verify the robustness of the proposed method for crack quantification. PMID:28773003

  10. Automated Diagnosis of Glaucoma Using Empirical Wavelet Transform and Correntropy Features Extracted From Fundus Images.

    PubMed

    Maheshwari, Shishir; Pachori, Ram Bilas; Acharya, U Rajendra

    2017-05-01

    Glaucoma is an ocular disorder caused due to increased fluid pressure in the optic nerve. It damages the optic nerve and subsequently causes loss of vision. The available scanning methods are Heidelberg retinal tomography, scanning laser polarimetry, and optical coherence tomography. These methods are expensive and require experienced clinicians to use them. So, there is a need to diagnose glaucoma accurately with low cost. Hence, in this paper, we have presented a new methodology for an automated diagnosis of glaucoma using digital fundus images based on empirical wavelet transform (EWT). The EWT is used to decompose the image, and correntropy features are obtained from decomposed EWT components. These extracted features are ranked based on t value feature selection algorithm. Then, these features are used for the classification of normal and glaucoma images using least-squares support vector machine (LS-SVM) classifier. The LS-SVM is employed for classification with radial basis function, Morlet wavelet, and Mexican-hat wavelet kernels. The classification accuracy of the proposed method is 98.33% and 96.67% using threefold and tenfold cross validation, respectively.

  11. [Rapid determination of COD in aquaculture water based on LS-SVM with ultraviolet/visible spectroscopy].

    PubMed

    Liu, Xue-Mei; Zhang, Hai-Liang

    2014-10-01

    Ultraviolet/visible (UV/Vis) spectroscopy was studied for the rapid determination of chemical oxygen demand (COD), which was an indicator to measure the concentration of organic matter in aquaculture water. In order to reduce the influence of the absolute noises of the spectra, the extracted 135 absorbance spectra were preprocessed by Savitzky-Golay smoothing (SG), EMD, and wavelet transform (WT) methods. The preprocessed spectra were then used to select latent variables (LVs) by partial least squares (PLS) methods. Partial least squares (PLS) was used to build models with the full spectra, and back- propagation neural network (BPNN) and least square support vector machine (LS-SVM) were applied to build models with the selected LVs. The overall results showed that BPNN and LS-SVM models performed better than PLS models, and the LS-SVM models with LVs based on WT preprocessed spectra obtained the best results with the determination coefficient (r2) and RMSE being 0. 83 and 14. 78 mg · L(-1) for calibration set, and 0.82 and 14.82 mg · L(-1) for the prediction set respectively. The method showed the best performance in LS-SVM model. The results indicated that it was feasible to use UV/Vis with LVs which were obtained by PLS method, combined with LS-SVM calibration could be applied to the rapid and accurate determination of COD in aquaculture water. Moreover, this study laid the foundation for further implementation of online analysis of aquaculture water and rapid determination of other water quality parameters.

  12. Eddy current characterization of small cracks using least square support vector machine

    NASA Astrophysics Data System (ADS)

    Chelabi, M.; Hacib, T.; Le Bihan, Y.; Ikhlef, N.; Boughedda, H.; Mekideche, M. R.

    2016-04-01

    Eddy current (EC) sensors are used for non-destructive testing since they are able to probe conductive materials. Despite being a conventional technique for defect detection and localization, the main weakness of this technique is that defect characterization, of the exact determination of the shape and dimension, is still a question to be answered. In this work, we demonstrate the capability of small crack sizing using signals acquired from an EC sensor. We report our effort to develop a systematic approach to estimate the size of rectangular and thin defects (length and depth) in a conductive plate. The achieved approach by the novel combination of a finite element method (FEM) with a statistical learning method is called least square support vector machines (LS-SVM). First, we use the FEM to design the forward problem. Next, an algorithm is used to find an adaptive database. Finally, the LS-SVM is used to solve the inverse problems, creating polynomial functions able to approximate the correlation between the crack dimension and the signal picked up from the EC sensor. Several methods are used to find the parameters of the LS-SVM. In this study, the particle swarm optimization (PSO) and genetic algorithm (GA) are proposed for tuning the LS-SVM. The results of the design and the inversions were compared to both simulated and experimental data, with accuracy experimentally verified. These suggested results prove the applicability of the presented approach.

  13. Application of machine learning on brain cancer multiclass classification

    NASA Astrophysics Data System (ADS)

    Panca, V.; Rustam, Z.

    2017-07-01

    Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.

  14. [Identification of Pummelo Cultivars Based on Hyperspectral Imaging Technology].

    PubMed

    Li, Xun-lan; Yi, Shi-lai; He, Shao-lan; Lü, Qiang; Xie, Rang-jin; Zheng, Yong-qiang; Deng, Lie

    2015-09-01

    Existing methods for the identification of pummelo cultivars are usually time-consuming and costly, and are therefore inconvenient to be used in cases that a rapid identification is needed. This research was aimed at identifying different pummelo cultivars by hyperspectral imaging technology which can achieve a rapid and highly sensitive measurement. A total of 240 leaf samples, 60 for each of the four cultivars were investigated. Samples were divided into two groups such as calibration set (48 samples of each cultivar) and validation set (12 samples of each cultivar) by a Kennard-Stone-based algorithm. Hyperspectral images of both adaxial and abaxial surfaces of each leaf were obtained, and were segmented into a region of interest (ROI) using a simple threshold. Spectra of leaf samples were extracted from ROI. To remove the absolute noises of the spectra, only the date of spectral range 400~1000 nm was used for analysis. Multiplicative scatter correction (MSC) and standard normal variable (SNV) were utilized for data preprocessing. Principal component analysis (PCA) was used to extract the best principal components, and successive projections algorithm (SPA) was used to extract the effective wavelengths. Least squares support vector machine (LS-SVM) was used to obtain the discrimination model of the four different pummelo cultivars. To find out the optimal values of σ2 and γ which were important parameters in LS-SVM modeling, Grid-search technique and Cross-Validation were applied. The first 10 and 11 principal components were extracted by PCA for the hyperspectral data of adaxial surface and abaxial surface, respectively. There were 31 and 21 effective wavelengths selected by SPA based on the hyperspectral data of adaxial surface and abaxial surface, respectively. The best principal components and the effective wavelengths were used as inputs of LS-SVM models, and then the PCA-LS-SVM model and the SPA-LS-SVM model were built. The results showed that 99.46% and 98.44% of identification accuracy was achieved in the calibration set for the PCA-LS-SVM model and the SPA-LS-SVM model, respectively, and a 95.83% of identification accuracy was achieved in the validation set for both the PCA-LS-SVM and the SPA- LS-SVM models, which were built based on the hyperspectral data of adaxial surface. Comparatively, the results of the PCA-LS-SVM and the SPA-LS-SVM models built based on the hyperspectral data of abaxial surface both achieved identification accuracies of 100% for both calibration set and validation set. The overall results demonstrated that use of hyperspectral data of adaxial and abaxial leaf surfaces coupled with the use of PCA-LS-SVM and the SPA-LS-SVM could achieve an accurate identification of pummelo cultivars. It was feasible to use hyperspectral imaging technology to identify different pummelo cultivars, and hyperspectral imaging technology provided an alternate way of rapid identification of pummelo cultivars. Moreover, the results in this paper demonstrated that the data from the abaxial surface of leaf was more sensitive in identifying pummelo cultivars. This study provided a new method for to the fast discrimination of pummelo cultivars.

  15. Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data.

    PubMed

    Balabin, Roman M; Lomakina, Ekaterina I

    2011-04-21

    In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.

  16. A method of neighbor classes based SVM classification for optical printed Chinese character recognition.

    PubMed

    Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng

    2013-01-01

    In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR.

  17. An online semi-supervised brain-computer interface.

    PubMed

    Gu, Zhenghui; Yu, Zhuliang; Shen, Zhifang; Li, Yuanqing

    2013-09-01

    Practical brain-computer interface (BCI) systems should require only low training effort for the user, and the algorithms used to classify the intent of the user should be computationally efficient. However, due to inter- and intra-subject variations in EEG signal, intermittent training/calibration is often unavoidable. In this paper, we present an online semi-supervised P300 BCI speller system. After a short initial training (around or less than 1 min in our experiments), the system is switched to a mode where the user can input characters through selective attention. In this mode, a self-training least squares support vector machine (LS-SVM) classifier is gradually enhanced in back end with the unlabeled EEG data collected online after every character input. In this way, the classifier is gradually enhanced. Even though the user may experience some errors in input at the beginning due to the small initial training dataset, the accuracy approaches that of fully supervised method in a few minutes. The algorithm based on LS-SVM and its sequential update has low computational complexity; thus, it is suitable for online applications. The effectiveness of the algorithm has been validated through data analysis on BCI Competition III dataset II (P300 speller BCI data). The performance of the online system was evaluated through experimental results on eight healthy subjects, where all of them achieved the spelling accuracy of 85 % or above within an average online semi-supervised learning time of around 3 min.

  18. [Determination of soluble solids content in Nanfeng Mandarin by Vis/NIR spectroscopy and UVE-ICA-LS-SVM].

    PubMed

    Sun, Tong; Xu, Wen-Li; Hu, Tian; Liu, Mu-Hua

    2013-12-01

    The objective of the present research was to assess soluble solids content (SSC) of Nanfeng mandarin by visible/near infrared (Vis/NIR) spectroscopy combined with new variable selection method, simplify prediction model and improve the performance of prediction model for SSC of Nanfeng mandarin. A total of 300 Nanfeng mandarin samples were used, the numbers of Nanfeng mandarin samples in calibration, validation and prediction sets were 150, 75 and 75, respectively. Vis/NIR spectra of Nanfeng mandarin samples were acquired by a QualitySpec spectrometer in the wavelength range of 350-1000 nm. Uninformative variables elimination (UVE) was used to eliminate wavelength variables that had few information of SSC, then independent component analysis (ICA) was used to extract independent components (ICs) from spectra that eliminated uninformative wavelength variables. At last, least squares support vector machine (LS-SVM) was used to develop calibration models for SSC of Nanfeng mandarin using extracted ICs, and 75 prediction samples that had not been used for model development were used to evaluate the performance of SSC model of Nanfeng mandarin. The results indicate t hat Vis/NIR spectroscopy combinedwith UVE-ICA-LS-SVM is suitable for assessing SSC o f Nanfeng mandarin, and t he precision o f prediction ishigh. UVE--ICA is an effective method to eliminate uninformative wavelength variables, extract important spectral information, simplify prediction model and improve the performance of prediction model. The SSC model developed by UVE-ICA-LS-SVM is superior to that developed by PLS, PCA-LS-SVM or ICA-LS-SVM, and the coefficient of determination and root mean square error in calibration, validation and prediction sets were 0.978, 0.230%, 0.965, 0.301% and 0.967, 0.292%, respectively.

  19. SVM and SVM Ensembles in Breast Cancer Prediction.

    PubMed

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.

  20. SVM and SVM Ensembles in Breast Cancer Prediction

    PubMed Central

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807

  1. Prediction of BP reactivity to talking using hybrid soft computing approaches.

    PubMed

    Kaur, Gurmanik; Arora, Ajat Shatru; Jain, Vijender Kumar

    2014-01-01

    High blood pressure (BP) is associated with an increased risk of cardiovascular diseases. Therefore, optimal precision in measurement of BP is appropriate in clinical and research studies. In this work, anthropometric characteristics including age, height, weight, body mass index (BMI), and arm circumference (AC) were used as independent predictor variables for the prediction of BP reactivity to talking. Principal component analysis (PCA) was fused with artificial neural network (ANN), adaptive neurofuzzy inference system (ANFIS), and least square-support vector machine (LS-SVM) model to remove the multicollinearity effect among anthropometric predictor variables. The statistical tests in terms of coefficient of determination (R (2)), root mean square error (RMSE), and mean absolute percentage error (MAPE) revealed that PCA based LS-SVM (PCA-LS-SVM) model produced a more efficient prediction of BP reactivity as compared to other models. This assessment presents the importance and advantages posed by PCA fused prediction models for prediction of biological variables.

  2. A Method of Neighbor Classes Based SVM Classification for Optical Printed Chinese Character Recognition

    PubMed Central

    Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng

    2013-01-01

    In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR. PMID:23536777

  3. Multi-view L2-SVM and its multi-view core vector machine.

    PubMed

    Huang, Chengquan; Chung, Fu-lai; Wang, Shitong

    2016-03-01

    In this paper, a novel L2-SVM based classifier Multi-view L2-SVM is proposed to address multi-view classification tasks. The proposed Multi-view L2-SVM classifier does not have any bias in its objective function and hence has the flexibility like μ-SVC in the sense that the number of the yielded support vectors can be controlled by a pre-specified parameter. The proposed Multi-view L2-SVM classifier can make full use of the coherence and the difference of different views through imposing the consensus among multiple views to improve the overall classification performance. Besides, based on the generalized core vector machine GCVM, the proposed Multi-view L2-SVM classifier is extended into its GCVM version MvCVM which can realize its fast training on large scale multi-view datasets, with its asymptotic linear time complexity with the sample size and its space complexity independent of the sample size. Our experimental results demonstrated the effectiveness of the proposed Multi-view L2-SVM classifier for small scale multi-view datasets and the proposed MvCVM classifier for large scale multi-view datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. A Hybrid Hierarchical Approach for Brain Tissue Segmentation by Combining Brain Atlas and Least Square Support Vector Machine

    PubMed Central

    Kasiri, Keyvan; Kazemi, Kamran; Dehghani, Mohammad Javad; Helfroush, Mohammad Sadegh

    2013-01-01

    In this paper, we present a new semi-automatic brain tissue segmentation method based on a hybrid hierarchical approach that combines a brain atlas as a priori information and a least-square support vector machine (LS-SVM). The method consists of three steps. In the first two steps, the skull is removed and the cerebrospinal fluid (CSF) is extracted. These two steps are performed using the toolbox FMRIB's automated segmentation tool integrated in the FSL software (FSL-FAST) developed in Oxford Centre for functional MRI of the brain (FMRIB). Then, in the third step, the LS-SVM is used to segment grey matter (GM) and white matter (WM). The training samples for LS-SVM are selected from the registered brain atlas. The voxel intensities and spatial positions are selected as the two feature groups for training and test. SVM as a powerful discriminator is able to handle nonlinear classification problems; however, it cannot provide posterior probability. Thus, we use a sigmoid function to map the SVM output into probabilities. The proposed method is used to segment CSF, GM and WM from the simulated magnetic resonance imaging (MRI) using Brainweb MRI simulator and real data provided by Internet Brain Segmentation Repository. The semi-automatically segmented brain tissues were evaluated by comparing to the corresponding ground truth. The Dice and Jaccard similarity coefficients, sensitivity and specificity were calculated for the quantitative validation of the results. The quantitative results show that the proposed method segments brain tissues accurately with respect to corresponding ground truth. PMID:24696800

  5. The construction of support vector machine classifier using the firefly algorithm.

    PubMed

    Chao, Chih-Feng; Horng, Ming-Huwi

    2015-01-01

    The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.

  6. The Construction of Support Vector Machine Classifier Using the Firefly Algorithm

    PubMed Central

    Chao, Chih-Feng; Horng, Ming-Huwi

    2015-01-01

    The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy. PMID:25802511

  7. Classification of brain tumours using short echo time 1H MR spectra

    NASA Astrophysics Data System (ADS)

    Devos, A.; Lukas, L.; Suykens, J. A. K.; Vanhamme, L.; Tate, A. R.; Howe, F. A.; Majós, C.; Moreno-Torres, A.; van der Graaf, M.; Arús, C.; Van Huffel, S.

    2004-09-01

    The purpose was to objectively compare the application of several techniques and the use of several input features for brain tumour classification using Magnetic Resonance Spectroscopy (MRS). Short echo time 1H MRS signals from patients with glioblastomas ( n = 87), meningiomas ( n = 57), metastases ( n = 39), and astrocytomas grade II ( n = 22) were provided by six centres in the European Union funded INTERPRET project. Linear discriminant analysis, least squares support vector machines (LS-SVM) with a linear kernel and LS-SVM with radial basis function kernel were applied and evaluated over 100 stratified random splittings of the dataset into training and test sets. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of binary classifiers, while the percentage of correct classifications was used to evaluate the multiclass classifiers. The influence of several factors on the classification performance has been tested: L2- vs. water normalization, magnitude vs. real spectra and baseline correction. The effect of input feature reduction was also investigated by using only the selected frequency regions containing the most discriminatory information, and peak integrated values. Using L2-normalized complete spectra the automated binary classifiers reached a mean test AUC of more than 0.95, except for glioblastomas vs. metastases. Similar results were obtained for all classification techniques and input features except for water normalized spectra, where classification performance was lower. This indicates that data acquisition and processing can be simplified for classification purposes, excluding the need for separate water signal acquisition, baseline correction or phasing.

  8. New bandwidth selection criterion for Kernel PCA: approach to dimensionality reduction and classification problems.

    PubMed

    Thomas, Minta; De Brabanter, Kris; De Moor, Bart

    2014-05-10

    DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies. We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity.

  9. The application of continuous wavelet transform and least squares support vector machine for the simultaneous quantitative spectrophotometric determination of Myricetin, Kaempferol and Quercetin as flavonoids in pharmaceutical plants

    NASA Astrophysics Data System (ADS)

    Sohrabi, Mahmoud Reza; Darabi, Golnaz

    2016-01-01

    Flavonoids are γ-benzopyrone derivatives, which are highly regarded in these researchers for their antioxidant property. In this study, two new signals processing methods been coupled with UV spectroscopy for spectral resolution and simultaneous quantitative determination of Myricetin, Kaempferol and Quercetin as flavonoids in Laurel, St. John's Wort and Green Tea without the need for any previous separation procedure. The developed methods are continuous wavelet transform (CWT) and least squares support vector machine (LS-SVM) methods integrated with UV spectroscopy individually. Different wavelet families were tested by CWT method and finally the Daubechies wavelet family (Db4) for Myricetin and the Gaussian wavelet families for Kaempferol (Gaus3) and Quercetin (Gaus7) were selected and applied for simultaneous analysis under the optimal conditions. The LS-SVM was applied to build the flavonoids prediction model based on absorption spectra. The root mean square errors for prediction (RMSEP) of Myricetin, Kaempferol and Quercetin were 0.0552, 0.0275 and 0.0374, respectively. The developed methods were validated by the analysis of the various synthetic mixtures associated with a well- known flavonoid contents. Mean recovery values of Myricetin, Kaempferol and Quercetin, in CWT method were 100.123, 100.253, 100.439 and in LS-SVM method were 99.94, 99.81 and 99.682, respectively. The results achieved by analyzing the real samples from the CWT and LS-SVM methods were compared to the HPLC reference method and the results were very close to the reference method. Meanwhile, the obtained results of the one-way ANOVA (analysis of variance) test revealed that there was no significant difference between the suggested methods.

  10. The application of continuous wavelet transform and least squares support vector machine for the simultaneous quantitative spectrophotometric determination of Myricetin, Kaempferol and Quercetin as flavonoids in pharmaceutical plants.

    PubMed

    Sohrabi, Mahmoud Reza; Darabi, Golnaz

    2016-01-05

    Flavonoids are γ-benzopyrone derivatives, which are highly regarded in these researchers for their antioxidant property. In this study, two new signals processing methods been coupled with UV spectroscopy for spectral resolution and simultaneous quantitative determination of Myricetin, Kaempferol and Quercetin as flavonoids in Laurel, St. John's Wort and Green Tea without the need for any previous separation procedure. The developed methods are continuous wavelet transform (CWT) and least squares support vector machine (LS-SVM) methods integrated with UV spectroscopy individually. Different wavelet families were tested by CWT method and finally the Daubechies wavelet family (Db4) for Myricetin and the Gaussian wavelet families for Kaempferol (Gaus3) and Quercetin (Gaus7) were selected and applied for simultaneous analysis under the optimal conditions. The LS-SVM was applied to build the flavonoids prediction model based on absorption spectra. The root mean square errors for prediction (RMSEP) of Myricetin, Kaempferol and Quercetin were 0.0552, 0.0275 and 0.0374, respectively. The developed methods were validated by the analysis of the various synthetic mixtures associated with a well- known flavonoid contents. Mean recovery values of Myricetin, Kaempferol and Quercetin, in CWT method were 100.123, 100.253, 100.439 and in LS-SVM method were 99.94, 99.81 and 99.682, respectively. The results achieved by analyzing the real samples from the CWT and LS-SVM methods were compared to the HPLC reference method and the results were very close to the reference method. Meanwhile, the obtained results of the one-way ANOVA (analysis of variance) test revealed that there was no significant difference between the suggested methods. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Prediction of pH of cola beverage using Vis/NIR spectroscopy and least squares-support vector machine

    NASA Astrophysics Data System (ADS)

    Liu, Fei; He, Yong

    2008-02-01

    Visible and near infrared (Vis/NIR) transmission spectroscopy and chemometric methods were utilized to predict the pH values of cola beverages. Five varieties of cola were prepared and 225 samples (45 samples for each variety) were selected for the calibration set, while 75 samples (15 samples for each variety) for the validation set. The smoothing way of Savitzky-Golay and standard normal variate (SNV) followed by first-derivative were used as the pre-processing methods. Partial least squares (PLS) analysis was employed to extract the principal components (PCs) which were used as the inputs of least squares-support vector machine (LS-SVM) model according to their accumulative reliabilities. Then LS-SVM with radial basis function (RBF) kernel function and a two-step grid search technique were applied to build the regression model with a comparison of PLS regression. The correlation coefficient (r), root mean square error of prediction (RMSEP) and bias were 0.961, 0.040 and 0.012 for PLS, while 0.975, 0.031 and 4.697x10 -3 for LS-SVM, respectively. Both methods obtained a satisfying precision. The results indicated that Vis/NIR spectroscopy combined with chemometric methods could be applied as an alternative way for the prediction of pH of cola beverages.

  12. Relevance Vector Machine and Support Vector Machine Classifier Analysis of Scanning Laser Polarimetry Retinal Nerve Fiber Layer Measurements

    PubMed Central

    Bowd, Christopher; Medeiros, Felipe A.; Zhang, Zuohua; Zangwill, Linda M.; Hao, Jiucang; Lee, Te-Won; Sejnowski, Terrence J.; Weinreb, Robert N.; Goldbaum, Michael H.

    2010-01-01

    Purpose To classify healthy and glaucomatous eyes using relevance vector machine (RVM) and support vector machine (SVM) learning classifiers trained on retinal nerve fiber layer (RNFL) thickness measurements obtained by scanning laser polarimetry (SLP). Methods Seventy-two eyes of 72 healthy control subjects (average age = 64.3 ± 8.8 years, visual field mean deviation =−0.71 ± 1.2 dB) and 92 eyes of 92 patients with glaucoma (average age = 66.9 ± 8.9 years, visual field mean deviation =−5.32 ± 4.0 dB) were imaged with SLP with variable corneal compensation (GDx VCC; Laser Diagnostic Technologies, San Diego, CA). RVM and SVM learning classifiers were trained and tested on SLP-determined RNFL thickness measurements from 14 standard parameters and 64 sectors (approximately 5.6° each) obtained in the circumpapillary area under the instrument-defined measurement ellipse (total 78 parameters). Tenfold cross-validation was used to train and test RVM and SVM classifiers on unique subsets of the full 164-eye data set and areas under the receiver operating characteristic (AUROC) curve for the classification of eyes in the test set were generated. AUROC curve results from RVM and SVM were compared to those for 14 SLP software-generated global and regional RNFL thickness parameters. Also reported was the AUROC curve for the GDx VCC software-generated nerve fiber indicator (NFI). Results The AUROC curves for RVM and SVM were 0.90 and 0.91, respectively, and increased to 0.93 and 0.94 when the training sets were optimized with sequential forward and backward selection (resulting in reduced dimensional data sets). AUROC curves for optimized RVM and SVM were significantly larger than those for all individual SLP parameters. The AUROC curve for the NFI was 0.87. Conclusions Results from RVM and SVM trained on SLP RNFL thickness measurements are similar and provide accurate classification of glaucomatous and healthy eyes. RVM may be preferable to SVM, because it provides a Bayesian-derived probability of glaucoma as an output. These results suggest that these machine learning classifiers show good potential for glaucoma diagnosis. PMID:15790898

  13. Determination of Hemicellulose, Cellulose and Lignin in Moso Bamboo by Near Infrared Spectroscopy

    PubMed Central

    Li, Xiaoli; Sun, Chanjun; Zhou, Binxiong; He, Yong

    2015-01-01

    The contents of hemicellulose, cellulose and lignin are important for moso bamboo processing in biomass energy industry. The feasibility of using near infrared (NIR) spectroscopy for rapid determination of hemicellulose, cellulose and lignin was investigated in this study. Initially, the linear relationship between bamboo components and their NIR spectroscopy was established. Subsequently, successive projections algorithm (SPA) was used to detect characteristic wavelengths for establishing the convenient models. For hemicellulose, cellulose and lignin, 22, 22 and 20 characteristic wavelengths were obtained, respectively. Nonlinear determination models were subsequently built by an artificial neural network (ANN) and a least-squares support vector machine (LS-SVM) based on characteristic wavelengths. The LS-SVM models for predicting hemicellulose, cellulose and lignin all obtained excellent results with high determination coefficients of 0.921, 0.909 and 0.892 respectively. These results demonstrated that NIR spectroscopy combined with SPA-LS-SVM is a useful, nondestructive tool for the determinations of hemicellulose, cellulose and lignin in moso bamboo. PMID:26601657

  14. [Determination of calcium and magnesium in tobacco by near-infrared spectroscopy and least squares-support vector machine].

    PubMed

    Tian, Kuang-da; Qiu, Kai-xian; Li, Zu-hong; Lü, Ya-qiong; Zhang, Qiu-ju; Xiong, Yan-mei; Min, Shun-geng

    2014-12-01

    The purpose of the present paper is to determine calcium and magnesium in tobacco using NIR combined with least squares-support vector machine (LS-SVM). Five hundred ground and dried tobacco samples from Qujing city, Yunnan province, China, were surveyed by a MATRIX-I spectrometer (Bruker Optics, Bremen, Germany). At the beginning of data processing, outliers of samples were eliminated for stability of the model. The rest 487 samples were divided into several calibration sets and validation sets according to a hybrid modeling strategy. Monte-Carlo cross validation was used to choose the best spectral preprocess method from multiplicative scatter correction (MSC), standard normal variate transformation (SNV), S-G smoothing, 1st derivative, etc., and their combinations. To optimize parameters of LS-SVM model, the multilayer grid search and 10-fold cross validation were applied. The final LS-SVM models with the optimizing parameters were trained by the calibration set and accessed by 287 validation samples picked by Kennard-Stone method. For the quantitative model of calcium in tobacco, Savitzky-Golay FIR smoothing with frame size 21 showed the best performance. The regularization parameter λ of LS-SVM was e16.11, while the bandwidth of the RBF kernel σ2 was e8.42. The determination coefficient for prediction (Rc(2)) was 0.9755 and the determination coefficient for prediction (Rp(2)) was 0.9422, better than the performance of PLS model (Rc(2)=0.9593, Rp(2)=0.9344). For the quantitative analysis of magnesium, SNV made the regression model more precise than other preprocess. The optimized λ was e15.25 and σ2 was e6.32. Rc(2) and Rp(2) were 0.9961 and 0.9301, respectively, better than PLS model (Rc(2)=0.9716, Rp(2)=0.8924). After modeling, the whole progress of NIR scan and data analysis for one sample was within tens of seconds. The overall results show that NIR spectroscopy combined with LS-SVM can be efficiently utilized for rapid and accurate analysis of calcium and magnesium in tobacco.

  15. A PCA aided cross-covariance scheme for discriminative feature extraction from EEG signals.

    PubMed

    Zarei, Roozbeh; He, Jing; Siuly, Siuly; Zhang, Yanchun

    2017-07-01

    Feature extraction of EEG signals plays a significant role in Brain-computer interface (BCI) as it can significantly affect the performance and the computational time of the system. The main aim of the current work is to introduce an innovative algorithm for acquiring reliable discriminating features from EEG signals to improve classification performances and to reduce the time complexity. This study develops a robust feature extraction method combining the principal component analysis (PCA) and the cross-covariance technique (CCOV) for the extraction of discriminatory information from the mental states based on EEG signals in BCI applications. We apply the correlation based variable selection method with the best first search on the extracted features to identify the best feature set for characterizing the distribution of mental state signals. To verify the robustness of the proposed feature extraction method, three machine learning techniques: multilayer perceptron neural networks (MLP), least square support vector machine (LS-SVM), and logistic regression (LR) are employed on the obtained features. The proposed methods are evaluated on two publicly available datasets. Furthermore, we evaluate the performance of the proposed methods by comparing it with some recently reported algorithms. The experimental results show that all three classifiers achieve high performance (above 99% overall classification accuracy) for the proposed feature set. Among these classifiers, the MLP and LS-SVM methods yield the best performance for the obtained feature. The average sensitivity, specificity and classification accuracy for these two classifiers are same, which are 99.32%, 100%, and 99.66%, respectively for the BCI competition dataset IVa and 100%, 100%, and 100%, for the BCI competition dataset IVb. The results also indicate the proposed methods outperform the most recently reported methods by at least 0.25% average accuracy improvement in dataset IVa. The execution time results show that the proposed method has less time complexity after feature selection. The proposed feature extraction method is very effective for getting representatives information from mental states EEG signals in BCI applications and reducing the computational complexity of classifiers by reducing the number of extracted features. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Research on Classification of Chinese Text Data Based on SVM

    NASA Astrophysics Data System (ADS)

    Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao

    2017-09-01

    Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.

  17. Support vector machine as a binary classifier for automated object detection in remotely sensed data

    NASA Astrophysics Data System (ADS)

    Wardaya, P. D.

    2014-02-01

    In the present paper, author proposes the application of Support Vector Machine (SVM) for the analysis of satellite imagery. One of the advantages of SVM is that, with limited training data, it may generate comparable or even better results than the other methods. The SVM algorithm is used for automated object detection and characterization. Specifically, the SVM is applied in its basic nature as a binary classifier where it classifies two classes namely, object and background. The algorithm aims at effectively detecting an object from its background with the minimum training data. The synthetic image containing noises is used for algorithm testing. Furthermore, it is implemented to perform remote sensing image analysis such as identification of Island vegetation, water body, and oil spill from the satellite imagery. It is indicated that SVM provides the fast and accurate analysis with the acceptable result.

  18. Vision based nutrient deficiency classification in maize plants using multi class support vector machines

    NASA Astrophysics Data System (ADS)

    Leena, N.; Saju, K. K.

    2018-04-01

    Nutritional deficiencies in plants are a major concern for farmers as it affects productivity and thus profit. The work aims to classify nutritional deficiencies in maize plant in a non-destructive mannerusing image processing and machine learning techniques. The colored images of the leaves are analyzed and classified with multi-class support vector machine (SVM) method. Several images of maize leaves with known deficiencies like nitrogen, phosphorous and potassium (NPK) are used to train the SVM classifier prior to the classification of test images. The results show that the method was able to classify and identify nutritional deficiencies.

  19. [Identification of varieties of textile fibers by using Vis/NIR infrared spectroscopy technique].

    PubMed

    Wu, Gui-Fang; He, Yong

    2010-02-01

    The aim of the present paper was to provide new insight into Vis/NIR spectroscopic analysis of textile fibers. In order to achieve rapid identification of the varieties of fibers, the authors selected 5 kinds of fibers of cotton, flax, wool, silk and tencel to do a study with Vis/NIR spectroscopy. Firstly, the spectra of each kind of fiber were scanned by spectrometer, and principal component analysis (PCA) method was used to analyze the characteristics of the pattern of Vis/NIR spectra. Principal component scores scatter plot (PC1 x PC2 x PC3) of fiber indicated the classification effect of five varieties of fibers. The former 6 principal components (PCs) were selected according to the quantity and size of PCs. The PCA classification model was optimized by using the least-squares support vector machines (LS-SVM) method. The authors used the 6 PCs extracted by PCA as the inputs of LS-SVM, and PCA-LS-SVM model was built to achieve varieties validation as well as mathematical model building and optimization analysis. Two hundred samples (40 samples for each variety of fibers) of five varieties of fibers were used for calibration of PCA-LS-SVM model, and the other 50 samples (10 samples for each variety of fibers) were used for validation. The result of validation showed that Vis/NIR spectroscopy technique based on PCA-LS-SVM had a powerful classification capability. It provides a new method for identifying varieties of fibers rapidly and real time, so it has important significance for protecting the rights of consumers, ensuring the quality of textiles, and implementing rationalization production and transaction of textile materials and its production.

  20. Classifying low-grade and high-grade bladder cancer using label-free serum surface-enhanced Raman spectroscopy and support vector machine

    NASA Astrophysics Data System (ADS)

    Zhang, Yanjiao; Lai, Xiaoping; Zeng, Qiuyao; Li, Linfang; Lin, Lin; Li, Shaoxin; Liu, Zhiming; Su, Chengkang; Qi, Minni; Guo, Zhouyi

    2018-03-01

    This study aims to classify low-grade and high-grade bladder cancer (BC) patients using serum surface-enhanced Raman scattering (SERS) spectra and support vector machine (SVM) algorithms. Serum SERS spectra are acquired from 88 serum samples with silver nanoparticles as the SERS-active substrate. Diagnostic accuracies of 96.4% and 95.4% are obtained when differentiating the serum SERS spectra of all BC patients versus normal subjects and low-grade versus high-grade BC patients, respectively, with optimal SVM classifier models. This study demonstrates that the serum SERS technique combined with SVM has great potential to noninvasively detect and classify high-grade and low-grade BC patients.

  1. Hyperspectral Imaging for Predicting the Internal Quality of Kiwifruits Based on Variable Selection Algorithms and Chemometric Models.

    PubMed

    Zhu, Hongyan; Chu, Bingquan; Fan, Yangyang; Tao, Xiaoya; Yin, Wenxin; He, Yong

    2017-08-10

    We investigated the feasibility and potentiality of determining firmness, soluble solids content (SSC), and pH in kiwifruits using hyperspectral imaging, combined with variable selection methods and calibration models. The images were acquired by a push-broom hyperspectral reflectance imaging system covering two spectral ranges. Weighted regression coefficients (BW), successive projections algorithm (SPA) and genetic algorithm-partial least square (GAPLS) were compared and evaluated for the selection of effective wavelengths. Moreover, multiple linear regression (MLR), partial least squares regression and least squares support vector machine (LS-SVM) were developed to predict quality attributes quantitatively using effective wavelengths. The established models, particularly SPA-MLR, SPA-LS-SVM and GAPLS-LS-SVM, performed well. The SPA-MLR models for firmness (R pre  = 0.9812, RPD = 5.17) and SSC (R pre  = 0.9523, RPD = 3.26) at 380-1023 nm showed excellent performance, whereas GAPLS-LS-SVM was the optimal model at 874-1734 nm for predicting pH (R pre  = 0.9070, RPD = 2.60). Image processing algorithms were developed to transfer the predictive model in every pixel to generate prediction maps that visualize the spatial distribution of firmness and SSC. Hence, the results clearly demonstrated that hyperspectral imaging has the potential as a fast and non-invasive method to predict the quality attributes of kiwifruits.

  2. Discrimination of tomatoes bred by spaceflight mutagenesis using visible/near infrared spectroscopy and chemometrics

    NASA Astrophysics Data System (ADS)

    Shao, Yongni; Xie, Chuanqi; Jiang, Linjun; Shi, Jiahui; Zhu, Jiajin; He, Yong

    2015-04-01

    Visible/near infrared spectroscopy (Vis/NIR) based on sensitive wavelengths (SWs) and chemometrics was proposed to discriminate different tomatoes bred by spaceflight mutagenesis from their leafs or fruits (green or mature). The tomato breeds were mutant M1, M2 and their parent. Partial least squares (PLS) analysis and least squares-support vector machine (LS-SVM) were implemented for calibration models. PLS analysis was implemented for calibration models with different wavebands including the visible region (400-700 nm) and the near infrared region (700-1000 nm). The best PLS models were achieved in the visible region for the leaf and green fruit samples and in the near infrared region for the mature fruit samples. Furthermore, different latent variables (4-8 LVs for leafs, 5-9 LVs for green fruits, and 4-9 LVs for mature fruits) were used as inputs of LS-SVM to develop the LV-LS-SVM models with the grid search technique and radial basis function (RBF) kernel. The optimal LV-LS-SVM models were achieved with six LVs for the leaf samples, seven LVs for green fruits, and six LVs for mature fruits, respectively, and they outperformed the PLS models. Moreover, independent component analysis (ICA) was executed to select several SWs based on loading weights. The optimal LS-SVM model was achieved with SWs of 550-560 nm, 562-574 nm, 670-680 nm and 705-715 nm for the leaf samples; 548-556 nm, 559-564 nm, 678-685 nm and 962-974 nm for the green fruit samples; and 712-718 nm, 720-729 nm, 968-978 nm and 820-830 nm for the mature fruit samples. All of them had better performance than PLS and LV-LS-SVM, with the parameters of correlation coefficient (rp), root mean square error of prediction (RMSEP) and bias of 0.9792, 0.2632 and 0.0901 based on leaf discrimination, 0.9837, 0.2783 and 0.1758 based on green fruit discrimination, 0.9804, 0.2215 and -0.0035 based on mature fruit discrimination, respectively. The overall results indicated that ICA was an effective way for the selection of SWs, and the Vis/NIR combined with LS-SVM models had the capability to predict the different breeds (mutant M1, mutant M2 and their parent) of tomatoes from leafs and fruits.

  3. Non-destructive evaluation of bacteria-infected watermelon seeds using visible/near-infrared hyperspectral imaging.

    PubMed

    Lee, Hoonsoo; Kim, Moon S; Song, Yu-Rim; Oh, Chang-Sik; Lim, Hyoun-Sub; Lee, Wang-Hee; Kang, Jum-Soon; Cho, Byoung-Kwan

    2017-03-01

    There is a need to minimize economic damage by sorting infected seeds from healthy seeds before seeding. However, current methods of detecting infected seeds, such as seedling grow-out, enzyme-linked immunosorbent assays, the polymerase chain reaction (PCR) and the real-time PCR have a critical drawbacks in that they are time-consuming, labor-intensive and destructive procedures. The present study aimed to evaluate the potential of visible/near-infrared (Vis/NIR) hyperspectral imaging system for detecting bacteria-infected watermelon seeds. A hyperspectral Vis/NIR reflectance imaging system (spectral region of 400-1000 nm) was constructed to obtain hyperspectral reflectance images for 336 bacteria-infected watermelon seeds, which were then subjected to partial least square discriminant analysis (PLS-DA) and a least-squares support vector machine (LS-SVM) to classify bacteria-infected watermelon seeds from healthy watermelon seeds. The developed system detected bacteria-infected watermelon seeds with an accuracy > 90% (PLS-DA: 91.7%, LS-SVM: 90.5%), suggesting that the Vis/NIR hyperspectral imaging system is effective for quarantining bacteria-infected watermelon seeds. The results of the present study show that it is possible to use the Vis/NIR hyperspectral imaging system for detecting bacteria-infected watermelon seeds. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.

  4. Integrating support vector machines and random forests to classify crops in time series of Worldview-2 images

    NASA Astrophysics Data System (ADS)

    Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.

    2017-10-01

    Crop maps are essential inputs for the agricultural planning done at various governmental and agribusinesses agencies. Remote sensing offers timely and costs efficient technologies to identify and map crop types over large areas. Among the plethora of classification methods, Support Vector Machine (SVM) and Random Forest (RF) are widely used because of their proven performance. In this work, we study the synergic use of both methods by introducing a random forest kernel (RFK) in an SVM classifier. A time series of multispectral WorldView-2 images acquired over Mali (West Africa) in 2014 was used to develop our case study. Ground truth containing five common crop classes (cotton, maize, millet, peanut, and sorghum) were collected at 45 farms and used to train and test the classifiers. An SVM with the standard Radial Basis Function (RBF) kernel, a RF, and an SVM-RFK were trained and tested over 10 random training and test subsets generated from the ground data. Results show that the newly proposed SVM-RFK classifier can compete with both RF and SVM-RBF. The overall accuracies based on the spectral bands only are of 83, 82 and 83% respectively. Adding vegetation indices to the analysis result in the classification accuracy of 82, 81 and 84% for SVM-RFK, RF, and SVM-RBF respectively. Overall, it can be observed that the newly tested RFK can compete with SVM-RBF and RF classifiers in terms of classification accuracy.

  5. An Improved TA-SVM Method Without Matrix Inversion and Its Fast Implementation for Nonstationary Datasets.

    PubMed

    Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong

    2015-09-01

    Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.

  6. Machinery Bearing Fault Diagnosis Using Variational Mode Decomposition and Support Vector Machine as a Classifier

    NASA Astrophysics Data System (ADS)

    Rama Krishna, K.; Ramachandran, K. I.

    2018-02-01

    Crack propagation is a major cause of failure in rotating machines. It adversely affects the productivity, safety, and the machining quality. Hence, detecting the crack’s severity accurately is imperative for the predictive maintenance of such machines. Fault diagnosis is an established concept in identifying the faults, for observing the non-linear behaviour of the vibration signals at various operating conditions. In this work, we find the classification efficiencies for both original and the reconstructed vibrational signals. The reconstructed signals are obtained using Variational Mode Decomposition (VMD), by splitting the original signal into three intrinsic mode functional components and framing them accordingly. Feature extraction, feature selection and feature classification are the three phases in obtaining the classification efficiencies. All the statistical features from the original signals and reconstructed signals are found out in feature extraction process individually. A few statistical parameters are selected in feature selection process and are classified using the SVM classifier. The obtained results show the best parameters and appropriate kernel in SVM classifier for detecting the faults in bearings. Hence, we conclude that better results were obtained by VMD and SVM process over normal process using SVM. This is owing to denoising and filtering the raw vibrational signals.

  7. Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders.

    PubMed

    Subasi, Abdulhamit

    2013-06-01

    Support vector machine (SVM) is an extensively used machine learning method with many biomedical signal classification applications. In this study, a novel PSO-SVM model has been proposed that hybridized the particle swarm optimization (PSO) and SVM to improve the EMG signal classification accuracy. This optimization mechanism involves kernel parameter setting in the SVM training procedure, which significantly influences the classification accuracy. The experiments were conducted on the basis of EMG signal to classify into normal, neurogenic or myopathic. In the proposed method the EMG signals were decomposed into the frequency sub-bands using discrete wavelet transform (DWT) and a set of statistical features were extracted from these sub-bands to represent the distribution of wavelet coefficients. The obtained results obviously validate the superiority of the SVM method compared to conventional machine learning methods, and suggest that further significant enhancements in terms of classification accuracy can be achieved by the proposed PSO-SVM classification system. The PSO-SVM yielded an overall accuracy of 97.41% on 1200 EMG signals selected from 27 subject records against 96.75%, 95.17% and 94.08% for the SVM, the k-NN and the RBF classifiers, respectively. PSO-SVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of PSO-SVM for diagnosis of neuromuscular disorders. Copyright © 2013 Elsevier Ltd. All rights reserved.

  8. Patient classification as an outlier detection problem: An application of the One-Class Support Vector Machine

    PubMed Central

    Mourão-Miranda, Janaina; Hardoon, David R.; Hahn, Tim; Marquand, Andre F.; Williams, Steve C.R.; Shawe-Taylor, John; Brammer, Michael

    2011-01-01

    Pattern recognition approaches, such as the Support Vector Machine (SVM), have been successfully used to classify groups of individuals based on their patterns of brain activity or structure. However these approaches focus on finding group differences and are not applicable to situations where one is interested in accessing deviations from a specific class or population. In the present work we propose an application of the one-class SVM (OC-SVM) to investigate if patterns of fMRI response to sad facial expressions in depressed patients would be classified as outliers in relation to patterns of healthy control subjects. We defined features based on whole brain voxels and anatomical regions. In both cases we found a significant correlation between the OC-SVM predictions and the patients' Hamilton Rating Scale for Depression (HRSD), i.e. the more depressed the patients were the more of an outlier they were. In addition the OC-SVM split the patient groups into two subgroups whose membership was associated with future response to treatment. When applied to region-based features the OC-SVM classified 52% of patients as outliers. However among the patients classified as outliers 70% did not respond to treatment and among those classified as non-outliers 89% responded to treatment. In addition 89% of the healthy controls were classified as non-outliers. PMID:21723950

  9. Support vector machines-based fault diagnosis for turbo-pump rotor

    NASA Astrophysics Data System (ADS)

    Yuan, Sheng-Fa; Chu, Fu-Lei

    2006-05-01

    Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.

  10. Power line identification of millimeter wave radar based on PCA-GS-SVM

    NASA Astrophysics Data System (ADS)

    Fang, Fang; Zhang, Guifeng; Cheng, Yansheng

    2017-12-01

    Aiming at the problem that the existing detection method can not effectively solve the security of UAV's ultra low altitude flight caused by power line, a power line recognition method based on grid search (GS) and the principal component analysis and support vector machine (PCA-SVM) is proposed. Firstly, the candidate line of Hough transform is reduced by PCA, and the main feature of candidate line is extracted. Then, upport vector machine (SVM is) optimized by grid search method (GS). Finally, using support vector machine classifier optimized parameters to classify the candidate line. MATLAB simulation results show that this method can effectively identify the power line and noise, and has high recognition accuracy and algorithm efficiency.

  11. Multiclass Classification of Cardiac Arrhythmia Using Improved Feature Selection and SVM Invariants.

    PubMed

    Mustaqeem, Anam; Anwar, Syed Muhammad; Majid, Muahammad

    2018-01-01

    Arrhythmia is considered a life-threatening disease causing serious health issues in patients, when left untreated. An early diagnosis of arrhythmias would be helpful in saving lives. This study is conducted to classify patients into one of the sixteen subclasses, among which one class represents absence of disease and the other fifteen classes represent electrocardiogram records of various subtypes of arrhythmias. The research is carried out on the dataset taken from the University of California at Irvine Machine Learning Data Repository. The dataset contains a large volume of feature dimensions which are reduced using wrapper based feature selection technique. For multiclass classification, support vector machine (SVM) based approaches including one-against-one (OAO), one-against-all (OAA), and error-correction code (ECC) are employed to detect the presence and absence of arrhythmias. The SVM method results are compared with other standard machine learning classifiers using varying parameters and the performance of the classifiers is evaluated using accuracy, kappa statistics, and root mean square error. The results show that OAO method of SVM outperforms all other classifiers by achieving an accuracy rate of 81.11% when used with 80/20 data split and 92.07% using 90/10 data split option.

  12. Carbon Nanotube Growth Rate Regression using Support Vector Machines and Artificial Neural Networks

    DTIC Science & Technology

    2014-03-27

    intensity D peak. Reprinted with permission from [38]. The SVM classifier is trained using custom written Java code leveraging the Sequential Minimal...Society Encog is a machine learning framework for Java , C++ and .Net applications that supports Bayesian Networks, Hidden Markov Models, SVMs and ANNs [13...SVM classifiers are trained using Weka libraries and leveraging custom written Java code. The data set is created as an Attribute Relationship File

  13. Lamb wave based damage detection using Matching Pursuit and Support Vector Machine classifier

    NASA Astrophysics Data System (ADS)

    Agarwal, Sushant; Mitra, Mira

    2014-03-01

    In this paper, the suitability of using Matching Pursuit (MP) and Support Vector Machine (SVM) for damage detection using Lamb wave response of thin aluminium plate is explored. Lamb wave response of thin aluminium plate with or without damage is simulated using finite element. Simulations are carried out at different frequencies for various kinds of damage. The procedure is divided into two parts - signal processing and machine learning. Firstly, MP is used for denoising and to maintain the sparsity of the dataset. In this study, MP is extended by using a combination of time-frequency functions as the dictionary and is deployed in two stages. Selection of a particular type of atoms lead to extraction of important features while maintaining the sparsity of the waveform. The resultant waveform is then passed as input data for SVM classifier. SVM is used to detect the location of the potential damage from the reduced data. The study demonstrates that SVM is a robust classifier in presence of noise and more efficient as compared to Artificial Neural Network (ANN). Out-of-sample data is used for the validation of the trained and tested classifier. Trained classifiers are found successful in detection of the damage with more than 95% detection rate.

  14. Tuning support vector machines for minimax and Neyman-Pearson classification.

    PubMed

    Davenport, Mark A; Baraniuk, Richard G; Scott, Clayton D

    2010-10-01

    This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and Neyman-Pearson criteria. In principle, these criteria can be optimized in a straightforward way using a cost-sensitive SVM. In practice, however, because these criteria require especially accurate error estimation, standard techniques for tuning SVM parameters, such as cross-validation, can lead to poor classifier performance. To address this issue, we first prove that the usual cost-sensitive SVM, here called the 2C-SVM, is equivalent to another formulation called the 2nu-SVM. We then exploit a characterization of the 2nu-SVM parameter space to develop a simple yet powerful approach to error estimation based on smoothing. In an extensive experimental study, we demonstrate that smoothing significantly improves the accuracy of cross-validation error estimates, leading to dramatic performance gains. Furthermore, we propose coordinate descent strategies that offer significant gains in computational efficiency, with little to no loss in performance.

  15. An implementation of support vector machine on sentiment classification of movie reviews

    NASA Astrophysics Data System (ADS)

    Yulietha, I. M.; Faraby, S. A.; Adiwijaya; Widyaningtyas, W. C.

    2018-03-01

    With technological advances, all information about movie is available on the internet. If the information is processed properly, it will get the quality of the information. This research proposes to the classify sentiments on movie review documents. This research uses Support Vector Machine (SVM) method because it can classify high dimensional data in accordance with the data used in this research in the form of text. Support Vector Machine is a popular machine learning technique for text classification because it can classify by learning from a collection of documents that have been classified previously and can provide good result. Based on number of datasets, the 90-10 composition has the best result that is 85.6%. Based on SVM kernel, kernel linear with constant 1 has the best result that is 84.9%

  16. Discrimination of tomatoes bred by spaceflight mutagenesis using visible/near infrared spectroscopy and chemometrics.

    PubMed

    Shao, Yongni; Xie, Chuanqi; Jiang, Linjun; Shi, Jiahui; Zhu, Jiajin; He, Yong

    2015-04-05

    Visible/near infrared spectroscopy (Vis/NIR) based on sensitive wavelengths (SWs) and chemometrics was proposed to discriminate different tomatoes bred by spaceflight mutagenesis from their leafs or fruits (green or mature). The tomato breeds were mutant M1, M2 and their parent. Partial least squares (PLS) analysis and least squares-support vector machine (LS-SVM) were implemented for calibration models. PLS analysis was implemented for calibration models with different wavebands including the visible region (400-700 nm) and the near infrared region (700-1000 nm). The best PLS models were achieved in the visible region for the leaf and green fruit samples and in the near infrared region for the mature fruit samples. Furthermore, different latent variables (4-8 LVs for leafs, 5-9 LVs for green fruits, and 4-9 LVs for mature fruits) were used as inputs of LS-SVM to develop the LV-LS-SVM models with the grid search technique and radial basis function (RBF) kernel. The optimal LV-LS-SVM models were achieved with six LVs for the leaf samples, seven LVs for green fruits, and six LVs for mature fruits, respectively, and they outperformed the PLS models. Moreover, independent component analysis (ICA) was executed to select several SWs based on loading weights. The optimal LS-SVM model was achieved with SWs of 550-560 nm, 562-574 nm, 670-680 nm and 705-71 5 nm for the leaf samples; 548-556 nm, 559-564 nm, 678-685 nm and 962-974 nm for the green fruit samples; and 712-718 nm, 720-729 nm, 968-978 nm and 820-830 nm for the mature fruit samples. All of them had better performance than PLS and LV-LS-SVM, with the parameters of correlation coefficient (rp), root mean square error of prediction (RMSEP) and bias of 0.9792, 0.2632 and 0.0901 based on leaf discrimination, 0.9837, 0.2783 and 0.1758 based on green fruit discrimination, 0.9804, 0.2215 and -0.0035 based on mature fruit discrimination, respectively. The overall results indicated that ICA was an effective way for the selection of SWs, and the Vis/NIR combined with LS-SVM models had the capability to predict the different breeds (mutant M1, mutant M2 and their parent) of tomatoes from leafs and fruits. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. Generalized SMO algorithm for SVM-based multitask learning.

    PubMed

    Cai, Feng; Cherkassky, Vladimir

    2012-06-01

    Exploiting additional information to improve traditional inductive learning is an active research area in machine learning. In many supervised-learning applications, training data can be naturally separated into several groups, and incorporating this group information into learning may improve generalization. Recently, Vapnik proposed a general approach to formalizing such problems, known as "learning with structured data" and its support vector machine (SVM) based optimization formulation called SVM+. Liang and Cherkassky showed the connection between SVM+ and multitask learning (MTL) approaches in machine learning, and proposed an SVM-based formulation for MTL called SVM+MTL for classification. Training the SVM+MTL classifier requires the solution of a large quadratic programming optimization problem which scales as O(n(3)) with sample size n. So there is a need to develop computationally efficient algorithms for implementing SVM+MTL. This brief generalizes Platt's sequential minimal optimization (SMO) algorithm to the SVM+MTL setting. Empirical results show that, for typical SVM+MTL problems, the proposed generalized SMO achieves over 100 times speed-up, in comparison with general-purpose optimization routines.

  18. Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA

    PubMed Central

    Ma, Xiaoqi

    2015-01-01

    A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. PMID:26543867

  19. Construction accident narrative classification: An evaluation of text mining techniques.

    PubMed

    Goh, Yang Miang; Ubeynarayana, C U

    2017-11-01

    Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals

    DTIC Science & Technology

    2012-09-30

    floor 1176 Howell St Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR...multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et al. 2008). This classifier both detects and classifies echolocation...whales. Here Moretti’s group, especially S. Jarvis , will improve the SVM classifier by resolving confusion between species whose clicks overlap in

  1. Detection of Fungus Infection on Petals of Rapeseed (Brassica napus L.) Using NIR Hyperspectral Imaging

    NASA Astrophysics Data System (ADS)

    Zhao, Yan-Ru; Yu, Ke-Qiang; Li, Xiaoli; He, Yong

    2016-12-01

    Infected petals are often regarded as the source for the spread of fungi Sclerotinia sclerotiorum in all growing process of rapeseed (Brassica napus L.) plants. This research aimed to detect fungal infection of rapeseed petals by applying hyperspectral imaging in the spectral region of 874-1734 nm coupled with chemometrics. Reflectance was extracted from regions of interest (ROIs) in the hyperspectral image of each sample. Firstly, principal component analysis (PCA) was applied to conduct a cluster analysis with the first several principal components (PCs). Then, two methods including X-loadings of PCA and random frog (RF) algorithm were used and compared for optimizing wavebands selection. Least squares-support vector machine (LS-SVM) methodology was employed to establish discriminative models based on the optimal and full wavebands. Finally, area under the receiver operating characteristics curve (AUC) was utilized to evaluate classification performance of these LS-SVM models. It was found that LS-SVM based on the combination of all optimal wavebands had the best performance with AUC of 0.929. These results were promising and demonstrated the potential of applying hyperspectral imaging in fungus infection detection on rapeseed petals.

  2. A comparative study of clonal selection algorithm for effluent removal forecasting in septic sludge treatment plant.

    PubMed

    Chun, Ting Sie; Malek, M A; Ismail, Amelia Ritahani

    2015-01-01

    The development of effluent removal prediction is crucial in providing a planning tool necessary for the future development and the construction of a septic sludge treatment plant (SSTP), especially in the developing countries. In order to investigate the expected functionality of the required standard, the prediction of the effluent quality, namely biological oxygen demand, chemical oxygen demand and total suspended solid of an SSTP was modelled using an artificial intelligence approach. In this paper, we adopt the clonal selection algorithm (CSA) to set up a prediction model, with a well-established method - namely the least-square support vector machine (LS-SVM) as a baseline model. The test results of the case study showed that the prediction of the CSA-based SSTP model worked well and provided model performance as satisfactory as the LS-SVM model. The CSA approach shows that fewer control and training parameters are required for model simulation as compared with the LS-SVM approach. The ability of a CSA approach in resolving limited data samples, non-linear sample function and multidimensional pattern recognition makes it a powerful tool in modelling the prediction of effluent removals in an SSTP.

  3. The prediction of the building precision in the Laser Engineered Net Shaping process using advanced networks

    NASA Astrophysics Data System (ADS)

    Lu, Z. L.; Li, D. C.; Lu, B. H.; Zhang, A. F.; Zhu, G. X.; Pi, G.

    2010-05-01

    Laser Engineered Net Shaping (LENS) is an advanced manufacturing technology, but it is difficult to control the depositing height (DH) of the prototype because there are many technology parameters influencing the forming process. The effect of main parameters (laser power, scanning speed and powder feeding rate) on the DH of single track is firstly analyzed, and then it shows that there is the complex nonlinear intrinsic relationship between them. In order to predict the DH, the back propagation (BP) based network improved with Adaptive learning rate and Momentum coefficient (AM) algorithm, and the least square support vector machine (LS-SVM) network are both adopted. The mapping relationship between above parameters and the DH is constructed according to training samples collected by LENS experiments, and then their generalization ability, function-approximating ability and real-time are contrastively investigated. The results show that although the predicted result by the BP-AM approximates the experimental result, above performance index of the LS-SVM are better than those of the BP-AM. Finally, high-definition thin-walled parts of AISI316L are successfully fabricated. Hence, the LS-SVM network is more suitable for the prediction of the DH.

  4. Scoliosis curve type classification using kernel machine from 3D trunk image

    NASA Astrophysics Data System (ADS)

    Adankon, Mathias M.; Dansereau, Jean; Parent, Stefan; Labelle, Hubert; Cheriet, Farida

    2012-03-01

    Adolescent idiopathic scoliosis (AIS) is a deformity of the spine manifested by asymmetry and deformities of the external surface of the trunk. Classification of scoliosis deformities according to curve type is used to plan management of scoliosis patients. Currently, scoliosis curve type is determined based on X-ray exam. However, cumulative exposure to X-rays radiation significantly increases the risk for certain cancer. In this paper, we propose a robust system that can classify the scoliosis curve type from non invasive acquisition of 3D trunk surface of the patients. The 3D image of the trunk is divided into patches and local geometric descriptors characterizing the surface of the back are computed from each patch and forming the features. We perform the reduction of the dimensionality by using Principal Component Analysis and 53 components were retained. In this work a multi-class classifier is built with Least-squares support vector machine (LS-SVM) which is a kernel classifier. For this study, a new kernel was designed in order to achieve a robust classifier in comparison with polynomial and Gaussian kernel. The proposed system was validated using data of 103 patients with different scoliosis curve types diagnosed and classified by an orthopedic surgeon from the X-ray images. The average rate of successful classification was 93.3% with a better rate of prediction for the major thoracic and lumbar/thoracolumbar types.

  5. Optimizing a machine learning based glioma grading system using multi-parametric MRI histogram and texture features

    PubMed Central

    Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin

    2017-01-01

    Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization. PMID:28599282

  6. Optimizing a machine learning based glioma grading system using multi-parametric MRI histogram and texture features.

    PubMed

    Zhang, Xin; Yan, Lin-Feng; Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin

    2017-07-18

    Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization.

  7. In Situ Measurement of Some Soil Properties in Paddy Soil Using Visible and Near-Infrared Spectroscopy

    PubMed Central

    Wenjun, Ji; Zhou, Shi; Jingyi, Huang; Shuo, Li

    2014-01-01

    In situ measurements with visible and near-infrared spectroscopy (vis-NIR) provide an efficient way for acquiring soil information of paddy soils in the short time gap between the harvest and following rotation. The aim of this study was to evaluate its feasibility to predict a series of soil properties including organic matter (OM), organic carbon (OC), total nitrogen (TN), available nitrogen (AN), available phosphorus (AP), available potassium (AK) and pH of paddy soils in Zhejiang province, China. Firstly, the linear partial least squares regression (PLSR) was performed on the in situ spectra and the predictions were compared to those with laboratory-based recorded spectra. Then, the non-linear least-square support vector machine (LS-SVM) algorithm was carried out aiming to extract more useful information from the in situ spectra and improve predictions. Results show that in terms of OC, OM, TN, AN and pH, (i) the predictions were worse using in situ spectra compared to laboratory-based spectra with PLSR algorithm (ii) the prediction accuracy using LS-SVM (R2>0.75, RPD>1.90) was obviously improved with in situ vis-NIR spectra compared to PLSR algorithm, and comparable or even better than results generated using laboratory-based spectra with PLSR; (iii) in terms of AP and AK, poor predictions were obtained with in situ spectra (R2<0.5, RPD<1.50) either using PLSR or LS-SVM. The results highlight the use of LS-SVM for in situ vis-NIR spectroscopic estimation of soil properties of paddy soils. PMID:25153132

  8. Machine learning-based quantitative texture analysis of CT images of small renal masses: Differentiation of angiomyolipoma without visible fat from renal cell carcinoma.

    PubMed

    Feng, Zhichao; Rong, Pengfei; Cao, Peng; Zhou, Qingyu; Zhu, Wenwei; Yan, Zhimin; Liu, Qianyun; Wang, Wei

    2018-04-01

    To evaluate the diagnostic performance of machine-learning based quantitative texture analysis of CT images to differentiate small (≤ 4 cm) angiomyolipoma without visible fat (AMLwvf) from renal cell carcinoma (RCC). This single-institutional retrospective study included 58 patients with pathologically proven small renal mass (17 in AMLwvf and 41 in RCC groups). Texture features were extracted from the largest possible tumorous regions of interest (ROIs) by manual segmentation in preoperative three-phase CT images. Interobserver reliability and the Mann-Whitney U test were applied to select features preliminarily. Then support vector machine with recursive feature elimination (SVM-RFE) and synthetic minority oversampling technique (SMOTE) were adopted to establish discriminative classifiers, and the performance of classifiers was assessed. Of the 42 extracted features, 16 candidate features showed significant intergroup differences (P < 0.05) and had good interobserver agreement. An optimal feature subset including 11 features was further selected by the SVM-RFE method. The SVM-RFE+SMOTE classifier achieved the best performance in discriminating between small AMLwvf and RCC, with the highest accuracy, sensitivity, specificity and AUC of 93.9 %, 87.8 %, 100 % and 0.955, respectively. Machine learning analysis of CT texture features can facilitate the accurate differentiation of small AMLwvf from RCC. • Although conventional CT is useful for diagnosis of SRMs, it has limitations. • Machine-learning based CT texture analysis facilitate differentiation of small AMLwvf from RCC. • The highest accuracy of SVM-RFE+SMOTE classifier reached 93.9 %. • Texture analysis combined with machine-learning methods might spare unnecessary surgery for AMLwvf.

  9. Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals

    DTIC Science & Technology

    2013-09-30

    N0001411WX21394 Steve W. Martin SPAWAR Systems Center Pacific 53366 Front St. San Diego, CA 92152-6551 phone: (619) 553-9882 email: Steve.W.Martin...multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et al. 2008). This classifier both detects and classifies echolocation...whales. Here Moretti’s group, particularly S. Jarvis , will improve the SVM classifier by resolving confusion between species whose clicks overlap in

  10. A linear-RBF multikernel SVM to classify big text corpora.

    PubMed

    Romero, R; Iglesias, E L; Borrajo, L

    2015-01-01

    Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers.

  11. Optimal classification for the diagnosis of duchenne muscular dystrophy images using support vector machines.

    PubMed

    Zhang, Ming-Huan; Ma, Jun-Shan; Shen, Ying; Chen, Ying

    2016-09-01

    This study aimed to investigate the optimal support vector machines (SVM)-based classifier of duchenne muscular dystrophy (DMD) magnetic resonance imaging (MRI) images. T1-weighted (T1W) and T2-weighted (T2W) images of the 15 boys with DMD and 15 normal controls were obtained. Textural features of the images were extracted and wavelet decomposed, and then, principal features were selected. Scale transform was then performed for MRI images. Afterward, SVM-based classifiers of MRI images were analyzed based on the radical basis function and decomposition levels. The cost (C) parameter and kernel parameter [Formula: see text] were used for classification. Then, the optimal SVM-based classifier, expressed as [Formula: see text]), was identified by performance evaluation (sensitivity, specificity and accuracy). Eight of 12 textural features were selected as principal features (eigenvalues [Formula: see text]). The 16 SVM-based classifiers were obtained using combination of (C, [Formula: see text]), and those with lower C and [Formula: see text] values showed higher performances, especially classifier of [Formula: see text]). The SVM-based classifiers of T1W images showed higher performance than T1W images at the same decomposition level. The T1W images in classifier of [Formula: see text]) at level 2 decomposition showed the highest performance of all, and its overall correct sensitivity, specificity, and accuracy reached 96.9, 97.3, and 97.1 %, respectively. The T1W images in SVM-based classifier [Formula: see text] at level 2 decomposition showed the highest performance of all, demonstrating that it was the optimal classification for the diagnosis of DMD.

  12. Development of a ten-signature classifier using a support vector machine integrated approach to subdivide the M1 stage into M1a and M1b stages of nasopharyngeal carcinoma with synchronous metastases to better predict patients' survival.

    PubMed

    Jiang, Rou; You, Rui; Pei, Xiao-Qing; Zou, Xiong; Zhang, Meng-Xia; Wang, Tong-Min; Sun, Rui; Luo, Dong-Hua; Huang, Pei-Yu; Chen, Qiu-Yan; Hua, Yi-Jun; Tang, Lin-Quan; Guo, Ling; Mo, Hao-Yuan; Qian, Chao-Nan; Mai, Hai-Qiang; Hong, Ming-Huang; Cai, Hong-Min; Chen, Ming-Yuan

    2016-01-19

    The aim of this study was to develop a prognostic classifier and subdivided the M1 stage for nasopharyngeal carcinoma patients with synchronous metastases (mNPC). A retrospective cohort of 347 mNPC patients was recruited between January 2000 and December 2010. Thirty hematological markers and 11 clinical characteristics were collected, and the association of these factors with overall survival (OS) was evaluated. Advanced machine learning schemes of a support vector machine (SVM) were used to select a subset of highly informative factors and to construct a prognostic model (mNPC-SVM). The mNPC-SVM classifier identified ten informative variables, including three clinical indexes and seven hematological markers. The median survival time for low-risk patients (M1a) as identified by the mNPC-SVM classifier was 38.0 months, and survival time was dramatically reduced to 13.8 months for high-risk patients (M1b) (P < 0.001). Multivariate adjustment using prognostic factors revealed that the mNPC-SVM classifier remained a powerful predictor of OS (M1a vs. M1b, hazard ratio, 3.45; 95% CI, 2.59 to 4.60, P < 0.001). Moreover, combination treatment of systemic chemotherapy and loco-regional radiotherapy was associated with significantly better survival outcomes than chemotherapy alone (the 5-year OS, 47.0% vs. 10.0%, P < 0.001) in the M1a subgroup but not in the M1b subgroup (12.0% vs. 3.0%, P = 0.101). These findings were validated by a separate cohort. In conclusion, the newly developed mNPC-SVM classifier led to more precise risk definitions that offer a promising subdivision of the M1 stage and individualized selection for future therapeutic regimens in mNPC patients.

  13. Machine learning approach to automatic exudate detection in retinal images from diabetic patients

    NASA Astrophysics Data System (ADS)

    Sopharak, Akara; Dailey, Matthew N.; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Tom; Thet Nwe, Khine; Aye Moe, Yin

    2010-01-01

    Exudates are among the preliminary signs of diabetic retinopathy, a major cause of vision loss in diabetic patients. Early detection of exudates could improve patients' chances to avoid blindness. In this paper, we present a series of experiments on feature selection and exudates classification using naive Bayes and support vector machine (SVM) classifiers. We first fit the naive Bayes model to a training set consisting of 15 features extracted from each of 115,867 positive examples of exudate pixels and an equal number of negative examples. We then perform feature selection on the naive Bayes model, repeatedly removing features from the classifier, one by one, until classification performance stops improving. To find the best SVM, we begin with the best feature set from the naive Bayes classifier, and repeatedly add the previously-removed features to the classifier. For each combination of features, we perform a grid search to determine the best combination of hyperparameters ν (tolerance for training errors) and γ (radial basis function width). We compare the best naive Bayes and SVM classifiers to a baseline nearest neighbour (NN) classifier using the best feature sets from both classifiers. We find that the naive Bayes and SVM classifiers perform better than the NN classifier. The overall best sensitivity, specificity, precision, and accuracy are 92.28%, 98.52%, 53.05%, and 98.41%, respectively.

  14. Wavelet SVM in Reproducing Kernel Hilbert Space for hyperspectral remote sensing image classification

    NASA Astrophysics Data System (ADS)

    Du, Peijun; Tan, Kun; Xing, Xiaoshi

    2010-12-01

    Combining Support Vector Machine (SVM) with wavelet analysis, we constructed wavelet SVM (WSVM) classifier based on wavelet kernel functions in Reproducing Kernel Hilbert Space (RKHS). In conventional kernel theory, SVM is faced with the bottleneck of kernel parameter selection which further results in time-consuming and low classification accuracy. The wavelet kernel in RKHS is a kind of multidimensional wavelet function that can approximate arbitrary nonlinear functions. Implications on semiparametric estimation are proposed in this paper. Airborne Operational Modular Imaging Spectrometer II (OMIS II) hyperspectral remote sensing image with 64 bands and Reflective Optics System Imaging Spectrometer (ROSIS) data with 115 bands were used to experiment the performance and accuracy of the proposed WSVM classifier. The experimental results indicate that the WSVM classifier can obtain the highest accuracy when using the Coiflet Kernel function in wavelet transform. In contrast with some traditional classifiers, including Spectral Angle Mapping (SAM) and Minimum Distance Classification (MDC), and SVM classifier using Radial Basis Function kernel, the proposed wavelet SVM classifier using the wavelet kernel function in Reproducing Kernel Hilbert Space is capable of improving classification accuracy obviously.

  15. Classification of Multiple Chinese Liquors by Means of a QCM-based E-Nose and MDS-SVM Classifier.

    PubMed

    Li, Qiang; Gu, Yu; Jia, Jing

    2017-01-30

    Chinese liquors are internationally well-known fermentative alcoholic beverages. They have unique flavors attributable to the use of various bacteria and fungi, raw materials, and production processes. Developing a novel, rapid, and reliable method to identify multiple Chinese liquors is of positive significance. This paper presents a pattern recognition system for classifying ten brands of Chinese liquors based on multidimensional scaling (MDS) and support vector machine (SVM) algorithms in a quartz crystal microbalance (QCM)-based electronic nose (e-nose) we designed. We evaluated the comprehensive performance of the MDS-SVM classifier that predicted all ten brands of Chinese liquors individually. The prediction accuracy (98.3%) showed superior performance of the MDS-SVM classifier over the back-propagation artificial neural network (BP-ANN) classifier (93.3%) and moving average-linear discriminant analysis (MA-LDA) classifier (87.6%). The MDS-SVM classifier has reasonable reliability, good fitting and prediction (generalization) performance in classification of the Chinese liquors. Taking both application of the e-nose and validation of the MDS-SVM classifier into account, we have thus created a useful method for the classification of multiple Chinese liquors.

  16. Steganalysis using logistic regression

    NASA Astrophysics Data System (ADS)

    Lubenko, Ivans; Ker, Andrew D.

    2011-02-01

    We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.

  17. [Near infrared spectroscopy study on water content in turbine oil].

    PubMed

    Chen, Bin; Liu, Ge; Zhang, Xian-Ming

    2013-11-01

    Near infrared (NIR) spectroscopy combined with successive projections algorithm (SPA) was investigated for determination of water content in turbine oil. Through the 57 samples of different water content in turbine oil scanned applying near infrared (NIR) spectroscopy, with the water content in the turbine oil of 0-0.156%, different pretreatment methods such as the original spectra, first derivative spectra and differential polynomial least squares fitting algorithm Savitzky-Golay (SG), and successive projections algorithm (SPA) were applied for the extraction of effective wavelengths, the correlation coefficient (R) and root mean square error (RMSE) were used as the model evaluation indices, accordingly water content in turbine oil was investigated. The results indicated that the original spectra with different water content in turbine oil were pretreated by the performance of first derivative + SG pretreatments, then the selected effective wavelengths were used as the inputs of least square support vector machine (LS-SVM). A total of 16 variables selected by SPA were employed to construct the model of SPA and least square support vector machine (SPA-LS-SVM). There is 9 as The correlation coefficient was 0.975 9 and the root of mean square error of validation set was 2.655 8 x 10(-3) using the model, and it is feasible to determine the water content in oil using near infrared spectroscopy and SPA-LS-SVM, and an excellent prediction precision was obtained. This study supplied a new and alternative approach to the further application of near infrared spectroscopy in on-line monitoring of contamination such as water content in oil.

  18. Generative Models for Similarity-based Classification

    DTIC Science & Technology

    2007-01-01

    NC), local nearest centroid (local NC), k-nearest neighbors ( kNN ), and condensed nearest neighbors (CNN) are all similarity-based classifiers which...vector machine to the k nearest neighbors of the test sample [80]. The SVM- KNN method was developed to address the robustness and dimensionality...concerns that afflict nearest neighbors and SVMs. Similarly to the nearest-means classifier, the SVM- KNN is a hybrid local and global classifier developed

  19. Evaluating the diagnostic utility of applying a machine learning algorithm to diffusion tensor MRI measures in individuals with major depressive disorder.

    PubMed

    Schnyer, David M; Clasen, Peter C; Gonzalez, Christopher; Beevers, Christopher G

    2017-06-30

    Using MRI to diagnose mental disorders has been a long-term goal. Despite this, the vast majority of prior neuroimaging work has been descriptive rather than predictive. The current study applies support vector machine (SVM) learning to MRI measures of brain white matter to classify adults with Major Depressive Disorder (MDD) and healthy controls. In a precisely matched group of individuals with MDD (n =25) and healthy controls (n =25), SVM learning accurately (74%) classified patients and controls across a brain map of white matter fractional anisotropy values (FA). The study revealed three main findings: 1) SVM applied to DTI derived FA maps can accurately classify MDD vs. healthy controls; 2) prediction is strongest when only right hemisphere white matter is examined; and 3) removing FA values from a region identified by univariate contrast as significantly different between MDD and healthy controls does not change the SVM accuracy. These results indicate that SVM learning applied to neuroimaging data can classify the presence versus absence of MDD and that predictive information is distributed across brain networks rather than being highly localized. Finally, MDD group differences revealed through typical univariate contrasts do not necessarily reveal patterns that provide accurate predictive information. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.

  20. Feature Selection and Parameters Optimization of SVM Using Particle Swarm Optimization for Fault Classification in Power Distribution Systems.

    PubMed

    Cho, Ming-Yuan; Hoang, Thi Thom

    2017-01-01

    Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.

  1. Analysis of miRNA expression profile based on SVM algorithm

    NASA Astrophysics Data System (ADS)

    Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian

    2018-05-01

    Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.

  2. Support vector machines to detect physiological patterns for EEG and EMG-based human-computer interaction: a review

    NASA Astrophysics Data System (ADS)

    Quitadamo, L. R.; Cavrini, F.; Sbernini, L.; Riillo, F.; Bianchi, L.; Seri, S.; Saggio, G.

    2017-02-01

    Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported.

  3. Approximate l-fold cross-validation with Least Squares SVM and Kernel Ridge Regression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Edwards, Richard E; Zhang, Hao; Parker, Lynne Edwards

    2013-01-01

    Kernel methods have difficulties scaling to large modern data sets. The scalability issues are based on computational and memory requirements for working with a large matrix. These requirements have been addressed over the years by using low-rank kernel approximations or by improving the solvers scalability. However, Least Squares Support VectorMachines (LS-SVM), a popular SVM variant, and Kernel Ridge Regression still have several scalability issues. In particular, the O(n^3) computational complexity for solving a single model, and the overall computational complexity associated with tuning hyperparameters are still major problems. We address these problems by introducing an O(n log n) approximate l-foldmore » cross-validation method that uses a multi-level circulant matrix to approximate the kernel. In addition, we prove our algorithm s computational complexity and present empirical runtimes on data sets with approximately 1 million data points. We also validate our approximate method s effectiveness at selecting hyperparameters on real world and standard benchmark data sets. Lastly, we provide experimental results on using a multi-level circulant kernel approximation to solve LS-SVM problems with hyperparameters selected using our method.« less

  4. Progressive Classification Using Support Vector Machines

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri; Kocurek, Michael

    2009-01-01

    An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user can halt this reclassification process at any point, thereby obtaining the best possible result for a given amount of computation time. Alternatively, the results can be displayed as they are generated, providing the user with real-time feedback about the current accuracy of classification.

  5. Activity Recognition in Egocentric video using SVM, kNN and Combined SVMkNN Classifiers

    NASA Astrophysics Data System (ADS)

    Sanal Kumar, K. P.; Bhavani, R., Dr.

    2017-08-01

    Egocentric vision is a unique perspective in computer vision which is human centric. The recognition of egocentric actions is a challenging task which helps in assisting elderly people, disabled patients and so on. In this work, life logging activity videos are taken as input. There are 2 categories, first one is the top level and second one is second level. Here, the recognition is done using the features like Histogram of Oriented Gradients (HOG), Motion Boundary Histogram (MBH) and Trajectory. The features are fused together and it acts as a single feature. The extracted features are reduced using Principal Component Analysis (PCA). The features that are reduced are provided as input to the classifiers like Support Vector Machine (SVM), k nearest neighbor (kNN) and combined Support Vector Machine (SVM) and k Nearest Neighbor (kNN) (combined SVMkNN). These classifiers are evaluated and the combined SVMkNN provided better results than other classifiers in the literature.

  6. A low cost implementation of multi-parameter patient monitor using intersection kernel support vector machine classifier

    NASA Astrophysics Data System (ADS)

    Mohan, Dhanya; Kumar, C. Santhosh

    2016-03-01

    Predicting the physiological condition (normal/abnormal) of a patient is highly desirable to enhance the quality of health care. Multi-parameter patient monitors (MPMs) using heart rate, arterial blood pressure, respiration rate and oxygen saturation (S pO2) as input parameters were developed to monitor the condition of patients, with minimum human resource utilization. The Support vector machine (SVM), an advanced machine learning approach popularly used for classification and regression is used for the realization of MPMs. For making MPMs cost effective, we experiment on the hardware implementation of the MPM using support vector machine classifier. The training of the system is done using the matlab environment and the detection of the alarm/noalarm condition is implemented in hardware. We used different kernels for SVM classification and note that the best performance was obtained using intersection kernel SVM (IKSVM). The intersection kernel support vector machine classifier MPM has outperformed the best known MPM using radial basis function kernel by an absoute improvement of 2.74% in accuracy, 1.86% in sensitivity and 3.01% in specificity. The hardware model was developed based on the improved performance system using Verilog Hardware Description Language and was implemented on Altera cyclone-II development board.

  7. Sonomyography Analysis on Thickness of Skeletal Muscle During Dynamic Contraction Induced by Neuromuscular Electrical Stimulation: A Pilot Study.

    PubMed

    Qiu, Shuang; Feng, Jing; Xu, Jiapeng; Xu, Rui; Zhao, Xin; Zhou, Peng; Qi, Hongzhi; Zhang, Lixin; Ming, Dong

    2017-01-01

    Neuromuscular electrical stimulation (NMES) that stimulates skeletal muscles to induce contractions has been widely applied to restore functions of paralyzed muscles. However, the architectural changes of stimulated muscles induced by NMES are still not well understood. The present study applies sonomyography (SMG) to evaluate muscle architecture under NMES-induced and voluntary movements. The quadriceps muscles of seven healthy subjects were tested for eight cycles during an extension exercise of the knee joint with/without NMES, and SMG and the knee joint angle were recorded during the process of knee extension. A least squares support vector machine (LS-SVM) LS-SVM model was developed and trained using the data sets of six cycles collected under NMES, while the remaining data was used to test. Muscle thickness changes were extracted from ultrasound images and compared between NMES-induced and voluntary contractions, and LS-SVM was used to model a relationship between dynamical knee joint angles and SMG signals. Muscle thickness showed to be significantly correlated with joint angle in NMES-induced contractions, and a significant negative correlation was observed between Vastus intermedius (VI) thickness and rectus femoris (RF) thickness. In addition, there was a significant difference between voluntary and NMES-induced contractions . The LS-SVM model based on RF thickness and knee joint angle provided superior performance compared with the model based on VI thickness and knee joint angle or total thickness and knee joint angle. This suggests that a strong relation exists between the RF thickness and knee joint angle. These results provided direct evidence for the potential application of RF thickness in optimizing NMES system as well as measuring muscle state under NMES.

  8. Data on Support Vector Machines (SVM) model to forecast photovoltaic power.

    PubMed

    Malvoni, M; De Giorgi, M G; Congedo, P M

    2016-12-01

    The data concern the photovoltaic (PV) power, forecasted by a hybrid model that considers weather variations and applies a technique to reduce the input data size, as presented in the paper entitled "Photovoltaic forecast based on hybrid pca-lssvm using dimensionality reducted data" (M. Malvoni, M.G. De Giorgi, P.M. Congedo, 2015) [1]. The quadratic Renyi entropy criteria together with the principal component analysis (PCA) are applied to the Least Squares Support Vector Machines (LS-SVM) to predict the PV power in the day-ahead time frame. The data here shared represent the proposed approach results. Hourly PV power predictions for 1,3,6,12, 24 ahead hours and for different data reduction sizes are provided in Supplementary material.

  9. Extraction and classification of 3D objects from volumetric CT data

    NASA Astrophysics Data System (ADS)

    Song, Samuel M.; Kwon, Junghyun; Ely, Austin; Enyeart, John; Johnson, Chad; Lee, Jongkyu; Kim, Namho; Boyd, Douglas P.

    2016-05-01

    We propose an Automatic Threat Detection (ATD) algorithm for Explosive Detection System (EDS) using our multistage Segmentation Carving (SC) followed by Support Vector Machine (SVM) classifier. The multi-stage Segmentation and Carving (SC) step extracts all suspect 3-D objects. The feature vector is then constructed for all extracted objects and the feature vector is classified by the Support Vector Machine (SVM) previously learned using a set of ground truth threat and benign objects. The learned SVM classifier has shown to be effective in classification of different types of threat materials. The proposed ATD algorithm robustly deals with CT data that are prone to artifacts due to scatter, beam hardening as well as other systematic idiosyncrasies of the CT data. Furthermore, the proposed ATD algorithm is amenable for including newly emerging threat materials as well as for accommodating data from newly developing sensor technologies. Efficacy of the proposed ATD algorithm with the SVM classifier is demonstrated by the Receiver Operating Characteristics (ROC) curve that relates Probability of Detection (PD) as a function of Probability of False Alarm (PFA). The tests performed using CT data of passenger bags shows excellent performance characteristics.

  10. Robust LS-SVM-based adaptive constrained control for a class of uncertain nonlinear systems with time-varying predefined performance

    NASA Astrophysics Data System (ADS)

    Luo, Jianjun; Wei, Caisheng; Dai, Honghua; Yuan, Jianping

    2018-03-01

    This paper focuses on robust adaptive control for a class of uncertain nonlinear systems subject to input saturation and external disturbance with guaranteed predefined tracking performance. To reduce the limitations of classical predefined performance control method in the presence of unknown initial tracking errors, a novel predefined performance function with time-varying design parameters is first proposed. Then, aiming at reducing the complexity of nonlinear approximations, only two least-square-support-vector-machine-based (LS-SVM-based) approximators with two design parameters are required through norm form transformation of the original system. Further, a novel LS-SVM-based adaptive constrained control scheme is developed under the time-vary predefined performance using backstepping technique. Wherein, to avoid the tedious analysis and repeated differentiations of virtual control laws in the backstepping technique, a simple and robust finite-time-convergent differentiator is devised to only extract its first-order derivative at each step in the presence of external disturbance. In this sense, the inherent demerit of backstepping technique-;explosion of terms; brought by the recursive virtual controller design is conquered. Moreover, an auxiliary system is designed to compensate the control saturation. Finally, three groups of numerical simulations are employed to validate the effectiveness of the newly developed differentiator and the proposed adaptive constrained control scheme.

  11. Firmness prediction in Prunus persica 'Calrico' peaches by visible/short-wave near infrared spectroscopy and acoustic measurements using optimised linear and non-linear chemometric models.

    PubMed

    Lafuente, Victoria; Herrera, Luis J; Pérez, María del Mar; Val, Jesús; Negueruela, Ignacio

    2015-08-15

    In this work, near infrared spectroscopy (NIR) and an acoustic measure (AWETA) (two non-destructive methods) were applied in Prunus persica fruit 'Calrico' (n = 260) to predict Magness-Taylor (MT) firmness. Separate and combined use of these measures was evaluated and compared using partial least squares (PLS) and least squares support vector machine (LS-SVM) regression methods. Also, a mutual-information-based variable selection method, seeking to find the most significant variables to produce optimal accuracy of the regression models, was applied to a joint set of variables (NIR wavelengths and AWETA measure). The newly proposed combined NIR-AWETA model gave good values of the determination coefficient (R(2)) for PLS and LS-SVM methods (0.77 and 0.78, respectively), improving the reliability of MT firmness prediction in comparison with separate NIR and AWETA predictions. The three variables selected by the variable selection method (AWETA measure plus NIR wavelengths 675 and 697 nm) achieved R(2) values 0.76 and 0.77, PLS and LS-SVM. These results indicated that the proposed mutual-information-based variable selection algorithm was a powerful tool for the selection of the most relevant variables. © 2014 Society of Chemical Industry.

  12. [Characteristic wavelengths selection of soluble solids content of pear based on NIR spectral and LS-SVM].

    PubMed

    Fan, Shu-xiang; Huang, Wen-qian; Li, Jiang-bo; Zhao, Chun-jiang; Zhang, Bao-hua

    2014-08-01

    To improve the precision and robustness of the NIR model of the soluble solid content (SSC) on pear. The total number of 160 pears was for the calibration (n=120) and prediction (n=40). Different spectral pretreatment methods, including standard normal variate (SNV) and multiplicative scatter correction (MSC) were used before further analysis. A combination of genetic algorithm (GA) and successive projections algorithm (SPA) was proposed to select most effective wavelengths after uninformative variable elimination (UVE) from original spectra, SNV pretreated spectra and MSC pretreated spectra respectively. The selected variables were used as the inputs of least squares-support vector machine (LS-SVM) model to build models for de- termining the SSC of pear. The results indicated that LS-SVM model built using SNVE-UVE-GA-SPA on 30 characteristic wavelengths selected from full-spectrum which had 3112 wavelengths achieved the optimal performance. The correlation coefficient (Rp) and root mean square error of prediction (RMSEP) for prediction sets were 0.956, 0.271 for SSC. The model is reliable and the predicted result is effective. The method can meet the requirement of quick measuring SSC of pear and might be important for the development of portable instruments and online monitoring.

  13. Mapping membrane activity in undiscovered peptide sequence space using machine learning

    PubMed Central

    Fulan, Benjamin M.; Wong, Gerard C. L.

    2016-01-01

    There are some ∼1,100 known antimicrobial peptides (AMPs), which permeabilize microbial membranes but have diverse sequences. Here, we develop a support vector machine (SVM)-based classifier to investigate ⍺-helical AMPs and the interrelated nature of their functional commonality and sequence homology. SVM is used to search the undiscovered peptide sequence space and identify Pareto-optimal candidates that simultaneously maximize the distance σ from the SVM hyperplane (thus maximize its “antimicrobialness”) and its ⍺-helicity, but minimize mutational distance to known AMPs. By calibrating SVM machine learning results with killing assays and small-angle X-ray scattering (SAXS), we find that the SVM metric σ correlates not with a peptide’s minimum inhibitory concentration (MIC), but rather its ability to generate negative Gaussian membrane curvature. This surprising result provides a topological basis for membrane activity common to AMPs. Moreover, we highlight an important distinction between the maximal recognizability of a sequence to a trained AMP classifier (its ability to generate membrane curvature) and its maximal antimicrobial efficacy. As mutational distances are increased from known AMPs, we find AMP-like sequences that are increasingly difficult for nature to discover via simple mutation. Using the sequence map as a discovery tool, we find a unexpectedly diverse taxonomy of sequences that are just as membrane-active as known AMPs, but with a broad range of primary functions distinct from AMP functions, including endogenous neuropeptides, viral fusion proteins, topogenic peptides, and amyloids. The SVM classifier is useful as a general detector of membrane activity in peptide sequences. PMID:27849600

  14. Applications of Support Vector Machines In Chemo And Bioinformatics

    NASA Astrophysics Data System (ADS)

    Jayaraman, V. K.; Sundararajan, V.

    2010-10-01

    Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.

  15. Knee Joint Vibration Signal Analysis with Matching Pursuit Decomposition and Dynamic Weighted Classifier Fusion

    PubMed Central

    Cai, Suxian; Yang, Shanshan; Zheng, Fang; Lu, Meng; Wu, Yunfeng; Krishnan, Sridhar

    2013-01-01

    Analysis of knee joint vibration (VAG) signals can provide quantitative indices for detection of knee joint pathology at an early stage. In addition to the statistical features developed in the related previous studies, we extracted two separable features, that is, the number of atoms derived from the wavelet matching pursuit decomposition and the number of significant signal turns detected with the fixed threshold in the time domain. To perform a better classification over the data set of 89 VAG signals, we applied a novel classifier fusion system based on the dynamic weighted fusion (DWF) method to ameliorate the classification performance. For comparison, a single leastsquares support vector machine (LS-SVM) and the Bagging ensemble were used for the classification task as well. The results in terms of overall accuracy in percentage and area under the receiver operating characteristic curve obtained with the DWF-based classifier fusion method reached 88.76% and 0.9515, respectively, which demonstrated the effectiveness and superiority of the DWF method with two distinct features for the VAG signal analysis. PMID:23573175

  16. Tackling Missing Data in Community Health Studies Using Additive LS-SVM Classifier.

    PubMed

    Wang, Guanjin; Deng, Zhaohong; Choi, Kup-Sze

    2018-03-01

    Missing data is a common issue in community health and epidemiological studies. Direct removal of samples with missing data can lead to reduced sample size and information bias, which deteriorates the significance of the results. While data imputation methods are available to deal with missing data, they are limited in performance and could introduce noises into the dataset. Instead of data imputation, a novel method based on additive least square support vector machine (LS-SVM) is proposed in this paper for predictive modeling when the input features of the model contain missing data. The method also determines simultaneously the influence of the features with missing values on the classification accuracy using the fast leave-one-out cross-validation strategy. The performance of the method is evaluated by applying it to predict the quality of life (QOL) of elderly people using health data collected in the community. The dataset involves demographics, socioeconomic status, health history, and the outcomes of health assessments of 444 community-dwelling elderly people, with 5% to 60% of data missing in some of the input features. The QOL is measured using a standard questionnaire of the World Health Organization. Results show that the proposed method outperforms four conventional methods for handling missing data-case deletion, feature deletion, mean imputation, and K-nearest neighbor imputation, with the average QOL prediction accuracy reaching 0.7418. It is potentially a promising technique for tackling missing data in community health research and other applications.

  17. Development of Noninvasive Classification Methods for Different Roasting Degrees of Coffee Beans Using Hyperspectral Imaging

    PubMed Central

    Chu, Bingquan; Yu, Keqiang; Zhao, Yanru

    2018-01-01

    This study aimed to develop an approach for quickly and noninvasively differentiating the roasting degrees of coffee beans using hyperspectral imaging (HSI). The qualitative properties of seven roasting degrees of coffee beans (unroasted, light, moderately light, light medium, medium, moderately dark, and dark) were assayed, including moisture, crude fat, trigonelline, chlorogenic acid, and caffeine contents. These properties were influenced greatly by the respective roasting degree. Their hyperspectral images (874–1734 nm) were collected using a hyperspectral reflectance imaging system. The spectra of the regions of interest were manually extracted from the HSI images. Then, principal components analysis was employed to compress the spectral data and select the optimal wavelengths based on loading weight analysis. Meanwhile, the random frog (RF) methodology and the successive projections algorithm were also adopted to pick effective wavelengths from the spectral data. Finally, least squares support vector machine (LS-SVM) was utilized to establish discriminative models using spectral reflectance and corresponding labeled classes for each degree of roast sample. The results showed that the LS-SVM model, established by the RF selecting method, with eight wavelengths performed very well, achieving an overall classification accuracy of 90.30%. In conclusion, HSI was illustrated as a potential technique for noninvasively classifying the roasting degrees of coffee beans and might have an important application for the development of nondestructive, real-time, and portable sensors to monitor the roasting process of coffee beans. PMID:29671781

  18. Development of Noninvasive Classification Methods for Different Roasting Degrees of Coffee Beans Using Hyperspectral Imaging.

    PubMed

    Chu, Bingquan; Yu, Keqiang; Zhao, Yanru; He, Yong

    2018-04-19

    This study aimed to develop an approach for quickly and noninvasively differentiating the roasting degrees of coffee beans using hyperspectral imaging (HSI). The qualitative properties of seven roasting degrees of coffee beans (unroasted, light, moderately light, light medium, medium, moderately dark, and dark) were assayed, including moisture, crude fat, trigonelline, chlorogenic acid, and caffeine contents. These properties were influenced greatly by the respective roasting degree. Their hyperspectral images (874⁻1734 nm) were collected using a hyperspectral reflectance imaging system. The spectra of the regions of interest were manually extracted from the HSI images. Then, principal components analysis was employed to compress the spectral data and select the optimal wavelengths based on loading weight analysis. Meanwhile, the random frog (RF) methodology and the successive projections algorithm were also adopted to pick effective wavelengths from the spectral data. Finally, least squares support vector machine (LS-SVM) was utilized to establish discriminative models using spectral reflectance and corresponding labeled classes for each degree of roast sample. The results showed that the LS-SVM model, established by the RF selecting method, with eight wavelengths performed very well, achieving an overall classification accuracy of 90.30%. In conclusion, HSI was illustrated as a potential technique for noninvasively classifying the roasting degrees of coffee beans and might have an important application for the development of nondestructive, real-time, and portable sensors to monitor the roasting process of coffee beans.

  19. Per-field crop classification in irrigated agricultural regions in middle Asia using random forest and support vector machine ensemble

    NASA Astrophysics Data System (ADS)

    Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher

    2012-10-01

    Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.

  20. Automated discrimination of dementia spectrum disorders using extreme learning machine and structural T1 MRI features.

    PubMed

    Jongin Kim; Boreom Lee

    2017-07-01

    The classification of neuroimaging data for the diagnosis of Alzheimer's Disease (AD) is one of the main research goals of the neuroscience and clinical fields. In this study, we performed extreme learning machine (ELM) classifier to discriminate the AD, mild cognitive impairment (MCI) from normal control (NC). We compared the performance of ELM with that of a linear kernel support vector machine (SVM) for 718 structural MRI images from Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The data consisted of normal control, MCI converter (MCI-C), MCI non-converter (MCI-NC), and AD. We employed SVM-based recursive feature elimination (RFE-SVM) algorithm to find the optimal subset of features. In this study, we found that the RFE-SVM feature selection approach in combination with ELM shows the superior classification accuracy to that of linear kernel SVM for structural T1 MRI data.

  1. Successive Projections Algorithm-Multivariable Linear Regression Classifier for the Detection of Contaminants on Chicken Carcasses in Hyperspectral Images

    NASA Astrophysics Data System (ADS)

    Wu, W.; Chen, G. Y.; Kang, R.; Xia, J. C.; Huang, Y. P.; Chen, K. J.

    2017-07-01

    During slaughtering and further processing, chicken carcasses are inevitably contaminated by microbial pathogen contaminants. Due to food safety concerns, many countries implement a zero-tolerance policy that forbids the placement of visibly contaminated carcasses in ice-water chiller tanks during processing. Manual detection of contaminants is labor consuming and imprecise. Here, a successive projections algorithm (SPA)-multivariable linear regression (MLR) classifier based on an optimal performance threshold was developed for automatic detection of contaminants on chicken carcasses. Hyperspectral images were obtained using a hyperspectral imaging system. A regression model of the classifier was established by MLR based on twelve characteristic wavelengths (505, 537, 561, 562, 564, 575, 604, 627, 656, 665, 670, and 689 nm) selected by SPA , and the optimal threshold T = 1 was obtained from the receiver operating characteristic (ROC) analysis. The SPA-MLR classifier provided the best detection results when compared with the SPA-partial least squares (PLS) regression classifier and the SPA-least squares supported vector machine (LS-SVM) classifier. The true positive rate (TPR) of 100% and the false positive rate (FPR) of 0.392% indicate that the SPA-MLR classifier can utilize spatial and spectral information to effectively detect contaminants on chicken carcasses.

  2. Spectrophotometric determination of ternary mixtures of thiamin, riboflavin and pyridoxal in pharmaceutical and human plasma by least-squares support vector machines.

    PubMed

    Niazi, Ali; Zolgharnein, Javad; Afiuni-Zadeh, Somaie

    2007-11-01

    Ternary mixtures of thiamin, riboflavin and pyridoxal have been simultaneously determined in synthetic and real samples by applications of spectrophotometric and least-squares support vector machines. The calibration graphs were linear in the ranges of 1.0 - 20.0, 1.0 - 10.0 and 1.0 - 20.0 microg ml(-1) with detection limits of 0.6, 0.5 and 0.7 microg ml(-1) for thiamin, riboflavin and pyridoxal, respectively. The experimental calibration matrix was designed with 21 mixtures of these chemicals. The concentrations were varied between calibration graph concentrations of vitamins. The simultaneous determination of these vitamin mixtures by using spectrophotometric methods is a difficult problem, due to spectral interferences. The partial least squares (PLS) modeling and least-squares support vector machines were used for the multivariate calibration of the spectrophotometric data. An excellent model was built using LS-SVM, with low prediction errors and superior performance in relation to PLS. The root mean square errors of prediction (RMSEP) for thiamin, riboflavin and pyridoxal with PLS and LS-SVM were 0.6926, 0.3755, 0.4322 and 0.0421, 0.0318, 0.0457, respectively. The proposed method was satisfactorily applied to the rapid simultaneous determination of thiamin, riboflavin and pyridoxal in commercial pharmaceutical preparations and human plasma samples.

  3. Combined empirical mode decomposition and texture features for skin lesion classification using quadratic support vector machine.

    PubMed

    Wahba, Maram A; Ashour, Amira S; Napoleon, Sameh A; Abd Elnaby, Mustafa M; Guo, Yanhui

    2017-12-01

    Basal cell carcinoma is one of the most common malignant skin lesions. Automated lesion identification and classification using image processing techniques is highly required to reduce the diagnosis errors. In this study, a novel technique is applied to classify skin lesion images into two classes, namely the malignant Basal cell carcinoma and the benign nevus. A hybrid combination of bi-dimensional empirical mode decomposition and gray-level difference method features is proposed after hair removal. The combined features are further classified using quadratic support vector machine (Q-SVM). The proposed system has achieved outstanding performance of 100% accuracy, sensitivity and specificity compared to other support vector machine procedures as well as with different extracted features. Basal Cell Carcinoma is effectively classified using Q-SVM with the proposed combined features.

  4. Support vector machines classifiers of physical activities in preschoolers

    USDA-ARS?s Scientific Manuscript database

    The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...

  5. Fabric wrinkle characterization and classification using modified wavelet coefficients and optimized support-vector-machine classifier

    USDA-ARS?s Scientific Manuscript database

    This paper presents a novel wrinkle evaluation method that uses modified wavelet coefficients and an optimized support-vector-machine (SVM) classification scheme to characterize and classify wrinkle appearance of fabric. Fabric images were decomposed with the wavelet transform (WT), and five parame...

  6. SVM classifier on chip for melanoma detection.

    PubMed

    Afifi, Shereen; GholamHosseini, Hamid; Sinha, Roopak

    2017-07-01

    Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM-based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy.

  7. Support-vector-machine tree-based domain knowledge learning toward automated sports video classification

    NASA Astrophysics Data System (ADS)

    Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin

    2010-12-01

    We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.

  8. An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms.

    PubMed

    Amaral, Jorge L M; Lopes, Agnaldo J; Jansen, José M; Faria, Alvaro C D; Melo, Pedro L

    2013-12-01

    The purpose of this study was to develop an automatic classifier to increase the accuracy of the forced oscillation technique (FOT) for diagnosing early respiratory abnormalities in smoking patients. The data consisted of FOT parameters obtained from 56 volunteers, 28 healthy and 28 smokers with low tobacco consumption. Many supervised learning techniques were investigated, including logistic linear classifiers, k nearest neighbor (KNN), neural networks and support vector machines (SVM). To evaluate performance, the ROC curve of the most accurate parameter was established as baseline. To determine the best input features and classifier parameters, we used genetic algorithms and a 10-fold cross-validation using the average area under the ROC curve (AUC). In the first experiment, the original FOT parameters were used as input. We observed a significant improvement in accuracy (KNN=0.89 and SVM=0.87) compared with the baseline (0.77). The second experiment performed a feature selection on the original FOT parameters. This selection did not cause any significant improvement in accuracy, but it was useful in identifying more adequate FOT parameters. In the third experiment, we performed a feature selection on the cross products of the FOT parameters. This selection resulted in a further increase in AUC (KNN=SVM=0.91), which allows for high diagnostic accuracy. In conclusion, machine learning classifiers can help identify early smoking-induced respiratory alterations. The use of FOT cross products and the search for the best features and classifier parameters can markedly improve the performance of machine learning classifiers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  9. A comparative study of the svm and k-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals

    PubMed Central

    2014-01-01

    Background Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. Results The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Conclusion Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database. PMID:24970564

  10. A comparative study of the SVM and K-nn machine learning algorithms for the diagnosis of respiratory pathologies using pulmonary acoustic signals.

    PubMed

    Palaniappan, Rajkumar; Sundaraj, Kenneth; Sundaraj, Sebastian

    2014-06-27

    Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database.

  11. Multiclass Classification for the Differential Diagnosis on the ADHD Subtypes Using Recursive Feature Elimination and Hierarchical Extreme Learning Machine: Structural MRI Study

    PubMed Central

    Qureshi, Muhammad Naveed Iqbal; Min, Beomjun; Jo, Hang Joon; Lee, Boreom

    2016-01-01

    The classification of neuroimaging data for the diagnosis of certain brain diseases is one of the main research goals of the neuroscience and clinical communities. In this study, we performed multiclass classification using a hierarchical extreme learning machine (H-ELM) classifier. We compared the performance of this classifier with that of a support vector machine (SVM) and basic extreme learning machine (ELM) for cortical MRI data from attention deficit/hyperactivity disorder (ADHD) patients. We used 159 structural MRI images of children from the publicly available ADHD-200 MRI dataset. The data consisted of three types, namely, typically developing (TDC), ADHD-inattentive (ADHD-I), and ADHD-combined (ADHD-C). We carried out feature selection by using standard SVM-based recursive feature elimination (RFE-SVM) that enabled us to achieve good classification accuracy (60.78%). In this study, we found the RFE-SVM feature selection approach in combination with H-ELM to effectively enable the acquisition of high multiclass classification accuracy rates for structural neuroimaging data. In addition, we found that the most important features for classification were the surface area of the superior frontal lobe, and the cortical thickness, volume, and mean surface area of the whole cortex. PMID:27500640

  12. Multiclass Classification for the Differential Diagnosis on the ADHD Subtypes Using Recursive Feature Elimination and Hierarchical Extreme Learning Machine: Structural MRI Study.

    PubMed

    Qureshi, Muhammad Naveed Iqbal; Min, Beomjun; Jo, Hang Joon; Lee, Boreom

    2016-01-01

    The classification of neuroimaging data for the diagnosis of certain brain diseases is one of the main research goals of the neuroscience and clinical communities. In this study, we performed multiclass classification using a hierarchical extreme learning machine (H-ELM) classifier. We compared the performance of this classifier with that of a support vector machine (SVM) and basic extreme learning machine (ELM) for cortical MRI data from attention deficit/hyperactivity disorder (ADHD) patients. We used 159 structural MRI images of children from the publicly available ADHD-200 MRI dataset. The data consisted of three types, namely, typically developing (TDC), ADHD-inattentive (ADHD-I), and ADHD-combined (ADHD-C). We carried out feature selection by using standard SVM-based recursive feature elimination (RFE-SVM) that enabled us to achieve good classification accuracy (60.78%). In this study, we found the RFE-SVM feature selection approach in combination with H-ELM to effectively enable the acquisition of high multiclass classification accuracy rates for structural neuroimaging data. In addition, we found that the most important features for classification were the surface area of the superior frontal lobe, and the cortical thickness, volume, and mean surface area of the whole cortex.

  13. Noninvasive prostate cancer screening based on serum surface-enhanced Raman spectroscopy and support vector machine

    NASA Astrophysics Data System (ADS)

    Li, Shaoxin; Zhang, Yanjiao; Xu, Junfa; Li, Linfang; Zeng, Qiuyao; Lin, Lin; Guo, Zhouyi; Liu, Zhiming; Xiong, Honglian; Liu, Songhao

    2014-09-01

    This study aims to present a noninvasive prostate cancer screening methods using serum surface-enhanced Raman scattering (SERS) and support vector machine (SVM) techniques through peripheral blood sample. SERS measurements are performed using serum samples from 93 prostate cancer patients and 68 healthy volunteers by silver nanoparticles. Three types of kernel functions including linear, polynomial, and Gaussian radial basis function (RBF) are employed to build SVM diagnostic models for classifying measured SERS spectra. For comparably evaluating the performance of SVM classification models, the standard multivariate statistic analysis method of principal component analysis (PCA) is also applied to classify the same datasets. The study results show that for the RBF kernel SVM diagnostic model, the diagnostic accuracy of 98.1% is acquired, which is superior to the results of 91.3% obtained from PCA methods. The receiver operating characteristic curve of diagnostic models further confirm above research results. This study demonstrates that label-free serum SERS analysis technique combined with SVM diagnostic algorithm has great potential for noninvasive prostate cancer screening.

  14. Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals

    DTIC Science & Technology

    2011-09-30

    Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR Systems Center Pacific...APPROACH Odontocete click detection and classification. A multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et...beaked whales, Risso’s dolphins, short-finned pilot whales, and sperm whales. Here Moretti’s group, especially S. Jarvis , will improve the SVM classifier

  15. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

    PubMed

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.

  16. SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier

    PubMed Central

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306

  17. Implementation of support vector machine for classification of speech marked hijaiyah letters based on Mel frequency cepstrum coefficient feature extraction

    NASA Astrophysics Data System (ADS)

    Adhi Pradana, Wisnu; Adiwijaya; Novia Wisesty, Untari

    2018-03-01

    Support Vector Machine or commonly called SVM is one method that can be used to process the classification of a data. SVM classifies data from 2 different classes with hyperplane. In this study, the system was built using SVM to develop Arabic Speech Recognition. In the development of the system, there are 2 kinds of speakers that have been tested that is dependent speakers and independent speakers. The results from this system is an accuracy of 85.32% for speaker dependent and 61.16% for independent speakers.

  18. Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier.

    PubMed

    Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram

    2015-08-01

    In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.

  19. Comparative Analysis of Hybrid Models for Prediction of BP Reactivity to Crossed Legs.

    PubMed

    Kaur, Gurmanik; Arora, Ajat Shatru; Jain, Vijender Kumar

    2017-01-01

    Crossing the legs at the knees, during BP measurement, is one of the several physiological stimuli that considerably influence the accuracy of BP measurements. Therefore, it is paramount to develop an appropriate prediction model for interpreting influence of crossed legs on BP. This research work described the use of principal component analysis- (PCA-) fused forward stepwise regression (FSWR), artificial neural network (ANN), adaptive neuro fuzzy inference system (ANFIS), and least squares support vector machine (LS-SVM) models for prediction of BP reactivity to crossed legs among the normotensive and hypertensive participants. The evaluation of the performance of the proposed prediction models using appropriate statistical indices showed that the PCA-based LS-SVM (PCA-LS-SVM) model has the highest prediction accuracy with coefficient of determination ( R 2 ) = 93.16%, root mean square error (RMSE) = 0.27, and mean absolute percentage error (MAPE) = 5.71 for SBP prediction in normotensive subjects. Furthermore, R 2  = 96.46%, RMSE = 0.19, and MAPE = 1.76 for SBP prediction and R 2  = 95.44%, RMSE = 0.21, and MAPE = 2.78 for DBP prediction in hypertensive subjects using the PCA-LSSVM model. This assessment presents the importance and advantages posed by hybrid computing models for the prediction of variables in biomedical research studies.

  20. Ensemble support vector machine classification of dementia using structural MRI and mini-mental state examination.

    PubMed

    Sørensen, Lauge; Nielsen, Mads

    2018-05-15

    The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Human action recognition with group lasso regularized-support vector machine

    NASA Astrophysics Data System (ADS)

    Luo, Huiwu; Lu, Huanzhang; Wu, Yabei; Zhao, Fei

    2016-05-01

    The bag-of-visual-words (BOVW) and Fisher kernel are two popular models in human action recognition, and support vector machine (SVM) is the most commonly used classifier for the two models. We show two kinds of group structures in the feature representation constructed by BOVW and Fisher kernel, respectively, since the structural information of feature representation can be seen as a prior for the classifier and can improve the performance of the classifier, which has been verified in several areas. However, the standard SVM employs L2-norm regularization in its learning procedure, which penalizes each variable individually and cannot express the structural information of feature representation. We replace the L2-norm regularization with group lasso regularization in standard SVM, and a group lasso regularized-support vector machine (GLRSVM) is proposed. Then, we embed the group structural information of feature representation into GLRSVM. Finally, we introduce an algorithm to solve the optimization problem of GLRSVM by alternating directions method of multipliers. The experiments evaluated on KTH, YouTube, and Hollywood2 datasets show that our method achieves promising results and improves the state-of-the-art methods on KTH and YouTube datasets.

  2. [MicroRNA Target Prediction Based on Support Vector Machine Ensemble Classification Algorithm of Under-sampling Technique].

    PubMed

    Chen, Zhiru; Hong, Wenxue

    2016-02-01

    Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.

  3. Onboard Classifiers for Science Event Detection on a Remote Sensing Spacecraft

    NASA Technical Reports Server (NTRS)

    Castano, Rebecca; Mazzoni, Dominic; Tang, Nghia; Greeley, Ron; Doggett, Thomas; Cichy, Ben; Chien, Steve; Davies, Ashley

    2006-01-01

    Typically, data collected by a spacecraft is downlinked to Earth and pre-processed before any analysis is performed. We have developed classifiers that can be used onboard a spacecraft to identify high priority data for downlink to Earth, providing a method for maximizing the use of a potentially bandwidth limited downlink channel. Onboard analysis can also enable rapid reaction to dynamic events, such as flooding, volcanic eruptions or sea ice break-up. Four classifiers were developed to identify cryosphere events using hyperspectral images. These classifiers include a manually constructed classifier, a Support Vector Machine (SVM), a Decision Tree and a classifier derived by searching over combinations of thresholded band ratios. Each of the classifiers was designed to run in the computationally constrained operating environment of the spacecraft. A set of scenes was hand-labeled to provide training and testing data. Performance results on the test data indicate that the SVM and manual classifiers outperformed the Decision Tree and band-ratio classifiers with the SVM yielding slightly better classifications than the manual classifier.

  4. Use of Machine Learning Classifiers and Sensor Data to Detect Neurological Deficit in Stroke Patients.

    PubMed

    Park, Eunjeong; Chang, Hyuk-Jae; Nam, Hyo Suk

    2017-04-18

    The pronator drift test (PDT), a neurological examination, is widely used in clinics to measure motor weakness of stroke patients. The aim of this study was to develop a PDT tool with machine learning classifiers to detect stroke symptoms based on quantification of proximal arm weakness using inertial sensors and signal processing. We extracted features of drift and pronation from accelerometer signals of wearable devices on the inner wrists of 16 stroke patients and 10 healthy controls. Signal processing and feature selection approach were applied to discriminate PDT features used to classify stroke patients. A series of machine learning techniques, namely support vector machine (SVM), radial basis function network (RBFN), and random forest (RF), were implemented to discriminate stroke patients from controls with leave-one-out cross-validation. Signal processing by the PDT tool extracted a total of 12 PDT features from sensors. Feature selection abstracted the major attributes from the 12 PDT features to elucidate the dominant characteristics of proximal weakness of stroke patients using machine learning classification. Our proposed PDT classifiers had an area under the receiver operating characteristic curve (AUC) of .806 (SVM), .769 (RBFN), and .900 (RF) without feature selection, and feature selection improves the AUCs to .913 (SVM), .956 (RBFN), and .975 (RF), representing an average performance enhancement of 15.3%. Sensors and machine learning methods can reliably detect stroke signs and quantify proximal arm weakness. Our proposed solution will facilitate pervasive monitoring of stroke patients. ©Eunjeong Park, Hyuk-Jae Chang, Hyo Suk Nam. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 18.04.2017.

  5. Multimodal data and machine learning for surgery outcome prediction in complicated cases of mesial temporal lobe epilepsy.

    PubMed

    Memarian, Negar; Kim, Sally; Dewar, Sandra; Engel, Jerome; Staba, Richard J

    2015-09-01

    This study sought to predict postsurgical seizure freedom from pre-operative diagnostic test results and clinical information using a rapid automated approach, based on supervised learning methods in patients with drug-resistant focal seizures suspected to begin in temporal lobe. We applied machine learning, specifically a combination of mutual information-based feature selection and supervised learning classifiers on multimodal data, to predict surgery outcome retrospectively in 20 presurgical patients (13 female; mean age±SD, in years 33±9.7 for females, and 35.3±9.4 for males) who were diagnosed with mesial temporal lobe epilepsy (MTLE) and subsequently underwent standard anteromesial temporal lobectomy. The main advantage of the present work over previous studies is the inclusion of the extent of ipsilateral neocortical gray matter atrophy and spatiotemporal properties of depth electrode-recorded seizures as training features for individual patient surgery planning. A maximum relevance minimum redundancy (mRMR) feature selector identified the following features as the most informative predictors of postsurgical seizure freedom in this study's sample of patients: family history of epilepsy, ictal EEG onset pattern (positive correlation with seizure freedom), MRI-based gray matter thickness reduction in the hemisphere ipsilateral to seizure onset, proportion of seizures that first appeared in ipsilateral amygdala to total seizures, age, epilepsy duration, delay in the spread of ipsilateral ictal discharges from site of onset, gender, and number of electrode contacts at seizure onset (negative correlation with seizure freedom). Using these features in combination with a least square support vector machine (LS-SVM) classifier compared to other commonly used classifiers resulted in very high surgical outcome prediction accuracy (95%). Supervised machine learning using multimodal compared to unimodal data accurately predicted postsurgical outcome in patients with atypical MTLE. Published by Elsevier Ltd.

  6. SVM-based feature extraction and classification of aflatoxin contaminated corn using fluorescence hyperspectral data

    USDA-ARS?s Scientific Manuscript database

    Support Vector Machine (SVM) was used in the Genetic Algorithms (GA) process to select and classify a subset of hyperspectral image bands. The method was applied to fluorescence hyperspectral data for the detection of aflatoxin contamination in Aspergillus flavus infected single corn kernels. In the...

  7. Wire connector classification with machine vision and a novel hybrid SVM

    NASA Astrophysics Data System (ADS)

    Chauhan, Vedang; Joshi, Keyur D.; Surgenor, Brian W.

    2018-04-01

    A machine vision-based system has been developed and tested that uses a novel hybrid Support Vector Machine (SVM) in a part inspection application with clear plastic wire connectors. The application required the system to differentiate between 4 different known styles of connectors plus one unknown style, for a total of 5 classes. The requirement to handle an unknown class is what necessitated the hybrid approach. The system was trained with the 4 known classes and tested with 5 classes (the 4 known plus the 1 unknown). The hybrid classification approach used two layers of SVMs: one layer was semi-supervised and the other layer was supervised. The semi-supervised SVM was a special case of unsupervised machine learning that classified test images as one of the 4 known classes (to accept) or as the unknown class (to reject). The supervised SVM classified test images as one of the 4 known classes and consequently would give false positives (FPs). Two methods were tested. The difference between the methods was that the order of the layers was switched. The method with the semi-supervised layer first gave an accuracy of 80% with 20% FPs. The method with the supervised layer first gave an accuracy of 98% with 0% FPs. Further work is being conducted to see if the hybrid approach works with other applications that have an unknown class requirement.

  8. Study of support vector machine and serum surface-enhanced Raman spectroscopy for noninvasive esophageal cancer detection

    NASA Astrophysics Data System (ADS)

    Li, Shao-Xin; Zeng, Qiu-Yao; Li, Lin-Fang; Zhang, Yan-Jiao; Wan, Ming-Ming; Liu, Zhi-Ming; Xiong, Hong-Lian; Guo, Zhou-Yi; Liu, Song-Hao

    2013-02-01

    The ability of combining serum surface-enhanced Raman spectroscopy (SERS) with support vector machine (SVM) for improving classification esophageal cancer patients from normal volunteers is investigated. Two groups of serum SERS spectra based on silver nanoparticles (AgNPs) are obtained: one group from patients with pathologically confirmed esophageal cancer (n=30) and the other group from healthy volunteers (n=31). Principal components analysis (PCA), conventional SVM (C-SVM) and conventional SVM combination with PCA (PCA-SVM) methods are implemented to classify the same spectral dataset. Results show that a diagnostic accuracy of 77.0% is acquired for PCA technique, while diagnostic accuracies of 83.6% and 85.2% are obtained for C-SVM and PCA-SVM methods based on radial basis functions (RBF) models. The results prove that RBF SVM models are superior to PCA algorithm in classification serum SERS spectra. The study demonstrates that serum SERS in combination with SVM technique has great potential to provide an effective and accurate diagnostic schema for noninvasive detection of esophageal cancer.

  9. Cerebral 18F-FDG PET in macrophagic myofasciitis: An individual SVM-based approach.

    PubMed

    Blanc-Durand, Paul; Van Der Gucht, Axel; Guedj, Eric; Abulizi, Mukedaisi; Aoun-Sebaiti, Mehdi; Lerman, Lionel; Verger, Antoine; Authier, François-Jérôme; Itti, Emmanuel

    2017-01-01

    Macrophagic myofasciitis (MMF) is an emerging condition with highly specific myopathological alterations. A peculiar spatial pattern of a cerebral glucose hypometabolism involving occipito-temporal cortex and cerebellum have been reported in patients with MMF; however, the full pattern is not systematically present in routine interpretation of scans, and with varying degrees of severity depending on the cognitive profile of patients. Aim was to generate and evaluate a support vector machine (SVM) procedure to classify patients between healthy or MMF 18F-FDG brain profiles. 18F-FDG PET brain images of 119 patients with MMF and 64 healthy subjects were retrospectively analyzed. The whole-population was divided into two groups; a training set (100 MMF, 44 healthy subjects) and a testing set (19 MMF, 20 healthy subjects). Dimensionality reduction was performed using a t-map from statistical parametric mapping (SPM) and a SVM with a linear kernel was trained on the training set. To evaluate the performance of the SVM classifier, values of sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV) and accuracy (Acc) were calculated. The SPM12 analysis on the training set exhibited the already reported hypometabolism pattern involving occipito-temporal and fronto-parietal cortices, limbic system and cerebellum. The SVM procedure, based on the t-test mask generated from the training set, correctly classified MMF patients of the testing set with following Se, Sp, PPV, NPV and Acc: 89%, 85%, 85%, 89%, and 87%. We developed an original and individual approach including a SVM to classify patients between healthy or MMF metabolic brain profiles using 18F-FDG-PET. Machine learning algorithms are promising for computer-aided diagnosis but will need further validation in prospective cohorts.

  10. Hyperspectral imaging for predicting the allicin and soluble solid content of garlic with variable selection algorithms and chemometric models.

    PubMed

    Rahman, Anisur; Faqeerzada, Mohammad A; Cho, Byoung-Kwan

    2018-03-14

    Allicin and soluble solid content (SSC) in garlic is the responsible for its pungent flavor and odor. However, current conventional methods such as the use of high-pressure liquid chromatography and a refractometer have critical drawbacks in that they are time-consuming, labor-intensive and destructive procedures. The present study aimed to predict allicin and SSC in garlic using hyperspectral imaging in combination with variable selection algorithms and calibration models. Hyperspectral images of 100 garlic cloves were acquired that covered two spectral ranges, from which the mean spectra of each clove were extracted. The calibration models included partial least squares (PLS) and least squares-support vector machine (LS-SVM) regression, as well as different spectral pre-processing techniques, from which the highest performing spectral preprocessing technique and spectral range were selected. Then, variable selection methods, such as regression coefficients, variable importance in projection (VIP) and the successive projections algorithm (SPA), were evaluated for the selection of effective wavelengths (EWs). Furthermore, PLS and LS-SVM regression methods were applied to quantitatively predict the quality attributes of garlic using the selected EWs. Of the established models, the SPA-LS-SVM model obtained an Rpred2 of 0.90 and standard error of prediction (SEP) of 1.01% for SSC prediction, whereas the VIP-LS-SVM model produced the best result with an Rpred2 of 0.83 and SEP of 0.19 mg g -1 for allicin prediction in the range 1000-1700 nm. Furthermore, chemical images of garlic were developed using the best predictive model to facilitate visualization of the spatial distributions of allicin and SSC. The present study clearly demonstrates that hyperspectral imaging combined with an appropriate chemometrics method can potentially be employed as a fast, non-invasive method to predict the allicin and SSC in garlic. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.

  11. Multi-phase classification by a least-squares support vector machine approach in tomography images of geological samples

    NASA Astrophysics Data System (ADS)

    Khan, Faisal; Enzmann, Frieder; Kersten, Michael

    2016-03-01

    Image processing of X-ray-computed polychromatic cone-beam micro-tomography (μXCT) data of geological samples mainly involves artefact reduction and phase segmentation. For the former, the main beam-hardening (BH) artefact is removed by applying a best-fit quadratic surface algorithm to a given image data set (reconstructed slice), which minimizes the BH offsets of the attenuation data points from that surface. A Matlab code for this approach is provided in the Appendix. The final BH-corrected image is extracted from the residual data or from the difference between the surface elevation values and the original grey-scale values. For the segmentation, we propose a novel least-squares support vector machine (LS-SVM, an algorithm for pixel-based multi-phase classification) approach. A receiver operating characteristic (ROC) analysis was performed on BH-corrected and uncorrected samples to show that BH correction is in fact an important prerequisite for accurate multi-phase classification. The combination of the two approaches was thus used to classify successfully three different more or less complex multi-phase rock core samples.

  12. Face recognition using total margin-based adaptive fuzzy support vector machines.

    PubMed

    Liu, Yi-Hung; Chen, Yen-Ting

    2007-01-01

    This paper presents a new classifier called total margin-based adaptive fuzzy support vector machines (TAF-SVM) that deals with several problems that may occur in support vector machines (SVMs) when applied to the face recognition. The proposed TAF-SVM not only solves the overfitting problem resulted from the outlier with the approach of fuzzification of the penalty, but also corrects the skew of the optimal separating hyperplane due to the very imbalanced data sets by using different cost algorithm. In addition, by introducing the total margin algorithm to replace the conventional soft margin algorithm, a lower generalization error bound can be obtained. Those three functions are embodied into the traditional SVM so that the TAF-SVM is proposed and reformulated in both linear and nonlinear cases. By using two databases, the Chung Yuan Christian University (CYCU) multiview and the facial recognition technology (FERET) face databases, and using the kernel Fisher's discriminant analysis (KFDA) algorithm to extract discriminating face features, experimental results show that the proposed TAF-SVM is superior to SVM in terms of the face-recognition accuracy. The results also indicate that the proposed TAF-SVM can achieve smaller error variances than SVM over a number of tests such that better recognition stability can be obtained.

  13. Breast Cancer Recognition Using a Novel Hybrid Intelligent Method

    PubMed Central

    Addeh, Jalil; Ebrahimzadeh, Ata

    2012-01-01

    Breast cancer is the second largest cause of cancer deaths among women. At the same time, it is also among the most curable cancer types if it can be diagnosed early. This paper presents a novel hybrid intelligent method for recognition of breast cancer tumors. The proposed method includes three main modules: the feature extraction module, the classifier module, and the optimization module. In the feature extraction module, fuzzy features are proposed as the efficient characteristic of the patterns. In the classifier module, because of the promising generalization capability of support vector machines (SVM), a SVM-based classifier is proposed. In support vector machine training, the hyperparameters have very important roles for its recognition accuracy. Therefore, in the optimization module, the bees algorithm (BA) is proposed for selecting appropriate parameters of the classifier. The proposed system is tested on Wisconsin Breast Cancer database and simulation results show that the recommended system has a high accuracy. PMID:23626945

  14. An expert support system for breast cancer diagnosis using color wavelet features.

    PubMed

    Issac Niwas, S; Palanisamy, P; Chibbar, Rajni; Zhang, W J

    2012-10-01

    Breast cancer diagnosis can be done through the pathologic assessments of breast tissue samples such as core needle biopsy technique. The result of analysis on this sample by pathologist is crucial for breast cancer patient. In this paper, nucleus of tissue samples are investigated after decomposition by means of the Log-Gabor wavelet on HSV color domain and an algorithm is developed to compute the color wavelet features. These features are used for breast cancer diagnosis using Support Vector Machine (SVM) classifier algorithm. The ability of properly trained SVM is to correctly classify patterns and make them particularly suitable for use in an expert system that aids in the diagnosis of cancer tissue samples. The results are compared with other multivariate classifiers such as Naïves Bayes classifier and Artificial Neural Network. The overall accuracy of the proposed method using SVM classifier will be further useful for automation in cancer diagnosis.

  15. Interpreting support vector machine models for multivariate group wise analysis in neuroimaging

    PubMed Central

    Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos

    2015-01-01

    Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913

  16. Machine learning classifiers for glaucoma diagnosis based on classification of retinal nerve fibre layer thickness parameters measured by Stratus OCT.

    PubMed

    Bizios, Dimitrios; Heijl, Anders; Hougaard, Jesper Leth; Bengtsson, Boel

    2010-02-01

    To compare the performance of two machine learning classifiers (MLCs), artificial neural networks (ANNs) and support vector machines (SVMs), with input based on retinal nerve fibre layer thickness (RNFLT) measurements by optical coherence tomography (OCT), on the diagnosis of glaucoma, and to assess the effects of different input parameters. We analysed Stratus OCT data from 90 healthy persons and 62 glaucoma patients. Performance of MLCs was compared using conventional OCT RNFLT parameters plus novel parameters such as minimum RNFLT values, 10th and 90th percentiles of measured RNFLT, and transformations of A-scan measurements. For each input parameter and MLC, the area under the receiver operating characteristic curve (AROC) was calculated. There were no statistically significant differences between ANNs and SVMs. The best AROCs for both ANN (0.982, 95%CI: 0.966-0.999) and SVM (0.989, 95% CI: 0.979-1.0) were based on input of transformed A-scan measurements. Our SVM trained on this input performed better than ANNs or SVMs trained on any of the single RNFLT parameters (p < or = 0.038). The performance of ANNs and SVMs trained on minimum thickness values and the 10th and 90th percentiles were at least as good as ANNs and SVMs with input based on the conventional RNFLT parameters. No differences between ANN and SVM were observed in this study. Both MLCs performed very well, with similar diagnostic performance. Input parameters have a larger impact on diagnostic performance than the type of machine classifier. Our results suggest that parameters based on transformed A-scan thickness measurements of the RNFL processed by machine classifiers can improve OCT-based glaucoma diagnosis.

  17. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes.

    PubMed

    Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying

    2015-01-01

    Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.

  18. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes

    PubMed Central

    2015-01-01

    Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes. PMID:26681483

  19. Automated detection of heart ailments from 12-lead ECG using complex wavelet sub-band bi-spectrum features.

    PubMed

    Tripathy, Rajesh Kumar; Dandapat, Samarendra

    2017-04-01

    The complex wavelet sub-band bi-spectrum (CWSB) features are proposed for detection and classification of myocardial infarction (MI), heart muscle disease (HMD) and bundle branch block (BBB) from 12-lead ECG. The dual tree CW transform of 12-lead ECG produces CW coefficients at different sub-bands. The higher-order CW analysis is used for evaluation of CWSB. The mean of the absolute value of CWSB, and the number of negative phase angle and the number of positive phase angle features from the phase of CWSB of 12-lead ECG are evaluated. Extreme learning machine and support vector machine (SVM) classifiers are used to evaluate the performance of CWSB features. Experimental results show that the proposed CWSB features of 12-lead ECG and the SVM classifier are successful for classification of various heart pathologies. The individual accuracy values for MI, HMD and BBB classes are obtained as 98.37, 97.39 and 96.40%, respectively, using SVM classifier and radial basis function kernel function. A comparison has also been made with existing 12-lead ECG-based cardiac disease detection techniques.

  20. Prediction of Flood Warning in Taiwan Using Nonlinear SVM with Simulated Annealing Algorithm

    NASA Astrophysics Data System (ADS)

    Lee, C.

    2013-12-01

    The issue of the floods is important in Taiwan. It is because the narrow and high topography of the island make lots of rivers steep in Taiwan. The tropical depression likes typhoon always causes rivers to flood. Prediction of river flow under the extreme rainfall circumstances is important for government to announce the warning of flood. Every time typhoon passed through Taiwan, there were always floods along some rivers. The warning is classified to three levels according to the warning water levels in Taiwan. The propose of this study is to predict the level of floods warning from the information of precipitation, rainfall duration and slope of riverbed. To classify the level of floods warning by the above-mentioned information and modeling the problems, a machine learning model, nonlinear Support vector machine (SVM), is formulated to classify the level of floods warning. In addition, simulated annealing (SA), a probabilistic heuristic algorithm, is used to determine the optimal parameter of the SVM model. A case study of flooding-trend rivers of different gradients in Taiwan is conducted. The contribution of this SVM model with simulated annealing is capable of making efficient announcement for flood warning and keeping the danger of flood from residents along the rivers.

  1. Use of Vis/NIRS for the determination of sugar content of cola soft drinks based on chemometric methods

    NASA Astrophysics Data System (ADS)

    Liu, Fei; He, Yong

    2008-03-01

    Three different chemometric methods were performed for the determination of sugar content of cola soft drinks using visible and near infrared spectroscopy (Vis/NIRS). Four varieties of colas were prepared and 180 samples (45 samples for each variety) were selected for the calibration set, while 60 samples (15 samples for each variety) for the validation set. The smoothing way of Savitzky-Golay, standard normal variate (SNV) and Savitzky-Golay first derivative transformation were applied for the pre-processing of spectral data. The first eleven principal components (PCs) extracted by partial least squares (PLS) analysis were employed as the inputs of BP neural network (BPNN) and least squares-support vector machine (LS-SVM) model. Then the BPNN model with the optimal structural parameters and LS-SVM model with radial basis function (RBF) kernel were applied to build the regression model with a comparison of PLS regression. The correlation coefficient (r), root mean square error of prediction (RMSEP) and bias for prediction were 0.971, 1.259 and -0.335 for PLS, 0.986, 0.763, and -0.042 for BPNN, while 0.978, 0.995 and -0.227 for LS-SVM, respectively. All the three methods supplied a high and satisfying precision. The results indicated that Vis/NIR spectroscopy combined with chemometric methods could be utilized as a high precision way for the determination of sugar content of cola soft drinks.

  2. Feasibility in multispectral imaging for predicting the content of bioactive compounds in intact tomato fruit.

    PubMed

    Liu, Changhong; Liu, Wei; Chen, Wei; Yang, Jianbo; Zheng, Lei

    2015-04-15

    Tomato is an important health-stimulating fruit because of the antioxidant properties of its main bioactive compounds, dominantly lycopene and phenolic compounds. Nowadays, product differentiation in the fruit market requires an accurate evaluation of these value-added compounds. An experiment was conducted to simultaneously and non-destructively measure lycopene and phenolic compounds content in intact tomatoes using multispectral imaging combined with chemometric methods. Partial least squares (PLS), least squares-support vector machines (LS-SVM) and back propagation neural network (BPNN) were applied to develop quantitative models. Compared with PLS and LS-SVM, BPNN model considerably improved the performance with coefficient of determination in prediction (RP(2))=0.938 and 0.965, residual predictive deviation (RPD)=4.590 and 9.335 for lycopene and total phenolics content prediction, respectively. It is concluded that multispectral imaging is an attractive alternative to the standard methods for determination of bioactive compounds content in intact tomatoes, providing a useful platform for infield fruit sorting/grading. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Multi-sensor information fusion method for vibration fault diagnosis of rolling bearing

    NASA Astrophysics Data System (ADS)

    Jiao, Jing; Yue, Jianhai; Pei, Di

    2017-10-01

    Bearing is a key element in high-speed electric multiple unit (EMU) and any defect of it can cause huge malfunctioning of EMU under high operation speed. This paper presents a new method for bearing fault diagnosis based on least square support vector machine (LS-SVM) in feature-level fusion and Dempster-Shafer (D-S) evidence theory in decision-level fusion which were used to solve the problems about low detection accuracy, difficulty in extracting sensitive characteristics and unstable diagnosis system of single-sensor in rolling bearing fault diagnosis. Wavelet de-nosing technique was used for removing the signal noises. LS-SVM was used to make pattern recognition of the bearing vibration signal, and then fusion process was made according to the D-S evidence theory, so as to realize recognition of bearing fault. The results indicated that the data fusion method improved the performance of the intelligent approach in rolling bearing fault detection significantly. Moreover, the results showed that this method can efficiently improve the accuracy of fault diagnosis.

  4. Design of Clinical Support Systems Using Integrated Genetic Algorithm and Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Chen, Yung-Fu; Huang, Yung-Fa; Jiang, Xiaoyi; Hsu, Yuan-Nian; Lin, Hsuan-Hung

    Clinical decision support system (CDSS) provides knowledge and specific information for clinicians to enhance diagnostic efficiency and improving healthcare quality. An appropriate CDSS can highly elevate patient safety, improve healthcare quality, and increase cost-effectiveness. Support vector machine (SVM) is believed to be superior to traditional statistical and neural network classifiers. However, it is critical to determine suitable combination of SVM parameters regarding classification performance. Genetic algorithm (GA) can find optimal solution within an acceptable time, and is faster than greedy algorithm with exhaustive searching strategy. By taking the advantage of GA in quickly selecting the salient features and adjusting SVM parameters, a method using integrated GA and SVM (IGS), which is different from the traditional method with GA used for feature selection and SVM for classification, was used to design CDSSs for prediction of successful ventilation weaning, diagnosis of patients with severe obstructive sleep apnea, and discrimination of different cell types form Pap smear. The results show that IGS is better than methods using SVM alone or linear discriminator.

  5. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

    PubMed Central

    Thanh Noi, Phan; Kappas, Martin

    2017-01-01

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909

  6. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery.

    PubMed

    Thanh Noi, Phan; Kappas, Martin

    2017-12-22

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.

  7. Classifying social anxiety disorder using multivoxel pattern analyses of brain function and structure☆

    PubMed Central

    Frick, Andreas; Gingnell, Malin; Marquand, Andre F.; Howner, Katarina; Fischer, Håkan; Kristiansson, Marianne; Williams, Steven C.R.; Fredrikson, Mats; Furmark, Tomas

    2014-01-01

    Functional neuroimaging of social anxiety disorder (SAD) support altered neural activation to threat-provoking stimuli focally in the fear network, while structural differences are distributed over the temporal and frontal cortices as well as limbic structures. Previous neuroimaging studies have investigated the brain at the voxel level using mass-univariate methods which do not enable detection of more complex patterns of activity and structural alterations that may separate SAD from healthy individuals. Support vector machine (SVM) is a supervised machine learning method that capitalizes on brain activation and structural patterns to classify individuals. The aim of this study was to investigate if it is possible to discriminate SAD patients (n = 14) from healthy controls (n = 12) using SVM based on (1) functional magnetic resonance imaging during fearful face processing and (2) regional gray matter volume. Whole brain and region of interest (fear network) SVM analyses were performed for both modalities. For functional scans, significant classifications were obtained both at whole brain level and when restricting the analysis to the fear network while gray matter SVM analyses correctly classified participants only when using the whole brain search volume. These results support that SAD is characterized by aberrant neural activation to affective stimuli in the fear network, while disorder-related alterations in regional gray matter volume are more diffusely distributed over the whole brain. SVM may thus be useful for identifying imaging biomarkers of SAD. PMID:24239689

  8. An SVM-Based Classifier for Estimating the State of Various Rotating Components in Agro-Industrial Machinery with a Vibration Signal Acquired from a Single Point on the Machine Chassis

    PubMed Central

    Ruiz-Gonzalez, Ruben; Gomez-Gil, Jaime; Gomez-Gil, Francisco Javier; Martínez-Martínez, Víctor

    2014-01-01

    The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM)-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i) accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii) the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii) when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels. PMID:25372618

  9. An SVM-based classifier for estimating the state of various rotating components in agro-industrial machinery with a vibration signal acquired from a single point on the machine chassis.

    PubMed

    Ruiz-Gonzalez, Ruben; Gomez-Gil, Jaime; Gomez-Gil, Francisco Javier; Martínez-Martínez, Víctor

    2014-11-03

    The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM)-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i) accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii) the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii) when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels.

  10. A Novel Bearing Multi-Fault Diagnosis Approach Based on Weighted Permutation Entropy and an Improved SVM Ensemble Classifier.

    PubMed

    Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang

    2018-06-14

    Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.

  11. An ensemble of SVM classifiers based on gene pairs.

    PubMed

    Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin

    2013-07-01

    In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Comparison of SVM RBF-NN and DT for crop and weed identification based on spectral measurement over corn fields

    USDA-ARS?s Scientific Manuscript database

    It is important to find an appropriate pattern-recognition method for in-field plant identification based on spectral measurement in order to classify the crop and weeds accurately. In this study, the method of Support Vector Machine (SVM) was evaluated and compared with two other methods, Decision ...

  13. lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.

    PubMed

    Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia

    2015-01-01

    Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.

  14. Differentiation of several interstitial lung disease patterns in HRCT images using support vector machine: role of databases on performance

    NASA Astrophysics Data System (ADS)

    Kale, Mandar; Mukhopadhyay, Sudipta; Dash, Jatindra K.; Garg, Mandeep; Khandelwal, Niranjan

    2016-03-01

    Interstitial lung disease (ILD) is complicated group of pulmonary disorders. High Resolution Computed Tomography (HRCT) considered to be best imaging technique for analysis of different pulmonary disorders. HRCT findings can be categorised in several patterns viz. Consolidation, Emphysema, Ground Glass Opacity, Nodular, Normal etc. based on their texture like appearance. Clinician often find it difficult to diagnosis these pattern because of their complex nature. In such scenario computer-aided diagnosis system could help clinician to identify patterns. Several approaches had been proposed for classification of ILD patterns. This includes computation of textural feature and training /testing of classifier such as artificial neural network (ANN), support vector machine (SVM) etc. In this paper, wavelet features are calculated from two different ILD database, publically available MedGIFT ILD database and private ILD database, followed by performance evaluation of ANN and SVM classifiers in terms of average accuracy. It is found that average classification accuracy by SVM is greater than ANN where trained and tested on same database. Investigation continued further to test variation in accuracy of classifier when training and testing is performed with alternate database and training and testing of classifier with database formed by merging samples from same class from two individual databases. The average classification accuracy drops when two independent databases used for training and testing respectively. There is significant improvement in average accuracy when classifiers are trained and tested with merged database. It infers dependency of classification accuracy on training data. It is observed that SVM outperforms ANN when same database is used for training and testing.

  15. Effective Sequential Classifier Training for SVM-Based Multitemporal Remote Sensing Image Classification

    NASA Astrophysics Data System (ADS)

    Guo, Yiqing; Jia, Xiuping; Paull, David

    2018-06-01

    The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.

  16. Use of a machine learning algorithm to classify expertise: analysis of hand motion patterns during a simulated surgical task.

    PubMed

    Watson, Robert A

    2014-08-01

    To test the hypothesis that machine learning algorithms increase the predictive power to classify surgical expertise using surgeons' hand motion patterns. In 2012 at the University of North Carolina at Chapel Hill, 14 surgical attendings and 10 first- and second-year surgical residents each performed two bench model venous anastomoses. During the simulated tasks, the participants wore an inertial measurement unit on the dorsum of their dominant (right) hand to capture their hand motion patterns. The pattern from each bench model task performed was preprocessed into a symbolic time series and labeled as expert (attending) or novice (resident). The labeled hand motion patterns were processed and used to train a Support Vector Machine (SVM) classification algorithm. The trained algorithm was then tested for discriminative/predictive power against unlabeled (blinded) hand motion patterns from tasks not used in the training. The Lempel-Ziv (LZ) complexity metric was also measured from each hand motion pattern, with an optimal threshold calculated to separately classify the patterns. The LZ metric classified unlabeled (blinded) hand motion patterns into expert and novice groups with an accuracy of 70% (sensitivity 64%, specificity 80%). The SVM algorithm had an accuracy of 83% (sensitivity 86%, specificity 80%). The results confirmed the hypothesis. The SVM algorithm increased the predictive power to classify blinded surgical hand motion patterns into expert versus novice groups. With further development, the system used in this study could become a viable tool for low-cost, objective assessment of procedural proficiency in a competency-based curriculum.

  17. QSAR study of anthranilic acid sulfonamides as inhibitors of methionine aminopeptidase-2 using LS-SVM and GRNN based on principal components.

    PubMed

    Shahlaei, Mohsen; Sabet, Razieh; Ziari, Maryam Bahman; Moeinifard, Behzad; Fassihi, Afshin; Karbakhsh, Reza

    2010-10-01

    Quantitative relationships between molecular structure and methionine aminopeptidase-2 inhibitory activity of a series of cytotoxic anthranilic acid sulfonamide derivatives were discovered. We have demonstrated the detailed application of two efficient nonlinear methods for evaluation of quantitative structure-activity relationships of the studied compounds. Components produced by principal component analysis as input of developed nonlinear models were used. The performance of the developed models namely PC-GRNN and PC-LS-SVM were tested by several validation methods. The resulted PC-LS-SVM model had a high statistical quality (R(2)=0.91 and R(CV)(2)=0.81) for predicting the cytotoxic activity of the compounds. Comparison between predictability of PC-GRNN and PC-LS-SVM indicates that later method has higher ability to predict the activity of the studied molecules. Copyright (c) 2010 Elsevier Masson SAS. All rights reserved.

  18. Applying six classifiers to airborne hyperspectral imagery for detecting giant reed

    USDA-ARS?s Scientific Manuscript database

    This study evaluated and compared six different image classifiers, including minimum distance (MD), Mahalanobis distance (MAHD), maximum likelihood (ML), spectral angle mapper (SAM), mixture tuned matched filtering (MTMF) and support vector machine (SVM), for detecting and mapping giant reed (Arundo...

  19. Data mining for the analysis of hippocampal zones in Alzheimer's disease

    NASA Astrophysics Data System (ADS)

    Ovando Vázquez, Cesaré M.

    2012-02-01

    In this work, a methodology to classify people with Alzheimer's Disease (AD), Healthy Controls (HC) and people with Mild Cognitive Impairment (MCI) is presented. This methodology consists of an ensemble of Support Vector Machines (SVM) with the hippocampal boxes (HB) as input data, these hippocampal zones are taken from Magnetic Resonance (MRI) and Positron Emission Tomography (PET) images. Two ways of constructing this ensemble are presented, the first consists of linear SVM models and the second of non-linear SVM models. Results demonstrate that the linear models classify HBs more accurately than the non-linear models between HC and MCI and that there are no differences between HC and AD.

  20. Support vector machine classification and characterization of age-related reorganization of functional brain networks

    PubMed Central

    Meier, Timothy B.; Desphande, Alok S.; Vergun, Svyatoslav; Nair, Veena A.; Song, Jie; Biswal, Bharat B.; Meyerand, Mary E.; Birn, Rasmus M.; Prabhakaran, Vivek

    2012-01-01

    Most of what is known about the reorganization of functional brain networks that accompanies normal aging is based on neuroimaging studies in which participants perform specific tasks. In these studies, reorganization is defined by the differences in task activation between young and old adults. However, task activation differences could be the result of differences in task performance, strategy, or motivation, and not necessarily reflect reorganization. Resting-state fMRI provides a method of investigating functional brain networks without such confounds. Here, a support vector machine (SVM) classifier was used in an attempt to differentiate older adults from younger adults based on their resting-state functional connectivity. In addition, the information used by the SVM was investigated to see what functional connections best differentiated younger adult brains from older adult brains. Three separate resting-state scans from 26 younger adults (18-35 yrs) and 26 older adults (55-85) were obtained from the International Consortium for Brain Mapping (ICBM) dataset made publically available in the 1000 Functional Connectomes project www.nitrc.org/projects/fcon_1000. 100 seed-regions from four functional networks with 5 mm3 radius were defined based on a recent study using machine learning classifiers on adolescent brains. Time-series for every seed-region were averaged and three matrices of z-transformed correlation coefficients were created for each subject corresponding to each individual’s three resting-state scans. SVM was then applied using leave-one-out cross-validation. The SVM classifier was 84% accurate in classifying older and younger adult brains. The majority of the connections used by the classifier to distinguish subjects by age came from seed-regions belonging to the sensorimotor and cingulo-opercular networks. These results suggest that age-related decreases in positive correlations within the cingulo-opercular and default networks, and decreases in negative correlations between the default and sensorimotor networks, are the distinguishing characteristics of age-related reorganization. PMID:22227886

  1. Support vector machine classification and characterization of age-related reorganization of functional brain networks.

    PubMed

    Meier, Timothy B; Desphande, Alok S; Vergun, Svyatoslav; Nair, Veena A; Song, Jie; Biswal, Bharat B; Meyerand, Mary E; Birn, Rasmus M; Prabhakaran, Vivek

    2012-03-01

    Most of what is known about the reorganization of functional brain networks that accompanies normal aging is based on neuroimaging studies in which participants perform specific tasks. In these studies, reorganization is defined by the differences in task activation between young and old adults. However, task activation differences could be the result of differences in task performance, strategy, or motivation, and not necessarily reflect reorganization. Resting-state fMRI provides a method of investigating functional brain networks without such confounds. Here, a support vector machine (SVM) classifier was used in an attempt to differentiate older adults from younger adults based on their resting-state functional connectivity. In addition, the information used by the SVM was investigated to see what functional connections best differentiated younger adult brains from older adult brains. Three separate resting-state scans from 26 younger adults (18-35 yrs) and 26 older adults (55-85) were obtained from the International Consortium for Brain Mapping (ICBM) dataset made publically available in the 1000 Functional Connectomes project www.nitrc.org/projects/fcon_1000. 100 seed-regions from four functional networks with 5mm(3) radius were defined based on a recent study using machine learning classifiers on adolescent brains. Time-series for every seed-region were averaged and three matrices of z-transformed correlation coefficients were created for each subject corresponding to each individual's three resting-state scans. SVM was then applied using leave-one-out cross-validation. The SVM classifier was 84% accurate in classifying older and younger adult brains. The majority of the connections used by the classifier to distinguish subjects by age came from seed-regions belonging to the sensorimotor and cingulo-opercular networks. These results suggest that age-related decreases in positive correlations within the cingulo-opercular and default networks, and decreases in negative correlations between the default and sensorimotor networks, are the distinguishing characteristics of age-related reorganization. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. A Support Vector Machine-Based Gender Identification Using Speech Signal

    NASA Astrophysics Data System (ADS)

    Lee, Kye-Hwan; Kang, Sang-Ick; Kim, Deok-Hwan; Chang, Joon-Hyuk

    We propose an effective voice-based gender identification method using a support vector machine (SVM). The SVM is a binary classification algorithm that classifies two groups by finding the voluntary nonlinear boundary in a feature space and is known to yield high classification performance. In the present work, we compare the identification performance of the SVM with that of a Gaussian mixture model (GMM)-based method using the mel frequency cepstral coefficients (MFCC). A novel approach of incorporating a features fusion scheme based on a combination of the MFCC and the fundamental frequency is proposed with the aim of improving the performance of gender identification. Experimental results demonstrate that the gender identification performance using the SVM is significantly better than that of the GMM-based scheme. Moreover, the performance is substantially improved when the proposed features fusion technique is applied.

  3. A hybrid approach to select features and classify diseases based on medical data

    NASA Astrophysics Data System (ADS)

    AbdelLatif, Hisham; Luo, Jiawei

    2018-03-01

    Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms

  4. Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment

    NASA Astrophysics Data System (ADS)

    Manurung, Jonson; Mawengkang, Herman; Zamzami, Elviawaty

    2017-12-01

    Support vector machine (SVM) is a popular classification method known to have strong generalization capabilities. SVM can solve the problem of classification and linear regression or nonlinear kernel which can be a learning algorithm for the ability of classification and regression. However, SVM also has a weakness that is difficult to determine the optimal parameter value. SVM calculates the best linear separator on the input feature space according to the training data. To classify data which are non-linearly separable, SVM uses kernel tricks to transform the data into a linearly separable data on a higher dimension feature space. The kernel trick using various kinds of kernel functions, such as : linear kernel, polynomial, radial base function (RBF) and sigmoid. Each function has parameters which affect the accuracy of SVM classification. To solve the problem genetic algorithms are proposed to be applied as the optimal parameter value search algorithm thus increasing the best classification accuracy on SVM. Data taken from UCI repository of machine learning database: Australian Credit Approval. The results show that the combination of SVM and genetic algorithms is effective in improving classification accuracy. Genetic algorithms has been shown to be effective in systematically finding optimal kernel parameters for SVM, instead of randomly selected kernel parameters. The best accuracy for data has been upgraded from kernel Linear: 85.12%, polynomial: 81.76%, RBF: 77.22% Sigmoid: 78.70%. However, for bigger data sizes, this method is not practical because it takes a lot of time.

  5. Deep neural mapping support vector machines.

    PubMed

    Li, Yujian; Zhang, Ting

    2017-09-01

    The choice of kernel has an important effect on the performance of a support vector machine (SVM). The effect could be reduced by NEUROSVM, an architecture using multilayer perceptron for feature extraction and SVM for classification. In binary classification, a general linear kernel NEUROSVM can be theoretically simplified as an input layer, many hidden layers, and an SVM output layer. As a feature extractor, the sub-network composed of the input and hidden layers is first trained together with a virtual ordinary output layer by backpropagation, then with the output of its last hidden layer taken as input of the SVM classifier for further training separately. By taking the sub-network as a kernel mapping from the original input space into a feature space, we present a novel model, called deep neural mapping support vector machine (DNMSVM), from the viewpoint of deep learning. This model is also a new and general kernel learning method, where the kernel mapping is indeed an explicit function expressed as a sub-network, different from an implicit function induced by a kernel function traditionally. Moreover, we exploit a two-stage procedure of contrastive divergence learning and gradient descent for DNMSVM to jointly training an adaptive kernel mapping instead of a kernel function, without requirement of kernel tricks. As a whole of the sub-network and the SVM classifier, the joint training of DNMSVM is done by using gradient descent to optimize the objective function with the sub-network layer-wise pre-trained via contrastive divergence learning of restricted Boltzmann machines. Compared to the separate training of NEUROSVM, the joint training is a new algorithm for DNMSVM to have advantages over NEUROSVM. Experimental results show that DNMSVM can outperform NEUROSVM and RBFSVM (i.e., SVM with the kernel of radial basis function), demonstrating its effectiveness. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Heidelberg Retina Tomograph 3 machine learning classifiers for glaucoma detection

    PubMed Central

    Townsend, K A; Wollstein, G; Danks, D; Sung, K R; Ishikawa, H; Kagemann, L; Gabriele, M L; Schuman, J S

    2010-01-01

    Aims To assess performance of classifiers trained on Heidelberg Retina Tomograph 3 (HRT3) parameters for discriminating between healthy and glaucomatous eyes. Methods Classifiers were trained using HRT3 parameters from 60 healthy subjects and 140 glaucomatous subjects. The classifiers were trained on all 95 variables and smaller sets created with backward elimination. Seven types of classifiers, including Support Vector Machines with radial basis (SVM-radial), and Recursive Partitioning and Regression Trees (RPART), were trained on the parameters. The area under the ROC curve (AUC) was calculated for classifiers, individual parameters and HRT3 glaucoma probability scores (GPS). Classifier AUCs and leave-one-out accuracy were compared with the highest individual parameter and GPS AUCs and accuracies. Results The highest AUC and accuracy for an individual parameter were 0.848 and 0.79, for vertical cup/disc ratio (vC/D). For GPS, global GPS performed best with AUC 0.829 and accuracy 0.78. SVM-radial with all parameters showed significant improvement over global GPS and vC/ D with AUC 0.916 and accuracy 0.85. RPART with all parameters provided significant improvement over global GPS with AUC 0.899 and significant improvement over global GPS and vC/D with accuracy 0.875. Conclusions Machine learning classifiers of HRT3 data provide significant enhancement over current methods for detection of glaucoma. PMID:18523087

  7. [Study on application of SVM in prediction of coronary heart disease].

    PubMed

    Zhu, Yue; Wu, Jianghua; Fang, Ying

    2013-12-01

    Base on the data of blood pressure, plasma lipid, Glu and UA by physical test, Support Vector Machine (SVM) was applied to identify coronary heart disease (CHD) in patients and non-CHD individuals in south China population for guide of further prevention and treatment of the disease. Firstly, the SVM classifier was built using radial basis kernel function, liner kernel function and polynomial kernel function, respectively. Secondly, the SVM penalty factor C and kernel parameter sigma were optimized by particle swarm optimization (PSO) and then employed to diagnose and predict the CHD. By comparison with those from artificial neural network with the back propagation (BP) model, linear discriminant analysis, logistic regression method and non-optimized SVM, the overall results of our calculation demonstrated that the classification performance of optimized RBF-SVM model could be superior to other classifier algorithm with higher accuracy rate, sensitivity and specificity, which were 94.51%, 92.31% and 96.67%, respectively. So, it is well concluded that SVM could be used as a valid method for assisting diagnosis of CHD.

  8. [Hyperspectral remote sensing image classification based on SVM optimized by clonal selection].

    PubMed

    Liu, Qing-Jie; Jing, Lin-Hai; Wang, Meng-Fei; Lin, Qi-Zhong

    2013-03-01

    Model selection for support vector machine (SVM) involving kernel and the margin parameter values selection is usually time-consuming, impacts training efficiency of SVM model and final classification accuracies of SVM hyperspectral remote sensing image classifier greatly. Firstly, based on combinatorial optimization theory and cross-validation method, artificial immune clonal selection algorithm is introduced to the optimal selection of SVM (CSSVM) kernel parameter a and margin parameter C to improve the training efficiency of SVM model. Then an experiment of classifying AVIRIS in India Pine site of USA was performed for testing the novel CSSVM, as well as a traditional SVM classifier with general Grid Searching cross-validation method (GSSVM) for comparison. And then, evaluation indexes including SVM model training time, classification overall accuracy (OA) and Kappa index of both CSSVM and GSSVM were all analyzed quantitatively. It is demonstrated that OA of CSSVM on test samples and whole image are 85.1% and 81.58, the differences from that of GSSVM are both within 0.08% respectively; And Kappa indexes reach 0.8213 and 0.7728, the differences from that of GSSVM are both within 0.001; While the ratio of model training time of CSSVM and GSSVM is between 1/6 and 1/10. Therefore, CSSVM is fast and accurate algorithm for hyperspectral image classification and is superior to GSSVM.

  9. Hidden Markov Model and Support Vector Machine based decoding of finger movements using Electrocorticography

    PubMed Central

    Wissel, Tobias; Pfeiffer, Tim; Frysch, Robert; Knight, Robert T.; Chang, Edward F.; Hinrichs, Hermann; Rieger, Jochem W.; Rose, Georg

    2013-01-01

    Objective Support Vector Machines (SVM) have developed into a gold standard for accurate classification in Brain-Computer-Interfaces (BCI). The choice of the most appropriate classifier for a particular application depends on several characteristics in addition to decoding accuracy. Here we investigate the implementation of Hidden Markov Models (HMM)for online BCIs and discuss strategies to improve their performance. Approach We compare the SVM, serving as a reference, and HMMs for classifying discrete finger movements obtained from the Electrocorticograms of four subjects doing a finger tapping experiment. The classifier decisions are based on a subset of low-frequency time domain and high gamma oscillation features. Main results We show that decoding optimization between the two approaches is due to the way features are extracted and selected and less dependent on the classifier. An additional gain in HMM performance of up to 6% was obtained by introducing model constraints. Comparable accuracies of up to 90% were achieved with both SVM and HMM with the high gamma cortical response providing the most important decoding information for both techniques. Significance We discuss technical HMM characteristics and adaptations in the context of the presented data as well as for general BCI applications. Our findings suggest that HMMs and their characteristics are promising for efficient online brain-computer interfaces. PMID:24045504

  10. The employment of Support Vector Machine to classify high and low performance archers based on bio-physiological variables

    NASA Astrophysics Data System (ADS)

    Taha, Zahari; Muazu Musa, Rabiu; Majeed, Anwar P. P. Abdul; Razali Abdullah, Mohamad; Amirul Abdullah, Muhammad; Hasnun Arif Hassan, Mohd; Khalil, Zubair

    2018-04-01

    The present study employs a machine learning algorithm namely support vector machine (SVM) to classify high and low potential archers from a collection of bio-physiological variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. The bio-physiological variables namely resting heart rate, resting respiratory rate, resting diastolic blood pressure, resting systolic blood pressure, as well as calories intake, were measured prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models i.e. linear, quadratic and cubic kernel functions, were trained on the aforementioned variables. The k-means clustered the archers into high (HPA) and low potential archers (LPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy with a classification accuracy of 94% in comparison the other tested models. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected bio-physiological variables examined.

  11. A Combination of Geographically Weighted Regression, Particle Swarm Optimization and Support Vector Machine for Landslide Susceptibility Mapping: A Case Study at Wanzhou in the Three Gorges Area, China

    PubMed Central

    Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian

    2016-01-01

    In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%–19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides. PMID:27187430

  12. A Combination of Geographically Weighted Regression, Particle Swarm Optimization and Support Vector Machine for Landslide Susceptibility Mapping: A Case Study at Wanzhou in the Three Gorges Area, China.

    PubMed

    Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian

    2016-05-11

    In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%-19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides.

  13. Identifying saltcedar with hyperspectral data and support vector machines

    USDA-ARS?s Scientific Manuscript database

    Saltcedar (Tamarix spp.) are a group of dense phreatophytic shrubs and trees that are invasive to riparian areas throughout the United States. This study determined the feasibility of using hyperspectral data and a support vector machine (SVM) classifier to discriminate saltcedar from other cover t...

  14. Automated detection of heart ailments from 12-lead ECG using complex wavelet sub-band bi-spectrum features

    PubMed Central

    Dandapat, Samarendra

    2017-01-01

    The complex wavelet sub-band bi-spectrum (CWSB) features are proposed for detection and classification of myocardial infarction (MI), heart muscle disease (HMD) and bundle branch block (BBB) from 12-lead ECG. The dual tree CW transform of 12-lead ECG produces CW coefficients at different sub-bands. The higher-order CW analysis is used for evaluation of CWSB. The mean of the absolute value of CWSB, and the number of negative phase angle and the number of positive phase angle features from the phase of CWSB of 12-lead ECG are evaluated. Extreme learning machine and support vector machine (SVM) classifiers are used to evaluate the performance of CWSB features. Experimental results show that the proposed CWSB features of 12-lead ECG and the SVM classifier are successful for classification of various heart pathologies. The individual accuracy values for MI, HMD and BBB classes are obtained as 98.37, 97.39 and 96.40%, respectively, using SVM classifier and radial basis function kernel function. A comparison has also been made with existing 12-lead ECG-based cardiac disease detection techniques. PMID:28894589

  15. Solution Path for Pin-SVM Classifiers With Positive and Negative $\\tau $ Values.

    PubMed

    Huang, Xiaolin; Shi, Lei; Suykens, Johan A K

    2017-07-01

    Applying the pinball loss in a support vector machine (SVM) classifier results in pin-SVM. The pinball loss is characterized by a parameter τ . Its value is related to the quantile level and different τ values are suitable for different problems. In this paper, we establish an algorithm to find the entire solution path for pin-SVM with different τ values. This algorithm is based on the fact that the optimal solution to pin-SVM is continuous and piecewise linear with respect to τ . We also show that the nonnegativity constraint on τ is not necessary, i.e., τ can be extended to negative values. First, in some applications, a negative τ leads to better accuracy. Second, τ = -1 corresponds to a simple solution that links SVM and the classical kernel rule. The solution for τ = -1 can be obtained directly and then be used as a starting point of the solution path. The proposed method efficiently traverses τ values through the solution path, and then achieves good performance by a suitable τ . In particular, τ = 0 corresponds to C-SVM, meaning that the traversal algorithm can output a result at least as good as C-SVM with respect to validation error.

  16. Classification of Stellar Spectra with Fuzzy Minimum Within-Class Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Zhong-bao, Liu; Wen-ai, Song; Jing, Zhang; Wen-juan, Zhao

    2017-06-01

    Classification is one of the important tasks in astronomy, especially in spectra analysis. Support Vector Machine (SVM) is a typical classification method, which is widely used in spectra classification. Although it performs well in practice, its classification accuracies can not be greatly improved because of two limitations. One is it does not take the distribution of the classes into consideration. The other is it is sensitive to noise. In order to solve the above problems, inspired by the maximization of the Fisher's Discriminant Analysis (FDA) and the SVM separability constraints, fuzzy minimum within-class support vector machine (FMWSVM) is proposed in this paper. In FMWSVM, the distribution of the classes is reflected by the within-class scatter in FDA and the fuzzy membership function is introduced to decrease the influence of the noise. The comparative experiments with SVM on the SDSS datasets verify the effectiveness of the proposed classifier FMWSVM.

  17. Beyond where to how: a machine learning approach for sensing mobility contexts using smartphone sensors.

    PubMed

    Guinness, Robert E

    2015-04-28

    This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.

  18. Beyond Where to How: A Machine Learning Approach for Sensing Mobility Contexts Using Smartphone Sensors †

    PubMed Central

    Guinness, Robert E.

    2015-01-01

    This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity. PMID:25928060

  19. Applying machine-learning techniques to Twitter data for automatic hazard-event classification.

    NASA Astrophysics Data System (ADS)

    Filgueira, R.; Bee, E. J.; Diaz-Doce, D.; Poole, J., Sr.; Singh, A.

    2017-12-01

    The constant flow of information offered by tweets provides valuable information about all sorts of events at a high temporal and spatial resolution. Over the past year we have been analyzing in real-time geological hazards/phenomenon, such as earthquakes, volcanic eruptions, landslides, floods or the aurora, as part of the GeoSocial project, by geo-locating tweets filtered by keywords in a web-map. However, not all the filtered tweets are related with hazard/phenomenon events. This work explores two classification techniques for automatic hazard-event categorization based on tweets about the "Aurora". First, tweets were filtered using aurora-related keywords, removing stop words and selecting the ones written in English. For classifying the remaining between "aurora-event" or "no-aurora-event" categories, we compared two state-of-art techniques: Support Vector Machine (SVM) and Deep Convolutional Neural Networks (CNN) algorithms. Both approaches belong to the family of supervised learning algorithms, which make predictions based on labelled training dataset. Therefore, we created a training dataset by tagging 1200 tweets between both categories. The general form of SVM is used to separate two classes by a function (kernel). We compared the performance of four different kernels (Linear Regression, Logistic Regression, Multinomial Naïve Bayesian and Stochastic Gradient Descent) provided by Scikit-Learn library using our training dataset to build the SVM classifier. The results shown that the Logistic Regression (LR) gets the best accuracy (87%). So, we selected the SVM-LR classifier to categorise a large collection of tweets using the "dispel4py" framework.Later, we developed a CNN classifier, where the first layer embeds words into low-dimensional vectors. The next layer performs convolutions over the embedded word vectors. Results from the convolutional layer are max-pooled into a long feature vector, which is classified using a softmax layer. The CNN's accuracy is lower (83%) than the SVM-LR, since the algorithm needs a bigger training dataset to increase its accuracy. We used TensorFlow framework for applying CNN classifier to the same collection of tweets.In future we will modify both classifiers to work with other geo-hazards, use larger training datasets and apply them in real-time.

  20. LBP and SIFT based facial expression recognition

    NASA Astrophysics Data System (ADS)

    Sumer, Omer; Gunes, Ece O.

    2015-02-01

    This study compares the performance of local binary patterns (LBP) and scale invariant feature transform (SIFT) with support vector machines (SVM) in automatic classification of discrete facial expressions. Facial expression recognition is a multiclass classification problem and seven classes; happiness, anger, sadness, disgust, surprise, fear and comtempt are classified. Using SIFT feature vectors and linear SVM, 93.1% mean accuracy is acquired on CK+ database. On the other hand, the performance of LBP-based classifier with linear SVM is reported on SFEW using strictly person independent (SPI) protocol. Seven-class mean accuracy on SFEW is 59.76%. Experiments on both databases showed that LBP features can be used in a fairly descriptive way if a good localization of facial points and partitioning strategy are followed.

  1. Comparative Study of SVM Methods Combined with Voxel Selection for Object Category Classification on fMRI Data

    PubMed Central

    Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li

    2011-01-01

    Background Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Methodology/Principal Findings Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. Conclusions/Significance The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice. PMID:21359184

  2. Comparative study of SVM methods combined with voxel selection for object category classification on fMRI data.

    PubMed

    Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li

    2011-02-16

    Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice.

  3. Classifying Lower Extremity Muscle Fatigue during Walking using Machine Learning and Inertial Sensors

    PubMed Central

    Zhang, Jian; Lockhart, Thurmon E.; Soangra, Rahul

    2013-01-01

    Fatigue in lower extremity musculature is associated with decline in postural stability, motor performance and alters normal walking patterns in human subjects. Automated recognition of lower extremity muscle fatigue condition may be advantageous in early detection of fall and injury risks. Supervised machine learning methods such as Support Vector Machines (SVM) have been previously used for classifying healthy and pathological gait patterns and also for separating old and young gait patterns. In this study we explore the classification potential of SVM in recognition of gait patterns utilizing an inertial measurement unit associated with lower extremity muscular fatigue. Both kinematic and kinetic gait patterns of 17 participants (29±11 years) were recorded and analyzed in normal and fatigued state of walking. Lower extremities were fatigued by performance of a squatting exercise until the participants reached 60% of their baseline maximal voluntary exertion level. Feature selection methods were used to classify fatigue and no-fatigue conditions based on temporal and frequency information of the signals. Additionally, influences of three different kernel schemes (i.e., linear, polynomial, and radial basis function) were investigated for SVM classification. The results indicated that lower extremity muscle fatigue condition influenced gait and loading responses. In terms of the SVM classification results, an accuracy of 96% was reached in distinguishing the two gait patterns (fatigue and no-fatigue) within the same subject using the kinematic, time and frequency domain features. It is also found that linear kernel and RBF kernel were equally good to identify intra-individual fatigue characteristics. These results suggest that intra-subject fatigue classification using gait patterns from an inertial sensor holds considerable potential in identifying “at-risk” gait due to muscle fatigue. PMID:24081829

  4. Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets.

    PubMed

    McAllister, Patrick; Zheng, Huiru; Bond, Raymond; Moorhead, Anne

    2018-04-01

    Obesity is increasing worldwide and can cause many chronic conditions such as type-2 diabetes, heart disease, sleep apnea, and some cancers. Monitoring dietary intake through food logging is a key method to maintain a healthy lifestyle to prevent and manage obesity. Computer vision methods have been applied to food logging to automate image classification for monitoring dietary intake. In this work we applied pretrained ResNet-152 and GoogleNet convolutional neural networks (CNNs), initially trained using ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset with MatConvNet package, to extract features from food image datasets; Food 5K, Food-11, RawFooT-DB, and Food-101. Deep features were extracted from CNNs and used to train machine learning classifiers including artificial neural network (ANN), support vector machine (SVM), Random Forest, and Naive Bayes. Results show that using ResNet-152 deep features with SVM with RBF kernel can accurately detect food items with 99.4% accuracy using Food-5K validation food image dataset and 98.8% with Food-5K evaluation dataset using ANN, SVM-RBF, and Random Forest classifiers. Trained with ResNet-152 features, ANN can achieve 91.34%, 99.28% when applied to Food-11 and RawFooT-DB food image datasets respectively and SVM with RBF kernel can achieve 64.98% with Food-101 image dataset. From this research it is clear that using deep CNN features can be used efficiently for diverse food item image classification. The work presented in this research shows that pretrained ResNet-152 features provide sufficient generalisation power when applied to a range of food image classification tasks. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. Quantifying surgical complexity with machine learning: looking beyond patient factors to improve surgical models.

    PubMed

    Van Esbroeck, Alexander; Rubinfeld, Ilan; Hall, Bruce; Syed, Zeeshan

    2014-11-01

    To investigate the use of machine learning to empirically determine the risk of individual surgical procedures and to improve surgical models with this information. American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) data from 2005 to 2009 were used to train support vector machine (SVM) classifiers to learn the relationship between textual constructs in current procedural terminology (CPT) descriptions and mortality, morbidity, Clavien 4 complications, and surgical-site infections (SSI) within 30 days of surgery. The procedural risk scores produced by the SVM classifiers were validated on data from 2010 in univariate and multivariate analyses. The procedural risk scores produced by the SVM classifiers achieved moderate-to-high levels of discrimination in univariate analyses (area under receiver operating characteristic curve: 0.871 for mortality, 0.789 for morbidity, 0.791 for SSI, 0.845 for Clavien 4 complications). Addition of these scores also substantially improved multivariate models comprising patient factors and previously proposed correlates of procedural risk (net reclassification improvement and integrated discrimination improvement: 0.54 and 0.001 for mortality, 0.46 and 0.011 for morbidity, 0.68 and 0.022 for SSI, 0.44 and 0.001 for Clavien 4 complications; P < .05 for all comparisons). Similar improvements were noted in discrimination and calibration for other statistical measures, and in subcohorts comprising patients with general or vascular surgery. Machine learning provides clinically useful estimates of surgical risk for individual procedures. This information can be measured in an entirely data-driven manner and substantially improves multifactorial models to predict postoperative complications. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. DCS-SVM: a novel semi-automated method for human brain MR image segmentation.

    PubMed

    Ahmadvand, Ali; Daliri, Mohammad Reza; Hajiali, Mohammadtaghi

    2017-11-27

    In this paper, a novel method is proposed which appropriately segments magnetic resonance (MR) brain images into three main tissues. This paper proposes an extension of our previous work in which we suggested a combination of multiple classifiers (CMC)-based methods named dynamic classifier selection-dynamic local training local Tanimoto index (DCS-DLTLTI) for MR brain image segmentation into three main cerebral tissues. This idea is used here and a novel method is developed that tries to use more complex and accurate classifiers like support vector machine (SVM) in the ensemble. This work is challenging because the CMC-based methods are time consuming, especially on huge datasets like three-dimensional (3D) brain MR images. Moreover, SVM is a powerful method that is used for modeling datasets with complex feature space, but it also has huge computational cost for big datasets, especially those with strong interclass variability problems and with more than two classes such as 3D brain images; therefore, we cannot use SVM in DCS-DLTLTI. Therefore, we propose a novel approach named "DCS-SVM" to use SVM in DCS-DLTLTI to improve the accuracy of segmentation results. The proposed method is applied on well-known datasets of the Internet Brain Segmentation Repository (IBSR) and promising results are obtained.

  7. Support Vector Machine-Based Prediction of Local Tumor Control After Stereotactic Body Radiation Therapy for Early-Stage Non-Small Cell Lung Cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klement, Rainer J., E-mail: rainer_klement@gmx.de; Department of Radiotherapy and Radiation Oncology, Leopoldina Hospital, Schweinfurt; Allgäuer, Michael

    2014-03-01

    Background: Several prognostic factors for local tumor control probability (TCP) after stereotactic body radiation therapy (SBRT) for early stage non-small cell lung cancer (NSCLC) have been described, but no attempts have been undertaken to explore whether a nonlinear combination of potential factors might synergistically improve the prediction of local control. Methods and Materials: We investigated a support vector machine (SVM) for predicting TCP in a cohort of 399 patients treated at 13 German and Austrian institutions. Among 7 potential input features for the SVM we selected those most important on the basis of forward feature selection, thereby evaluating classifier performancemore » by using 10-fold cross-validation and computing the area under the ROC curve (AUC). The final SVM classifier was built by repeating the feature selection 10 times with different splitting of the data for cross-validation and finally choosing only those features that were selected at least 5 out of 10 times. It was compared with a multivariate logistic model that was built by forward feature selection. Results: Local failure occurred in 12% of patients. Biologically effective dose (BED) at the isocenter (BED{sub ISO}) was the strongest predictor of TCP in the logistic model and also the most frequently selected input feature for the SVM. A bivariate logistic function of BED{sub ISO} and the pulmonary function indicator forced expiratory volume in 1 second (FEV1) yielded the best description of the data but resulted in a significantly smaller AUC than the final SVM classifier with the input features BED{sub ISO}, age, baseline Karnofsky index, and FEV1 (0.696 ± 0.040 vs 0.789 ± 0.001, P<.03). The final SVM resulted in sensitivity and specificity of 67.0% ± 0.5% and 78.7% ± 0.3%, respectively. Conclusions: These results confirm that machine learning techniques like SVMs can be successfully applied to predict treatment outcome after SBRT. Improvements over traditional TCP modeling are expected through a nonlinear combination of multiple features, eventually helping in the task of personalized treatment planning.« less

  8. Support vector machine-based prediction of local tumor control after stereotactic body radiation therapy for early-stage non-small cell lung cancer.

    PubMed

    Klement, Rainer J; Allgäuer, Michael; Appold, Steffen; Dieckmann, Karin; Ernst, Iris; Ganswindt, Ute; Holy, Richard; Nestle, Ursula; Nevinny-Stickel, Meinhard; Semrau, Sabine; Sterzing, Florian; Wittig, Andrea; Andratschke, Nicolaus; Guckenberger, Matthias

    2014-03-01

    Several prognostic factors for local tumor control probability (TCP) after stereotactic body radiation therapy (SBRT) for early stage non-small cell lung cancer (NSCLC) have been described, but no attempts have been undertaken to explore whether a nonlinear combination of potential factors might synergistically improve the prediction of local control. We investigated a support vector machine (SVM) for predicting TCP in a cohort of 399 patients treated at 13 German and Austrian institutions. Among 7 potential input features for the SVM we selected those most important on the basis of forward feature selection, thereby evaluating classifier performance by using 10-fold cross-validation and computing the area under the ROC curve (AUC). The final SVM classifier was built by repeating the feature selection 10 times with different splitting of the data for cross-validation and finally choosing only those features that were selected at least 5 out of 10 times. It was compared with a multivariate logistic model that was built by forward feature selection. Local failure occurred in 12% of patients. Biologically effective dose (BED) at the isocenter (BED(ISO)) was the strongest predictor of TCP in the logistic model and also the most frequently selected input feature for the SVM. A bivariate logistic function of BED(ISO) and the pulmonary function indicator forced expiratory volume in 1 second (FEV1) yielded the best description of the data but resulted in a significantly smaller AUC than the final SVM classifier with the input features BED(ISO), age, baseline Karnofsky index, and FEV1 (0.696 ± 0.040 vs 0.789 ± 0.001, P<.03). The final SVM resulted in sensitivity and specificity of 67.0% ± 0.5% and 78.7% ± 0.3%, respectively. These results confirm that machine learning techniques like SVMs can be successfully applied to predict treatment outcome after SBRT. Improvements over traditional TCP modeling are expected through a nonlinear combination of multiple features, eventually helping in the task of personalized treatment planning. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging.

    PubMed

    Wang, Rui; Li, Rui; Lei, Yanyan; Zhu, Quing

    2015-01-01

    Support vector machine (SVM) is one of the most effective classification methods for cancer detection. The efficiency and quality of a SVM classifier depends strongly on several important features and a set of proper parameters. Here, a series of classification analyses, with one set of photoacoustic data from ovarian tissues ex vivo and a widely used breast cancer dataset- the Wisconsin Diagnostic Breast Cancer (WDBC), revealed the different accuracy of a SVM classification in terms of the number of features used and the parameters selected. A pattern recognition system is proposed by means of SVM-Recursive Feature Elimination (RFE) with the Radial Basis Function (RBF) kernel. To improve the effectiveness and robustness of the system, an optimized tuning ensemble algorithm called as SVM-RFE(C) with correlation filter was implemented to quantify feature and parameter information based on cross validation. The proposed algorithm is first demonstrated outperforming SVM-RFE on WDBC. Then the best accuracy of 94.643% and sensitivity of 94.595% were achieved when using SVM-RFE(C) to test 57 new PAT data from 19 patients. The experiment results show that the classifier constructed with SVM-RFE(C) algorithm is able to learn additional information from new data and has significant potential in ovarian cancer diagnosis.

  10. A SVM-based method for sentiment analysis in Persian language

    NASA Astrophysics Data System (ADS)

    Hajmohammadi, Mohammad Sadegh; Ibrahim, Roliana

    2013-03-01

    Persian language is the official language of Iran, Tajikistan and Afghanistan. Local online users often represent their opinions and experiences on the web with written Persian. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product. In this paper, standard machine learning techniques SVM and naive Bayes are incorporated into the domain of online Persian Movie reviews to automatically classify user reviews as positive or negative and performance of these two classifiers is compared with each other in this language. The effects of feature presentations on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The SVM classifier achieves as well as or better accuracy than naive Bayes in Persian movie. Unigrams are proved better features than bigrams and trigrams in capturing Persian sentiment orientation.

  11. Rapid discrimination of pork in Halal and non-Halal Chinese ham sausages by Fourier transform infrared (FTIR) spectroscopy and chemometrics.

    PubMed

    Xu, L; Cai, C B; Cui, H F; Ye, Z H; Yu, X P

    2012-12-01

    Rapid discrimination of pork in Halal and non-Halal Chinese ham sausages was developed by Fourier transform infrared (FTIR) spectrometry combined with chemometrics. Transmittance spectra ranging from 400 to 4000 cm⁻¹ of 73 Halal and 78 non-Halal Chinese ham sausages were measured. Sample preparation involved finely grinding of samples and formation of KBr disks (under 10 MPa for 5 min). The influence of data preprocessing methods including smoothing, taking derivatives and standard normal variate (SNV) on partial least squares discriminant analysis (PLSDA) and least squares support vector machine (LS-SVM) was investigated. The results indicate removal of spectral background and baseline plays an important role in discrimination. Taking derivatives, SNV can improve classification accuracy and reduce the complexity of PLSDA. Possibly due to the loss of detailed high-frequency spectral information, smoothing degrades the model performance. For the best models, the sensitivity and specificity was 0.913 and 0.929 for PLSDA with SNV spectra, 0.957 and 0.929 for LS-SVM with second derivative spectra, respectively. Copyright © 2012 Elsevier Ltd. All rights reserved.

  12. Variety identification of brown sugar using short-wave near infrared spectroscopy and multivariate calibration

    NASA Astrophysics Data System (ADS)

    Yang, Haiqing; Wu, Di; He, Yong

    2007-11-01

    Near-infrared spectroscopy (NIRS) with the characteristics of high speed, non-destructiveness, high precision and reliable detection data, etc. is a pollution-free, rapid, quantitative and qualitative analysis method. A new approach for variety discrimination of brown sugars using short-wave NIR spectroscopy (800-1050nm) was developed in this work. The relationship between the absorbance spectra and brown sugar varieties was established. The spectral data were compressed by the principal component analysis (PCA). The resulting features can be visualized in principal component (PC) space, which can lead to discovery of structures correlative with the different class of spectral samples. It appears to provide a reasonable variety clustering of brown sugars. The 2-D PCs plot obtained using the first two PCs can be used for the pattern recognition. Least-squares support vector machines (LS-SVM) was applied to solve the multivariate calibration problems in a relatively fast way. The work has shown that short-wave NIR spectroscopy technique is available for the brand identification of brown sugar, and LS-SVM has the better identification ability than PLS when the calibration set is small.

  13. A measurement fusion method for nonlinear system identification using a cooperative learning algorithm.

    PubMed

    Xia, Youshen; Kamel, Mohamed S

    2007-06-01

    Identification of a general nonlinear noisy system viewed as an estimation of a predictor function is studied in this article. A measurement fusion method for the predictor function estimate is proposed. In the proposed scheme, observed data are first fused by using an optimal fusion technique, and then the optimal fused data are incorporated in a nonlinear function estimator based on a robust least squares support vector machine (LS-SVM). A cooperative learning algorithm is proposed to implement the proposed measurement fusion method. Compared with related identification methods, the proposed method can minimize both the approximation error and the noise error. The performance analysis shows that the proposed optimal measurement fusion function estimate has a smaller mean square error than the LS-SVM function estimate. Moreover, the proposed cooperative learning algorithm can converge globally to the optimal measurement fusion function estimate. Finally, the proposed measurement fusion method is applied to ARMA signal and spatial temporal signal modeling. Experimental results show that the proposed measurement fusion method can provide a more accurate model.

  14. Machine learning algorithms to classify spinal muscular atrophy subtypes.

    PubMed

    Srivastava, Tuhin; Darras, Basil T; Wu, Jim S; Rutkove, Seward B

    2012-07-24

    The development of better biomarkers for disease assessment remains an ongoing effort across the spectrum of neurologic illnesses. One approach for refining biomarkers is based on the concept of machine learning, in which individual, unrelated biomarkers are simultaneously evaluated. In this cross-sectional study, we assess the possibility of using machine learning, incorporating both quantitative muscle ultrasound (QMU) and electrical impedance myography (EIM) data, for classification of muscles affected by spinal muscular atrophy (SMA). Twenty-one normal subjects, 15 subjects with SMA type 2, and 10 subjects with SMA type 3 underwent EIM and QMU measurements of unilateral biceps, wrist extensors, quadriceps, and tibialis anterior. EIM and QMU parameters were then applied in combination using a support vector machine (SVM), a type of machine learning, in an attempt to accurately categorize 165 individual muscles. For all 3 classification problems, normal vs SMA, normal vs SMA 3, and SMA 2 vs SMA 3, use of SVM provided the greatest accuracy in discrimination, surpassing both EIM and QMU individually. For example, the accuracy, as measured by the receiver operating characteristic area under the curve (ROC-AUC) for the SVM discriminating SMA 2 muscles from SMA 3 muscles was 0.928; in comparison, the ROC-AUCs for EIM and QMU parameters alone were only 0.877 (p < 0.05) and 0.627 (p < 0.05), respectively. Combining EIM and QMU data categorizes individual SMA-affected muscles with very high accuracy. Further investigation of this approach for classifying and for following the progression of neuromuscular illness is warranted.

  15. Detection of Life Threatening Ventricular Arrhythmia Using Digital Taylor Fourier Transform.

    PubMed

    Tripathy, Rajesh K; Zamora-Mendez, Alejandro; de la O Serna, José A; Paternina, Mario R Arrieta; Arrieta, Juan G; Naik, Ganesh R

    2018-01-01

    Accurate detection and classification of life-threatening ventricular arrhythmia episodes such as ventricular fibrillation (VF) and rapid ventricular tachycardia (VT) from electrocardiogram (ECG) is a challenging problem for patient monitoring and defibrillation therapy. This paper introduces a novel method for detection and classification of life-threatening ventricular arrhythmia episodes. The ECG signal is decomposed into various oscillatory modes using digital Taylor-Fourier transform (DTFT). The magnitude feature and a novel phase feature namely the phase difference (PD) are evaluated from the mode Taylor-Fourier coefficients of ECG signal. The least square support vector machine (LS-SVM) classifier with linear and radial basis function (RBF) kernels is employed for detection and classification of VT vs. VF, non-shock vs. shock and VF vs. non-VF arrhythmia episodes. The accuracy, sensitivity, and specificity values obtained using the proposed method are 89.81, 86.38, and 93.97%, respectively for the classification of Non-VF and VF episodes. Comparison with the performance of the state-of-the-art features demonstrate the advantages of the proposition.

  16. Detection of Life Threatening Ventricular Arrhythmia Using Digital Taylor Fourier Transform

    PubMed Central

    Tripathy, Rajesh K.; Zamora-Mendez, Alejandro; de la O Serna, José A.; Paternina, Mario R. Arrieta; Arrieta, Juan G.; Naik, Ganesh R.

    2018-01-01

    Accurate detection and classification of life-threatening ventricular arrhythmia episodes such as ventricular fibrillation (VF) and rapid ventricular tachycardia (VT) from electrocardiogram (ECG) is a challenging problem for patient monitoring and defibrillation therapy. This paper introduces a novel method for detection and classification of life-threatening ventricular arrhythmia episodes. The ECG signal is decomposed into various oscillatory modes using digital Taylor-Fourier transform (DTFT). The magnitude feature and a novel phase feature namely the phase difference (PD) are evaluated from the mode Taylor-Fourier coefficients of ECG signal. The least square support vector machine (LS-SVM) classifier with linear and radial basis function (RBF) kernels is employed for detection and classification of VT vs. VF, non-shock vs. shock and VF vs. non-VF arrhythmia episodes. The accuracy, sensitivity, and specificity values obtained using the proposed method are 89.81, 86.38, and 93.97%, respectively for the classification of Non-VF and VF episodes. Comparison with the performance of the state-of-the-art features demonstrate the advantages of the proposition.

  17. Support vector machine for day ahead electricity price forecasting

    NASA Astrophysics Data System (ADS)

    Razak, Intan Azmira binti Wan Abdul; Abidin, Izham bin Zainal; Siah, Yap Keem; Rahman, Titik Khawa binti Abdul; Lada, M. Y.; Ramani, Anis Niza binti; Nasir, M. N. M.; Ahmad, Arfah binti

    2015-05-01

    Electricity price forecasting has become an important part of power system operation and planning. In a pool- based electric energy market, producers submit selling bids consisting in energy blocks and their corresponding minimum selling prices to the market operator. Meanwhile, consumers submit buying bids consisting in energy blocks and their corresponding maximum buying prices to the market operator. Hence, both producers and consumers use day ahead price forecasts to derive their respective bidding strategies to the electricity market yet reduce the cost of electricity. However, forecasting electricity prices is a complex task because price series is a non-stationary and highly volatile series. Many factors cause for price spikes such as volatility in load and fuel price as well as power import to and export from outside the market through long term contract. This paper introduces an approach of machine learning algorithm for day ahead electricity price forecasting with Least Square Support Vector Machine (LS-SVM). Previous day data of Hourly Ontario Electricity Price (HOEP), generation's price and demand from Ontario power market are used as the inputs for training data. The simulation is held using LSSVMlab in Matlab with the training and testing data of 2004. SVM that widely used for classification and regression has great generalization ability with structured risk minimization principle rather than empirical risk minimization. Moreover, same parameter settings in trained SVM give same results that absolutely reduce simulation process compared to other techniques such as neural network and time series. The mean absolute percentage error (MAPE) for the proposed model shows that SVM performs well compared to neural network.

  18. Novelty Detection Classifiers in Weed Mapping: Silybum marianum Detection on UAV Multispectral Images.

    PubMed

    Alexandridis, Thomas K; Tamouridou, Afroditi Alexandra; Pantazi, Xanthoula Eirini; Lagopodi, Anastasia L; Kashefi, Javid; Ovakoglou, Georgios; Polychronos, Vassilios; Moshou, Dimitrios

    2017-09-01

    In the present study, the detection and mapping of Silybum marianum (L.) Gaertn. weed using novelty detection classifiers is reported. A multispectral camera (green-red-NIR) on board a fixed wing unmanned aerial vehicle (UAV) was employed for obtaining high-resolution images. Four novelty detection classifiers were used to identify S. marianum between other vegetation in a field. The classifiers were One Class Support Vector Machine (OC-SVM), One Class Self-Organizing Maps (OC-SOM), Autoencoders and One Class Principal Component Analysis (OC-PCA). As input features to the novelty detection classifiers, the three spectral bands and texture were used. The S. marianum identification accuracy using OC-SVM reached an overall accuracy of 96%. The results show the feasibility of effective S. marianum mapping by means of novelty detection classifiers acting on multispectral UAV imagery.

  19. Combatting nonlinear phase noise in coherent optical systems with an optimized decision processor based on machine learning

    NASA Astrophysics Data System (ADS)

    Wang, Danshi; Zhang, Min; Cai, Zhongle; Cui, Yue; Li, Ze; Han, Huanhuan; Fu, Meixia; Luo, Bin

    2016-06-01

    An effective machine learning algorithm, the support vector machine (SVM), is presented in the context of a coherent optical transmission system. As a classifier, the SVM can create nonlinear decision boundaries to mitigate the distortions caused by nonlinear phase noise (NLPN). Without any prior information or heuristic assumptions, the SVM can learn and capture the link properties from only a few training data. Compared with the maximum likelihood estimation (MLE) algorithm, a lower bit-error rate (BER) is achieved by the SVM for a given launch power; moreover, the launch power dynamic range (LPDR) is increased by 3.3 dBm for 8 phase-shift keying (8 PSK), 1.2 dBm for QPSK, and 0.3 dBm for BPSK. The maximum transmission distance corresponding to a BER of 1 ×10-3 is increased by 480 km for the case of 8 PSK. The larger launch power range and longer transmission distance improve the tolerance to amplitude and phase noise, which demonstrates the feasibility of the SVM in digital signal processing for M-PSK formats. Meanwhile, in order to apply the SVM method to 16 quadratic amplitude modulation (16 QAM) detection, we propose a parameter optimization scheme. By utilizing a cross-validation and grid-search techniques, the optimal parameters of SVM can be selected, thus leading to the LPDR improvement by 2.8 dBm. Additionally, we demonstrate that the SVM is also effective in combating the laser phase noise combined with the inphase and quadrature (I/Q) modulator imperfections, but the improvement is insignificant for the linear noise and separate I/Q imbalance. The computational complexity of SVM is also discussed. The relatively low complexity makes it possible for SVM to implement the real-time processing.

  20. Fingerprint recognition of alien invasive weeds based on the texture character and machine learning

    NASA Astrophysics Data System (ADS)

    Yu, Jia-Jia; Li, Xiao-Li; He, Yong; Xu, Zheng-Hao

    2008-11-01

    Multi-spectral imaging technique based on texture analysis and machine learning was proposed to discriminate alien invasive weeds with similar outline but different categories. The objectives of this study were to investigate the feasibility of using Multi-spectral imaging, especially the near-infrared (NIR) channel (800 nm+/-10 nm) to find the weeds' fingerprints, and validate the performance with specific eigenvalues by co-occurrence matrix. Veronica polita Pries, Veronica persica Poir, longtube ground ivy, Laminum amplexicaule Linn. were selected in this study, which perform different effect in field, and are alien invasive species in China. 307 weed leaves' images were randomly selected for the calibration set, while the remaining 207 samples for the prediction set. All images were pretreated by Wallis filter to adjust the noise by uneven lighting. Gray level co-occurrence matrix was applied to extract the texture character, which shows density, randomness correlation, contrast and homogeneity of texture with different algorithms. Three channels (green channel by 550 nm+/-10 nm, red channel by 650 nm+/-10 nm and NIR channel by 800 nm+/-10 nm) were respectively calculated to get the eigenvalues.Least-squares support vector machines (LS-SVM) was applied to discriminate the categories of weeds by the eigenvalues from co-occurrence matrix. Finally, recognition ratio of 83.35% by NIR channel was obtained, better than the results by green channel (76.67%) and red channel (69.46%). The prediction results of 81.35% indicated that the selected eigenvalues reflected the main characteristics of weeds' fingerprint based on multi-spectral (especially by NIR channel) and LS-SVM model.

  1. Incremental Support Vector Machine Framework for Visual Sensor Networks

    NASA Astrophysics Data System (ADS)

    Awad, Mariette; Jiang, Xianhua; Motai, Yuichi

    2006-12-01

    Motivated by the emerging requirements of surveillance networks, we present in this paper an incremental multiclassification support vector machine (SVM) technique as a new framework for action classification based on real-time multivideo collected by homogeneous sites. The technique is based on an adaptation of least square SVM (LS-SVM) formulation but extends beyond the static image-based learning of current SVM methodologies. In applying the technique, an initial supervised offline learning phase is followed by a visual behavior data acquisition and an online learning phase during which the cluster head performs an ensemble of model aggregations based on the sensor nodes inputs. The cluster head then selectively switches on designated sensor nodes for future incremental learning. Combining sensor data offers an improvement over single camera sensing especially when the latter has an occluded view of the target object. The optimization involved alleviates the burdens of power consumption and communication bandwidth requirements. The resulting misclassification error rate, the iterative error reduction rate of the proposed incremental learning, and the decision fusion technique prove its validity when applied to visual sensor networks. Furthermore, the enabled online learning allows an adaptive domain knowledge insertion and offers the advantage of reducing both the model training time and the information storage requirements of the overall system which makes it even more attractive for distributed sensor networks communication.

  2. Age and gender estimation using Region-SIFT and multi-layered SVM

    NASA Astrophysics Data System (ADS)

    Kim, Hyunduk; Lee, Sang-Heon; Sohn, Myoung-Kyu; Hwang, Byunghun

    2018-04-01

    In this paper, we propose an age and gender estimation framework using the region-SIFT feature and multi-layered SVM classifier. The suggested framework entails three processes. The first step is landmark based face alignment. The second step is the feature extraction step. In this step, we introduce the region-SIFT feature extraction method based on facial landmarks. First, we define sub-regions of the face. We then extract SIFT features from each sub-region. In order to reduce the dimensions of features we employ a Principal Component Analysis (PCA) and a Linear Discriminant Analysis (LDA). Finally, we classify age and gender using a multi-layered Support Vector Machines (SVM) for efficient classification. Rather than performing gender estimation and age estimation independently, the use of the multi-layered SVM can improve the classification rate by constructing a classifier that estimate the age according to gender. Moreover, we collect a dataset of face images, called by DGIST_C, from the internet. A performance evaluation of proposed method was performed with the FERET database, CACD database, and DGIST_C database. The experimental results demonstrate that the proposed approach classifies age and performs gender estimation very efficiently and accurately.

  3. Efficient HIK SVM learning for image classification.

    PubMed

    Wu, Jianxin

    2012-10-01

    Histograms are used in almost every aspect of image processing and computer vision, from visual descriptors to image representations. Histogram intersection kernel (HIK) and support vector machine (SVM) classifiers are shown to be very effective in dealing with histograms. This paper presents contributions concerning HIK SVM for image classification. First, we propose intersection coordinate descent (ICD), a deterministic and scalable HIK SVM solver. ICD is much faster than, and has similar accuracies to, general purpose SVM solvers and other fast HIK SVM training methods. We also extend ICD to the efficient training of a broader family of kernels. Second, we show an important empirical observation that ICD is not sensitive to the C parameter in SVM, and we provide some theoretical analyses to explain this observation. ICD achieves high accuracies in many problems, using its default parameters. This is an attractive property for practitioners, because many image processing tasks are too large to choose SVM parameters using cross-validation.

  4. SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition

    PubMed Central

    Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina

    2007-01-01

    Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145

  5. A feasibility study of automatic lung nodule detection in chest digital tomosynthesis with machine learning based on support vector machine

    NASA Astrophysics Data System (ADS)

    Lee, Donghoon; Kim, Ye-seul; Choi, Sunghoon; Lee, Haenghwa; Jo, Byungdu; Choi, Seungyeon; Shin, Jungwook; Kim, Hee-Joung

    2017-03-01

    The chest digital tomosynthesis(CDT) is recently developed medical device that has several advantage for diagnosing lung disease. For example, CDT provides depth information with relatively low radiation dose compared to computed tomography (CT). However, a major problem with CDT is the image artifacts associated with data incompleteness resulting from limited angle data acquisition in CDT geometry. For this reason, the sensitivity of lung disease was not clear compared to CT. In this study, to improve sensitivity of lung disease detection in CDT, we developed computer aided diagnosis (CAD) systems based on machine learning. For design CAD systems, we used 100 cases of lung nodules cropped images and 100 cases of normal lesion cropped images acquired by lung man phantoms and proto type CDT. We used machine learning techniques based on support vector machine and Gabor filter. The Gabor filter was used for extracting characteristics of lung nodules and we compared performance of feature extraction of Gabor filter with various scale and orientation parameters. We used 3, 4, 5 scales and 4, 6, 8 orientations. After extracting features, support vector machine (SVM) was used for classifying feature of lesions. The linear, polynomial and Gaussian kernels of SVM were compared to decide the best SVM conditions for CDT reconstruction images. The results of CAD system with machine learning showed the capability of automatically lung lesion detection. Furthermore detection performance was the best when Gabor filter with 5 scale and 8 orientation and SVM with Gaussian kernel were used. In conclusion, our suggested CAD system showed improving sensitivity of lung lesion detection in CDT and decide Gabor filter and SVM conditions to achieve higher detection performance of our developed CAD system for CDT.

  6. Soft Computing Application in Fault Detection of Induction Motor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Konar, P.; Puhan, P. S.; Chattopadhyay, P. Dr.

    2010-10-26

    The paper investigates the effectiveness of different patter classifier like Feed Forward Back Propagation (FFBPN), Radial Basis Function (RBF) and Support Vector Machine (SVM) for detection of bearing faults in Induction Motor. The steady state motor current with Park's Transformation has been used for discrimination of inner race and outer race bearing defects. The RBF neural network shows very encouraging results for multi-class classification problems and is hoped to set up a base for incipient fault detection of induction motor. SVM is also found to be a very good fault classifier which is highly competitive with RBF.

  7. Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences.

    PubMed

    An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Fang, Yu-Hong; Zhao, Yu-Jun; Zhang, Ming

    2016-01-01

    We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.

  8. Classification of burn wounds using support vector machines

    NASA Astrophysics Data System (ADS)

    Acha, Begona; Serrano, Carmen; Palencia, Sergio; Murillo, Juan Jose

    2004-05-01

    The purpose of this work is to improve a previous method developed by the authors for the classification of burn wounds into their depths. The inputs of the system are color and texture information, as these are the characteristics observed by physicians in order to give a diagnosis. Our previous work consisted in segmenting the burn wound from the rest of the image and classifying the burn into its depth. In this paper we focus on the classification problem only. We already proposed to use a Fuzzy-ARTMAP neural network (NN). However, we may take advantage of new powerful classification tools such as Support Vector Machines (SVM). We apply the five-folded cross validation scheme to divide the database into training and validating sets. Then, we apply a feature selection method for each classifier, which will give us the set of features that yields the smallest classification error for each classifier. Features used to classify are first-order statistical parameters extracted from the L*, u* and v* color components of the image. The feature selection algorithms used are the Sequential Forward Selection (SFS) and the Sequential Backward Selection (SBS) methods. As data of the problem faced here are not linearly separable, the SVM was trained using some different kernels. The validating process shows that the SVM method, when using a Gaussian kernel of variance 1, outperforms classification results obtained with the rest of the classifiers, yielding an error classification rate of 0.7% whereas the Fuzzy-ARTMAP NN attained 1.6 %.

  9. Terminator Detection by Support Vector Machine Utilizing aStochastic Context-Free Grammar

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Francis-Lyon, Patricia; Cristianini, Nello; Holbrook, Stephen

    2006-12-30

    A 2-stage detector was designed to find rho-independent transcription terminators in the Escherichia coli genome. The detector includes a Stochastic Context Free Grammar (SCFG) component and a Support Vector Machine (SVM) component. To find terminators, the SCFG searches the intergenic regions of nucleotide sequence for local matches to a terminator grammar that was designed and trained utilizing examples of known terminators. The grammar selects sequences that are the best candidates for terminators and assigns them a prefix, stem-loop, suffix structure using the Cocke-Younger-Kasaami (CYK) algorithm, modified to incorporate energy affects of base pairing. The parameters from this inferred structure aremore » passed to the SVM classifier, which distinguishes terminators from non-terminators that score high according to the terminator grammar. The SVM was trained with negative examples drawn from intergenic sequences that include both featureless and RNA gene regions (which were assigned prefix, stem-loop, suffix structure by the SCFG), so that it successfully distinguishes terminators from either of these. The classifier was found to be 96.4% successful during testing.« less

  10. Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm.

    PubMed

    Pizarro, Ricardo A; Cheng, Xi; Barnett, Alan; Lemaitre, Herve; Verchinski, Beth A; Goldman, Aaron L; Xiao, Ena; Luo, Qian; Berman, Karen F; Callicott, Joseph H; Weinberger, Daniel R; Mattay, Venkata S

    2016-01-01

    High-resolution three-dimensional magnetic resonance imaging (3D-MRI) is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM) algorithm in the quality assessment of structural brain images, using global and region of interest (ROI) automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy) of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.

  11. Exploiting machine learning algorithms for tree species classification in a semiarid woodland using RapidEye image

    NASA Astrophysics Data System (ADS)

    Adelabu, Samuel; Mutanga, Onisimo; Adam, Elhadi; Cho, Moses Azong

    2013-01-01

    Classification of different tree species in semiarid areas can be challenging as a result of the change in leaf structure and orientation due to soil moisture constraints. Tree species mapping is, however, a key parameter for forest management in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples.We performed classification using random forest (RF) and support vector machines (SVM) based on EnMap box. The overall accuracies for classifying the five tree species was 88.75 and 85% for both SVM and RF, respectively. We also demonstrated that the new red-edge band in the RapidEye sensor has the potential for classifying tree species in semiarid environments when integrated with other standard bands. Similarly, we observed that where there are limited training samples, SVM is preferred over RF. Finally, we demonstrated that the two accuracy measures of quantity and allocation disagreement are simpler and more helpful for the vast majority of remote sensing classification process than the kappa coefficient. Overall, high species classification can be achieved using strategically located RapidEye bands integrated with advanced processing algorithms.

  12. Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals

    DTIC Science & Technology

    2014-09-30

    floor 1176 Howell St Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR...APPROACH Odontocete click detection and classification. A multi-class support vector machine (SVM) classifier was previously developed ( Jarvis ...beaked whales, Risso’s dolphins, short-finned pilot whales, and sperm whales. Here Moretti’s group, particularly S. Jarvis , is improving the SVM

  13. New support vector machine-based method for microRNA target prediction.

    PubMed

    Li, L; Gao, Q; Mao, X; Cao, Y

    2014-06-09

    MicroRNA (miRNA) plays important roles in cell differentiation, proliferation, growth, mobility, and apoptosis. An accurate list of precise target genes is necessary in order to fully understand the importance of miRNAs in animal development and disease. Several computational methods have been proposed for miRNA target-gene identification. However, these methods still have limitations with respect to their sensitivity and accuracy. Thus, we developed a new miRNA target-prediction method based on the support vector machine (SVM) model. The model supplies information of two binding sites (primary and secondary) for a radial basis function kernel as a similarity measure for SVM features. The information is categorized based on structural, thermodynamic, and sequence conservation. Using high-confidence datasets selected from public miRNA target databases, we obtained a human miRNA target SVM classifier model with high performance and provided an efficient tool for human miRNA target gene identification. Experiments have shown that our method is a reliable tool for miRNA target-gene prediction, and a successful application of an SVM classifier. Compared with other methods, the method proposed here improves the sensitivity and accuracy of miRNA prediction. Its performance can be further improved by providing more training examples.

  14. Supervised learning methods for pathological arterial pulse wave differentiation: A SVM and neural networks approach.

    PubMed

    Paiva, Joana S; Cardoso, João; Pereira, Tânia

    2018-01-01

    The main goal of this study was to develop an automatic method based on supervised learning methods, able to distinguish healthy from pathologic arterial pulse wave (APW), and those two from noisy waveforms (non-relevant segments of the signal), from the data acquired during a clinical examination with a novel optical system. The APW dataset analysed was composed by signals acquired in a clinical environment from a total of 213 subjects, including healthy volunteers and non-healthy patients. The signals were parameterised by means of 39pulse features: morphologic, time domain statistics, cross-correlation features, wavelet features. Multiclass Support Vector Machine Recursive Feature Elimination (SVM RFE) method was used to select the most relevant features. A comparative study was performed in order to evaluate the performance of the two classifiers: Support Vector Machine (SVM) and Artificial Neural Network (ANN). SVM achieved a statistically significant better performance for this problem with an average accuracy of 0.9917±0.0024 and a F-Measure of 0.9925±0.0019, in comparison with ANN, which reached the values of 0.9847±0.0032 and 0.9852±0.0031 for Accuracy and F-Measure, respectively. A significant difference was observed between the performances obtained with SVM classifier using a different number of features from the original set available. The comparison between SVM and NN allowed reassert the higher performance of SVM. The results obtained in this study showed the potential of the proposed method to differentiate those three important signal outcomes (healthy, pathologic and noise) and to reduce bias associated with clinical diagnosis of cardiovascular disease using APW. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Fuzzy support vector machine for microarray imbalanced data classification

    NASA Astrophysics Data System (ADS)

    Ladayya, Faroh; Purnami, Santi Wulan; Irhamah

    2017-11-01

    DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different contributions to the classifier. The minority classes have large fuzzy membership so FSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is a high dimensional data with a very large number of features, it is necessary to do feature selection first using Fast Correlation based Filter (FCBF). In this study will be analyzed by SVM, FSVM and both methods by applying FCBF and get the classification performance of them. Based on the overall results, FSVM on selected features has the best classification performance compared to SVM.

  16. Semi-supervised vibration-based classification and condition monitoring of compressors

    NASA Astrophysics Data System (ADS)

    Potočnik, Primož; Govekar, Edvard

    2017-09-01

    Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.

  17. A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories.

    PubMed

    Hasan, Mehedi; Kotov, Alexander; Carcone, April; Dong, Ming; Naar, Sylvie; Hartlieb, Kathryn Brogan

    2016-08-01

    This study examines the effectiveness of state-of-the-art supervised machine learning methods in conjunction with different feature types for the task of automatic annotation of fragments of clinical text based on codebooks with a large number of categories. We used a collection of motivational interview transcripts consisting of 11,353 utterances, which were manually annotated by two human coders as the gold standard, and experimented with state-of-art classifiers, including Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), Random Forest (RF), AdaBoost, DiscLDA, Conditional Random Fields (CRF) and Convolutional Neural Network (CNN) in conjunction with lexical, contextual (label of the previous utterance) and semantic (distribution of words in the utterance across the Linguistic Inquiry and Word Count dictionaries) features. We found out that, when the number of classes is large, the performance of CNN and CRF is inferior to SVM. When only lexical features were used, interview transcripts were automatically annotated by SVM with the highest classification accuracy among all classifiers of 70.8%, 61% and 53.7% based on the codebooks consisting of 17, 20 and 41 codes, respectively. Using contextual and semantic features, as well as their combination, in addition to lexical ones, improved the accuracy of SVM for annotation of utterances in motivational interview transcripts with a codebook consisting of 17 classes to 71.5%, 74.2%, and 75.1%, respectively. Our results demonstrate the potential of using machine learning methods in conjunction with lexical, semantic and contextual features for automatic annotation of clinical interview transcripts with near-human accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. Multisource image fusion method using support value transform.

    PubMed

    Zheng, Sheng; Shi, Wen-Zhong; Liu, Jian; Zhu, Guang-Xi; Tian, Jin-Wen

    2007-07-01

    With the development of numerous imaging sensors, many images can be simultaneously pictured by various sensors. However, there are many scenarios where no one sensor can give the complete picture. Image fusion is an important approach to solve this problem and produces a single image which preserves all relevant information from a set of different sensors. In this paper, we proposed a new image fusion method using the support value transform, which uses the support value to represent the salient features of image. This is based on the fact that, in support vector machines (SVMs), the data with larger support values have a physical meaning in the sense that they reveal relative more importance of the data points for contributing to the SVM model. The mapped least squares SVM (mapped LS-SVM) is used to efficiently compute the support values of image. The support value analysis is developed by using a series of multiscale support value filters, which are obtained by filling zeros in the basic support value filter deduced from the mapped LS-SVM to match the resolution of the desired level. Compared with the widely used image fusion methods, such as the Laplacian pyramid, discrete wavelet transform methods, the proposed method is an undecimated transform-based approach. The fusion experiments are undertaken on multisource images. The results demonstrate that the proposed approach is effective and is superior to the conventional image fusion methods in terms of the pertained quantitative fusion evaluation indexes, such as quality of visual information (Q(AB/F)), the mutual information, etc.

  19. Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology.

    PubMed

    Heinson, Ashley I; Gunawardana, Yawwani; Moesker, Bastiaan; Hume, Carmen C Denman; Vataga, Elena; Hall, Yper; Stylianou, Elena; McShane, Helen; Williams, Ann; Niranjan, Mahesan; Woelk, Christopher H

    2017-02-01

    Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.

  20. 3D-QSAR studies of some reversible Acetyl cholinesterase inhibitors based on CoMFA and ligand protein interaction fingerprints using PC-LS-SVM and PLS-LS-SVM.

    PubMed

    Ghafouri, Hamidreza; Ranjbar, Mohsen; Sakhteman, Amirhossein

    2017-08-01

    A great challenge in medicinal chemistry is to develop different methods for structural design based on the pattern of the previously synthesized compounds. In this study two different QSAR methods were established and compared for a series of piperidine acetylcholinesterase inhibitors. In one novel approach, PC-LS-SVM and PLS-LS-SVM was used for modeling 3D interaction descriptors, and in the other method the same nonlinear techniques were used to build QSAR equations based on field descriptors. Different validation methods were used to evaluate the models and the results revealed the more applicability and predictive ability of the model generated by field descriptors (Q 2 LOO-CV =1, R 2 ext =0.97). External validation criteria revealed that both methods can be used in generating reasonable QSAR models. It was concluded that due to ability of interaction descriptors in prediction of binding mode, using this approach can be implemented in future 3D-QSAR softwares. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Bands selection and classification of hyperspectral images based on hybrid kernels SVM by evolutionary algorithm

    NASA Astrophysics Data System (ADS)

    Hu, Yan-Yan; Li, Dong-Sheng

    2016-01-01

    The hyperspectral images(HSI) consist of many closely spaced bands carrying the most object information. While due to its high dimensionality and high volume nature, it is hard to get satisfactory classification performance. In order to reduce HSI data dimensionality preparation for high classification accuracy, it is proposed to combine a band selection method of artificial immune systems (AIS) with a hybrid kernels support vector machine (SVM-HK) algorithm. In fact, after comparing different kernels for hyperspectral analysis, the approach mixed radial basis function kernel (RBF-K) with sigmoid kernel (Sig-K) and applied the optimized hybrid kernels in SVM classifiers. Then the SVM-HK algorithm used to induce the bands selection of an improved version of AIS. The AIS was composed of clonal selection and elite antibody mutation, including evaluation process with optional index factor (OIF). Experimental classification performance was on a San Diego Naval Base acquired by AVIRIS, the HRS dataset shows that the method is able to efficiently achieve bands redundancy removal while outperforming the traditional SVM classifier.

  2. A survey of supervised machine learning models for mobile-phone based pathogen identification and classification

    NASA Astrophysics Data System (ADS)

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Tseng, Derek; Benien, Parul; Ozcan, Aydogan

    2017-03-01

    Giardia lamblia causes a disease known as giardiasis, which results in diarrhea, abdominal cramps, and bloating. Although conventional pathogen detection methods used in water analysis laboratories offer high sensitivity and specificity, they are time consuming, and need experts to operate bulky equipment and analyze the samples. Here we present a field-portable and cost-effective smartphone-based waterborne pathogen detection platform that can automatically classify Giardia cysts using machine learning. Our platform enables the detection and quantification of Giardia cysts in one hour, including sample collection, labeling, filtration, and automated counting steps. We evaluated the performance of three prototypes using Giardia-spiked water samples from different sources (e.g., reagent-grade, tap, non-potable, and pond water samples). We populated a training database with >30,000 cysts and estimated our detection sensitivity and specificity using 20 different classifier models, including decision trees, nearest neighbor classifiers, support vector machines (SVMs), and ensemble classifiers, and compared their speed of training and classification, as well as predicted accuracies. Among them, cubic SVM, medium Gaussian SVM, and bagged-trees were the most promising classifier types with accuracies of 94.1%, 94.2%, and 95%, respectively; we selected the latter as our preferred classifier for the detection and enumeration of Giardia cysts that are imaged using our mobile-phone fluorescence microscope. Without the need for any experts or microbiologists, this field-portable pathogen detection platform can present a useful tool for water quality monitoring in resource-limited-settings.

  3. A support vector machine approach for classification of welding defects from ultrasonic signals

    NASA Astrophysics Data System (ADS)

    Chen, Yuan; Ma, Hong-Wei; Zhang, Guang-Ming

    2014-07-01

    Defect classification is an important issue in ultrasonic non-destructive evaluation. A layered multi-class support vector machine (LMSVM) classification system, which combines multiple SVM classifiers through a layered architecture, is proposed in this paper. The proposed LMSVM classification system is applied to the classification of welding defects from ultrasonic test signals. The measured ultrasonic defect echo signals are first decomposed into wavelet coefficients by the wavelet packet transform. The energy of the wavelet coefficients at different frequency channels are used to construct the feature vectors. The bees algorithm (BA) is then used for feature selection and SVM parameter optimisation for the LMSVM classification system. The BA-based feature selection optimises the energy feature vectors. The optimised feature vectors are input to the LMSVM classification system for training and testing. Experimental results of classifying welding defects demonstrate that the proposed technique is highly robust, precise and reliable for ultrasonic defect classification.

  4. Unresolved Galaxy Classifier for ESA/Gaia mission: Support Vector Machines approach

    NASA Astrophysics Data System (ADS)

    Bellas-Velidis, Ioannis; Kontizas, Mary; Dapergolas, Anastasios; Livanou, Evdokia; Kontizas, Evangelos; Karampelas, Antonios

    A software package Unresolved Galaxy Classifier (UGC) is being developed for the ground-based pipeline of ESA's Gaia mission. It aims to provide an automated taxonomic classification and specific parameters estimation analyzing Gaia BP/RP instrument low-dispersion spectra of unresolved galaxies. The UGC algorithm is based on a supervised learning technique, the Support Vector Machines (SVM). The software is implemented in Java as two separate modules. An offline learning module provides functions for SVM-models training. Once trained, the set of models can be repeatedly applied to unknown galaxy spectra by the pipeline's application module. A library of galaxy models synthetic spectra, simulated for the BP/RP instrument, is used to train and test the modules. Science tests show a very good classification performance of UGC and relatively good regression performance, except for some of the parameters. Possible approaches to improve the performance are discussed.

  5. Support Vector Machines to improve physiologic hot flash measures: application to the ambulatory setting.

    PubMed

    Thurston, Rebecca C; Hernandez, Javier; Del Rio, Jose M; De La Torre, Fernando

    2011-07-01

    Most midlife women have hot flashes. The conventional criterion (≥2 μmho rise/30 s) for classifying hot flashes physiologically has shown poor performance. We improved this performance in the laboratory with Support Vector Machines (SVMs), a pattern classification method. We aimed to compare conventional to SVM methods to classify hot flashes in the ambulatory setting. Thirty-one women with hot flashes underwent 24 h of ambulatory sternal skin conductance monitoring. Hot flashes were quantified with conventional (≥2 μmho/30 s) and SVM methods. Conventional methods had low sensitivity (sensitivity=.57, specificity=.98, positive predictive value (PPV)=.91, negative predictive value (NPV)=.90, F1=.60), with performance lower with higher body mass index (BMI). SVMs improved this performance (sensitivity=.87, specificity=.97, PPV=.90, NPV=.96, F1=.88) and reduced BMI variation. SVMs can improve ambulatory physiologic hot flash measures. Copyright © 2010 Society for Psychophysiological Research.

  6. A support vector machine classifier reduces interscanner variation in the HRCT classification of regional disease pattern in diffuse lung disease: Comparison to a Bayesian classifier

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, Yongjun; Lim, Jonghyuck; Kim, Namkug

    2013-05-15

    Purpose: To investigate the effect of using different computed tomography (CT) scanners on the accuracy of high-resolution CT (HRCT) images in classifying regional disease patterns in patients with diffuse lung disease, support vector machine (SVM) and Bayesian classifiers were applied to multicenter data. Methods: Two experienced radiologists marked sets of 600 rectangular 20 Multiplication-Sign 20 pixel regions of interest (ROIs) on HRCT images obtained from two scanners (GE and Siemens), including 100 ROIs for each of local patterns of lungs-normal lung and five of regional pulmonary disease patterns (ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation). Each ROI was assessedmore » using 22 quantitative features belonging to one of the following descriptors: histogram, gradient, run-length, gray level co-occurrence matrix, low-attenuation area cluster, and top-hat transform. For automatic classification, a Bayesian classifier and a SVM classifier were compared under three different conditions. First, classification accuracies were estimated using data from each scanner. Next, data from the GE and Siemens scanners were used for training and testing, respectively, and vice versa. Finally, all ROI data were integrated regardless of the scanner type and were then trained and tested together. All experiments were performed based on forward feature selection and fivefold cross-validation with 20 repetitions. Results: For each scanner, better classification accuracies were achieved with the SVM classifier than the Bayesian classifier (92% and 82%, respectively, for the GE scanner; and 92% and 86%, respectively, for the Siemens scanner). The classification accuracies were 82%/72% for training with GE data and testing with Siemens data, and 79%/72% for the reverse. The use of training and test data obtained from the HRCT images of different scanners lowered the classification accuracy compared to the use of HRCT images from the same scanner. For integrated ROI data obtained from both scanners, the classification accuracies with the SVM and Bayesian classifiers were 92% and 77%, respectively. The selected features resulting from the classification process differed by scanner, with more features included for the classification of the integrated HRCT data than for the classification of the HRCT data from each scanner. For the integrated data, consisting of HRCT images of both scanners, the classification accuracy based on the SVM was statistically similar to the accuracy of the data obtained from each scanner. However, the classification accuracy of the integrated data using the Bayesian classifier was significantly lower than the classification accuracy of the ROI data of each scanner. Conclusions: The use of an integrated dataset along with a SVM classifier rather than a Bayesian classifier has benefits in terms of the classification accuracy of HRCT images acquired with more than one scanner. This finding is of relevance in studies involving large number of images, as is the case in a multicenter trial with different scanners.« less

  7. "When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.

    PubMed

    Daniulaityte, Raminta; Chen, Lu; Lamy, Francois R; Carlson, Robert G; Thirunarayan, Krishnaprasad; Sheth, Amit

    2016-10-24

    To harness the full potential of social media for epidemiological surveillance of drug abuse trends, the field needs a greater level of automation in processing and analyzing social media content. The objective of the study is to describe the development of supervised machine-learning techniques for the eDrugTrends platform to automatically classify tweets by type/source of communication (personal, official/media, retail) and sentiment (positive, negative, neutral) expressed in cannabis- and synthetic cannabinoid-related tweets. Tweets were collected using Twitter streaming Application Programming Interface and filtered through the eDrugTrends platform using keywords related to cannabis, marijuana edibles, marijuana concentrates, and synthetic cannabinoids. After creating coding rules and assessing intercoder reliability, a manually labeled data set (N=4000) was developed by coding several batches of randomly selected subsets of tweets extracted from the pool of 15,623,869 collected by eDrugTrends (May-November 2015). Out of 4000 tweets, 25% (1000/4000) were used to build source classifiers and 75% (3000/4000) were used for sentiment classifiers. Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) were used to train the classifiers. Source classification (n=1000) tested Approach 1 that used short URLs, and Approach 2 where URLs were expanded and included into the bag-of-words analysis. For sentiment classification, Approach 1 used all tweets, regardless of their source/type (n=3000), while Approach 2 applied sentiment classification to personal communication tweets only (2633/3000, 88%). Multiclass and binary classification tasks were examined, and machine-learning sentiment classifier performance was compared with Valence Aware Dictionary for sEntiment Reasoning (VADER), a lexicon and rule-based method. The performance of each classifier was assessed using 5-fold cross validation that calculated average F-scores. One-tailed t test was used to determine if differences in F-scores were statistically significant. In multiclass source classification, the use of expanded URLs did not contribute to significant improvement in classifier performance (0.7972 vs 0.8102 for SVM, P=.19). In binary classification, the identification of all source categories improved significantly when unshortened URLs were used, with personal communication tweets benefiting the most (0.8736 vs 0.8200, P<.001). In multiclass sentiment classification Approach 1, SVM (0.6723) performed similarly to NB (0.6683) and LR (0.6703). In Approach 2, SVM (0.7062) did not differ from NB (0.6980, P=.13) or LR (F=0.6931, P=.05), but it was over 40% more accurate than VADER (F=0.5030, P<.001). In multiclass task, improvements in sentiment classification (Approach 2 vs Approach 1) did not reach statistical significance (eg, SVM: 0.7062 vs 0.6723, P=.052). In binary sentiment classification (positive vs negative), Approach 2 (focus on personal communication tweets only) improved classification results, compared with Approach 1, for LR (0.8752 vs 0.8516, P=.04) and SVM (0.8800 vs 0.8557, P=.045). The study provides an example of the use of supervised machine learning methods to categorize cannabis- and synthetic cannabinoid-related tweets with fairly high accuracy. Use of these content analysis tools along with geographic identification capabilities developed by the eDrugTrends platform will provide powerful methods for tracking regional changes in user opinions related to cannabis and synthetic cannabinoids use over time and across different regions.

  8. Comparison of two Classification methods (MLC and SVM) to extract land use and land cover in Johor Malaysia

    NASA Astrophysics Data System (ADS)

    Rokni Deilmai, B.; Ahmad, B. Bin; Zabihi, H.

    2014-06-01

    Mapping is essential for the analysis of the land use and land cover, which influence many environmental processes and properties. For the purpose of the creation of land cover maps, it is important to minimize error. These errors will propagate into later analyses based on these land cover maps. The reliability of land cover maps derived from remotely sensed data depends on an accurate classification. In this study, we have analyzed multispectral data using two different classifiers including Maximum Likelihood Classifier (MLC) and Support Vector Machine (SVM). To pursue this aim, Landsat Thematic Mapper data and identical field-based training sample datasets in Johor Malaysia used for each classification method, which results indicate in five land cover classes forest, oil palm, urban area, water, rubber. Classification results indicate that SVM was more accurate than MLC. With demonstrated capability to produce reliable cover results, the SVM methods should be especially useful for land cover classification.

  9. Modeling Personalized Email Prioritization: Classification-based and Regression-based Approaches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoo S.; Yang, Y.; Carbonell, J.

    2011-10-24

    Email overload, even after spam filtering, presents a serious productivity challenge for busy professionals and executives. One solution is automated prioritization of incoming emails to ensure the most important are read and processed quickly, while others are processed later as/if time permits in declining priority levels. This paper presents a study of machine learning approaches to email prioritization into discrete levels, comparing ordinal regression versus classier cascades. Given the ordinal nature of discrete email priority levels, SVM ordinal regression would be expected to perform well, but surprisingly a cascade of SVM classifiers significantly outperforms ordinal regression for email prioritization. Inmore » contrast, SVM regression performs well -- better than classifiers -- on selected UCI data sets. This unexpected performance inversion is analyzed and results are presented, providing core functionality for email prioritization systems.« less

  10. Recognition algorithm for assisting ovarian cancer diagnosis from coregistered ultrasound and photoacoustic images: ex vivo study

    NASA Astrophysics Data System (ADS)

    Alqasemi, Umar; Kumavor, Patrick; Aguirre, Andres; Zhu, Quing

    2012-12-01

    Unique features and the underlining hypotheses of how these features may relate to the tumor physiology in coregistered ultrasound and photoacoustic images of ex vivo ovarian tissue are introduced. The images were first compressed with wavelet transform. The mean Radon transform of photoacoustic images was then computed and fitted with a Gaussian function to find the centroid of a suspicious area for shift-invariant recognition process. Twenty-four features were extracted from a training set by several methods, including Fourier transform, image statistics, and different composite filters. The features were chosen from more than 400 training images obtained from 33 ex vivo ovaries of 24 patients, and used to train three classifiers, including generalized linear model, neural network, and support vector machine (SVM). The SVM achieved the best training performance and was able to exclusively separate cancerous from non-cancerous cases with 100% sensitivity and specificity. At the end, the classifiers were used to test 95 new images obtained from 37 ovaries of 20 additional patients. The SVM classifier achieved 76.92% sensitivity and 95.12% specificity. Furthermore, if we assume that recognizing one image as a cancer is sufficient to consider an ovary as malignant, the SVM classifier achieves 100% sensitivity and 87.88% specificity.

  11. Optimizing classification performance in an object-based very-high-resolution land use-land cover urban application

    NASA Astrophysics Data System (ADS)

    Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore

    2017-10-01

    This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.

  12. Balanced VS Imbalanced Training Data: Classifying Rapideye Data with Support Vector Machines

    NASA Astrophysics Data System (ADS)

    Ustuner, M.; Sanli, F. B.; Abdikan, S.

    2016-06-01

    The accuracy of supervised image classification is highly dependent upon several factors such as the design of training set (sample selection, composition, purity and size), resolution of input imagery and landscape heterogeneity. The design of training set is still a challenging issue since the sensitivity of classifier algorithm at learning stage is different for the same dataset. In this paper, the classification of RapidEye imagery with balanced and imbalanced training data for mapping the crop types was addressed. Classification with imbalanced training data may result in low accuracy in some scenarios. Support Vector Machines (SVM), Maximum Likelihood (ML) and Artificial Neural Network (ANN) classifications were implemented here to classify the data. For evaluating the influence of the balanced and imbalanced training data on image classification algorithms, three different training datasets were created. Two different balanced datasets which have 70 and 100 pixels for each class of interest and one imbalanced dataset in which each class has different number of pixels were used in classification stage. Results demonstrate that ML and NN classifications are affected by imbalanced training data in resulting a reduction in accuracy (from 90.94% to 85.94% for ML and from 91.56% to 88.44% for NN) while SVM is not affected significantly (from 94.38% to 94.69%) and slightly improved. Our results highlighted that SVM is proven to be a very robust, consistent and effective classifier as it can perform very well under balanced and imbalanced training data situations. Furthermore, the training stage should be precisely and carefully designed for the need of adopted classifier.

  13. Comparison of water extraction methods in Tibet based on GF-1 data

    NASA Astrophysics Data System (ADS)

    Jia, Lingjun; Shang, Kun; Liu, Jing; Sun, Zhongqing

    2018-03-01

    In this study, we compared four different water extraction methods with GF-1 data according to different water types in Tibet, including Support Vector Machine (SVM), Principal Component Analysis (PCA), Decision Tree Classifier based on False Normalized Difference Water Index (FNDWI-DTC), and PCA-SVM. The results show that all of the four methods can extract large area water body, but only SVM and PCA-SVM can obtain satisfying extraction results for small size water body. The methods were evaluated by both overall accuracy (OAA) and Kappa coefficient (KC). The OAA of PCA-SVM, SVM, FNDWI-DTC, PCA are 96.68%, 94.23%, 93.99%, 93.01%, and the KCs are 0.9308, 0.8995, 0.8962, 0.8842, respectively, in consistent with visual inspection. In summary, SVM is better for narrow rivers extraction and PCA-SVM is suitable for water extraction of various types. As for dark blue lakes, the methods using PCA can extract more quickly and accurately.

  14. A two-dimensional matrix image based feature extraction method for classification of sEMG: A comparative analysis based on SVM, KNN and RBF-NN.

    PubMed

    Wen, Tingxi; Zhang, Zhongnan; Qiu, Ming; Zeng, Ming; Luo, Weizhen

    2017-01-01

    The computer mouse is an important human-computer interaction device. But patients with physical finger disability are unable to operate this device. Surface EMG (sEMG) can be monitored by electrodes on the skin surface and is a reflection of the neuromuscular activities. Therefore, we can control limbs auxiliary equipment by utilizing sEMG classification in order to help the physically disabled patients to operate the mouse. To develop a new a method to extract sEMG generated by finger motion and apply novel features to classify sEMG. A window-based data acquisition method was presented to extract signal samples from sEMG electordes. Afterwards, a two-dimensional matrix image based feature extraction method, which differs from the classical methods based on time domain or frequency domain, was employed to transform signal samples to feature maps used for classification. In the experiments, sEMG data samples produced by the index and middle fingers at the click of a mouse button were separately acquired. Then, characteristics of the samples were analyzed to generate a feature map for each sample. Finally, the machine learning classification algorithms (SVM, KNN, RBF-NN) were employed to classify these feature maps on a GPU. The study demonstrated that all classifiers can identify and classify sEMG samples effectively. In particular, the accuracy of the SVM classifier reached up to 100%. The signal separation method is a convenient, efficient and quick method, which can effectively extract the sEMG samples produced by fingers. In addition, unlike the classical methods, the new method enables to extract features by enlarging sample signals' energy appropriately. The classical machine learning classifiers all performed well by using these features.

  15. Understanding user intents in online health forums.

    PubMed

    Zhang, Thomas; Cho, Jason H D; Zhai, Chengxiang

    2015-07-01

    Online health forums provide a convenient way for patients to obtain medical information and connect with physicians and peers outside of clinical settings. However, large quantities of unstructured and diversified content generated on these forums make it difficult for users to digest and extract useful information. Understanding user intents would enable forums to find and recommend relevant information to users by filtering out threads that do not match particular intents. In this paper, we derive a taxonomy of intents to capture user information needs in online health forums and propose novel pattern-based features for use with a multiclass support vector machine (SVM) classifier to classify original thread posts according to their underlying intents. Since no dataset existed for this task, we employ three annotators to manually label a dataset of 1192 HealthBoards posts spanning four forum topics. Experimental results show that a SVM using pattern-based features is highly capable of identifying user intents in forum posts, reaching a maximum precision of 75%, and that a SVM-based hierarchical classifier using both pattern and word features outperforms its SVM counterpart that uses only word features. Furthermore, comparable classification performance can be achieved by training and testing on posts from different forum topics.

  16. Discrimination and Measurements of Three Flavonols with Similar Structure Using Terahertz Spectroscopy and Chemometrics

    NASA Astrophysics Data System (ADS)

    Yan, Ling; Liu, Changhong; Qu, Hao; Liu, Wei; Zhang, Yan; Yang, Jianbo; Zheng, Lei

    2018-03-01

    Terahertz (THz) technique, a recently developed spectral method, has been researched and used for the rapid discrimination and measurements of food compositions due to its low-energy and non-ionizing characteristics. In this study, THz spectroscopy combined with chemometrics has been utilized for qualitative and quantitative analysis of myricetin, quercetin, and kaempferol with concentrations of 0.025, 0.05, and 0.1 mg/mL. The qualitative discrimination was achieved by KNN, ELM, and RF models with the spectra pre-treatments. An excellent discrimination (100% CCR in the prediction set) could be achieved using the RF model. Furthermore, the quantitative analyses were performed by partial least square regression (PLSR) and least squares support vector machine (LS-SVM). Comparing to the PLSR models, the LS-SVM yielded better results with low RMSEP (0.0044, 0.0039, and 0.0048), higher Rp (0.9601, 0.9688, and 0.9359), and higher RPD (8.6272, 9.6333, and 7.9083) for myricetin, quercetin, and kaempferol, respectively. Our results demonstrate that THz spectroscopy technique is a powerful tool for identification of three flavonols with similar chemical structures and quantitative determination of their concentrations.

  17. PLS-LS-SVM based modeling of ATR-IR as a robust method in detection and qualification of alprazolam

    NASA Astrophysics Data System (ADS)

    Parhizkar, Elahehnaz; Ghazali, Mohammad; Ahmadi, Fatemeh; Sakhteman, Amirhossein

    2017-02-01

    According to the United States pharmacopeia (USP), Gold standard technique for Alprazolam determination in dosage forms is HPLC, an expensive and time-consuming method that is not easy to approach. In this study chemometrics assisted ATR-IR was introduced as an alternative method that produce similar results in fewer time and energy consumed manner. Fifty-eight samples containing different concentrations of commercial alprazolam were evaluated by HPLC and ATR-IR method. A preprocessing approach was applied to convert raw data obtained from ATR-IR spectra to normal matrix. Finally, a relationship between alprazolam concentrations achieved by HPLC and ATR-IR data was established using PLS-LS-SVM (partial least squares least squares support vector machines). Consequently, validity of the method was verified to yield a model with low error values (root mean square error of cross validation equal to 0.98). The model was able to predict about 99% of the samples according to R2 of prediction set. Response permutation test was also applied to affirm that the model was not assessed by chance correlations. At conclusion, ATR-IR can be a reliable method in manufacturing process in detection and qualification of alprazolam content.

  18. An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics

    PubMed Central

    Torii, Manabu; Yin, Lanlan; Nguyen, Thang; Mazumdar, Chand T.; Liu, Hongfang; Hartley, David M.; Nelson, Noele P.

    2014-01-01

    Purpose Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles. Methods Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers. Results Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period. Conclusions The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages. PMID:21134784

  19. Enhancement of gesture recognition for contactless interface using a personalized classifier in the operating room.

    PubMed

    Cho, Yongwon; Lee, Areum; Park, Jongha; Ko, Bemseok; Kim, Namkug

    2018-07-01

    Contactless operating room (OR) interfaces are important for computer-aided surgery, and have been developed to decrease the risk of contamination during surgical procedures. In this study, we used Leap Motion™, with a personalized automated classifier, to enhance the accuracy of gesture recognition for contactless interfaces. This software was trained and tested on a personal basis that means the training of gesture per a user. We used 30 features including finger and hand data, which were computed, selected, and fed into a multiclass support vector machine (SVM), and Naïve Bayes classifiers and to predict and train five types of gestures including hover, grab, click, one peak, and two peaks. Overall accuracy of the five gestures was 99.58% ± 0.06, and 98.74% ± 3.64 on a personal basis using SVM and Naïve Bayes classifiers, respectively. We compared gesture accuracy across the entire dataset and used SVM and Naïve Bayes classifiers to examine the strength of personal basis training. We developed and enhanced non-contact interfaces with gesture recognition to enhance OR control systems. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Comparative evaluation of support vector machine classification for computer aided detection of breast masses in mammography

    NASA Astrophysics Data System (ADS)

    Lesniak, J. M.; Hupse, R.; Blanc, R.; Karssemeijer, N.; Székely, G.

    2012-08-01

    False positive (FP) marks represent an obstacle for effective use of computer-aided detection (CADe) of breast masses in mammography. Typically, the problem can be approached either by developing more discriminative features or by employing different classifier designs. In this paper, the usage of support vector machine (SVM) classification for FP reduction in CADe is investigated, presenting a systematic quantitative evaluation against neural networks, k-nearest neighbor classification, linear discriminant analysis and random forests. A large database of 2516 film mammography examinations and 73 input features was used to train the classifiers and evaluate for their performance on correctly diagnosed exams as well as false negatives. Further, classifier robustness was investigated using varying training data and feature sets as input. The evaluation was based on the mean exam sensitivity in 0.05-1 FPs on normals on the free-response receiver operating characteristic curve (FROC), incorporated into a tenfold cross validation framework. It was found that SVM classification using a Gaussian kernel offered significantly increased detection performance (P = 0.0002) compared to the reference methods. Varying training data and input features, SVMs showed improved exploitation of large feature sets. It is concluded that with the SVM-based CADe a significant reduction of FPs is possible outperforming other state-of-the-art approaches for breast mass CADe.

  1. Evaluation of Chilling Injury in Mangoes Using Multispectral Imaging.

    PubMed

    Hashim, Norhashila; Onwude, Daniel I; Osman, Muhamad Syafiq

    2018-05-01

    Commodities originating from tropical and subtropical climes are prone to chilling injury (CI). This injury could affect the quality and marketing potential of mango after harvest. This will later affect the quality of the produce and subsequent consumer acceptance. In this study, the appearance of CI symptoms in mango was evaluated non-destructively using multispectral imaging. The fruit were stored at 4 °C to induce CI and 12 °C to preserve the quality of the control samples for 4 days before they were taken out and stored at ambient temperature for 24 hr. Measurements using multispectral imaging and standard reference methods were conducted before and after storage. The performance of multispectral imaging was compared using standard reference properties including moisture content (MC), total soluble solids (TSS) content, firmness, pH, and color. Least square support vector machine (LS-SVM) combined with principal component analysis (PCA) were used to discriminate CI samples with those of control and before storage, respectively. The statistical results demonstrated significant changes in the reference quality properties of samples before and after storage. The results also revealed that multispectral parameters have a strong correlation with the reference parameters of L * , a * , TSS, and MC. The MC and L * were found to be the best reference parameters in identifying the severity of CI in mangoes. PCA and LS-SVM analysis indicated that the fruit were successfully classified into their categories, that is, before storage, control, and CI. This indicated that the multispectral imaging technique is feasible for detecting CI in mangoes during postharvest storage and processing. This paper demonstrates a fast, easy, and accurate method of identifying the effect of cold storage on mango, nondestructively. The method presented in this paper can be used industrially to efficiently differentiate different fruits from each other after low temperature storage. © 2018 Institute of Food Technologists®.

  2. Prediction of cell penetrating peptides by support vector machines.

    PubMed

    Sanders, William S; Johnston, C Ian; Bridges, Susan M; Burgess, Shane C; Willeford, Kenneth O

    2011-07-01

    Cell penetrating peptides (CPPs) are those peptides that can transverse cell membranes to enter cells. Once inside the cell, different CPPs can localize to different cellular components and perform different roles. Some generate pore-forming complexes resulting in the destruction of cells while others localize to various organelles. Use of machine learning methods to predict potential new CPPs will enable more rapid screening for applications such as drug delivery. We have investigated the influence of the composition of training datasets on the ability to classify peptides as cell penetrating using support vector machines (SVMs). We identified 111 known CPPs and 34 known non-penetrating peptides from the literature and commercial vendors and used several approaches to build training data sets for the classifiers. Features were calculated from the datasets using a set of basic biochemical properties combined with features from the literature determined to be relevant in the prediction of CPPs. Our results using different training datasets confirm the importance of a balanced training set with approximately equal number of positive and negative examples. The SVM based classifiers have greater classification accuracy than previously reported methods for the prediction of CPPs, and because they use primary biochemical properties of the peptides as features, these classifiers provide insight into the properties needed for cell-penetration. To confirm our SVM classifications, a subset of peptides classified as either penetrating or non-penetrating was selected for synthesis and experimental validation. Of the synthesized peptides predicted to be CPPs, 100% of these peptides were shown to be penetrating.

  3. Pattern Recognition Approaches for Breast Cancer DCE-MRI Classification: A Systematic Review.

    PubMed

    Fusco, Roberta; Sansone, Mario; Filice, Salvatore; Carone, Guglielmo; Amato, Daniela Maria; Sansone, Carlo; Petrillo, Antonella

    2016-01-01

    We performed a systematic review of several pattern analysis approaches for classifying breast lesions using dynamic, morphological, and textural features in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). Several machine learning approaches, namely artificial neural networks (ANN), support vector machines (SVM), linear discriminant analysis (LDA), tree-based classifiers (TC), and Bayesian classifiers (BC), and features used for classification are described. The findings of a systematic review of 26 studies are presented. The sensitivity and specificity are respectively 91 and 83 % for ANN, 85 and 82 % for SVM, 96 and 85 % for LDA, 92 and 87 % for TC, and 82 and 85 % for BC. The sensitivity and specificity are respectively 82 and 74 % for dynamic features, 93 and 60 % for morphological features, 88 and 81 % for textural features, 95 and 86 % for a combination of dynamic and morphological features, and 88 and 84 % for a combination of dynamic, morphological, and other features. LDA and TC have the best performance. A combination of dynamic and morphological features gives the best performance.

  4. Classification of hadith into positive suggestion, negative suggestion, and information

    NASA Astrophysics Data System (ADS)

    Faraby, Said Al; Riviera Rachmawati Jasin, Eliza; Kusumaningrum, Andina; Adiwijaya

    2018-03-01

    As one of the Muslim life guidelines, based on the meaning of its sentence(s), a hadith can be viewed as a suggestion for doing something, or a suggestion for not doing something, or just information without any suggestion. In this paper, we tried to classify the Bahasa translation of hadith into the three categories using machine learning approach. We tried stemming and stopword removal in preprocessing, and TF-IDF of unigram, bigram, and trigram as the extracted features. As the classifier, we compared between SVM and Neural Network. Since the categories are new, so in order to compare the results of the previous pipelines, we created a baseline classifier using simple rule-based string matching technique. The rule-based algorithm conditions on the occurrence of words such as “janganlah, sholatlah, and so on” to determine the category. The baseline method achieved F1-Score of 0.69, while the best F1-Score from the machine learning approach was 0.88, and it was produced by SVM model with the linear kernel.

  5. Area Determination of Diabetic Foot Ulcer Images Using a Cascaded Two-Stage SVM-Based Classification.

    PubMed

    Wang, Lei; Pedersen, Peder C; Agu, Emmanuel; Strong, Diane M; Tulu, Bengisu

    2017-09-01

    The standard chronic wound assessment method based on visual examination is potentially inaccurate and also represents a significant clinical workload. Hence, computer-based systems providing quantitative wound assessment may be valuable for accurately monitoring wound healing status, with the wound area the best suited for automated analysis. Here, we present a novel approach, using support vector machines (SVM) to determine the wound boundaries on foot ulcer images captured with an image capture box, which provides controlled lighting and range. After superpixel segmentation, a cascaded two-stage classifier operates as follows: in the first stage, a set of k binary SVM classifiers are trained and applied to different subsets of the entire training images dataset, and incorrectly classified instances are collected. In the second stage, another binary SVM classifier is trained on the incorrectly classified set. We extracted various color and texture descriptors from superpixels that are used as input for each stage in the classifier training. Specifically, color and bag-of-word representations of local dense scale invariant feature transformation features are descriptors for ruling out irrelevant regions, and color and wavelet-based features are descriptors for distinguishing healthy tissue from wound regions. Finally, the detected wound boundary is refined by applying the conditional random field method. We have implemented the wound classification on a Nexus 5 smartphone platform, except for training which was done offline. Results are compared with other classifiers and show that our approach provides high global performance rates (average sensitivity = 73.3%, specificity = 94.6%) and is sufficiently efficient for a smartphone-based image analysis.

  6. Arbitrary norm support vector machines.

    PubMed

    Huang, Kaizhu; Zheng, Danian; King, Irwin; Lyu, Michael R

    2009-02-01

    Support vector machines (SVM) are state-of-the-art classifiers. Typically L2-norm or L1-norm is adopted as a regularization term in SVMs, while other norm-based SVMs, for example, the L0-norm SVM or even the L(infinity)-norm SVM, are rarely seen in the literature. The major reason is that L0-norm describes a discontinuous and nonconvex term, leading to a combinatorially NP-hard optimization problem. In this letter, motivated by Bayesian learning, we propose a novel framework that can implement arbitrary norm-based SVMs in polynomial time. One significant feature of this framework is that only a sequence of sequential minimal optimization problems needs to be solved, thus making it practical in many real applications. The proposed framework is important in the sense that Bayesian priors can be efficiently plugged into most learning methods without knowing the explicit form. Hence, this builds a connection between Bayesian learning and the kernel machines. We derive the theoretical framework, demonstrate how our approach works on the L0-norm SVM as a typical example, and perform a series of experiments to validate its advantages. Experimental results on nine benchmark data sets are very encouraging. The implemented L0-norm is competitive with or even better than the standard L2-norm SVM in terms of accuracy but with a reduced number of support vectors, -9.46% of the number on average. When compared with another sparse model, the relevance vector machine, our proposed algorithm also demonstrates better sparse properties with a training speed over seven times faster.

  7. Realistic Subsurface Anomaly Discrimination Using Electromagnetic Induction and an SVM Classifier

    NASA Astrophysics Data System (ADS)

    Pablo Fernández, Juan; Shubitidze, Fridon; Shamatava, Irma; Barrowes, Benjamin E.; O'Neill, Kevin

    2010-12-01

    The environmental research program of the United States military has set up blind tests for detection and discrimination of unexploded ordnance. One such test consists of measurements taken with the EM-63 sensor at Camp Sibert, AL. We review the performance on the test of a procedure that combines a field-potential (HAP) method to locate targets, the normalized surface magnetic source (NSMS) model to characterize them, and a support vector machine (SVM) to classify them. The HAP method infers location from the scattered magnetic field and its associated scalar potential, the latter reconstructed using equivalent sources. NSMS replaces the target with an enclosing spheroid of equivalent radial magnetization whose integral it uses as a discriminator. SVM generalizes from empirical evidence and can be adapted for multiclass discrimination using a voting system. Our method identifies all potentially dangerous targets correctly and has a false-alarm rate of about 5%.

  8. Coregistered photoacoustic and ultrasound imaging and classification of ovarian cancer: ex vivo and in vivo studies

    NASA Astrophysics Data System (ADS)

    Salehi, Hassan S.; Li, Hai; Merkulov, Alex; Kumavor, Patrick D.; Vavadi, Hamed; Sanders, Melinda; Kueck, Angela; Brewer, Molly A.; Zhu, Quing

    2016-04-01

    Most ovarian cancers are diagnosed at advanced stages due to the lack of efficacious screening techniques. Photoacoustic tomography (PAT) has a potential to image tumor angiogenesis and detect early neovascular changes of the ovary. We have developed a coregistered PAT and ultrasound (US) prototype system for real-time assessment of ovarian masses. Features extracted from PAT and US angular beams, envelopes, and images were input to a logistic classifier and a support vector machine (SVM) classifier to diagnose ovaries as benign or malignant. A total of 25 excised ovaries of 15 patients were studied and the logistic and SVM classifiers achieved sensitivities of 70.4 and 87.7%, and specificities of 95.6 and 97.9%, respectively. Furthermore, the ovaries of two patients were noninvasively imaged using the PAT/US system before surgical excision. By using five significant features and the logistic classifier, 12 out of 14 images (86% sensitivity) from a malignant ovarian mass and all 17 images (100% specificity) from a benign mass were accurately classified; the SVM correctly classified 10 out of 14 malignant images (71% sensitivity) and all 17 benign images (100% specificity). These initial results demonstrate the clinical potential of the PAT/US technique for ovarian cancer diagnosis.

  9. Successful classification of cocaine dependence using brain imaging: a generalizable machine learning approach.

    PubMed

    Mete, Mutlu; Sakoglu, Unal; Spence, Jeffrey S; Devous, Michael D; Harris, Thomas S; Adinoff, Bryon

    2016-10-06

    Neuroimaging studies have yielded significant advances in the understanding of neural processes relevant to the development and persistence of addiction. However, these advances have not explored extensively for diagnostic accuracy in human subjects. The aim of this study was to develop a statistical approach, using a machine learning framework, to correctly classify brain images of cocaine-dependent participants and healthy controls. In this study, a framework suitable for educing potential brain regions that differed between the two groups was developed and implemented. Single Photon Emission Computerized Tomography (SPECT) images obtained during rest or a saline infusion in three cohorts of 2-4 week abstinent cocaine-dependent participants (n = 93) and healthy controls (n = 69) were used to develop a classification model. An information theoretic-based feature selection algorithm was first conducted to reduce the number of voxels. A density-based clustering algorithm was then used to form spatially connected voxel clouds in three-dimensional space. A statistical classifier, Support Vectors Machine (SVM), was then used for participant classification. Statistically insignificant voxels of spatially connected brain regions were removed iteratively and classification accuracy was reported through the iterations. The voxel-based analysis identified 1,500 spatially connected voxels in 30 distinct clusters after a grid search in SVM parameters. Participants were successfully classified with 0.88 and 0.89 F-measure accuracies in 10-fold cross validation (10xCV) and leave-one-out (LOO) approaches, respectively. Sensitivity and specificity were 0.90 and 0.89 for LOO; 0.83 and 0.83 for 10xCV. Many of the 30 selected clusters are highly relevant to the addictive process, including regions relevant to cognitive control, default mode network related self-referential thought, behavioral inhibition, and contextual memories. Relative hyperactivity and hypoactivity of regional cerebral blood flow in brain regions in cocaine-dependent participants are presented with corresponding level of significance. The SVM-based approach successfully classified cocaine-dependent and healthy control participants using voxels selected with information theoretic-based and statistical methods from participants' SPECT data. The regions found in this study align with brain regions reported in the literature. These findings support the future use of brain imaging and SVM-based classifier in the diagnosis of substance use disorders and furthering an understanding of their underlying pathology.

  10. SVM Based Descriptor Selection and Classification of Neurodegenerative Disease Drugs for Pharmacological Modeling.

    PubMed

    Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin

    2013-03-01

    Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Webb-Robertson, Bobbie-Jo M.

    Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptidemore » identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample« less

  12. Support vector machine for the diagnosis of malignant mesothelioma

    NASA Astrophysics Data System (ADS)

    Ushasukhanya, S.; Nithyakalyani, A.; Sivakumar, V.

    2018-04-01

    Harmful mesothelioma is an illness in which threatening (malignancy) cells shape in the covering of the trunk or stomach area. Being presented to asbestos can influence the danger of threatening mesothelioma. Signs and side effects of threatening mesothelioma incorporate shortness of breath and agony under the rib confine. Tests that inspect within the trunk and belly are utilized to recognize (find) and analyse harmful mesothelioma. Certain elements influence forecast (shot of recuperation) and treatment choices. In this review, Support vector machine (SVM) classifiers were utilized for Mesothelioma sickness conclusion. SVM output is contrasted by concentrating on Mesothelioma’s sickness and findings by utilizing similar information set. The support vector machine algorithm gives 92.5% precision acquired by means of 3-overlap cross-approval. The Mesothelioma illness dataset were taken from an organization reports from Turkey.

  13. Discrimination of soft tissues using laser-induced breakdown spectroscopy in combination with k nearest neighbors (kNN) and support vector machine (SVM) classifiers

    NASA Astrophysics Data System (ADS)

    Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying

    2018-06-01

    In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.

  14. Classification of EEG Signals Based on Pattern Recognition Approach.

    PubMed

    Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed

    2017-01-01

    Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a "pattern recognition" approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90-7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11-89.63% and 91.60-81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy.

  15. Classification of EEG Signals Based on Pattern Recognition Approach

    PubMed Central

    Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed

    2017-01-01

    Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a “pattern recognition” approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90–7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11–89.63% and 91.60–81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy. PMID:29209190

  16. Individualized prediction of illness course at the first psychotic episode: a support vector machine MRI study.

    PubMed

    Mourao-Miranda, J; Reinders, A A T S; Rocha-Rego, V; Lappin, J; Rondina, J; Morgan, C; Morgan, K D; Fearon, P; Jones, P B; Doody, G A; Murray, R M; Kapur, S; Dazzan, P

    2012-05-01

    To date, magnetic resonance imaging (MRI) has made little impact on the diagnosis and monitoring of psychoses in individual patients. In this study, we used a support vector machine (SVM) whole-brain classification approach to predict future illness course at the individual level from MRI data obtained at the first psychotic episode. One hundred patients at their first psychotic episode and 91 healthy controls had an MRI scan. Patients were re-evaluated 6.2 years (s.d.=2.3) later, and were classified as having a continuous, episodic or intermediate illness course. Twenty-eight subjects with a continuous course were compared with 28 patients with an episodic course and with 28 healthy controls. We trained each SVM classifier independently for the following contrasts: continuous versus episodic, continuous versus healthy controls, and episodic versus healthy controls. At baseline, patients with a continuous course were already distinguishable, with significance above chance level, from both patients with an episodic course (p=0.004, sensitivity=71, specificity=68) and healthy individuals (p=0.01, sensitivity=71, specificity=61). Patients with an episodic course could not be distinguished from healthy individuals. When patients with an intermediate outcome were classified according to the discriminating pattern episodic versus continuous, 74% of those who did not develop other episodes were classified as episodic, and 65% of those who did develop further episodes were classified as continuous (p=0.035). We provide preliminary evidence of MRI application in the individualized prediction of future illness course, using a simple and automated SVM pipeline. When replicated and validated in larger groups, this could enable targeted clinical decisions based on imaging data.

  17. Individualized prediction of illness course at the first psychotic episode: a support vector machine MRI study

    PubMed Central

    Mourao-Miranda, J.; Reinders, A. A. T. S.; Rocha-Rego, V.; Lappin, J.; Rondina, J.; Morgan, C.; Morgan, K. D.; Fearon, P.; Jones, P. B.; Doody, G. A.; Murray, R. M.; Kapur, S.; Dazzan, P.

    2012-01-01

    Background To date, magnetic resonance imaging (MRI) has made little impact on the diagnosis and monitoring of psychoses in individual patients. In this study, we used a support vector machine (SVM) whole-brain classification approach to predict future illness course at the individual level from MRI data obtained at the first psychotic episode. Method One hundred patients at their first psychotic episode and 91 healthy controls had an MRI scan. Patients were re-evaluated 6.2 years (s.d.=2.3) later, and were classified as having a continuous, episodic or intermediate illness course. Twenty-eight subjects with a continuous course were compared with 28 patients with an episodic course and with 28 healthy controls. We trained each SVM classifier independently for the following contrasts: continuous versus episodic, continuous versus healthy controls, and episodic versus healthy controls. Results At baseline, patients with a continuous course were already distinguishable, with significance above chance level, from both patients with an episodic course (p=0.004, sensitivity=71, specificity=68) and healthy individuals (p=0.01, sensitivity=71, specificity=61). Patients with an episodic course could not be distinguished from healthy individuals. When patients with an intermediate outcome were classified according to the discriminating pattern episodic versus continuous, 74% of those who did not develop other episodes were classified as episodic, and 65% of those who did develop further episodes were classified as continuous (p=0.035). Conclusions We provide preliminary evidence of MRI application in the individualized prediction of future illness course, using a simple and automated SVM pipeline. When replicated and validated in larger groups, this could enable targeted clinical decisions based on imaging data. PMID:22059690

  18. Combination of minimum enclosing balls classifier with SVM in coal-rock recognition.

    PubMed

    Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan

    2017-01-01

    Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition.

  19. Combination of minimum enclosing balls classifier with SVM in coal-rock recognition

    PubMed Central

    Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan

    2017-01-01

    Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition. PMID:28937987

  20. Recognizing ovarian cancer from co-registered ultrasound and photoacoustic images

    NASA Astrophysics Data System (ADS)

    Alqasemi, Umar; Kumavor, Patrick; Aguirre, Andres; Zhu, Quing

    2013-03-01

    Unique features in co-registered ultrasound and photoacoustic images of ex vivo ovarian tissue are introduced, along with the hypotheses of how these features may relate to the physiology of tumors. The images are compressed with wavelet transform, after which the mean Radon transform of the photoacoustic image is computed and fitted with a Gaussian function to find the centroid of the suspicious area for shift-invariant recognition process. In the next step, 24 features are extracted from a training set of images by several methods; including features from the Fourier domain, image statistics, and the outputs of different composite filters constructed from the joint frequency response of different cancerous images. The features were chosen from more than 400 training images obtained from 33 ex vivo ovaries of 24 patients, and used to train a support vector machine (SVM) structure. The SVM classifier was able to exclusively separate the cancerous from the non-cancerous cases with 100% sensitivity and specificity. At the end, the classifier was used to test 95 new images, obtained from 37 ovaries of 20 additional patients. The SVM classifier achieved 76.92% sensitivity and 95.12% specificity. Furthermore, if we assume that recognizing one image as a cancerous case is sufficient to consider the ovary as malignant, then the SVM classifier achieves 100% sensitivity and 87.88% specificity.

  1. Predicting conversion from MCI to AD using resting-state fMRI, graph theoretical approach and SVM.

    PubMed

    Hojjati, Seyed Hani; Ebrahimzadeh, Ata; Khazaee, Ali; Babajani-Feremi, Abbas

    2017-04-15

    We investigated identifying patients with mild cognitive impairment (MCI) who progress to Alzheimer's disease (AD), MCI converter (MCI-C), from those with MCI who do not progress to AD, MCI non-converter (MCI-NC), based on resting-state fMRI (rs-fMRI). Graph theory and machine learning approach were utilized to predict progress of patients with MCI to AD using rs-fMRI. Eighteen MCI converts (average age 73.6 years; 11 male) and 62 age-matched MCI non-converters (average age 73.0 years, 28 male) were included in this study. We trained and tested a support vector machine (SVM) to classify MCI-C from MCI-NC using features constructed based on the local and global graph measures. A novel feature selection algorithm was developed and utilized to select an optimal subset of features. Using subset of optimal features in SVM, we classified MCI-C from MCI-NC with an accuracy, sensitivity, specificity, and the area under the receiver operating characteristic (ROC) curve of 91.4%, 83.24%, 90.1%, and 0.95, respectively. Furthermore, results of our statistical analyses were used to identify the affected brain regions in AD. To the best of our knowledge, this is the first study that combines the graph measures (constructed based on rs-fMRI) with machine learning approach and accurately classify MCI-C from MCI-NC. Results of this study demonstrate potential of the proposed approach for early AD diagnosis and demonstrate capability of rs-fMRI to predict conversion from MCI to AD by identifying affected brain regions underlying this conversion. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Accuracy of dementia diagnosis: a direct comparison between radiologists and a computerized method.

    PubMed

    Klöppel, Stefan; Stonnington, Cynthia M; Barnes, Josephine; Chen, Frederick; Chu, Carlton; Good, Catriona D; Mader, Irina; Mitchell, L Anne; Patel, Ameet C; Roberts, Catherine C; Fox, Nick C; Jack, Clifford R; Ashburner, John; Frackowiak, Richard S J

    2008-11-01

    There has been recent interest in the application of machine learning techniques to neuroimaging-based diagnosis. These methods promise fully automated, standard PC-based clinical decisions, unbiased by variable radiological expertise. We recently used support vector machines (SVMs) to separate sporadic Alzheimer's disease from normal ageing and from fronto-temporal lobar degeneration (FTLD). In this study, we compare the results to those obtained by radiologists. A binary diagnostic classification was made by six radiologists with different levels of experience on the same scans and information that had been previously analysed with SVM. SVMs correctly classified 95% (sensitivity/specificity: 95/95) of sporadic Alzheimer's disease and controls into their respective groups. Radiologists correctly classified 65-95% (median 89%; sensitivity/specificity: 88/90) of scans. SVM correctly classified another set of sporadic Alzheimer's disease in 93% (sensitivity/specificity: 100/86) of cases, whereas radiologists ranged between 80% and 90% (median 83%; sensitivity/specificity: 80/85). SVMs were better at separating patients with sporadic Alzheimer's disease from those with FTLD (SVM 89%; sensitivity/specificity: 83/95; compared to radiological range from 63% to 83%; median 71%; sensitivity/specificity: 64/76). Radiologists were always accurate when they reported a high degree of diagnostic confidence. The results show that well-trained neuroradiologists classify typical Alzheimer's disease-associated scans comparable to SVMs. However, SVMs require no expert knowledge and trained SVMs can readily be exchanged between centres for use in diagnostic classification. These results are encouraging and indicate a role for computerized diagnostic methods in clinical practice.

  3. Accuracy of dementia diagnosis—a direct comparison between radiologists and a computerized method

    PubMed Central

    Stonnington, Cynthia M.; Barnes, Josephine; Chen, Frederick; Chu, Carlton; Good, Catriona D.; Mader, Irina; Mitchell, L. Anne; Patel, Ameet C.; Roberts, Catherine C.; Fox, Nick C.; Jack, Clifford R.; Ashburner, John; Frackowiak, Richard S. J.

    2008-01-01

    There has been recent interest in the application of machine learning techniques to neuroimaging-based diagnosis. These methods promise fully automated, standard PC-based clinical decisions, unbiased by variable radiological expertise. We recently used support vector machines (SVMs) to separate sporadic Alzheimer's disease from normal ageing and from fronto-temporal lobar degeneration (FTLD). In this study, we compare the results to those obtained by radiologists. A binary diagnostic classification was made by six radiologists with different levels of experience on the same scans and information that had been previously analysed with SVM. SVMs correctly classified 95% (sensitivity/specificity: 95/95) of sporadic Alzheimer's disease and controls into their respective groups. Radiologists correctly classified 65–95% (median 89%; sensitivity/specificity: 88/90) of scans. SVM correctly classified another set of sporadic Alzheimer's disease in 93% (sensitivity/specificity: 100/86) of cases, whereas radiologists ranged between 80% and 90% (median 83%; sensitivity/specificity: 80/85). SVMs were better at separating patients with sporadic Alzheimer's disease from those with FTLD (SVM 89%; sensitivity/specificity: 83/95; compared to radiological range from 63% to 83%; median 71%; sensitivity/specificity: 64/76). Radiologists were always accurate when they reported a high degree of diagnostic confidence. The results show that well-trained neuroradiologists classify typical Alzheimer's disease-associated scans comparable to SVMs. However, SVMs require no expert knowledge and trained SVMs can readily be exchanged between centres for use in diagnostic classification. These results are encouraging and indicate a role for computerized diagnostic methods in clinical practice. PMID:18835868

  4. CCD-Based Skinning Injury Recognition on Potato Tubers (Solanum tuberosum L.): A Comparison between Visible and Biospeckle Imaging

    PubMed Central

    Gao, Yingwang; Geng, Jinfeng; Rao, Xiuqin; Ying, Yibin

    2016-01-01

    Skinning injury on potato tubers is a kind of superficial wound that is generally inflicted by mechanical forces during harvest and postharvest handling operations. Though skinning injury is pervasive and obstructive, its detection is very limited. This study attempted to identify injured skin using two CCD (Charge Coupled Device) sensor-based machine vision technologies, i.e., visible imaging and biospeckle imaging. The identification of skinning injury was realized via exploiting features extracted from varied ROIs (Region of Interests). The features extracted from visible images were pixel-wise color and texture features, while region-wise BA (Biospeckle Activity) was calculated from biospeckle imaging. In addition, the calculation of BA using varied numbers of speckle patterns were compared. Finally, extracted features were implemented into classifiers of LS-SVM (Least Square Support Vector Machine) and BLR (Binary Logistic Regression), respectively. Results showed that color features performed better than texture features in classifying sound skin and injured skin, especially for injured skin stored no less than 1 day, with the average classification accuracy of 90%. Image capturing and processing efficiency can be speeded up in biospeckle imaging, with captured 512 frames reduced to 125 frames. Classification results obtained based on the feature of BA were acceptable for early skinning injury stored within 1 day, with the accuracy of 88.10%. It is concluded that skinning injury can be recognized by visible and biospeckle imaging during different stages. Visible imaging has the aptitude in recognizing stale skinning injury, while fresh injury can be discriminated by biospeckle imaging. PMID:27763555

  5. CCD-Based Skinning Injury Recognition on Potato Tubers (Solanum tuberosum L.): A Comparison between Visible and Biospeckle Imaging.

    PubMed

    Gao, Yingwang; Geng, Jinfeng; Rao, Xiuqin; Ying, Yibin

    2016-10-18

    Skinning injury on potato tubers is a kind of superficial wound that is generally inflicted by mechanical forces during harvest and postharvest handling operations. Though skinning injury is pervasive and obstructive, its detection is very limited. This study attempted to identify injured skin using two CCD (Charge Coupled Device) sensor-based machine vision technologies, i.e., visible imaging and biospeckle imaging. The identification of skinning injury was realized via exploiting features extracted from varied ROIs (Region of Interests). The features extracted from visible images were pixel-wise color and texture features, while region-wise BA (Biospeckle Activity) was calculated from biospeckle imaging. In addition, the calculation of BA using varied numbers of speckle patterns were compared. Finally, extracted features were implemented into classifiers of LS-SVM (Least Square Support Vector Machine) and BLR (Binary Logistic Regression), respectively. Results showed that color features performed better than texture features in classifying sound skin and injured skin, especially for injured skin stored no less than 1 day, with the average classification accuracy of 90%. Image capturing and processing efficiency can be speeded up in biospeckle imaging, with captured 512 frames reduced to 125 frames. Classification results obtained based on the feature of BA were acceptable for early skinning injury stored within 1 day, with the accuracy of 88.10%. It is concluded that skinning injury can be recognized by visible and biospeckle imaging during different stages. Visible imaging has the aptitude in recognizing stale skinning injury, while fresh injury can be discriminated by biospeckle imaging.

  6. [Application of characteristic NIR variables selection in portable detection of soluble solids content of apple by near infrared spectroscopy].

    PubMed

    Fan, Shu-Xiang; Huang, Wen-Qian; Li, Jiang-Bo; Guo, Zhi-Ming; Zhaq, Chun-Jiang

    2014-10-01

    In order to detect the soluble solids content(SSC)of apple conveniently and rapidly, a ring fiber probe and a portable spectrometer were applied to obtain the spectroscopy of apple. Different wavelength variable selection methods, including unin- formative variable elimination (UVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm (GA) were pro- posed to select effective wavelength variables of the NIR spectroscopy of the SSC in apple based on PLS. The back interval LS- SVM (BiLS-SVM) and GA were used to select effective wavelength variables based on LS-SVM. Selected wavelength variables and full wavelength range were set as input variables of PLS model and LS-SVM model, respectively. The results indicated that PLS model built using GA-CARS on 50 characteristic variables selected from full-spectrum which had 1512 wavelengths achieved the optimal performance. The correlation coefficient (Rp) and root mean square error of prediction (RMSEP) for prediction sets were 0.962, 0.403°Brix respectively for SSC. The proposed method of GA-CARS could effectively simplify the portable detection model of SSC in apple based on near infrared spectroscopy and enhance the predictive precision. The study can provide a reference for the development of portable apple soluble solids content spectrometer.

  7. Combined data mining/NIR spectroscopy for purity assessment of lime juice

    NASA Astrophysics Data System (ADS)

    Shafiee, Sahameh; Minaei, Saeid

    2018-06-01

    This paper reports the data mining study on the NIR spectrum of lime juice samples to determine their purity (natural or synthetic). NIR spectra for 72 pure and synthetic lime juice samples were recorded in reflectance mode. Sample outliers were removed using PCA analysis. Different data mining techniques for feature selection (Genetic Algorithm (GA)) and classification (including the radial basis function (RBF) network, Support Vector Machine (SVM), and Random Forest (RF) tree) were employed. Based on the results, SVM proved to be the most accurate classifier as it achieved the highest accuracy (97%) using the raw spectrum information. The classifier accuracy dropped to 93% when selected feature vector by GA search method was applied as classifier input. It can be concluded that some relevant features which produce good performance with the SVM classifier are removed by feature selection. Also, reduced spectra using PCA do not show acceptable performance (total accuracy of 66% by RBFNN), which indicates that dimensional reduction methods such as PCA do not always lead to more accurate results. These findings demonstrate the potential of data mining combination with near-infrared spectroscopy for monitoring lime juice quality in terms of natural or synthetic nature.

  8. Identification and classification of similar looking food grains

    NASA Astrophysics Data System (ADS)

    Anami, B. S.; Biradar, Sunanda D.; Savakar, D. G.; Kulkarni, P. V.

    2013-01-01

    This paper describes the comparative study of Artificial Neural Network (ANN) and Support Vector Machine (SVM) classifiers by taking a case study of identification and classification of four pairs of similar looking food grains namely, Finger Millet, Mustard, Soyabean, Pigeon Pea, Aniseed, Cumin-seeds, Split Greengram and Split Blackgram. Algorithms are developed to acquire and process color images of these grains samples. The developed algorithms are used to extract 18 colors-Hue Saturation Value (HSV), and 42 wavelet based texture features. Back Propagation Neural Network (BPNN)-based classifier is designed using three feature sets namely color - HSV, wavelet-texture and their combined model. SVM model for color- HSV model is designed for the same set of samples. The classification accuracies ranging from 93% to 96% for color-HSV, ranging from 78% to 94% for wavelet texture model and from 92% to 97% for combined model are obtained for ANN based models. The classification accuracy ranging from 80% to 90% is obtained for color-HSV based SVM model. Training time required for the SVM based model is substantially lesser than ANN for the same set of images.

  9. Machine learning-based assessment tool for imbalance and vestibular dysfunction with virtual reality rehabilitation system.

    PubMed

    Yeh, Shih-Ching; Huang, Ming-Chun; Wang, Pa-Chun; Fang, Te-Yung; Su, Mu-Chun; Tsai, Po-Yi; Rizzo, Albert

    2014-10-01

    Dizziness is a major consequence of imbalance and vestibular dysfunction. Compared to surgery and drug treatments, balance training is non-invasive and more desired. However, training exercises are usually tedious and the assessment tool is insufficient to diagnose patient's severity rapidly. An interactive virtual reality (VR) game-based rehabilitation program that adopted Cawthorne-Cooksey exercises, and a sensor-based measuring system were introduced. To verify the therapeutic effect, a clinical experiment with 48 patients and 36 normal subjects was conducted. Quantified balance indices were measured and analyzed by statistical tools and a Support Vector Machine (SVM) classifier. In terms of balance indices, patients who completed the training process are progressed and the difference between normal subjects and patients is obvious. Further analysis by SVM classifier show that the accuracy of recognizing the differences between patients and normal subject is feasible, and these results can be used to evaluate patients' severity and make rapid assessment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  10. Electroencephalogram Signal Classification for Automated Epileptic Seizure Detection Using Genetic Algorithm

    PubMed Central

    Nanthini, B. Suguna; Santhi, B.

    2017-01-01

    Background: Epilepsy causes when the repeated seizure occurs in the brain. Electroencephalogram (EEG) test provides valuable information about the brain functions and can be useful to detect brain disorder, especially for epilepsy. In this study, application for an automated seizure detection model has been introduced successfully. Materials and Methods: The EEG signals are decomposed into sub-bands by discrete wavelet transform using db2 (daubechies) wavelet. The eight statistical features, the four gray level co-occurrence matrix and Renyi entropy estimation with four different degrees of order, are extracted from the raw EEG and its sub-bands. Genetic algorithm (GA) is used to select eight relevant features from the 16 dimension features. The model has been trained and tested using support vector machine (SVM) classifier successfully for EEG signals. The performance of the SVM classifier is evaluated for two different databases. Results: The study has been experimented through two different analyses and achieved satisfactory performance for automated seizure detection using relevant features as the input to the SVM classifier. Conclusion: Relevant features using GA give better accuracy performance for seizure detection. PMID:28781480

  11. On the use of feature selection to improve the detection of sea oil spills in SAR images

    NASA Astrophysics Data System (ADS)

    Mera, David; Bolon-Canedo, Veronica; Cotos, J. M.; Alonso-Betanzos, Amparo

    2017-03-01

    Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each detected dark spot is independently examined. Traditionally, detection systems use an arbitrary set of features to discriminate between oil spills and look-alikes phenomena. However, Feature Selection (FS) methods based on Machine Learning (ML) have proved to be very useful in real domains for enhancing the generalization capabilities of the classifiers, while discarding the existing irrelevant features. In this work, we present a generic and systematic approach, based on FS methods, for choosing a concise and relevant set of features to improve the oil spill detection systems. We have compared five FS methods: Correlation-based feature selection (CFS), Consistency-based filter, Information Gain, ReliefF and Recursive Feature Elimination for Support Vector Machine (SVM-RFE). They were applied on a 141-input vector composed of features from a collection of outstanding studies. Selected features were validated via a Support Vector Machine (SVM) classifier and the results were compared with previous works. Test experiments revealed that the classifier trained with the 6-input feature vector proposed by SVM-RFE achieved the best accuracy and Cohen's kappa coefficient (87.1% and 74.06% respectively). This is a smaller feature combination with similar or even better classification accuracy than previous works. The presented finding allows to speed up the feature extraction phase without reducing the classifier accuracy. Experiments also confirmed the significance of the geometrical features since 75.0% of the different features selected by the applied FS methods as well as 66.67% of the proposed 6-input feature vector belong to this category.

  12. N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit.

    PubMed

    Marafino, Ben J; Davies, Jason M; Bardach, Naomi S; Dean, Mitzi L; Dudley, R Adams

    2014-01-01

    Existing risk adjustment models for intensive care unit (ICU) outcomes rely on manual abstraction of patient-level predictors from medical charts. Developing an automated method for abstracting these data from free text might reduce cost and data collection times. To develop a support vector machine (SVM) classifier capable of identifying a range of procedures and diagnoses in ICU clinical notes for use in risk adjustment. We selected notes from 2001-2008 for 4191 neonatal ICU (NICU) and 2198 adult ICU patients from the MIMIC-II database from the Beth Israel Deaconess Medical Center. Using these notes, we developed an implementation of the SVM classifier to identify procedures (mechanical ventilation and phototherapy in NICU notes) and diagnoses (jaundice in NICU and intracranial hemorrhage (ICH) in adult ICU). On the jaundice classification task, we also compared classifier performance using n-gram features to unigrams with application of a negation algorithm (NegEx). Our classifier accurately identified mechanical ventilation (accuracy=0.982, F1=0.954) and phototherapy use (accuracy=0.940, F1=0.912), as well as jaundice (accuracy=0.898, F1=0.884) and ICH diagnoses (accuracy=0.938, F1=0.943). Including bigram features improved performance on the jaundice (accuracy=0.898 vs 0.865) and ICH (0.938 vs 0.927) tasks, and outperformed NegEx-derived unigram features (accuracy=0.898 vs 0.863) on the jaundice task. Overall, a classifier using n-gram support vectors displayed excellent performance characteristics. The classifier generalizes to diverse patient populations, diagnoses, and procedures. SVM-based classifiers can accurately identify procedure status and diagnoses among ICU patients, and including n-gram features improves performance, compared to existing methods. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  13. Identification of Alzheimer's disease and mild cognitive impairment using multimodal sparse hierarchical extreme learning machine.

    PubMed

    Kim, Jongin; Lee, Boreom

    2018-05-07

    Different modalities such as structural MRI, FDG-PET, and CSF have complementary information, which is likely to be very useful for diagnosis of AD and MCI. Therefore, it is possible to develop a more effective and accurate AD/MCI automatic diagnosis method by integrating complementary information of different modalities. In this paper, we propose multi-modal sparse hierarchical extreme leaning machine (MSH-ELM). We used volume and mean intensity extracted from 93 regions of interest (ROIs) as features of MRI and FDG-PET, respectively, and used p-tau, t-tau, and Aβ42 as CSF features. In detail, high-level representation was individually extracted from each of MRI, FDG-PET, and CSF using a stacked sparse extreme learning machine auto-encoder (sELM-AE). Then, another stacked sELM-AE was devised to acquire a joint hierarchical feature representation by fusing the high-level representations obtained from each modality. Finally, we classified joint hierarchical feature representation using a kernel-based extreme learning machine (KELM). The results of MSH-ELM were compared with those of conventional ELM, single kernel support vector machine (SK-SVM), multiple kernel support vector machine (MK-SVM) and stacked auto-encoder (SAE). Performance was evaluated through 10-fold cross-validation. In the classification of AD vs. HC and MCI vs. HC problem, the proposed MSH-ELM method showed mean balanced accuracies of 96.10% and 86.46%, respectively, which is much better than those of competing methods. In summary, the proposed algorithm exhibits consistently better performance than SK-SVM, ELM, MK-SVM and SAE in the two binary classification problems (AD vs. HC and MCI vs. HC). © 2018 Wiley Periodicals, Inc.

  14. Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

    PubMed

    Tuo, Youlin; An, Ning; Zhang, Ming

    2018-03-01

    The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.

  15. Classifier transfer with data selection strategies for online support vector machine classification with class imbalance

    NASA Astrophysics Data System (ADS)

    Krell, Mario Michael; Wilshusen, Nils; Seeland, Anett; Kim, Su Kyoung

    2017-04-01

    Objective. Classifier transfers usually come with dataset shifts. To overcome dataset shifts in practical applications, we consider the limitations in computational resources in this paper for the adaptation of batch learning algorithms, like the support vector machine (SVM). Approach. We focus on data selection strategies which limit the size of the stored training data by different inclusion, exclusion, and further dataset manipulation criteria like handling class imbalance with two new approaches. We provide a comparison of the strategies with linear SVMs on several synthetic datasets with different data shifts as well as on different transfer settings with electroencephalographic (EEG) data. Main results. For the synthetic data, adding only misclassified samples performed astoundingly well. Here, balancing criteria were very important when the other criteria were not well chosen. For the transfer setups, the results show that the best strategy depends on the intensity of the drift during the transfer. Adding all and removing the oldest samples results in the best performance, whereas for smaller drifts, it can be sufficient to only add samples near the decision boundary of the SVM which reduces processing resources. Significance. For brain-computer interfaces based on EEG data, models trained on data from a calibration session, a previous recording session, or even from a recording session with another subject are used. We show, that by using the right combination of data selection criteria, it is possible to adapt the SVM classifier to overcome the performance drop from the transfer.

  16. Accuracy of land use change detection using support vector machine and maximum likelihood techniques for open-cast coal mining areas.

    PubMed

    Karan, Shivesh Kishore; Samadder, Sukha Ranjan

    2016-08-01

    One objective of the present study was to evaluate the performance of support vector machine (SVM)-based image classification technique with the maximum likelihood classification (MLC) technique for a rapidly changing landscape of an open-cast mine. The other objective was to assess the change in land use pattern due to coal mining from 2006 to 2016. Assessing the change in land use pattern accurately is important for the development and monitoring of coalfields in conjunction with sustainable development. For the present study, Landsat 5 Thematic Mapper (TM) data of 2006 and Landsat 8 Operational Land Imager (OLI)/Thermal Infrared Sensor (TIRS) data of 2016 of a part of Jharia Coalfield, Dhanbad, India, were used. The SVM classification technique provided greater overall classification accuracy when compared to the MLC technique in classifying heterogeneous landscape with limited training dataset. SVM exceeded MLC in handling a difficult challenge of classifying features having near similar reflectance on the mean signature plot, an improvement of over 11 % was observed in classification of built-up area, and an improvement of 24 % was observed in classification of surface water using SVM; similarly, the SVM technique improved the overall land use classification accuracy by almost 6 and 3 % for Landsat 5 and Landsat 8 images, respectively. Results indicated that land degradation increased significantly from 2006 to 2016 in the study area. This study will help in quantifying the changes and can also serve as a basis for further decision support system studies aiding a variety of purposes such as planning and management of mines and environmental impact assessment.

  17. A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

    PubMed

    Chen, Zhenyu; Li, Jianping; Wei, Liwei

    2007-10-01

    Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.

  18. Complexity and spectral analysis of the heart rate variability dynamics for distant prediction of paroxysmal atrial fibrillation with artificial intelligence methods.

    PubMed

    Chesnokov, Yuriy V

    2008-06-01

    Paroxysmal atrial fibrillation (PAF) is a serious arrhythmia associated with morbidity and mortality. We explore the possibility of distant prediction of PAF by analyzing changes in heart rate variability (HRV) dynamics of non-PAF rhythms immediately before PAF event. We use that model for distant prognosis of PAF onset with artificial intelligence methods. We analyzed 30-min non-PAF HRV records from 51 subjects immediately before PAF onset and at least 45min distant from any PAF event. We used spectral and complexity analysis with sample (SmEn) and approximate (ApEn) entropies and their multiscale versions on extracted HRV data. We used that features to train the artificial neural networks (ANNs) and support vector machine (SVM) classifiers to differentiate the subjects. The trained classifiers were further tested for distant PAF event prognosis on 16 subjects from independent database on non-PAF rhythm lasting from 60 to 320 min before PAF onset classifying the 30-min segments as distant or leading to PAF. We found statistically significant increase in 30-min non-PAF HRV recordings from 51 subjects in the VLF, LF, HF bands and total power (p<0.0001) before PAF event compared to PAF distant ones. The SmEn and ApEn analysis provided significant decrease in complexity (p<0.0001 and p<0.001) before PAF onset. For training ANN and SVM classifiers the data from 51 subjects were randomly split to training, validation and testing. ANN provided better results in terms of sensitivity (Se), specificity (Sp) and positive predictivity (Pp) compared to SVM which became biased towards positive case. The validation results of the ANN classifier we achieved: Se 76%, Sp 93%, Pp 94%. Testing ANN and SVM classifiers on 16 subjects with non-PAF HRV data preceding PAF events we obtained distant prediction of PAF onset with SVM classifier in 10 subjects (58+/-18 min in advance). ANN classifier provided distant prediction of PAF event in 13 subjects (62+/-21 min in advance). From the results of distant PAF prediction we conclude that ANN and SVM classifiers learned the changes in the HRV dynamics immediately before PAF event and successfully identified them during distant PAF prognosis on independent database. This confirms the reported in the literature results that corresponding changes in the HRV data occur about 60 min before PAF onset and proves the possibility of distant PAF prediction with ANN and SVM methods.

  19. Accelerometer and Camera-Based Strategy for Improved Human Fall Detection.

    PubMed

    Zerrouki, Nabil; Harrou, Fouzi; Sun, Ying; Houacine, Amrane

    2016-12-01

    In this paper, we address the problem of detecting human falls using anomaly detection. Detection and classification of falls are based on accelerometric data and variations in human silhouette shape. First, we use the exponentially weighted moving average (EWMA) monitoring scheme to detect a potential fall in the accelerometric data. We used an EWMA to identify features that correspond with a particular type of fall allowing us to classify falls. Only features corresponding with detected falls were used in the classification phase. A benefit of using a subset of the original data to design classification models minimizes training time and simplifies models. Based on features corresponding to detected falls, we used the support vector machine (SVM) algorithm to distinguish between true falls and fall-like events. We apply this strategy to the publicly available fall detection databases from the university of Rzeszow's. Results indicated that our strategy accurately detected and classified fall events, suggesting its potential application to early alert mechanisms in the event of fall situations and its capability for classification of detected falls. Comparison of the classification results using the EWMA-based SVM classifier method with those achieved using three commonly used machine learning classifiers, neural network, K-nearest neighbor and naïve Bayes, proved our model superior.

  20. Trajectory Correction and Locomotion Analysis of a Hexapod Walking Robot with Semi-Round Rigid Feet

    PubMed Central

    Zhu, Yaguang; Jin, Bo; Wu, Yongsheng; Guo, Tong; Zhao, Xiangmo

    2016-01-01

    Aimed at solving the misplaced body trajectory problem caused by the rolling of semi-round rigid feet when a robot is walking, a legged kinematic trajectory correction methodology based on the Least Squares Support Vector Machine (LS-SVM) is proposed. The concept of ideal foothold is put forward for the three-dimensional kinematic model modification of a robot leg, and the deviation value between the ideal foothold and real foothold is analyzed. The forward/inverse kinematic solutions between the ideal foothold and joint angular vectors are formulated and the problem of direct/inverse kinematic nonlinear mapping is solved by using the LS-SVM. Compared with the previous approximation method, this correction methodology has better accuracy and faster calculation speed with regards to inverse kinematics solutions. Experiments on a leg platform and a hexapod walking robot are conducted with multi-sensors for the analysis of foot tip trajectory, base joint vibration, contact force impact, direction deviation, and power consumption, respectively. The comparative analysis shows that the trajectory correction methodology can effectively correct the joint trajectory, thus eliminating the contact force influence of semi-round rigid feet, significantly improving the locomotion of the walking robot and reducing the total power consumption of the system. PMID:27589766

  1. Application of Hyperspectral Imaging and Chemometric Calibrations for Variety Discrimination of Maize Seeds

    PubMed Central

    Zhang, Xiaolei; Liu, Fei; He, Yong; Li, Xiaoli

    2012-01-01

    Hyperspectral imaging in the visible and near infrared (VIS-NIR) region was used to develop a novel method for discriminating different varieties of commodity maize seeds. Firstly, hyperspectral images of 330 samples of six varieties of maize seeds were acquired using a hyperspectral imaging system in the 380–1,030 nm wavelength range. Secondly, principal component analysis (PCA) and kernel principal component analysis (KPCA) were used to explore the internal structure of the spectral data. Thirdly, three optimal wavelengths (523, 579 and 863 nm) were selected by implementing PCA directly on each image. Then four textural variables including contrast, homogeneity, energy and correlation were extracted from gray level co-occurrence matrix (GLCM) of each monochromatic image based on the optimal wavelengths. Finally, several models for maize seeds identification were established by least squares-support vector machine (LS-SVM) and back propagation neural network (BPNN) using four different combinations of principal components (PCs), kernel principal components (KPCs) and textural features as input variables, respectively. The recognition accuracy achieved in the PCA-GLCM-LS-SVM model (98.89%) was the most satisfactory one. We conclude that hyperspectral imaging combined with texture analysis can be implemented for fast classification of different varieties of maize seeds. PMID:23235456

  2. Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.

    PubMed

    Sakr, Sherif; Elshawi, Radwa; Ahmed, Amjad M; Qureshi, Waqas T; Brawner, Clinton A; Keteyian, Steven J; Blaha, Michael J; Al-Mallah, Mouaz H

    2017-12-19

    Prior studies have demonstrated that cardiorespiratory fitness (CRF) is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of this study is to present an evaluation and comparison of how machine learning techniques can be applied on medical records of cardiorespiratory fitness and how the various techniques differ in terms of capabilities of predicting medical outcomes (e.g. mortality). We use data of 34,212 patients free of known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems Between 1991 and 2009 and had a complete 10-year follow-up. Seven machine learning classification techniques were evaluated: Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayesian Classifier (BC), Bayesian Network (BN), K-Nearest Neighbor (KNN) and Random Forest (RF). In order to handle the imbalanced dataset used, the Synthetic Minority Over-Sampling Technique (SMOTE) is used. Two set of experiments have been conducted with and without the SMOTE sampling technique. On average over different evaluation metrics, SVM Classifier has shown the lowest performance while other models like BN, BC and DT performed better. The RF classifier has shown the best performance (AUC = 0.97) among all models trained using the SMOTE sampling. The results show that various ML techniques can significantly vary in terms of its performance for the different evaluation metrics. It is also not necessarily that the more complex the ML model, the more prediction accuracy can be achieved. The prediction performance of all models trained with SMOTE is much better than the performance of models trained without SMOTE. The study shows the potential of machine learning methods for predicting all-cause mortality using cardiorespiratory fitness data.

  3. Design of a hybrid model for cardiac arrhythmia classification based on Daubechies wavelet transform.

    PubMed

    Rajagopal, Rekha; Ranganathan, Vidhyapriya

    2018-06-05

    Automation in cardiac arrhythmia classification helps medical professionals make accurate decisions about the patient's health. The aim of this work was to design a hybrid classification model to classify cardiac arrhythmias. The design phase of the classification model comprises the following stages: preprocessing of the cardiac signal by eliminating detail coefficients that contain noise, feature extraction through Daubechies wavelet transform, and arrhythmia classification using a collaborative decision from the K nearest neighbor classifier (KNN) and a support vector machine (SVM). The proposed model is able to classify 5 arrhythmia classes as per the ANSI/AAMI EC57: 1998 classification standard. Level 1 of the proposed model involves classification using the KNN and the classifier is trained with examples from all classes. Level 2 involves classification using an SVM and is trained specifically to classify overlapped classes. The final classification of a test heartbeat pertaining to a particular class is done using the proposed KNN/SVM hybrid model. The experimental results demonstrated that the average sensitivity of the proposed model was 92.56%, the average specificity 99.35%, the average positive predictive value 98.13%, the average F-score 94.5%, and the average accuracy 99.78%. The results obtained using the proposed model were compared with the results of discriminant, tree, and KNN classifiers. The proposed model is able to achieve a high classification accuracy.

  4. Ischemia episode detection in ECG using kernel density estimation, support vector machine and feature selection

    PubMed Central

    2012-01-01

    Background Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in electrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophic disease. To this end, we propose a new method, which employs wavelets and simple feature selection. Methods For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in 90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for differentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalized and signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point. We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers to those features. Results We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods. Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively. The SVM classifier detects 355 ischemic ST episodes. Conclusions We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any numerical values of the parameters to be supplied in advance. In the case of the SVM classifier, one has to select a single parameter. PMID:22703641

  5. The formation method of the feature space for the identification of fatigued bills

    NASA Astrophysics Data System (ADS)

    Kang, Dongshik; Oshiro, Ayumu; Ozawa, Kenji; Mitsui, Ikugo

    2014-10-01

    Fatigued bills make a trouble such as the paper jam in a bill handling machine. In the discrimination of fatigued bills using an acoustic signal, the variation of an observed bill sound is considered to be one of causes in misclassification. Therefore a technique has demanded in order to make the classification of fatigued bills more efficient. In this paper, we proposed the algorithm that extracted feature quantity of bill sound from acoustic signal using the frequency difference, and carried out discrimination experiment of fatigued bill money by Support Vector Machine(SVM). The feature quantity of frequency difference can represent the frequency components of an acoustic signal is varied by the fatigued degree of bill money. The generalization performance of SVM does not depend on the size of dimensions of the feature space, even in a high dimensional feature space such as bill-acoustic signals. Furthermore, SVM can induce an optimal classifier which considers the combination of features by the virtue of polynomial kernel functions.

  6. Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization.

    PubMed

    Nishio, Mizuho; Nishizawa, Mitsuo; Sugiyama, Osamu; Kojima, Ryosuke; Yakami, Masahiro; Kuroda, Tomohiro; Togashi, Kaori

    2018-01-01

    We aimed to evaluate a computer-aided diagnosis (CADx) system for lung nodule classification focussing on (i) usefulness of the conventional CADx system (hand-crafted imaging feature + machine learning algorithm), (ii) comparison between support vector machine (SVM) and gradient tree boosting (XGBoost) as machine learning algorithms, and (iii) effectiveness of parameter optimization using Bayesian optimization and random search. Data on 99 lung nodules (62 lung cancers and 37 benign lung nodules) were included from public databases of CT images. A variant of the local binary pattern was used for calculating a feature vector. SVM or XGBoost was trained using the feature vector and its corresponding label. Tree Parzen Estimator (TPE) was used as Bayesian optimization for parameters of SVM and XGBoost. Random search was done for comparison with TPE. Leave-one-out cross-validation was used for optimizing and evaluating the performance of our CADx system. Performance was evaluated using area under the curve (AUC) of receiver operating characteristic analysis. AUC was calculated 10 times, and its average was obtained. The best averaged AUC of SVM and XGBoost was 0.850 and 0.896, respectively; both were obtained using TPE. XGBoost was generally superior to SVM. Optimal parameters for achieving high AUC were obtained with fewer numbers of trials when using TPE, compared with random search. Bayesian optimization of SVM and XGBoost parameters was more efficient than random search. Based on observer study, AUC values of two board-certified radiologists were 0.898 and 0.822. The results show that diagnostic accuracy of our CADx system was comparable to that of radiologists with respect to classifying lung nodules.

  7. Structural analysis of online handwritten mathematical symbols based on support vector machines

    NASA Astrophysics Data System (ADS)

    Simistira, Foteini; Papavassiliou, Vassilis; Katsouros, Vassilis; Carayannis, George

    2013-01-01

    Mathematical expression recognition is still a very challenging task for the research community mainly because of the two-dimensional (2d) structure of mathematical expressions (MEs). In this paper, we present a novel approach for the structural analysis between two on-line handwritten mathematical symbols of a ME, based on spatial features of the symbols. We introduce six features to represent the spatial affinity of the symbols and compare two multi-class classification methods that employ support vector machines (SVMs): one based on the "one-against-one" technique and one based on the "one-against-all", in identifying the relation between a pair of symbols (i.e. subscript, numerator, etc). A dataset containing 1906 spatial relations derived from the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) 2012 training dataset is constructed to evaluate the classifiers and compare them with the rule-based classifier of the ILSP-1 system participated in the contest. The experimental results give an overall mean error rate of 2.61% for the "one-against-one" SVM approach, 6.57% for the "one-against-all" SVM technique and 12.31% error rate for the ILSP-1 classifier.

  8. CARSVM: a class association rule-based classification framework and its application to gene expression data.

    PubMed

    Kianmehr, Keivan; Alhajj, Reda

    2008-09-01

    In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.

  9. Fault diagnosis of automobile hydraulic brake system using statistical features and support vector machines

    NASA Astrophysics Data System (ADS)

    Jegadeeshwaran, R.; Sugumaran, V.

    2015-02-01

    Hydraulic brakes in automobiles are important components for the safety of passengers; therefore, the brakes are a good subject for condition monitoring. The condition of the brake components can be monitored by using the vibration characteristics. On-line condition monitoring by using machine learning approach is proposed in this paper as a possible solution to such problems. The vibration signals for both good as well as faulty conditions of brakes were acquired from a hydraulic brake test setup with the help of a piezoelectric transducer and a data acquisition system. Descriptive statistical features were extracted from the acquired vibration signals and the feature selection was carried out using the C4.5 decision tree algorithm. There is no specific method to find the right number of features required for classification for a given problem. Hence an extensive study is needed to find the optimum number of features. The effect of the number of features was also studied, by using the decision tree as well as Support Vector Machines (SVM). The selected features were classified using the C-SVM and Nu-SVM with different kernel functions. The results are discussed and the conclusion of the study is presented.

  10. A Semisupervised Support Vector Machines Algorithm for BCI Systems

    PubMed Central

    Qin, Jianzhao; Li, Yuanqing; Sun, Wei

    2007-01-01

    As an emerging technology, brain-computer interfaces (BCIs) bring us new communication interfaces which translate brain activities into control signals for devices like computers, robots, and so forth. In this study, we propose a semisupervised support vector machine (SVM) algorithm for brain-computer interface (BCI) systems, aiming at reducing the time-consuming training process. In this algorithm, we apply a semisupervised SVM for translating the features extracted from the electrical recordings of brain into control signals. This SVM classifier is built from a small labeled data set and a large unlabeled data set. Meanwhile, to reduce the time for training semisupervised SVM, we propose a batch-mode incremental learning method, which can also be easily applied to the online BCI systems. Additionally, it is suggested in many studies that common spatial pattern (CSP) is very effective in discriminating two different brain states. However, CSP needs a sufficient labeled data set. In order to overcome the drawback of CSP, we suggest a two-stage feature extraction method for the semisupervised learning algorithm. We apply our algorithm to two BCI experimental data sets. The offline data analysis results demonstrate the effectiveness of our algorithm. PMID:18368141

  11. Machine learning-based methods for prediction of linear B-cell epitopes.

    PubMed

    Wang, Hsin-Wei; Pai, Tun-Wen

    2014-01-01

    B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.

  12. LS Bound based gene selection for DNA microarray data.

    PubMed

    Zhou, Xin; Mao, K Z

    2005-04-15

    One problem with discriminant analysis of DNA microarray data is that each sample is represented by quite a large number of genes, and many of them are irrelevant, insignificant or redundant to the discriminant problem at hand. Methods for selecting important genes are, therefore, of much significance in microarray data analysis. In the present study, a new criterion, called LS Bound measure, is proposed to address the gene selection problem. The LS Bound measure is derived from leave-one-out procedure of LS-SVMs (least squares support vector machines), and as the upper bound for leave-one-out classification results it reflects to some extent the generalization performance of gene subsets. We applied this LS Bound measure for gene selection on two benchmark microarray datasets: colon cancer and leukemia. We also compared the LS Bound measure with other evaluation criteria, including the well-known Fisher's ratio and Mahalanobis class separability measure, and other published gene selection algorithms, including Weighting factor and SVM Recursive Feature Elimination. The strength of the LS Bound measure is that it provides gene subsets leading to more accurate classification results than the filter method while its computational complexity is at the level of the filter method. A companion website can be accessed at http://www.ntu.edu.sg/home5/pg02776030/lsbound/. The website contains: (1) the source code of the gene selection algorithm; (2) the complete set of tables and figures regarding the experimental study; (3) proof of the inequality (9). ekzmao@ntu.edu.sg.

  13. Cancer survival classification using integrated data sets and intermediate information.

    PubMed

    Kim, Shinuk; Park, Taesung; Kon, Mark

    2014-09-01

    Although numerous studies related to cancer survival have been published, increasing the prediction accuracy of survival classes still remains a challenge. Integration of different data sets, such as microRNA (miRNA) and mRNA, might increase the accuracy of survival class prediction. Therefore, we suggested a machine learning (ML) approach to integrate different data sets, and developed a novel method based on feature selection with Cox proportional hazard regression model (FSCOX) to improve the prediction of cancer survival time. FSCOX provides us with intermediate survival information, which is usually discarded when separating survival into 2 groups (short- and long-term), and allows us to perform survival analysis. We used an ML-based protocol for feature selection, integrating information from miRNA and mRNA expression profiles at the feature level. To predict survival phenotypes, we used the following classifiers, first, existing ML methods, support vector machine (SVM) and random forest (RF), second, a new median-based classifier using FSCOX (FSCOX_median), and third, an SVM classifier using FSCOX (FSCOX_SVM). We compared these methods using 3 types of cancer tissue data sets: (i) miRNA expression, (ii) mRNA expression, and (iii) combined miRNA and mRNA expression. The latter data set included features selected either from the combined miRNA/mRNA profile or independently from miRNAs and mRNAs profiles (IFS). In the ovarian data set, the accuracy of survival classification using the combined miRNA/mRNA profiles with IFS was 75% using RF, 86.36% using SVM, 84.09% using FSCOX_median, and 88.64% using FSCOX_SVM with a balanced 22 short-term and 22 long-term survivor data set. These accuracies are higher than those using miRNA alone (70.45%, RF; 75%, SVM; 75%, FSCOX_median; and 75%, FSCOX_SVM) or mRNA alone (65.91%, RF; 63.64%, SVM; 72.73%, FSCOX_median; and 70.45%, FSCOX_SVM). Similarly in the glioblastoma multiforme data, the accuracy of miRNA/mRNA using IFS was 75.51% (RF), 87.76% (SVM) 85.71% (FSCOX_median), 85.71% (FSCOX_SVM). These results are higher than the results of using miRNA expression and mRNA expression alone. In addition we predict 16 hsa-miR-23b and hsa-miR-27b target genes in ovarian cancer data sets, obtained by SVM-based feature selection through integration of sequence information and gene expression profiles. Among the approaches used, the integrated miRNA and mRNA data set yielded better results than the individual data sets. The best performance was achieved using the FSCOX_SVM method with independent feature selection, which uses intermediate survival information between short-term and long-term survival time and the combination of the 2 different data sets. The results obtained using the combined data set suggest that there are some strong interactions between miRNA and mRNA features that are not detectable in the individual analyses. Copyright © 2014 Elsevier B.V. All rights reserved.

  14. An improved PSO-SVM model for online recognition defects in eddy current testing

    NASA Astrophysics Data System (ADS)

    Liu, Baoling; Hou, Dibo; Huang, Pingjie; Liu, Banteng; Tang, Huayi; Zhang, Wubo; Chen, Peihua; Zhang, Guangxin

    2013-12-01

    Accurate and rapid recognition of defects is essential for structural integrity and health monitoring of in-service device using eddy current (EC) non-destructive testing. This paper introduces a novel model-free method that includes three main modules: a signal pre-processing module, a classifier module and an optimisation module. In the signal pre-processing module, a kind of two-stage differential structure is proposed to suppress the lift-off fluctuation that could contaminate the EC signal. In the classifier module, multi-class support vector machine (SVM) based on one-against-one strategy is utilised for its good accuracy. In the optimisation module, the optimal parameters of classifier are obtained by an improved particle swarm optimisation (IPSO) algorithm. The proposed IPSO technique can improve convergence performance of the primary PSO through the following strategies: nonlinear processing of inertia weight, introductions of the black hole and simulated annealing model with extremum disturbance. The good generalisation ability of the IPSO-SVM model has been validated through adding additional specimen into the testing set. Experiments show that the proposed algorithm can achieve higher recognition accuracy and efficiency than other well-known classifiers and the superiorities are more obvious with less training set, which contributes to online application.

  15. Automated discrimination of dicentric and monocentric chromosomes by machine learning-based image processing.

    PubMed

    Li, Yanxin; Knoll, Joan H; Wilkins, Ruth C; Flegal, Farrah N; Rogan, Peter K

    2016-05-01

    Dose from radiation exposure can be estimated from dicentric chromosome (DC) frequencies in metaphase cells of peripheral blood lymphocytes. We automated DC detection by extracting features in Giemsa-stained metaphase chromosome images and classifying objects by machine learning (ML). DC detection involves (i) intensity thresholded segmentation of metaphase objects, (ii) chromosome separation by watershed transformation and elimination of inseparable chromosome clusters, fragments and staining debris using a morphological decision tree filter, (iii) determination of chromosome width and centreline, (iv) derivation of centromere candidates, and (v) distinction of DCs from monocentric chromosomes (MC) by ML. Centromere candidates are inferred from 14 image features input to a Support Vector Machine (SVM). Sixteen features derived from these candidates are then supplied to a Boosting classifier and a second SVM which determines whether a chromosome is either a DC or MC. The SVM was trained with 292 DCs and 3135 MCs, and then tested with cells exposed to either low (1 Gy) or high (2-4 Gy) radiation dose. Results were then compared with those of 3 experts. True positive rates (TPR) and positive predictive values (PPV) were determined for the tuning parameter, σ. At larger σ, PPV decreases and TPR increases. At high dose, for σ = 1.3, TPR = 0.52 and PPV = 0.83, while at σ = 1.6, the TPR = 0.65 and PPV = 0.72. At low dose and σ = 1.3, TPR = 0.67 and PPV = 0.26. The algorithm differentiates DCs from MCs, overlapped chromosomes and other objects with acceptable accuracy over a wide range of radiation exposures. © 2016 Wiley Periodicals, Inc.

  16. SVM Pixel Classification on Colour Image Segmentation

    NASA Astrophysics Data System (ADS)

    Barui, Subhrajit; Latha, S.; Samiappan, Dhanalakshmi; Muthu, P.

    2018-04-01

    The aim of image segmentation is to simplify the representation of an image with the help of cluster pixels into something meaningful to analyze. Segmentation is typically used to locate boundaries and curves in an image, precisely to label every pixel in an image to give each pixel an independent identity. SVM pixel classification on colour image segmentation is the topic highlighted in this paper. It holds useful application in the field of concept based image retrieval, machine vision, medical imaging and object detection. The process is accomplished step by step. At first we need to recognize the type of colour and the texture used as an input to the SVM classifier. These inputs are extracted via local spatial similarity measure model and Steerable filter also known as Gabon Filter. It is then trained by using FCM (Fuzzy C-Means). Both the pixel level information of the image and the ability of the SVM Classifier undergoes some sophisticated algorithm to form the final image. The method has a well developed segmented image and efficiency with respect to increased quality and faster processing of the segmented image compared with the other segmentation methods proposed earlier. One of the latest application result is the Light L16 camera.

  17. Diagnosis of periodontal diseases using different classification algorithms: a preliminary study.

    PubMed

    Ozden, F O; Özgönenel, O; Özden, B; Aydogdu, A

    2015-01-01

    The purpose of the proposed study was to develop an identification unit for classifying periodontal diseases using support vector machine (SVM), decision tree (DT), and artificial neural networks (ANNs). A total of 150 patients was divided into two groups such as training (100) and testing (50). The codes created for risk factors, periodontal data, and radiographically bone loss were formed as a matrix structure and regarded as inputs for the classification unit. A total of six periodontal conditions was the outputs of the classification unit. The accuracy of the suggested methods was compared according to their resolution and working time. DT and SVM were best to classify the periodontal diseases with a high accuracy according to the clinical research based on 150 patients. The performances of SVM and DT were found 98% with total computational time of 19.91 and 7.00 s, respectively. ANN had the worst correlation between input and output variable, and its performance was calculated as 46%. SVM and DT appeared to be sufficiently complex to reflect all the factors associated with the periodontal status, simple enough to be understandable and practical as a decision-making aid for prediction of periodontal disease.

  18. Mediterranean Land Use and Land Cover Classification Assessment Using High Spatial Resolution Data

    NASA Astrophysics Data System (ADS)

    Elhag, Mohamed; Boteva, Silvena

    2016-10-01

    Landscape fragmentation is noticeably practiced in Mediterranean regions and imposes substantial complications in several satellite image classification methods. To some extent, high spatial resolution data were able to overcome such complications. For better classification performances in Land Use Land Cover (LULC) mapping, the current research adopts different classification methods comparison for LULC mapping using Sentinel-2 satellite as a source of high spatial resolution. Both of pixel-based and an object-based classification algorithms were assessed; the pixel-based approach employs Maximum Likelihood (ML), Artificial Neural Network (ANN) algorithms, Support Vector Machine (SVM), and, the object-based classification uses the Nearest Neighbour (NN) classifier. Stratified Masking Process (SMP) that integrates a ranking process within the classes based on spectral fluctuation of the sum of the training and testing sites was implemented. An analysis of the overall and individual accuracy of the classification results of all four methods reveals that the SVM classifier was the most efficient overall by distinguishing most of the classes with the highest accuracy. NN succeeded to deal with artificial surface classes in general while agriculture area classes, and forest and semi-natural area classes were segregated successfully with SVM. Furthermore, a comparative analysis indicates that the conventional classification method yielded better accuracy results than the SMP method overall with both classifiers used, ML and SVM.

  19. Prediction and Identification of Krüppel-Like Transcription Factors by Machine Learning Method.

    PubMed

    Liao, Zhijun; Wang, Xinrui; Chen, Xingyong; Zou, Quan

    2017-01-01

    The Krüppel-like factors (KLFs) are a family of containing Zn finger(ZF) motif transcription factors with 18 members in human genome, among them, KLF18 is predicted by bioinformatics. KLFs possess various physiological function involving in a number of cancers and other diseases. Here we perform a binary-class classification of KLFs and non-KLFs by machine learning methods. The protein sequences of KLFs and non-KLFs were searched from UniProt and randomly separate them into training dataset(containing positive and negative sequences) and test dataset(containing only negative sequences), after extracting the 188-dimensional(188D) feature vectors we carry out category with four classifiers(GBDT, libSVM, RF, and k-NN). On the human KLFs, we further dig into the evolutionary relationship and motif distribution, and finally we analyze the conserved amino acid residue of three zinc fingers. The classifier model from training dataset were well constructed, and the highest specificity(Sp) was 99.83% from a library for support vector machine(libSVM) and all the correctly classified rates were over 70% for 10-fold cross-validation on test dataset. The 18 human KLFs can be further divided into 7 groups and the zinc finger domains were located at the carboxyl terminus, and many conserved amino acid residues including Cysteine and Histidine, and the span and interval between them were consistent in the three ZF domains. Two classification models for KLFs prediction have been built by novel machine learning methods. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  20. Support vector machine in machine condition monitoring and fault diagnosis

    NASA Astrophysics Data System (ADS)

    Widodo, Achmad; Yang, Bo-Suk

    2007-08-01

    Recently, the issue of machine condition monitoring and fault diagnosis as a part of maintenance system became global due to the potential advantages to be gained from reduced maintenance costs, improved productivity and increased machine availability. This paper presents a survey of machine condition monitoring and fault diagnosis using support vector machine (SVM). It attempts to summarize and review the recent research and developments of SVM in machine condition monitoring and diagnosis. Numerous methods have been developed based on intelligent systems such as artificial neural network, fuzzy expert system, condition-based reasoning, random forest, etc. However, the use of SVM for machine condition monitoring and fault diagnosis is still rare. SVM has excellent performance in generalization so it can produce high accuracy in classification for machine condition monitoring and diagnosis. Until 2006, the use of SVM in machine condition monitoring and fault diagnosis is tending to develop towards expertise orientation and problem-oriented domain. Finally, the ability to continually change and obtain a novel idea for machine condition monitoring and fault diagnosis using SVM will be future works.

  1. Using support vector machines to improve elemental ion identification in macromolecular crystal structures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morshed, Nader; Lawrence Berkeley National Laboratory, Berkeley, CA 94720; Echols, Nathaniel, E-mail: nechols@lbl.gov

    2015-05-01

    A method to automatically identify possible elemental ions in X-ray crystal structures has been extended to use support vector machine (SVM) classifiers trained on selected structures in the PDB, with significantly improved sensitivity over manually encoded heuristics. In the process of macromolecular model building, crystallographers must examine electron density for isolated atoms and differentiate sites containing structured solvent molecules from those containing elemental ions. This task requires specific knowledge of metal-binding chemistry and scattering properties and is prone to error. A method has previously been described to identify ions based on manually chosen criteria for a number of elements. Here,more » the use of support vector machines (SVMs) to automatically classify isolated atoms as either solvent or one of various ions is described. Two data sets of protein crystal structures, one containing manually curated structures deposited with anomalous diffraction data and another with automatically filtered, high-resolution structures, were constructed. On the manually curated data set, an SVM classifier was able to distinguish calcium from manganese, zinc, iron and nickel, as well as all five of these ions from water molecules, with a high degree of accuracy. Additionally, SVMs trained on the automatically curated set of high-resolution structures were able to successfully classify most common elemental ions in an independent validation test set. This method is readily extensible to other elemental ions and can also be used in conjunction with previous methods based on a priori expectations of the chemical environment and X-ray scattering.« less

  2. Energy-Based Metrics for Arthroscopic Skills Assessment.

    PubMed

    Poursartip, Behnaz; LeBel, Marie-Eve; McCracken, Laura C; Escoto, Abelardo; Patel, Rajni V; Naish, Michael D; Trejos, Ana Luisa

    2017-08-05

    Minimally invasive skills assessment methods are essential in developing efficient surgical simulators and implementing consistent skills evaluation. Although numerous methods have been investigated in the literature, there is still a need to further improve the accuracy of surgical skills assessment. Energy expenditure can be an indication of motor skills proficiency. The goals of this study are to develop objective metrics based on energy expenditure, normalize these metrics, and investigate classifying trainees using these metrics. To this end, different forms of energy consisting of mechanical energy and work were considered and their values were divided by the related value of an ideal performance to develop normalized metrics. These metrics were used as inputs for various machine learning algorithms including support vector machines (SVM) and neural networks (NNs) for classification. The accuracy of the combination of the normalized energy-based metrics with these classifiers was evaluated through a leave-one-subject-out cross-validation. The proposed method was validated using 26 subjects at two experience levels (novices and experts) in three arthroscopic tasks. The results showed that there are statistically significant differences between novices and experts for almost all of the normalized energy-based metrics. The accuracy of classification using SVM and NN methods was between 70% and 95% for the various tasks. The results show that the normalized energy-based metrics and their combination with SVM and NN classifiers are capable of providing accurate classification of trainees. The assessment method proposed in this study can enhance surgical training by providing appropriate feedback to trainees about their level of expertise and can be used in the evaluation of proficiency.

  3. Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.

    PubMed

    Becker, Natalia; Toedt, Grischa; Lichter, Peter; Benner, Axel

    2011-05-09

    Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.

  4. Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data

    PubMed Central

    2011-01-01

    Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters. The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'. We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets. PMID:21554689

  5. A combined LS-SVM & MLR QSAR workflow for predicting the inhibition of CXCR3 receptor by quinazolinone analogs.

    PubMed

    Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Igglessi-Markopoulou, Olga; Kollias, George

    2010-05-01

    A novel QSAR workflow is constructed that combines MLR with LS-SVM classification techniques for the identification of quinazolinone analogs as "active" or "non-active" CXCR3 antagonists. The accuracy of the LS-SVM classification technique for the training set and test was 100% and 90%, respectively. For the "active" analogs a validated MLR QSAR model estimates accurately their I-IP10 IC(50) inhibition values. The accuracy of the QSAR model (R (2) = 0.80) is illustrated using various evaluation techniques, such as leave-one-out procedure (R(LOO2)) = 0.67) and validation through an external test set (R(pred2) = 0.78). The key conclusion of this study is that the selected molecular descriptors, Highest Occupied Molecular Orbital energy (HOMO), Principal Moment of Inertia along X and Y axes PMIX and PMIZ, Polar Surface Area (PSA), Presence of triple bond (PTrplBnd), and Kier shape descriptor ((1) kappa), demonstrate discriminatory and pharmacophore abilities.

  6. Fall Detection Using Smartphone Audio Features.

    PubMed

    Cheffena, Michael

    2016-07-01

    An automated fall detection system based on smartphone audio features is developed. The spectrogram, mel frequency cepstral coefficents (MFCCs), linear predictive coding (LPC), and matching pursuit (MP) features of different fall and no-fall sound events are extracted from experimental data. Based on the extracted audio features, four different machine learning classifiers: k-nearest neighbor classifier (k-NN), support vector machine (SVM), least squares method (LSM), and artificial neural network (ANN) are investigated for distinguishing between fall and no-fall events. For each audio feature, the performance of each classifier in terms of sensitivity, specificity, accuracy, and computational complexity is evaluated. The best performance is achieved using spectrogram features with ANN classifier with sensitivity, specificity, and accuracy all above 98%. The classifier also has acceptable computational requirement for training and testing. The system is applicable in home environments where the phone is placed in the vicinity of the user.

  7. Automated characterisation of ultrasound images of ovarian tumours: the diagnostic accuracy of a support vector machine and image processing with a local binary pattern operator.

    PubMed

    Khazendar, S; Sayasneh, A; Al-Assam, H; Du, H; Kaijser, J; Ferrara, L; Timmerman, D; Jassim, S; Bourne, T

    2015-01-01

    Preoperative characterisation of ovarian masses into benign or malignant is of paramount importance to optimise patient management. In this study, we developed and validated a computerised model to characterise ovarian masses as benign or malignant. Transvaginal 2D B mode static ultrasound images of 187 ovarian masses with known histological diagnosis were included. Images were first pre-processed and enhanced, and Local Binary Pattern Histograms were then extracted from 2 × 2 blocks of each image. A Support Vector Machine (SVM) was trained using stratified cross validation with randomised sampling. The process was repeated 15 times and in each round 100 images were randomly selected. The SVM classified the original non-treated static images as benign or malignant masses with an average accuracy of 0.62 (95% CI: 0.59-0.65). This performance significantly improved to an average accuracy of 0.77 (95% CI: 0.75-0.79) when images were pre-processed, enhanced and treated with a Local Binary Pattern operator (mean difference 0.15: 95% 0.11-0.19, p < 0.0001, two-tailed t test). We have shown that an SVM can classify static 2D B mode ultrasound images of ovarian masses into benign and malignant categories. The accuracy improves if texture related LBP features extracted from the images are considered.

  8. The Application of Support Vector Machine (svm) Using Cielab Color Model, Color Intensity and Color Constancy as Features for Ortho Image Classification of Benthic Habitats in Hinatuan, Surigao del Sur, Philippines

    NASA Astrophysics Data System (ADS)

    Cubillas, J. E.; Japitana, M.

    2016-06-01

    This study demonstrates the application of CIELAB, Color intensity, and One Dimensional Scalar Constancy as features for image recognition and classifying benthic habitats in an image with the coastal areas of Hinatuan, Surigao Del Sur, Philippines as the study area. The study area is composed of four datasets, namely: (a) Blk66L005, (b) Blk66L021, (c) Blk66L024, and (d) Blk66L0114. SVM optimization was performed in Matlab® software with the help of Parallel Computing Toolbox to hasten the SVM computing speed. The image used for collecting samples for SVM procedure was Blk66L0114 in which a total of 134,516 sample objects of mangrove, possible coral existence with rocks, sand, sea, fish pens and sea grasses were collected and processed. The collected samples were then used as training sets for the supervised learning algorithm and for the creation of class definitions. The learned hyper-planes separating one class from another in the multi-dimensional feature space can be thought of as a super feature which will then be used in developing the C (classifier) rule set in eCognition® software. The classification results of the sampling site yielded an accuracy of 98.85% which confirms the reliability of remote sensing techniques and analysis employed to orthophotos like the CIELAB, Color Intensity and One dimensional scalar constancy and the use of SVM classification algorithm in classifying benthic habitats.

  9. Classification of the fragrant styles and evaluation of the aromatic quality of flue-cured tobacco leaves by machine-learning methods.

    PubMed

    Gu, Li; Xue, Lichun; Song, Qi; Wang, Fengji; He, Huaqin; Zhang, Zhongyi

    2016-12-01

    During commercial transactions, the quality of flue-cured tobacco leaves must be characterized efficiently, and the evaluation system should be easily transferable across different traders. However, there are over 3000 chemical compounds in flue-cured tobacco leaves; thus, it is impossible to evaluate the quality of flue-cured tobacco leaves using all the chemical compounds. In this paper, we used Support Vector Machine (SVM) algorithm together with 22 chemical compounds selected by ReliefF-Particle Swarm Optimization (R-PSO) to classify the fragrant style of flue-cured tobacco leaves, where the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) were 90.95% and 0.80, respectively. SVM algorithm combined with 19 chemical compounds selected by R-PSO achieved the best assessment performance of the aromatic quality of tobacco leaves, where the PCC and MSE were 0.594 and 0.263, respectively. Finally, we constructed two online tools to classify the fragrant style and evaluate the aromatic quality of flue-cured tobacco leaf samples. These tools can be accessed at http://bioinformatics.fafu.edu.cn/tobacco .

  10. SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease.

    PubMed

    Ozcift, Akin

    2012-08-01

    Parkinson disease (PD) is an age-related deterioration of certain nerve systems, which affects movement, balance, and muscle control of clients. PD is one of the common diseases which affect 1% of people older than 60 years. A new classification scheme based on support vector machine (SVM) selected features to train rotation forest (RF) ensemble classifiers is presented for improving diagnosis of PD. The dataset contains records of voice measurements from 31 people, 23 with PD and each record in the dataset is defined with 22 features. The diagnosis model first makes use of a linear SVM to select ten most relevant features from 22. As a second step of the classification model, six different classifiers are trained with the subset of features. Subsequently, at the third step, the accuracies of classifiers are improved by the utilization of RF ensemble classification strategy. The results of the experiments are evaluated using three metrics; classification accuracy (ACC), Kappa Error (KE) and Area under the Receiver Operating Characteristic (ROC) Curve (AUC). Performance measures of two base classifiers, i.e. KStar and IBk, demonstrated an apparent increase in PD diagnosis accuracy compared to similar studies in literature. After all, application of RF ensemble classification scheme improved PD diagnosis in 5 of 6 classifiers significantly. We, numerically, obtained about 97% accuracy in RF ensemble of IBk (a K-Nearest Neighbor variant) algorithm, which is a quite high performance for Parkinson disease diagnosis.

  11. Protein-protein interaction site prediction in Homo sapiens and E. coli using an interaction-affinity based membership function in fuzzy SVM.

    PubMed

    Sriwastava, Brijesh Kumar; Basu, Subhadip; Maulik, Ujjwal

    2015-10-01

    Protein-protein interaction (PPI) site prediction aids to ascertain the interface residues that participate in interaction processes. Fuzzy support vector machine (F-SVM) is proposed as an effective method to solve this problem, and we have shown that the performance of the classical SVM can be enhanced with the help of an interaction-affinity based fuzzy membership function. The performances of both SVM and F-SVM on the PPI databases of the Homo sapiens and E. coli organisms are evaluated and estimated the statistical significance of the developed method over classical SVM and other fuzzy membership-based SVM methods available in the literature. Our membership function uses the residue-level interaction affinity scores for each pair of positive and negative sequence fragments. The average AUC scores in the 10-fold cross-validation experiments are measured as 79.94% and 80.48% for the Homo sapiens and E. coli organisms respectively. On the independent test datasets, AUC scores are obtained as 76.59% and 80.17% respectively for the two organisms. In almost all cases, the developed F-SVM method improves the performances obtained by the corresponding classical SVM and the other classifiers, available in the literature.

  12. Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer

    PubMed Central

    Gabere, Musa Nur; Hussein, Mohamed Aly; Aziz, Mohammad Azhar

    2016-01-01

    Purpose There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. Methods In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). Results The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. Conclusion This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. PMID:27330311

  13. Classification of skin cancer images using local binary pattern and SVM classifier

    NASA Astrophysics Data System (ADS)

    Adjed, Faouzi; Faye, Ibrahima; Ababsa, Fakhreddine; Gardezi, Syed Jamal; Dass, Sarat Chandra

    2016-11-01

    In this paper, a classification method for melanoma and non-melanoma skin cancer images has been presented using the local binary patterns (LBP). The LBP computes the local texture information from the skin cancer images, which is later used to compute some statistical features that have capability to discriminate the melanoma and non-melanoma skin tissues. Support vector machine (SVM) is applied on the feature matrix for classification into two skin image classes (malignant and benign). The method achieves good classification accuracy of 76.1% with sensitivity of 75.6% and specificity of 76.7%.

  14. Epileptic seizure detection in EEG signal using machine learning techniques.

    PubMed

    Jaiswal, Abeg Kumar; Banka, Haider

    2018-03-01

    Epilepsy is a well-known nervous system disorder characterized by seizures. Electroencephalograms (EEGs), which capture brain neural activity, can detect epilepsy. Traditional methods for analyzing an EEG signal for epileptic seizure detection are time-consuming. Recently, several automated seizure detection frameworks using machine learning technique have been proposed to replace these traditional methods. The two basic steps involved in machine learning are feature extraction and classification. Feature extraction reduces the input pattern space by keeping informative features and the classifier assigns the appropriate class label. In this paper, we propose two effective approaches involving subpattern based PCA (SpPCA) and cross-subpattern correlation-based PCA (SubXPCA) with Support Vector Machine (SVM) for automated seizure detection in EEG signals. Feature extraction was performed using SpPCA and SubXPCA. Both techniques explore the subpattern correlation of EEG signals, which helps in decision-making process. SVM is used for classification of seizure and non-seizure EEG signals. The SVM was trained with radial basis kernel. All the experiments have been carried out on the benchmark epilepsy EEG dataset. The entire dataset consists of 500 EEG signals recorded under different scenarios. Seven different experimental cases for classification have been conducted. The classification accuracy was evaluated using tenfold cross validation. The classification results of the proposed approaches have been compared with the results of some of existing techniques proposed in the literature to establish the claim.

  15. Early detection of germinated wheat grains using terahertz image and chemometrics

    NASA Astrophysics Data System (ADS)

    Jiang, Yuying; Ge, Hongyi; Lian, Feiyu; Zhang, Yuan; Xia, Shanhong

    2016-02-01

    In this paper, we propose a feasible tool that uses a terahertz (THz) imaging system for identifying wheat grains at different stages of germination. The THz spectra of the main changed components of wheat grains, maltose and starch, which were obtained by THz time spectroscopy, were distinctly different. Used for original data compression and feature extraction, principal component analysis (PCA) revealed the changes that occurred in the inner chemical structure during germination. Two thresholds, one indicating the start of the release of α-amylase and the second when it reaches the steady state, were obtained through the first five score images. Thus, the first five PCs were input for the partial least-squares regression (PLSR), least-squares support vector machine (LS-SVM), and back-propagation neural network (BPNN) models, which were used to classify seven different germination times between 0 and 48 h, with a prediction accuracy of 92.85%, 93.57%, and 90.71%, respectively. The experimental results indicated that the combination of THz imaging technology and chemometrics could be a new effective way to discriminate wheat grains at the early germination stage of approximately 6 h.

  16. Effectively identifying compound-protein interactions by learning from positive and unlabeled examples.

    PubMed

    Cheng, Zhanzhan; Zhou, Shuigeng; Wang, Yang; Liu, Hui; Guan, Jihong; Chen, Yi-Ping Phoebe

    2016-05-18

    Prediction of compound-protein interactions (CPIs) is to find new compound-protein pairs where a protein is targeted by at least a compound, which is a crucial step in new drug design. Currently, a number of machine learning based methods have been developed to predict new CPIs in the literature. However, as there is not yet any publicly available set of validated negative CPIs, most existing machine learning based approaches use the unknown interactions (not validated CPIs) selected randomly as the negative examples to train classifiers for predicting new CPIs. Obviously, this is not quite reasonable and unavoidably impacts the CPI prediction performance. In this paper, we simply take the unknown CPIs as unlabeled examples, and propose a new method called PUCPI (the abbreviation of PU learning for Compound-Protein Interaction identification) that employs biased-SVM (Support Vector Machine) to predict CPIs using only positive and unlabeled examples. PU learning is a class of learning methods that leans from positive and unlabeled (PU) samples. To the best of our knowledge, this is the first work that identifies CPIs using only positive and unlabeled examples. We first collect known CPIs as positive examples and then randomly select compound-protein pairs not in the positive set as unlabeled examples. For each CPI/compound-protein pair, we extract protein domains as protein features and compound substructures as chemical features, then take the tensor product of the corresponding compound features and protein features as the feature vector of the CPI/compound-protein pair. After that, biased-SVM is employed to train classifiers on different datasets of CPIs and compound-protein pairs. Experiments over various datasets show that our method outperforms six typical classifiers, including random forest, L1- and L2-regularized logistic regression, naive Bayes, SVM and k-nearest neighbor (kNN), and three types of existing CPI prediction models. Source code, datasets and related documents of PUCPI are available at: http://admis.fudan.edu.cn/projects/pucpi.html.

  17. Protein classification based on text document classification techniques.

    PubMed

    Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith

    2005-03-01

    The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.

  18. A region-based segmentation of tumour from brain CT images using nonlinear support vector machine classifier.

    PubMed

    Nanthagopal, A Padma; Rajamony, R Sukanesh

    2012-07-01

    The proposed system provides new textural information for segmenting tumours, efficiently and accurately and with less computational time, from benign and malignant tumour images, especially in smaller dimensions of tumour regions of computed tomography (CT) images. Region-based segmentation of tumour from brain CT image data is an important but time-consuming task performed manually by medical experts. The objective of this work is to segment brain tumour from CT images using combined grey and texture features with new edge features and nonlinear support vector machine (SVM) classifier. The selected optimal features are used to model and train the nonlinear SVM classifier to segment the tumour from computed tomography images and the segmentation accuracies are evaluated for each slice of the tumour image. The method is applied on real data of 80 benign, malignant tumour images. The results are compared with the radiologist labelled ground truth. Quantitative analysis between ground truth and the segmented tumour is presented in terms of segmentation accuracy and the overlap similarity measure dice metric. From the analysis and performance measures such as segmentation accuracy and dice metric, it is inferred that better segmentation accuracy and higher dice metric are achieved with the normalized cut segmentation method than with the fuzzy c-means clustering method.

  19. Support Vector Machine Model for Automatic Detection and Classification of Seismic Events

    NASA Astrophysics Data System (ADS)

    Barros, Vesna; Barros, Lucas

    2016-04-01

    The automated processing of multiple seismic signals to detect, localize and classify seismic events is a central tool in both natural hazards monitoring and nuclear treaty verification. However, false detections and missed detections caused by station noise and incorrect classification of arrivals are still an issue and the events are often unclassified or poorly classified. Thus, machine learning techniques can be used in automatic processing for classifying the huge database of seismic recordings and provide more confidence in the final output. Applied in the context of the International Monitoring System (IMS) - a global sensor network developed for the Comprehensive Nuclear-Test-Ban Treaty (CTBT) - we propose a fully automatic method for seismic event detection and classification based on a supervised pattern recognition technique called the Support Vector Machine (SVM). According to Kortström et al., 2015, the advantages of using SVM are handleability of large number of features and effectiveness in high dimensional spaces. Our objective is to detect seismic events from one IMS seismic station located in an area of high seismicity and mining activity and classify them as earthquakes or quarry blasts. It is expected to create a flexible and easily adjustable SVM method that can be applied in different regions and datasets. Taken a step further, accurate results for seismic stations could lead to a modification of the model and its parameters to make it applicable to other waveform technologies used to monitor nuclear explosions such as infrasound and hydroacoustic waveforms. As an authorized user, we have direct access to all IMS data and bulletins through a secure signatory account. A set of significant seismic waveforms containing different types of events (e.g. earthquake, quarry blasts) and noise is being analysed to train the model and learn the typical pattern of the signal from these events. Moreover, comparing the performance of the support-vector network to various classical learning algorithms used before in seismic detection and classification is an essential final step to analyze the advantages and disadvantages of the model.

  20. A comprehensive quality evaluation method by FT-NIR spectroscopy and chemometric: Fine classification and untargeted authentication against multiple frauds for Chinese Ganoderma lucidum

    NASA Astrophysics Data System (ADS)

    Fu, Haiyan; Yin, Qiaobo; Xu, Lu; Wang, Weizheng; Chen, Feng; Yang, Tianming

    2017-07-01

    The origins and authenticity against frauds are two essential aspects of food quality. In this work, a comprehensive quality evaluation method by FT-NIR spectroscopy and chemometrics were suggested to address the geographical origins and authentication of Chinese Ganoderma lucidum (GL). Classification for 25 groups of GL samples (7 common species from 15 producing areas) was performed using near-infrared spectroscopy and interval-combination One-Versus-One least squares support vector machine (IC-OVO-LS-SVM). Untargeted analysis of 4 adulterants of cheaper mushrooms was performed by one-class partial least squares (OCPLS) modeling for each of the 7 GL species. After outlier diagnosis and comparing the influences of different preprocessing methods and spectral intervals on classification, IC-OVO-LS-SVM with standard normal variate (SNV) spectra obtained a total classification accuracy of 0.9317, an average sensitivity and specificity of 0.9306 and 0.9971, respectively. With SNV or second-order derivative (D2) spectra, OCPLS could detect at least 2% or more doping levels of adulterants for 5 of the 7 GL species and 5% or more doping levels for the other 2 GL species. This study demonstrates the feasibility of using new chemometrics and NIR spectroscopy for fine classification of GL geographical origins and species as well as for untargeted analysis of multiple adulterants.

  1. Fresh Biomass Estimation in Heterogeneous Grassland Using Hyperspectral Measurements and Multivariate Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Darvishzadeh, R.; Skidmore, A. K.; Mirzaie, M.; Atzberger, C.; Schlerf, M.

    2014-12-01

    Accurate estimation of grassland biomass at their peak productivity can provide crucial information regarding the functioning and productivity of the rangelands. Hyperspectral remote sensing has proved to be valuable for estimation of vegetation biophysical parameters such as biomass using different statistical techniques. However, in statistical analysis of hyperspectral data, multicollinearity is a common problem due to large amount of correlated hyper-spectral reflectance measurements. The aim of this study was to examine the prospect of above ground biomass estimation in a heterogeneous Mediterranean rangeland employing multivariate calibration methods. Canopy spectral measurements were made in the field using a GER 3700 spectroradiometer, along with concomitant in situ measurements of above ground biomass for 170 sample plots. Multivariate calibrations including partial least squares regression (PLSR), principal component regression (PCR), and Least-Squared Support Vector Machine (LS-SVM) were used to estimate the above ground biomass. The prediction accuracy of the multivariate calibration methods were assessed using cross validated R2 and RMSE. The best model performance was obtained using LS_SVM and then PLSR both calibrated with first derivative reflectance dataset with R2cv = 0.88 & 0.86 and RMSEcv= 1.15 & 1.07 respectively. The weakest prediction accuracy was appeared when PCR were used (R2cv = 0.31 and RMSEcv= 2.48). The obtained results highlight the importance of multivariate calibration methods for biomass estimation when hyperspectral data are used.

  2. Prediction of B-cell linear epitopes with a combination of support vector machine classification and amino acid propensity identification.

    PubMed

    Wang, Hsin-Wei; Lin, Ya-Chi; Pai, Tun-Wen; Chang, Hao-Teng

    2011-01-01

    Epitopes are antigenic determinants that are useful because they induce B-cell antibody production and stimulate T-cell activation. Bioinformatics can enable rapid, efficient prediction of potential epitopes. Here, we designed a novel B-cell linear epitope prediction system called LEPS, Linear Epitope Prediction by Propensities and Support Vector Machine, that combined physico-chemical propensity identification and support vector machine (SVM) classification. We tested the LEPS on four datasets: AntiJen, HIV, a newly generated PC, and AHP, a combination of these three datasets. Peptides with globally or locally high physicochemical propensities were first identified as primitive linear epitope (LE) candidates. Then, candidates were classified with the SVM based on the unique features of amino acid segments. This reduced the number of predicted epitopes and enhanced the positive prediction value (PPV). Compared to four other well-known LE prediction systems, the LEPS achieved the highest accuracy (72.52%), specificity (84.22%), PPV (32.07%), and Matthews' correlation coefficient (10.36%).

  3. TU-C-12A-12: Differentiating Bone Lesions and Degenerative Joint Disease in NaF PET/CT Scans Using Machine Learning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Perk, T; Bradshaw, T; Muzahir, S

    2014-06-15

    Purpose: [F-18]NaF PET can be used to image bone metastases; however, tracer uptake in degenerative joint disease (DJD) often appears similar to metastases. This study aims to develop and compare different machine learning algorithms to automatically identify regions of [F-18]NaF scans that correspond to DJD. Methods: 10 metastatic prostate cancer patients received whole body [F-18]NaF PET/CT scans prior to treatment. Image segmentation resulted in 852 ROIs, 69 of which were identified by a nuclear medicine physician as DJD. For all ROIs, various PET and CT textural features were computed. ROIs were divided into training and testing sets used to trainmore » eight different machine learning classifiers. Classifiers were evaluated based on receiver operating characteristics area under the curve (AUC), sensitivity, specificity, and positive predictive value (PPV). We also assessed the added value of including CT features in addition to PET features for training classifiers. Results: The training set consisted of 37 DJD ROIs with 475 non-DJD ROIs, and the testing set consisted of 32 DJD ROIs with 308 non-DJD ROIs. Of all classifiers, generalized linear models (GLM), decision forests (DF), and support vector machines (SVM) had the best performance. AUCs of GLM (0.929), DF (0.921), and SVM (0.889) were significantly higher than the other models (p<0.001). GLM and DF, overall, had the best sensitivity, specificity, and PPV, and gave a significantly better performance (p<0.01) than all other models. PET/CT GLM classifiers had higher AUC than just PET or just CT. GLMs built using PET/CT information had superior or comparable sensitivities, specificities and PPVs to just PET or just CT. Conclusion: Machine learning algorithms trained with PET/CT features were able to identify some cases of DJD. GLM outperformed the other classification algorithms. Using PET and CT information together was shown to be superior to using PET or CT features alone. Research supported by the Prostate Cancer Foundation.« less

  4. Classifying machinery condition using oil samples and binary logistic regression

    NASA Astrophysics Data System (ADS)

    Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.

    2015-08-01

    The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.

  5. GAPscreener: an automatic tool for screening human genetic association literature in PubMed using the support vector machine technique.

    PubMed

    Yu, Wei; Clyne, Melinda; Dolan, Siobhan M; Yesupriya, Ajay; Wulf, Anja; Liu, Tiebin; Khoury, Muin J; Gwinn, Marta

    2008-04-22

    Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.

  6. Document page structure learning for fixed-layout e-books using conditional random fields

    NASA Astrophysics Data System (ADS)

    Tao, Xin; Tang, Zhi; Xu, Canhui

    2013-12-01

    In this paper, a model is proposed to learn logical structure of fixed-layout document pages by combining support vector machine (SVM) and conditional random fields (CRF). Features related to each logical label and their dependencies are extracted from various original Portable Document Format (PDF) attributes. Both local evidence and contextual dependencies are integrated in the proposed model so as to achieve better logical labeling performance. With the merits of SVM as local discriminative classifier and CRF modeling contextual correlations of adjacent fragments, it is capable of resolving the ambiguities of semantic labels. The experimental results show that CRF based models with both tree and chain graph structures outperform the SVM model with an increase of macro-averaged F1 by about 10%.

  7. Weighted K-means support vector machine for cancer prediction.

    PubMed

    Kim, SungHwan

    2016-01-01

    To date, the support vector machine (SVM) has been widely applied to diverse bio-medical fields to address disease subtype identification and pathogenicity of genetic variants. In this paper, I propose the weighted K-means support vector machine (wKM-SVM) and weighted support vector machine (wSVM), for which I allow the SVM to impose weights to the loss term. Besides, I demonstrate the numerical relations between the objective function of the SVM and weights. Motivated by general ensemble techniques, which are known to improve accuracy, I directly adopt the boosting algorithm to the newly proposed weighted KM-SVM (and wSVM). For predictive performance, a range of simulation studies demonstrate that the weighted KM-SVM (and wSVM) with boosting outperforms the standard KM-SVM (and SVM) including but not limited to many popular classification rules. I applied the proposed methods to simulated data and two large-scale real applications in the TCGA pan-cancer methylation data of breast and kidney cancer. In conclusion, the weighted KM-SVM (and wSVM) increases accuracy of the classification model, and will facilitate disease diagnosis and clinical treatment decisions to benefit patients. A software package (wSVM) is publicly available at the R-project webpage (https://www.r-project.org).

  8. Machine Learning Classification of Cirrhotic Patients with and without Minimal Hepatic Encephalopathy Based on Regional Homogeneity of Intrinsic Brain Activity.

    PubMed

    Chen, Qiu-Feng; Chen, Hua-Jun; Liu, Jun; Sun, Tao; Shen, Qun-Tai

    2016-01-01

    Machine learning-based approaches play an important role in examining functional magnetic resonance imaging (fMRI) data in a multivariate manner and extracting features predictive of group membership. This study was performed to assess the potential for measuring brain intrinsic activity to identify minimal hepatic encephalopathy (MHE) in cirrhotic patients, using the support vector machine (SVM) method. Resting-state fMRI data were acquired in 16 cirrhotic patients with MHE and 19 cirrhotic patients without MHE. The regional homogeneity (ReHo) method was used to investigate the local synchrony of intrinsic brain activity. Psychometric Hepatic Encephalopathy Score (PHES) was used to define MHE condition. SVM-classifier was then applied using leave-one-out cross-validation, to determine the discriminative ReHo-map for MHE. The discrimination map highlights a set of regions, including the prefrontal cortex, anterior cingulate cortex, anterior insular cortex, inferior parietal lobule, precentral and postcentral gyri, superior and medial temporal cortices, and middle and inferior occipital gyri. The optimized discriminative model showed total accuracy of 82.9% and sensitivity of 81.3%. Our results suggested that a combination of the SVM approach and brain intrinsic activity measurement could be helpful for detection of MHE in cirrhotic patients.

  9. Comparison of SVM, RF and ELM on an Electronic Nose for the Intelligent Evaluation of Paraffin Samples.

    PubMed

    Men, Hong; Fu, Songlin; Yang, Jialin; Cheng, Meiqi; Shi, Yan; Liu, Jingjing

    2018-01-18

    Paraffin odor intensity is an important quality indicator when a paraffin inspection is performed. Currently, paraffin odor level assessment is mainly dependent on an artificial sensory evaluation. In this paper, we developed a paraffin odor analysis system to classify and grade four kinds of paraffin samples. The original feature set was optimized using Principal Component Analysis (PCA) and Partial Least Squares (PLS). Support Vector Machine (SVM), Random Forest (RF), and Extreme Learning Machine (ELM) were applied to three different feature data sets for classification and level assessment of paraffin. For classification, the model based on SVM, with an accuracy rate of 100%, was superior to that based on RF, with an accuracy rate of 98.33-100%, and ELM, with an accuracy rate of 98.01-100%. For level assessment, the R² related to the training set was above 0.97 and the R² related to the test set was above 0.87. Through comprehensive comparison, the generalization of the model based on ELM was superior to those based on SVM and RF. The scoring errors for the three models were 0.0016-0.3494, lower than the error of 0.5-1.0 measured by industry standard experts, meaning these methods have a higher prediction accuracy for scoring paraffin level.

  10. Prediction of human disease-associated phosphorylation sites with combined feature selection approach and support vector machine.

    PubMed

    Xu, Xiaoyi; Li, Ao; Wang, Minghui

    2015-08-01

    Phosphorylation is a crucial post-translational modification, which regulates almost all cellular processes in life. It has long been recognised that protein phosphorylation has close relationship with diseases, and therefore many researches are undertaken to predict phosphorylation sites for disease treatment and drug design. However, despite the success achieved by these approaches, no method focuses on disease-associated phosphorylation sites prediction. Herein, for the first time the authors propose a novel approach that is specially designed to identify associations between phosphorylation sites and human diseases. To take full advantage of local sequence information, a combined feature selection method-based support vector machine (CFS-SVM) that incorporates minimum-redundancy-maximum-relevance filtering process and forward feature selection process is developed. Performance evaluation shows that CFS-SVM is significantly better than the widely used classifiers including Bayesian decision theory, k nearest neighbour and random forest. With the extremely high specificity of 99%, CFS-SVM can still achieve a high sensitivity. Besides, tests on extra data confirm the effectiveness and general applicability of CFS-SVM approach on a variety of diseases. Finally, the analysis of selected features and corresponding kinases also help the understanding of the potential mechanism of disease-phosphorylation relationships and guide further experimental validations.

  11. Classification of fMRI resting-state maps using machine learning techniques: A comparative study

    NASA Astrophysics Data System (ADS)

    Gallos, Ioannis; Siettos, Constantinos

    2017-11-01

    We compare the efficiency of Principal Component Analysis (PCA) and nonlinear learning manifold algorithms (ISOMAP and Diffusion maps) for classifying brain maps between groups of schizophrenia patients and healthy from fMRI scans during a resting-state experiment. After a standard pre-processing pipeline, we applied spatial Independent component analysis (ICA) to reduce (a) noise and (b) spatial-temporal dimensionality of fMRI maps. On the cross-correlation matrix of the ICA components, we applied PCA, ISOMAP and Diffusion Maps to find an embedded low-dimensional space. Finally, support-vector-machines (SVM) and k-NN algorithms were used to evaluate the performance of the algorithms in classifying between the two groups.

  12. Regional autonomy changes in resting-state functional MRI in patients with HIV associated neurocognitive disorder

    NASA Astrophysics Data System (ADS)

    DSouza, Adora M.; Abidin, Anas Z.; Chockanathan, Udaysankar; Wismüller, Axel

    2018-03-01

    In this study, we investigate whether there are discernable changes in influence that brain regions have on themselves once patients show symptoms of HIV Associated Neurocognitive Disorder (HAND) using functional MRI (fMRI). Simple functional connectivity measures, such as correlation cannot reveal such information. To this end, we use mutual connectivity analysis (MCA) with Local Models (LM), which reveals a measure of influence in terms of predictability. Once such measures of interaction are obtained, we train two classifiers to characterize difference in patterns of regional self-influence between healthy subjects and subjects presenting with HAND symptoms. The two classifiers we use are Support Vector Machines (SVM) and Localized Generalized Matrix Learning Vector Quantization (LGMLVQ). Performing machine learning on fMRI connectivity measures is popularly known as multi-voxel pattern analysis (MVPA). By performing such an analysis, we are interested in studying the impact HIV infection has on an individual's brain. The high area under receiver operating curve (AUC) and accuracy values for 100 different train/test separations using MCA-LM self-influence measures (SVM: mean AUC=0.86, LGMLVQ: mean AUC=0.88, SVM and LGMLVQ: mean accuracy=0.78) compared with standard MVPA analysis using cross-correlation between fMRI time-series (SVM: mean AUC=0.58, LGMLVQ: mean AUC=0.57), demonstrates that self-influence features can be more discriminative than measures of interaction between time-series pairs. Furthermore, our results suggest that incorporating measures of self-influence in MVPA analysis used commonly in fMRI analysis has the potential to provide a performance boost and indicate important changes in dynamics of regions in the brain as a consequence of HIV infection.

  13. Classification of toxicity effects of biotransformed hepatic drugs using whale optimized support vector machines.

    PubMed

    Tharwat, Alaa; Moemen, Yasmine S; Hassanien, Aboul Ella

    2017-04-01

    Measuring toxicity is an important step in drug development. Nevertheless, the current experimental methods used to estimate the drug toxicity are expensive and time-consuming, indicating that they are not suitable for large-scale evaluation of drug toxicity in the early stage of drug development. Hence, there is a high demand to develop computational models that can predict the drug toxicity risks. In this study, we used a dataset that consists of 553 drugs that biotransformed in liver. The toxic effects were calculated for the current data, namely, mutagenic, tumorigenic, irritant and reproductive effect. Each drug is represented by 31 chemical descriptors (features). The proposed model consists of three phases. In the first phase, the most discriminative subset of features is selected using rough set-based methods to reduce the classification time while improving the classification performance. In the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique (SMOTE), BorderLine SMOTE and Safe Level SMOTE are used to solve the problem of imbalanced dataset. In the third phase, the Support Vector Machines (SVM) classifier is used to classify an unknown drug into toxic or non-toxic. SVM parameters such as the penalty parameter and kernel parameter have a great impact on the classification accuracy of the model. In this paper, Whale Optimization Algorithm (WOA) has been proposed to optimize the parameters of SVM, so that the classification error can be reduced. The experimental results proved that the proposed model achieved high sensitivity to all toxic effects. Overall, the high sensitivity of the WOA+SVM model indicates that it could be used for the prediction of drug toxicity in the early stage of drug development. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Detection of tripping gait patterns in the elderly using autoregressive features and support vector machines.

    PubMed

    Lai, Daniel T H; Begg, Rezaul K; Taylor, Simon; Palaniswami, Marimuthu

    2008-01-01

    Elderly tripping falls cost billions annually in medical funds and result in high mortality rates often perpetrated by pulmonary embolism (internal bleeding) and infected fractures that do not heal well. In this paper, we propose an intelligent gait detection system (AR-SVM) for screening elderly individuals at risk of suffering tripping falls. The motivation of this system is to provide early detection of elderly gait reminiscent of tripping characteristics so that preventive measures could be administered. Our system is composed of two stages, a predictor model estimated by an autoregressive (AR) process and a support vector machine (SVM) classifier. The system input is a digital signal constructed from consecutive measurements of minimum toe clearance (MTC) representative of steady-state walking. The AR-SVM system was tested on 23 individuals (13 healthy and 10 having suffered at least one tripping fall in the past year) who each completed a minimum of 10 min of walking on a treadmill at a self-selected pace. In the first stage, a fourth order AR model required at least 64 MTC values to correctly detect all fallers and non-fallers. Detection was further improved to less than 1 min of walking when the model coefficients were used as input features to the SVM classifier. The system achieved a detection accuracy of 95.65% with the leave one out method using only 16 MTC samples, but was reduced to 69.57% when eight MTC samples were used. These results demonstrate a fast and efficient system requiring a small number of strides and only MTC measurements for accurate detection of tripping gait characteristics.

  15. Multicategory Composite Least Squares Classifiers

    PubMed Central

    Park, Seo Young; Liu, Yufeng; Liu, Dacheng; Scholl, Paul

    2010-01-01

    Classification is a very useful statistical tool for information extraction. In particular, multicategory classification is commonly seen in various applications. Although binary classification problems are heavily studied, extensions to the multicategory case are much less so. In view of the increased complexity and volume of modern statistical problems, it is desirable to have multicategory classifiers that are able to handle problems with high dimensions and with a large number of classes. Moreover, it is necessary to have sound theoretical properties for the multicategory classifiers. In the literature, there exist several different versions of simultaneous multicategory Support Vector Machines (SVMs). However, the computation of the SVM can be difficult for large scale problems, especially for problems with large number of classes. Furthermore, the SVM cannot produce class probability estimation directly. In this article, we propose a novel efficient multicategory composite least squares classifier (CLS classifier), which utilizes a new composite squared loss function. The proposed CLS classifier has several important merits: efficient computation for problems with large number of classes, asymptotic consistency, ability to handle high dimensional data, and simple conditional class probability estimation. Our simulated and real examples demonstrate competitive performance of the proposed approach. PMID:21218128

  16. Utilizing spatial and spectral features of photoacoustic imaging for ovarian cancer detection and diagnosis

    NASA Astrophysics Data System (ADS)

    Li, Hai; Kumavor, Patrick; Salman Alqasemi, Umar; Zhu, Quing

    2015-01-01

    A composite set of ovarian tissue features extracted from photoacoustic spectral data, beam envelope, and co-registered ultrasound and photoacoustic images are used to characterize malignant and normal ovaries using logistic and support vector machine (SVM) classifiers. Normalized power spectra were calculated from the Fourier transform of the photoacoustic beamformed data, from which the spectral slopes and 0-MHz intercepts were extracted. Five features were extracted from the beam envelope and another 10 features were extracted from the photoacoustic images. These 17 features were ranked by their p-values from t-tests on which a filter type of feature selection method was used to determine the optimal feature number for final classification. A total of 169 samples from 19 ex vivo ovaries were randomly distributed into training and testing groups. Both classifiers achieved a minimum value of the mean misclassification error when the seven features with lowest p-values were selected. Using these seven features, the logistic and SVM classifiers obtained sensitivities of 96.39±3.35% and 97.82±2.26%, and specificities of 98.92±1.39% and 100%, respectively, for the training group. For the testing group, logistic and SVM classifiers achieved sensitivities of 92.71±3.55% and 92.64±3.27%, and specificities of 87.52±8.78% and 98.49±2.05%, respectively.

  17. A fast learning method for large scale and multi-class samples of SVM

    NASA Astrophysics Data System (ADS)

    Fan, Yu; Guo, Huiming

    2017-06-01

    A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.

  18. Combined texture feature analysis of segmentation and classification of benign and malignant tumour CT slices.

    PubMed

    Padma, A; Sukanesh, R

    2013-01-01

    A computer software system is designed for the segmentation and classification of benign from malignant tumour slices in brain computed tomography (CT) images. This paper presents a method to find and select both the dominant run length and co-occurrence texture features of region of interest (ROI) of the tumour region of each slice to be segmented by Fuzzy c means clustering (FCM) and evaluate the performance of support vector machine (SVM)-based classifiers in classifying benign and malignant tumour slices. Two hundred and six tumour confirmed CT slices are considered in this study. A total of 17 texture features are extracted by a feature extraction procedure, and six features are selected using Principal Component Analysis (PCA). This study constructed the SVM-based classifier with the selected features and by comparing the segmentation results with the experienced radiologist labelled ground truth (target). Quantitative analysis between ground truth and segmented tumour is presented in terms of segmentation accuracy, segmentation error and overlap similarity measures such as the Jaccard index. The classification performance of the SVM-based classifier with the same selected features is also evaluated using a 10-fold cross-validation method. The proposed system provides some newly found texture features have an important contribution in classifying benign and malignant tumour slices efficiently and accurately with less computational time. The experimental results showed that the proposed system is able to achieve the highest segmentation and classification accuracy effectiveness as measured by jaccard index and sensitivity and specificity.

  19. New models to predict depth of infiltration in endometrial carcinoma based on transvaginal sonography.

    PubMed

    De Smet, F; De Brabanter, J; Van den Bosch, T; Pochet, N; Amant, F; Van Holsbeke, C; Moerman, P; De Moor, B; Vergote, I; Timmerman, D

    2006-06-01

    Preoperative knowledge of the depth of myometrial infiltration is important in patients with endometrial carcinoma. This study aimed at assessing the value of histopathological parameters obtained from an endometrial biopsy (Pipelle de Cornier; results available preoperatively) and ultrasound measurements obtained after transvaginal sonography with color Doppler imaging in the preoperative prediction of the depth of myometrial invasion, as determined by the final histopathological examination of the hysterectomy specimen (the gold standard). We first collected ultrasound and histopathological data from 97 consecutive women with endometrial carcinoma and divided them into two groups according to surgical stage (Stages Ia and Ib vs. Stages Ic and higher). The areas (AUC) under the receiver-operating characteristics curves of the subjective assessment of depth of invasion by an experienced gynecologist and of the individual ultrasound parameters were calculated. Subsequently, we used these variables to train a logistic regression model and least squares support vector machines (LS-SVM) with linear and RBF (radial basis function) kernels. Finally, these models were validated prospectively on data from 76 new patients in order to make a preoperative prediction of the depth of invasion. Of all ultrasound parameters, the ratio of the endometrial and uterine volumes had the largest AUC (78%), while that of the subjective assessment was 79%. The AUCs of the blood flow indices were low (range, 51-64%). Stepwise logistic regression selected the degree of differentiation, the number of fibroids, the endometrial thickness and the volume of the tumor. Compared with the AUC of the subjective assessment (72%), prospective evaluation of the mathematical models resulted in a higher AUC for the LS-SVM model with an RBF kernel (77%), but this difference was not significant. Single morphological parameters do not improve the predictive power when compared with the subjective assessment of depth of myometrial invasion of endometrial cancer, and blood flow indices do not contribute to the prediction of stage. In this study an LS-SVM model with an RBF kernel gave the best prediction; while this might be more reliable than subjective assessment, confirmation by larger prospective studies is required. Copyright 2006 ISUOG. Published by John Wiley & Sons, Ltd.

  20. Comparison of Random Forest and Support Vector Machine classifiers using UAV remote sensing imagery

    NASA Astrophysics Data System (ADS)

    Piragnolo, Marco; Masiero, Andrea; Pirotti, Francesco

    2017-04-01

    Since recent years surveying with unmanned aerial vehicles (UAV) is getting a great amount of attention due to decreasing costs, higher precision and flexibility of usage. UAVs have been applied for geomorphological investigations, forestry, precision agriculture, cultural heritage assessment and for archaeological purposes. It can be used for land use and land cover classification (LULC). In literature, there are two main types of approaches for classification of remote sensing imagery: pixel-based and object-based. On one hand, pixel-based approach mostly uses training areas to define classes and respective spectral signatures. On the other hand, object-based classification considers pixels, scale, spatial information and texture information for creating homogeneous objects. Machine learning methods have been applied successfully for classification, and their use is increasing due to the availability of faster computing capabilities. The methods learn and train the model from previous computation. Two machine learning methods which have given good results in previous investigations are Random Forest (RF) and Support Vector Machine (SVM). The goal of this work is to compare RF and SVM methods for classifying LULC using images collected with a fixed wing UAV. The processing chain regarding classification uses packages in R, an open source scripting language for data analysis, which provides all necessary algorithms. The imagery was acquired and processed in November 2015 with cameras providing information over the red, blue, green and near infrared wavelength reflectivity over a testing area in the campus of Agripolis, in Italy. Images were elaborated and ortho-rectified through Agisoft Photoscan. The ortho-rectified image is the full data set, and the test set is derived from partial sub-setting of the full data set. Different tests have been carried out, using a percentage from 2 % to 20 % of the total. Ten training sets and ten validation sets are obtained from each test set. The control dataset consist of an independent visual classification done by an expert over the whole area. The classes are (i) broadleaf, (ii) building, (iii) grass, (iv) headland access path, (v) road, (vi) sowed land, (vii) vegetable. The RF and SVM are applied to the test set. The performances of the methods are evaluated using the three following accuracy metrics: Kappa index, Classification accuracy and Classification Error. All three are calculated in three different ways: with K-fold cross validation, using the validation test set and using the full test set. The analysis indicates that SVM gets better results in terms of good scores using K-fold cross or validation test set. Using the full test set, RF achieves a better result in comparison to SVM. It also seems that SVM performs better with smaller training sets, whereas RF performs better as training sets get larger.

  1. Accuracy of automated classification of major depressive disorder as a function of symptom severity.

    PubMed

    Ramasubbu, Rajamannar; Brown, Matthew R G; Cortese, Filmeno; Gaxiola, Ismael; Goodyear, Bradley; Greenshaw, Andrew J; Dursun, Serdar M; Greiner, Russell

    2016-01-01

    Growing evidence documents the potential of machine learning for developing brain based diagnostic methods for major depressive disorder (MDD). As symptom severity may influence brain activity, we investigated whether the severity of MDD affected the accuracies of machine learned MDD-vs-Control diagnostic classifiers. Forty-five medication-free patients with DSM-IV defined MDD and 19 healthy controls participated in the study. Based on depression severity as determined by the Hamilton Rating Scale for Depression (HRSD), MDD patients were sorted into three groups: mild to moderate depression (HRSD 14-19), severe depression (HRSD 20-23), and very severe depression (HRSD ≥ 24). We collected functional magnetic resonance imaging (fMRI) data during both resting-state and an emotional-face matching task. Patients in each of the three severity groups were compared against controls in separate analyses, using either the resting-state or task-based fMRI data. We use each of these six datasets with linear support vector machine (SVM) binary classifiers for identifying individuals as patients or controls. The resting-state fMRI data showed statistically significant classification accuracy only for the very severe depression group (accuracy 66%, p = 0.012 corrected), while mild to moderate (accuracy 58%, p = 1.0 corrected) and severe depression (accuracy 52%, p = 1.0 corrected) were only at chance. With task-based fMRI data, the automated classifier performed at chance in all three severity groups. Binary linear SVM classifiers achieved significant classification of very severe depression with resting-state fMRI, but the contribution of brain measurements may have limited potential in differentiating patients with less severe depression from healthy controls.

  2. Approach to explosive hazard detection using sensor fusion and multiple kernel learning with downward-looking GPR and EMI sensor data

    NASA Astrophysics Data System (ADS)

    Pinar, Anthony; Masarik, Matthew; Havens, Timothy C.; Burns, Joseph; Thelen, Brian; Becker, John

    2015-05-01

    This paper explores the effectiveness of an anomaly detection algorithm for downward-looking ground penetrating radar (GPR) and electromagnetic inductance (EMI) data. Threat detection with GPR is challenged by high responses to non-target/clutter objects, leading to a large number of false alarms (FAs), and since the responses of target and clutter signatures are so similar, classifier design is not trivial. We suggest a method based on a Run Packing (RP) algorithm to fuse GPR and EMI data into a composite confidence map to improve detection as measured by the area-under-ROC (NAUC) metric. We examine the value of a multiple kernel learning (MKL) support vector machine (SVM) classifier using image features such as histogram of oriented gradients (HOG), local binary patterns (LBP), and local statistics. Experimental results on government furnished data show that use of our proposed fusion and classification methods improves the NAUC when compared with the results from individual sensors and a single kernel SVM classifier.

  3. PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons.

    PubMed

    Long, Yi; Du, Zhi-Jiang; Wang, Wei-Dong; Zhao, Guang-Yu; Xu, Guo-Qiang; He, Long; Mao, Xi-Wang; Dong, Wei

    2016-09-02

    Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM) optimized by particle swarm optimization (PSO) to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS) attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz), a three-layer wavelet packet analysis (WPA) is used for feature extraction, after which, the kernel principal component analysis (kPCA) is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA) is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance.

  4. PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons

    PubMed Central

    Long, Yi; Du, Zhi-Jiang; Wang, Wei-Dong; Zhao, Guang-Yu; Xu, Guo-Qiang; He, Long; Mao, Xi-Wang; Dong, Wei

    2016-01-01

    Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM) optimized by particle swarm optimization (PSO) to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS) attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz), a three-layer wavelet packet analysis (WPA) is used for feature extraction, after which, the kernel principal component analysis (kPCA) is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA) is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance. PMID:27598160

  5. The Optimization of Trained and Untrained Image Classification Algorithms for Use on Large Spatial Datasets

    NASA Technical Reports Server (NTRS)

    Kocurek, Michael J.

    2005-01-01

    The HARVIST project seeks to automatically provide an accurate, interactive interface to predict crop yield over the entire United States. In order to accomplish this goal, large images must be quickly and automatically classified by crop type. Current trained and untrained classification algorithms, while accurate, are highly inefficient when operating on large datasets. This project sought to develop new variants of two standard trained and untrained classification algorithms that are optimized to take advantage of the spatial nature of image data. The first algorithm, harvist-cluster, utilizes divide-and-conquer techniques to precluster an image in the hopes of increasing overall clustering speed. The second algorithm, harvistSVM, utilizes support vector machines (SVMs), a type of trained classifier. It seeks to increase classification speed by applying a "meta-SVM" to a quick (but inaccurate) SVM to approximate a slower, yet more accurate, SVM. Speedups were achieved by tuning the algorithm to quickly identify when the quick SVM was incorrect, and then reclassifying low-confidence pixels as necessary. Comparing the classification speeds of both algorithms to known baselines showed a slight speedup for large values of k (the number of clusters) for harvist-cluster, and a significant speedup for harvistSVM. Future work aims to automate the parameter tuning process required for harvistSVM, and further improve classification accuracy and speed. Additionally, this research will move documents created in Canvas into ArcGIS. The launch of the Mars Reconnaissance Orbiter (MRO) will provide a wealth of image data such as global maps of Martian weather and high resolution global images of Mars. The ability to store this new data in a georeferenced format will support future Mars missions by providing data for landing site selection and the search for water on Mars.

  6. Automatic Cataract Hardness Classification Ex Vivo by Ultrasound Techniques.

    PubMed

    Caixinha, Miguel; Santos, Mário; Santos, Jaime

    2016-04-01

    To demonstrate the feasibility of a new methodology for cataract hardness characterization and automatic classification using ultrasound techniques, different cataract degrees were induced in 210 porcine lenses. A 25-MHz ultrasound transducer was used to obtain acoustical parameters (velocity and attenuation) and backscattering signals. B-Scan and parametric Nakagami images were constructed. Ninety-seven parameters were extracted and subjected to a Principal Component Analysis. Bayes, K-Nearest-Neighbours, Fisher Linear Discriminant and Support Vector Machine (SVM) classifiers were used to automatically classify the different cataract severities. Statistically significant increases with cataract formation were found for velocity, attenuation, mean brightness intensity of the B-Scan images and mean Nakagami m parameter (p < 0.01). The four classifiers showed a good performance for healthy versus cataractous lenses (F-measure ≥ 92.68%), while for initial versus severe cataracts the SVM classifier showed the higher performance (90.62%). The results showed that ultrasound techniques can be used for non-invasive cataract hardness characterization and automatic classification. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.

  7. Machine learning search for variable stars

    NASA Astrophysics Data System (ADS)

    Pashchenko, Ilya N.; Sokolovsky, Kirill V.; Gavras, Panagiotis

    2018-04-01

    Photometric variability detection is often considered as a hypothesis testing problem: an object is variable if the null hypothesis that its brightness is constant can be ruled out given the measurements and their uncertainties. The practical applicability of this approach is limited by uncorrected systematic errors. We propose a new variability detection technique sensitive to a wide range of variability types while being robust to outliers and underestimated measurement uncertainties. We consider variability detection as a classification problem that can be approached with machine learning. Logistic Regression (LR), Support Vector Machines (SVM), k Nearest Neighbours (kNN), Neural Nets (NN), Random Forests (RF), and Stochastic Gradient Boosting classifier (SGB) are applied to 18 features (variability indices) quantifying scatter and/or correlation between points in a light curve. We use a subset of Optical Gravitational Lensing Experiment phase two (OGLE-II) Large Magellanic Cloud (LMC) photometry (30 265 light curves) that was searched for variability using traditional methods (168 known variable objects) as the training set and then apply the NN to a new test set of 31 798 OGLE-II LMC light curves. Among 205 candidates selected in the test set, 178 are real variables, while 13 low-amplitude variables are new discoveries. The machine learning classifiers considered are found to be more efficient (select more variables and fewer false candidates) compared to traditional techniques using individual variability indices or their linear combination. The NN, SGB, SVM, and RF show a higher efficiency compared to LR and kNN.

  8. Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.

    PubMed

    Sriwastava, Brijesh K; Basu, Subhadip; Maulik, Ujjwal

    2015-01-01

    Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.

  9. Classification algorithm of ovarian tissue based on co-registered ultrasound and photoacoustic tomography

    NASA Astrophysics Data System (ADS)

    Li, Hai; Kumavor, Patrick D.; Alqasemi, Umar; Zhu, Quing

    2014-03-01

    Human ovarian tissue features extracted from photoacoustic spectra data, beam envelopes and co-registered ultrasound and photoacoustic images are used to characterize cancerous vs. normal processes using a support vector machine (SVM) classifier. The centers of suspicious tumor areas are estimated from the Gaussian fitting of the mean Radon transforms of the photoacoustic image along 0 and 90 degrees. Normalized power spectra are calculated using the Fourier transform of the photoacoustic beamformed data across these suspicious areas, where the spectral slope and 0-MHz intercepts are extracted. Image statistics, envelope histogram fitting and maximum output of 6 composite filters of cancerous or normal patterns along with other previously used features are calculated to compose a total of 17 features. These features are extracted from 169 datasets of 19 ex vivo ovaries. Half of the cancerous and normal datasets are randomly chosen to train a SVM classifier with polynomial kernel and the remainder is used for testing. With 50 times data resampling, the SVM classifier, for the training group, gives 100% sensitivity and 100% specificity. For the testing group, it gives 89.68+/- 6.37% sensitivity and 93.16+/- 3.70% specificity. These results are superior to those obtained earlier by our group using features extracted from photoacoustic raw data or image statistics only.

  10. A tri-fold hybrid classification approach for diagnostics with unexampled faulty states

    NASA Astrophysics Data System (ADS)

    Tamilselvan, Prasanna; Wang, Pingfeng

    2015-01-01

    System health diagnostics provides diversified benefits such as improved safety, improved reliability and reduced costs for the operation and maintenance of engineered systems. Successful health diagnostics requires the knowledge of system failures. However, with an increasing system complexity, it is extraordinarily difficult to have a well-tested system so that all potential faulty states can be realized and studied at product testing stage. Thus, real time health diagnostics requires automatic detection of unexampled system faulty states based upon sensory data to avoid sudden catastrophic system failures. This paper presents a trifold hybrid classification (THC) approach for structural health diagnosis with unexampled health states (UHS), which comprises of preliminary UHS identification using a new thresholded Mahalanobis distance (TMD) classifier, UHS diagnostics using a two-class support vector machine (SVM) classifier, and exampled health states diagnostics using a multi-class SVM classifier. The proposed THC approach, which takes the advantages of both TMD and SVM-based classification techniques, is able to identify and isolate the unexampled faulty states through interactively detecting the deviation of sensory data from the exampled health states and forming new ones autonomously. The proposed THC approach is further extended to a generic framework for health diagnostics problems with unexampled faulty states and demonstrated with health diagnostics case studies for power transformers and rolling bearings.

  11. Research on gesture recognition of augmented reality maintenance guiding system based on improved SVM

    NASA Astrophysics Data System (ADS)

    Zhao, Shouwei; Zhang, Yong; Zhou, Bin; Ma, Dongxi

    2014-09-01

    Interaction is one of the key techniques of augmented reality (AR) maintenance guiding system. Because of the complexity of the maintenance guiding system's image background and the high dimensionality of gesture characteristics, the whole process of gesture recognition can be divided into three stages which are gesture segmentation, gesture characteristic feature modeling and trick recognition. In segmentation stage, for solving the misrecognition of skin-like region, a segmentation algorithm combing background mode and skin color to preclude some skin-like regions is adopted. In gesture characteristic feature modeling of image attributes stage, plenty of characteristic features are analyzed and acquired, such as structure characteristics, Hu invariant moments features and Fourier descriptor. In trick recognition stage, a classifier based on Support Vector Machine (SVM) is introduced into the augmented reality maintenance guiding process. SVM is a novel learning method based on statistical learning theory, processing academic foundation and excellent learning ability, having a lot of issues in machine learning area and special advantages in dealing with small samples, non-linear pattern recognition at high dimension. The gesture recognition of augmented reality maintenance guiding system is realized by SVM after the granulation of all the characteristic features. The experimental results of the simulation of number gesture recognition and its application in augmented reality maintenance guiding system show that the real-time performance and robustness of gesture recognition of AR maintenance guiding system can be greatly enhanced by improved SVM.

  12. The identification of high potential archers based on relative psychological coping skills variables: A Support Vector Machine approach

    NASA Astrophysics Data System (ADS)

    Taha, Zahari; Muazu Musa, Rabiu; Majeed, A. P. P. Abdul; Razali Abdullah, Mohamad; Aizzat Zakaria, Muhammad; Muaz Alim, Muhammad; Arif Mat Jizat, Jessnor; Fauzi Ibrahim, Mohamad

    2018-03-01

    Support Vector Machine (SVM) has been revealed to be a powerful learning algorithm for classification and prediction. However, the use of SVM for prediction and classification in sport is at its inception. The present study classified and predicted high and low potential archers from a collection of psychological coping skills variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. Psychological coping skills inventory which evaluates the archers level of related coping skills were filled out by the archers prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models, i.e. linear and fine radial basis function (RBF) kernel functions, were trained on the psychological variables. The k-means clustered the archers into high psychologically prepared archers (HPPA) and low psychologically prepared archers (LPPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy and precision throughout the exercise with an accuracy of 92% and considerably fewer error rate for the prediction of the HPPA and the LPPA as compared to the fine RBF SVM. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected psychological coping skills variables examined which would consequently save time and energy during talent identification and development programme.

  13. Automated characterisation of ultrasound images of ovarian tumours: the diagnostic accuracy of a support vector machine and image processing with a local binary pattern operator

    PubMed Central

    Khazendar, S.; Sayasneh, A.; Al-Assam, H.; Du, H.; Kaijser, J.; Ferrara, L.; Timmerman, D.; Jassim, S.; Bourne, T.

    2015-01-01

    Introduction: Preoperative characterisation of ovarian masses into benign or malignant is of paramount importance to optimise patient management. Objectives: In this study, we developed and validated a computerised model to characterise ovarian masses as benign or malignant. Materials and methods: Transvaginal 2D B mode static ultrasound images of 187 ovarian masses with known histological diagnosis were included. Images were first pre-processed and enhanced, and Local Binary Pattern Histograms were then extracted from 2 × 2 blocks of each image. A Support Vector Machine (SVM) was trained using stratified cross validation with randomised sampling. The process was repeated 15 times and in each round 100 images were randomly selected. Results: The SVM classified the original non-treated static images as benign or malignant masses with an average accuracy of 0.62 (95% CI: 0.59-0.65). This performance significantly improved to an average accuracy of 0.77 (95% CI: 0.75-0.79) when images were pre-processed, enhanced and treated with a Local Binary Pattern operator (mean difference 0.15: 95% 0.11-0.19, p < 0.0001, two-tailed t test). Conclusion: We have shown that an SVM can classify static 2D B mode ultrasound images of ovarian masses into benign and malignant categories. The accuracy improves if texture related LBP features extracted from the images are considered. PMID:25897367

  14. Comparative Performance Analysis of Support Vector Machine, Random Forest, Logistic Regression and k-Nearest Neighbours in Rainbow Trout (Oncorhynchus Mykiss) Classification Using Image-Based Features

    PubMed Central

    Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry

    2018-01-01

    The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout (Oncorhynchus mykiss) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k-Nearest neighbours (k-NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k-NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet’s effects on fish skin. PMID:29596375

  15. Fast mental states decoding in mixed reality.

    PubMed

    De Massari, Daniele; Pacheco, Daniel; Malekshahi, Rahim; Betella, Alberto; Verschure, Paul F M J; Birbaumer, Niels; Caria, Andrea

    2014-01-01

    The combination of Brain-Computer Interface (BCI) technology, allowing online monitoring and decoding of brain activity, with virtual and mixed reality (MR) systems may help to shape and guide implicit and explicit learning using ecological scenarios. Real-time information of ongoing brain states acquired through BCI might be exploited for controlling data presentation in virtual environments. Brain states discrimination during mixed reality experience is thus critical for adapting specific data features to contingent brain activity. In this study we recorded electroencephalographic (EEG) data while participants experienced MR scenarios implemented through the eXperience Induction Machine (XIM). The XIM is a novel framework modeling the integration of a sensing system that evaluates and measures physiological and psychological states with a number of actuators and effectors that coherently reacts to the user's actions. We then assessed continuous EEG-based discrimination of spatial navigation, reading and calculation performed in MR, using linear discriminant analysis (LDA) and support vector machine (SVM) classifiers. Dynamic single trial classification showed high accuracy of LDA and SVM classifiers in detecting multiple brain states as well as in differentiating between high and low mental workload, using a 5 s time-window shifting every 200 ms. Our results indicate overall better performance of LDA with respect to SVM and suggest applicability of our approach in a BCI-controlled MR scenario. Ultimately, successful prediction of brain states might be used to drive adaptation of data representation in order to boost information processing in MR.

  16. Fast mental states decoding in mixed reality

    PubMed Central

    De Massari, Daniele; Pacheco, Daniel; Malekshahi, Rahim; Betella, Alberto; Verschure, Paul F. M. J.; Birbaumer, Niels; Caria, Andrea

    2014-01-01

    The combination of Brain-Computer Interface (BCI) technology, allowing online monitoring and decoding of brain activity, with virtual and mixed reality (MR) systems may help to shape and guide implicit and explicit learning using ecological scenarios. Real-time information of ongoing brain states acquired through BCI might be exploited for controlling data presentation in virtual environments. Brain states discrimination during mixed reality experience is thus critical for adapting specific data features to contingent brain activity. In this study we recorded electroencephalographic (EEG) data while participants experienced MR scenarios implemented through the eXperience Induction Machine (XIM). The XIM is a novel framework modeling the integration of a sensing system that evaluates and measures physiological and psychological states with a number of actuators and effectors that coherently reacts to the user's actions. We then assessed continuous EEG-based discrimination of spatial navigation, reading and calculation performed in MR, using linear discriminant analysis (LDA) and support vector machine (SVM) classifiers. Dynamic single trial classification showed high accuracy of LDA and SVM classifiers in detecting multiple brain states as well as in differentiating between high and low mental workload, using a 5 s time-window shifting every 200 ms. Our results indicate overall better performance of LDA with respect to SVM and suggest applicability of our approach in a BCI-controlled MR scenario. Ultimately, successful prediction of brain states might be used to drive adaptation of data representation in order to boost information processing in MR. PMID:25505878

  17. Comparative Performance Analysis of Support Vector Machine, Random Forest, Logistic Regression and k-Nearest Neighbours in Rainbow Trout (Oncorhynchus Mykiss) Classification Using Image-Based Features.

    PubMed

    Saberioon, Mohammadmehdi; Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry

    2018-03-29

    The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout ( Oncorhynchus mykiss ) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k -Nearest neighbours ( k -NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k -NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet's effects on fish skin.

  18. Explosive hazard detection using MIMO forward-looking ground penetrating radar

    NASA Astrophysics Data System (ADS)

    Shaw, Darren; Ho, K. C.; Stone, Kevin; Keller, James M.; Popescu, Mihail; Anderson, Derek T.; Luke, Robert H.; Burns, Brian

    2015-05-01

    This paper proposes a machine learning algorithm for subsurface object detection on multiple-input-multiple-output (MIMO) forward-looking ground-penetrating radar (FLGPR). By detecting hazards using FLGPR, standoff distances of up to tens of meters can be acquired, but this is at the degradation of performance due to high false alarm rates. The proposed system utilizes an anomaly detection prescreener to identify potential object locations. Alarm locations have multiple one-dimensional (ML) spectral features, two-dimensional (2D) spectral features, and log-Gabor statistic features extracted. The ability of these features to reduce the number of false alarms and increase the probability of detection is evaluated for both co-polarizations present in the Akela MIMO array. Classification is performed by a Support Vector Machine (SVM) with lane-based cross-validation for training and testing. Class imbalance and optimized SVM kernel parameters are considered during classifier training.

  19. Discriminative analysis of schizophrenia using support vector machine and recursive feature elimination on structural MRI images.

    PubMed

    Lu, Xiaobing; Yang, Yongzhe; Wu, Fengchun; Gao, Minjian; Xu, Yong; Zhang, Yue; Yao, Yongcheng; Du, Xin; Li, Chengwei; Wu, Lei; Zhong, Xiaomei; Zhou, Yanling; Fan, Ni; Zheng, Yingjun; Xiong, Dongsheng; Peng, Hongjun; Escudero, Javier; Huang, Biao; Li, Xiaobo; Ning, Yuping; Wu, Kai

    2016-07-01

    Structural abnormalities in schizophrenia (SZ) patients have been well documented with structural magnetic resonance imaging (MRI) data using voxel-based morphometry (VBM) and region of interest (ROI) analyses. However, these analyses can only detect group-wise differences and thus, have a poor predictive value for individuals. In the present study, we applied a machine learning method that combined support vector machine (SVM) with recursive feature elimination (RFE) to discriminate SZ patients from normal controls (NCs) using their structural MRI data. We first employed both VBM and ROI analyses to compare gray matter volume (GMV) and white matter volume (WMV) between 41 SZ patients and 42 age- and sex-matched NCs. The method of SVM combined with RFE was used to discriminate SZ patients from NCs using significant between-group differences in both GMV and WMV as input features. We found that SZ patients showed GM and WM abnormalities in several brain structures primarily involved in the emotion, memory, and visual systems. An SVM with a RFE classifier using the significant structural abnormalities identified by the VBM analysis as input features achieved the best performance (an accuracy of 88.4%, a sensitivity of 91.9%, and a specificity of 84.4%) in the discriminative analyses of SZ patients. These results suggested that distinct neuroanatomical profiles associated with SZ patients might provide a potential biomarker for disease diagnosis, and machine-learning methods can reveal neurobiological mechanisms in psychiatric diseases.

  20. Comparison of ANN and SVM for classification of eye movements in EOG signals

    NASA Astrophysics Data System (ADS)

    Qi, Lim Jia; Alias, Norma

    2018-03-01

    Nowadays, electrooculogram is regarded as one of the most important biomedical signal in measuring and analyzing eye movement patterns. Thus, it is helpful in designing EOG-based Human Computer Interface (HCI). In this research, electrooculography (EOG) data was obtained from five volunteers. The (EOG) data was then preprocessed before feature extraction methods were employed to further reduce the dimensionality of data. Three feature extraction approaches were put forward, namely statistical parameters, autoregressive (AR) coefficients using Burg method, and power spectral density (PSD) using Yule-Walker method. These features would then become input to both artificial neural network (ANN) and support vector machine (SVM). The performance of the combination of different feature extraction methods and classifiers was presented and analyzed. It was found that statistical parameters + SVM achieved the highest classification accuracy of 69.75%.

  1. Classification of burst and suppression in the neonatal electroencephalogram

    NASA Astrophysics Data System (ADS)

    Löfhede, J.; Löfgren, N.; Thordstein, M.; Flisberg, A.; Kjellmer, I.; Lindecrantz, K.

    2008-12-01

    Fisher's linear discriminant (FLD), a feed-forward artificial neural network (ANN) and a support vector machine (SVM) were compared with respect to their ability to distinguish bursts from suppressions in electroencephalograms (EEG) displaying a burst-suppression pattern. Five features extracted from the EEG were used as inputs. The study was based on EEG signals from six full-term infants who had suffered from perinatal asphyxia, and the methods have been trained with reference data classified by an experienced electroencephalographer. The results are summarized as the area under the curve (AUC), derived from receiver operating characteristic (ROC) curves for the three methods. Based on this, the SVM performs slightly better than the others. Testing the three methods with combinations of increasing numbers of the five features shows that the SVM handles the increasing amount of information better than the other methods.

  2. Automatic Recognition of Acute Myelogenous Leukemia in Blood Microscopic Images Using K-means Clustering and Support Vector Machine.

    PubMed

    Kazemi, Fatemeh; Najafabadi, Tooraj Abbasian; Araabi, Babak Nadjar

    2016-01-01

    Acute myelogenous leukemia (AML) is a subtype of acute leukemia, which is characterized by the accumulation of myeloid blasts in the bone marrow. Careful microscopic examination of stained blood smear or bone marrow aspirate is still the most significant diagnostic methodology for initial AML screening and considered as the first step toward diagnosis. It is time-consuming and due to the elusive nature of the signs and symptoms of AML; wrong diagnosis may occur by pathologists. Therefore, the need for automation of leukemia detection has arisen. In this paper, an automatic technique for identification and detection of AML and its prevalent subtypes, i.e., M2-M5 is presented. At first, microscopic images are acquired from blood smears of patients with AML and normal cases. After applying image preprocessing, color segmentation strategy is applied for segmenting white blood cells from other blood components and then discriminative features, i.e., irregularity, nucleus-cytoplasm ratio, Hausdorff dimension, shape, color, and texture features are extracted from the entire nucleus in the whole images containing multiple nuclei. Images are classified to cancerous and noncancerous images by binary support vector machine (SVM) classifier with 10-fold cross validation technique. Classifier performance is evaluated by three parameters, i.e., sensitivity, specificity, and accuracy. Cancerous images are also classified into their prevalent subtypes by multi-SVM classifier. The results show that the proposed algorithm has achieved an acceptable performance for diagnosis of AML and its common subtypes. Therefore, it can be used as an assistant diagnostic tool for pathologists.

  3. Support vector machine-based facial-expression recognition method combining shape and appearance

    NASA Astrophysics Data System (ADS)

    Han, Eun Jung; Kang, Byung Jun; Park, Kang Ryoung; Lee, Sangyoun

    2010-11-01

    Facial expression recognition can be widely used for various applications, such as emotion-based human-machine interaction, intelligent robot interfaces, face recognition robust to expression variation, etc. Previous studies have been classified as either shape- or appearance-based recognition. The shape-based method has the disadvantage that the individual variance of facial feature points exists irrespective of similar expressions, which can cause a reduction of the recognition accuracy. The appearance-based method has a limitation in that the textural information of the face is very sensitive to variations in illumination. To overcome these problems, a new facial-expression recognition method is proposed, which combines both shape and appearance information, based on the support vector machine (SVM). This research is novel in the following three ways as compared to previous works. First, the facial feature points are automatically detected by using an active appearance model. From these, the shape-based recognition is performed by using the ratios between the facial feature points based on the facial-action coding system. Second, the SVM, which is trained to recognize the same and different expression classes, is proposed to combine two matching scores obtained from the shape- and appearance-based recognitions. Finally, a single SVM is trained to discriminate four different expressions, such as neutral, a smile, anger, and a scream. By determining the expression of the input facial image whose SVM output is at a minimum, the accuracy of the expression recognition is much enhanced. The experimental results showed that the recognition accuracy of the proposed method was better than previous researches and other fusion methods.

  4. An evaluation of open set recognition for FLIR images

    NASA Astrophysics Data System (ADS)

    Scherreik, Matthew; Rigling, Brian

    2015-05-01

    Typical supervised classification algorithms label inputs according to what was learned in a training phase. Thus, test inputs that were not seen in training are always given incorrect labels. Open set recognition algorithms address this issue by accounting for inputs that are not present in training and providing the classifier with an option to reject" unknown samples. A number of such techniques have been developed in the literature, many of which are based on support vector machines (SVMs). One approach, the 1-vs-set machine, constructs a slab" in feature space using the SVM hyperplane. Inputs falling on one side of the slab or within the slab belong to a training class, while inputs falling on the far side of the slab are rejected. We note that rejection of unknown inputs can be achieved by thresholding class posterior probabilities. Another recently developed approach, the Probabilistic Open Set SVM (POS-SVM), empirically determines good probability thresholds. We apply the 1-vs-set machine, POS-SVM, and closed set SVMs to FLIR images taken from the Comanche SIG dataset. Vehicles in the dataset are divided into three general classes: wheeled, armored personnel carrier (APC), and tank. For each class, a coarse pose estimate (front, rear, left, right) is taken. In a closed set sense, we analyze these algorithms for prediction of vehicle class and pose. To test open set performance, one or more vehicle classes are held out from training. By considering closed and open set performance separately, we may closely analyze both inter-class discrimination and threshold effectiveness.

  5. Classifying Physical Morphology of Cocoa Beans Digital Images using Multiclass Ensemble Least-Squares Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Lawi, Armin; Adhitya, Yudhi

    2018-03-01

    The objective of this research is to determine the quality of cocoa beans through morphology of their digital images. Samples of cocoa beans were scattered on a bright white paper under a controlled lighting condition. A compact digital camera was used to capture the images. The images were then processed to extract their morphological parameters. Classification process begins with an analysis of cocoa beans image based on morphological feature extraction. Parameters for extraction of morphological or physical feature parameters, i.e., Area, Perimeter, Major Axis Length, Minor Axis Length, Aspect Ratio, Circularity, Roundness, Ferret Diameter. The cocoa beans are classified into 4 groups, i.e.: Normal Beans, Broken Beans, Fractured Beans, and Skin Damaged Beans. The model of classification used in this paper is the Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM), a proposed improvement model of SVM using ensemble method in which the separate hyperplanes are obtained by least square approach and the multiclass procedure uses One-Against- All method. The result of our proposed model showed that the classification with morphological feature input parameters were accurately as 99.705% for the four classes, respectively.

  6. A mechatronics platform to study prosthetic hand control using EMG signals.

    PubMed

    Geethanjali, P

    2016-09-01

    In this paper, a low-cost mechatronics platform for the design and development of robotic hands as well as a surface electromyogram (EMG) pattern recognition system is proposed. This paper also explores various EMG classification techniques using a low-cost electronics system in prosthetic hand applications. The proposed platform involves the development of a four channel EMG signal acquisition system; pattern recognition of acquired EMG signals; and development of a digital controller for a robotic hand. Four-channel surface EMG signals, acquired from ten healthy subjects for six different movements of the hand, were used to analyse pattern recognition in prosthetic hand control. Various time domain features were extracted and grouped into five ensembles to compare the influence of features in feature-selective classifiers (SLR) with widely considered non-feature-selective classifiers, such as neural networks (NN), linear discriminant analysis (LDA) and support vector machines (SVM) applied with different kernels. The results divulged that the average classification accuracy of the SVM, with a linear kernel function, outperforms other classifiers with feature ensembles, Hudgin's feature set and auto regression (AR) coefficients. However, the slight improvement in classification accuracy of SVM incurs more processing time and memory space in the low-level controller. The Kruskal-Wallis (KW) test also shows that there is no significant difference in the classification performance of SLR with Hudgin's feature set to that of SVM with Hudgin's features along with AR coefficients. In addition, the KW test shows that SLR was found to be better in respect to computation time and memory space, which is vital in a low-level controller. Similar to SVM, with a linear kernel function, other non-feature selective LDA and NN classifiers also show a slight improvement in performance using twice the features but with the drawback of increased memory space requirement and time. This prototype facilitated the study of various issues of pattern recognition and identified an efficient classifier, along with a feature ensemble, in the implementation of EMG controlled prosthetic hands in a laboratory setting at low-cost. This platform may help to motivate and facilitate prosthetic hand research in developing countries.

  7. Improving Classification of Cancer and Mining Biomarkers from Gene Expression Profiles Using Hybrid Optimization Algorithms and Fuzzy Support Vector Machine

    PubMed Central

    Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud

    2018-01-01

    Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919

  8. Accuracy comparison among different machine learning techniques for detecting malicious codes

    NASA Astrophysics Data System (ADS)

    Narang, Komal

    2016-03-01

    In this paper, a machine learning based model for malware detection is proposed. It can detect newly released malware i.e. zero day attack by analyzing operation codes on Android operating system. The accuracy of Naïve Bayes, Support Vector Machine (SVM) and Neural Network for detecting malicious code has been compared for the proposed model. In the experiment 400 benign files, 100 system files and 500 malicious files have been used to construct the model. The model yields the best accuracy 88.9% when neural network is used as classifier and achieved 95% and 82.8% accuracy for sensitivity and specificity respectively.

  9. Extended robust support vector machine based on financial risk minimization.

    PubMed

    Takeda, Akiko; Fujiwara, Shuhei; Kanamori, Takafumi

    2014-11-01

    Financial risk measures have been used recently in machine learning. For example, ν-support vector machine ν-SVM) minimizes the conditional value at risk (CVaR) of margin distribution. The measure is popular in finance because of the subadditivity property, but it is very sensitive to a few outliers in the tail of the distribution. We propose a new classification method, extended robust SVM (ER-SVM), which minimizes an intermediate risk measure between the CVaR and value at risk (VaR) by expecting that the resulting model becomes less sensitive than ν-SVM to outliers. We can regard ER-SVM as an extension of robust SVM, which uses a truncated hinge loss. Numerical experiments imply the ER-SVM's possibility of achieving a better prediction performance with proper parameter setting.

  10. Classification of Regional Ionospheric Disturbances Based on Support Vector Machines

    NASA Astrophysics Data System (ADS)

    Begüm Terzi, Merve; Arikan, Feza; Arikan, Orhan; Karatay, Secil

    2016-07-01

    Ionosphere is an anisotropic, inhomogeneous, time varying and spatio-temporally dispersive medium whose parameters can be estimated almost always by using indirect measurements. Geomagnetic, gravitational, solar or seismic activities cause variations of ionosphere at various spatial and temporal scales. This complex spatio-temporal variability is challenging to be identified due to extensive scales in period, duration, amplitude and frequency of disturbances. Since geomagnetic and solar indices such as Disturbance storm time (Dst), F10.7 solar flux, Sun Spot Number (SSN), Auroral Electrojet (AE), Kp and W-index provide information about variability on a global scale, identification and classification of regional disturbances poses a challenge. The main aim of this study is to classify the regional effects of global geomagnetic storms and classify them according to their risk levels. For this purpose, Total Electron Content (TEC) estimated from GPS receivers, which is one of the major parameters of ionosphere, will be used to model the regional and local variability that differs from global activity along with solar and geomagnetic indices. In this work, for the automated classification of the regional disturbances, a classification technique based on a robust machine learning technique that have found wide spread use, Support Vector Machine (SVM) is proposed. SVM is a supervised learning model used for classification with associated learning algorithm that analyze the data and recognize patterns. In addition to performing linear classification, SVM can efficiently perform nonlinear classification by embedding data into higher dimensional feature spaces. Performance of the developed classification technique is demonstrated for midlatitude ionosphere over Anatolia using TEC estimates generated from the GPS data provided by Turkish National Permanent GPS Network (TNPGN-Active) for solar maximum year of 2011. As a result of implementing the developed classification technique to the Global Ionospheric Map (GIM) TEC data which is provided by the NASA Jet Propulsion Laboratory (JPL), it will be shown that SVM can be a suitable learning method to detect the anomalies in Total Electron Content (TEC) variations. This study is supported by TUBITAK 114E541 project as a part of the Scientific and Technological Research Projects Funding Program (1001).

  11. Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading.

    PubMed

    Sahran, Shahnorbanun; Albashish, Dheeb; Abdullah, Azizi; Shukor, Nordashima Abd; Hayati Md Pauzi, Suria

    2018-04-18

    Feature selection (FS) methods are widely used in grading and diagnosing prostate histopathological images. In this context, FS is based on the texture features obtained from the lumen, nuclei, cytoplasm and stroma, all of which are important tissue components. However, it is difficult to represent the high-dimensional textures of these tissue components. To solve this problem, we propose a new FS method that enables the selection of features with minimal redundancy in the tissue components. We categorise tissue images based on the texture of individual tissue components via the construction of a single classifier and also construct an ensemble learning model by merging the values obtained by each classifier. Another issue that arises is overfitting due to the high-dimensional texture of individual tissue components. We propose a new FS method, SVM-RFE(AC), that integrates a Support Vector Machine-Recursive Feature Elimination (SVM-RFE) embedded procedure with an absolute cosine (AC) filter method to prevent redundancy in the selected features of the SV-RFE and an unoptimised classifier in the AC. We conducted experiments on H&E histopathological prostate and colon cancer images with respect to three prostate classifications, namely benign vs. grade 3, benign vs. grade 4 and grade 3 vs. grade 4. The colon benchmark dataset requires a distinction between grades 1 and 2, which are the most difficult cases to distinguish in the colon domain. The results obtained by both the single and ensemble classification models (which uses the product rule as its merging method) confirm that the proposed SVM-RFE(AC) is superior to the other SVM and SVM-RFE-based methods. We developed an FS method based on SVM-RFE and AC and successfully showed that its use enabled the identification of the most crucial texture feature of each tissue component. Thus, it makes possible the distinction between multiple Gleason grades (e.g. grade 3 vs. grade 4) and its performance is far superior to other reported FS methods. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Comparison of SVM, RF and ELM on an Electronic Nose for the Intelligent Evaluation of Paraffin Samples

    PubMed Central

    Men, Hong; Fu, Songlin; Yang, Jialin; Cheng, Meiqi; Shi, Yan

    2018-01-01

    Paraffin odor intensity is an important quality indicator when a paraffin inspection is performed. Currently, paraffin odor level assessment is mainly dependent on an artificial sensory evaluation. In this paper, we developed a paraffin odor analysis system to classify and grade four kinds of paraffin samples. The original feature set was optimized using Principal Component Analysis (PCA) and Partial Least Squares (PLS). Support Vector Machine (SVM), Random Forest (RF), and Extreme Learning Machine (ELM) were applied to three different feature data sets for classification and level assessment of paraffin. For classification, the model based on SVM, with an accuracy rate of 100%, was superior to that based on RF, with an accuracy rate of 98.33–100%, and ELM, with an accuracy rate of 98.01–100%. For level assessment, the R2 related to the training set was above 0.97 and the R2 related to the test set was above 0.87. Through comprehensive comparison, the generalization of the model based on ELM was superior to those based on SVM and RF. The scoring errors for the three models were 0.0016–0.3494, lower than the error of 0.5–1.0 measured by industry standard experts, meaning these methods have a higher prediction accuracy for scoring paraffin level. PMID:29346328

  13. Integrating multiple fitting regression and Bayes decision for cancer diagnosis with transcriptomic data from tumor-educated blood platelets.

    PubMed

    Huang, Guangzao; Yuan, Mingshun; Chen, Moliang; Li, Lei; You, Wenjie; Li, Hanjie; Cai, James J; Ji, Guoli

    2017-10-07

    The application of machine learning in cancer diagnostics has shown great promise and is of importance in clinic settings. Here we consider applying machine learning methods to transcriptomic data derived from tumor-educated platelets (TEPs) from individuals with different types of cancer. We aim to define a reliability measure for diagnostic purposes to increase the potential for facilitating personalized treatments. To this end, we present a novel classification method called MFRB (for Multiple Fitting Regression and Bayes decision), which integrates the process of multiple fitting regression (MFR) with Bayes decision theory. MFR is first used to map multidimensional features of the transcriptomic data into a one-dimensional feature. The probability density function of each class in the mapped space is then adjusted using the Gaussian probability density function. Finally, the Bayes decision theory is used to build a probabilistic classifier with the estimated probability density functions. The output of MFRB can be used to determine which class a sample belongs to, as well as to assign a reliability measure for a given class. The classical support vector machine (SVM) and probabilistic SVM (PSVM) are used to evaluate the performance of the proposed method with simulated and real TEP datasets. Our results indicate that the proposed MFRB method achieves the best performance compared to SVM and PSVM, mainly due to its strong generalization ability for limited, imbalanced, and noisy data.

  14. Semi-supervised manifold learning with affinity regularization for Alzheimer's disease identification using positron emission tomography imaging.

    PubMed

    Lu, Shen; Xia, Yong; Cai, Tom Weidong; Feng, David Dagan

    2015-01-01

    Dementia, Alzheimer's disease (AD) in particular is a global problem and big threat to the aging population. An image based computer-aided dementia diagnosis method is needed to providing doctors help during medical image examination. Many machine learning based dementia classification methods using medical imaging have been proposed and most of them achieve accurate results. However, most of these methods make use of supervised learning requiring fully labeled image dataset, which usually is not practical in real clinical environment. Using large amount of unlabeled images can improve the dementia classification performance. In this study we propose a new semi-supervised dementia classification method based on random manifold learning with affinity regularization. Three groups of spatial features are extracted from positron emission tomography (PET) images to construct an unsupervised random forest which is then used to regularize the manifold learning objective function. The proposed method, stat-of-the-art Laplacian support vector machine (LapSVM) and supervised SVM are applied to classify AD and normal controls (NC). The experiment results show that learning with unlabeled images indeed improves the classification performance. And our method outperforms LapSVM on the same dataset.

  15. Classification of Focal and Non Focal Epileptic Seizures Using Multi-Features and SVM Classifier.

    PubMed

    Sriraam, N; Raghu, S

    2017-09-02

    Identifying epileptogenic zones prior to surgery is an essential and crucial step in treating patients having pharmacoresistant focal epilepsy. Electroencephalogram (EEG) is a significant measurement benchmark to assess patients suffering from epilepsy. This paper investigates the application of multi-features derived from different domains to recognize the focal and non focal epileptic seizures obtained from pharmacoresistant focal epilepsy patients from Bern Barcelona database. From the dataset, five different classification tasks were formed. Total 26 features were extracted from focal and non focal EEG. Significant features were selected using Wilcoxon rank sum test by setting p-value (p < 0.05) and z-score (-1.96 > z > 1.96) at 95% significance interval. Hypothesis was made that the effect of removing outliers improves the classification accuracy. Turkey's range test was adopted for pruning outliers from feature set. Finally, 21 features were classified using optimized support vector machine (SVM) classifier with 10-fold cross validation. Bayesian optimization technique was adopted to minimize the cross-validation loss. From the simulation results, it was inferred that the highest sensitivity, specificity, and classification accuracy of 94.56%, 89.74%, and 92.15% achieved respectively and found to be better than the state-of-the-art approaches. Further, it was observed that the classification accuracy improved from 80.2% with outliers to 92.15% without outliers. The classifier performance metrics ensures the suitability of the proposed multi-features with optimized SVM classifier. It can be concluded that the proposed approach can be applied for recognition of focal EEG signals to localize epileptogenic zones.

  16. Assessing the similarity of surface linguistic features related to epilepsy across pediatric hospitals.

    PubMed

    Connolly, Brian; Matykiewicz, Pawel; Bretonnel Cohen, K; Standridge, Shannon M; Glauser, Tracy A; Dlugos, Dennis J; Koh, Susan; Tham, Eric; Pestian, John

    2014-01-01

    The constant progress in computational linguistic methods provides amazing opportunities for discovering information in clinical text and enables the clinical scientist to explore novel approaches to care. However, these new approaches need evaluation. We describe an automated system to compare descriptions of epilepsy patients at three different organizations: Cincinnati Children's Hospital, the Children's Hospital Colorado, and the Children's Hospital of Philadelphia. To our knowledge, there have been no similar previous studies. In this work, a support vector machine (SVM)-based natural language processing (NLP) algorithm is trained to classify epilepsy progress notes as belonging to a patient with a specific type of epilepsy from a particular hospital. The same SVM is then used to classify notes from another hospital. Our null hypothesis is that an NLP algorithm cannot be trained using epilepsy-specific notes from one hospital and subsequently used to classify notes from another hospital better than a random baseline classifier. The hypothesis is tested using epilepsy progress notes from the three hospitals. We are able to reject the null hypothesis at the 95% level. It is also found that classification was improved by including notes from a second hospital in the SVM training sample. With a reasonably uniform epilepsy vocabulary and an NLP-based algorithm able to use this uniformity to classify epilepsy progress notes across different hospitals, we can pursue automated comparisons of patient conditions, treatments, and diagnoses across different healthcare settings. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  17. Detection of Splice Sites Using Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Varadwaj, Pritish; Purohit, Neetesh; Arora, Bhumika

    Automatic identification and annotation of exon and intron region of gene, from DNA sequences has been an important research area in field of computational biology. Several approaches viz. Hidden Markov Model (HMM), Artificial Intelligence (AI) based machine learning and Digital Signal Processing (DSP) techniques have extensively and independently been used by various researchers to cater this challenging task. In this work, we propose a Support Vector Machine based kernel learning approach for detection of splice sites (the exon-intron boundary) in a gene. Electron-Ion Interaction Potential (EIIP) values of nucleotides have been used for mapping character sequences to corresponding numeric sequences. Radial Basis Function (RBF) SVM kernel is trained using EIIP numeric sequences. Furthermore this was tested on test gene dataset for detection of splice site by window (of 12 residues) shifting. Optimum values of window size, various important parameters of SVM kernel have been optimized for a better accuracy. Receiver Operating Characteristic (ROC) curves have been utilized for displaying the sensitivity rate of the classifier and results showed 94.82% accuracy for splice site detection on test dataset.

  18. Recognition of medication information from discharge summaries using ensembles of classifiers.

    PubMed

    Doan, Son; Collier, Nigel; Xu, Hua; Pham, Hoang Duy; Tu, Minh Phuong

    2012-05-07

    Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Recently, supervised machine learning methods have proven to be effective in clinical NLP as well. However, combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Combining classifiers into an ensemble classifier presents both challenges and opportunities to improve performance in such NLP tasks. We investigated ensemble classifiers that used different voting strategies to combine outputs from three individual classifiers: a rule-based system, a support vector machine (SVM) based system, and a conditional random field (CRF) based system. Three voting methods were proposed and evaluated using the annotated data sets from the 2009 i2b2 NLP challenge: simple majority, local SVM-based voting, and local CRF-based voting. Evaluation on 268 manually annotated discharge summaries from the i2b2 challenge showed that the local CRF-based voting method achieved the best F-score of 90.84% (94.11% Precision, 87.81% Recall) for 10-fold cross-validation. We then compared our systems with the first-ranked system in the challenge by using the same training and test sets. Our system based on majority voting achieved a better F-score of 89.65% (93.91% Precision, 85.76% Recall) than the previously reported F-score of 89.19% (93.78% Precision, 85.03% Recall) by the first-ranked system in the challenge. Our experimental results using the 2009 i2b2 challenge datasets showed that ensemble classifiers that combine individual classifiers into a voting system could achieve better performance than a single classifier in recognizing medication information from clinical text. It suggests that simple strategies that can be easily implemented such as majority voting could have the potential to significantly improve clinical entity recognition.

  19. Alzheimer Disease and Behavioral Variant Frontotemporal Dementia: Automatic Classification Based on Cortical Atrophy for Single-Subject Diagnosis.

    PubMed

    Möller, Christiane; Pijnenburg, Yolande A L; van der Flier, Wiesje M; Versteeg, Adriaan; Tijms, Betty; de Munck, Jan C; Hafkemeijer, Anne; Rombouts, Serge A R B; van der Grond, Jeroen; van Swieten, John; Dopper, Elise; Scheltens, Philip; Barkhof, Frederik; Vrenken, Hugo; Wink, Alle Meije

    2016-06-01

    Purpose To investigate the diagnostic accuracy of an image-based classifier to distinguish between Alzheimer disease (AD) and behavioral variant frontotemporal dementia (bvFTD) in individual patients by using gray matter (GM) density maps computed from standard T1-weighted structural images obtained with multiple imagers and with independent training and prediction data. Materials and Methods The local institutional review board approved the study. Eighty-four patients with AD, 51 patients with bvFTD, and 94 control subjects were divided into independent training (n = 115) and prediction (n = 114) sets with identical diagnosis and imager type distributions. Training of a support vector machine (SVM) classifier used diagnostic status and GM density maps and produced voxelwise discrimination maps. Discriminant function analysis was used to estimate suitability of the extracted weights for single-subject classification in the prediction set. Receiver operating characteristic (ROC) curves and area under the ROC curve (AUC) were calculated for image-based classifiers and neuropsychological z scores. Results Training accuracy of the SVM was 85% for patients with AD versus control subjects, 72% for patients with bvFTD versus control subjects, and 79% for patients with AD versus patients with bvFTD (P ≤ .029). Single-subject diagnosis in the prediction set when using the discrimination maps yielded accuracies of 88% for patients with AD versus control subjects, 85% for patients with bvFTD versus control subjects, and 82% for patients with AD versus patients with bvFTD, with a good to excellent AUC (range, 0.81-0.95; P ≤ .001). Machine learning-based categorization of AD versus bvFTD based on GM density maps outperforms classification based on neuropsychological test results. Conclusion The SVM can be used in single-subject discrimination and can help the clinician arrive at a diagnosis. The SVM can be used to distinguish disease-specific GM patterns in patients with AD and those with bvFTD as compared with normal aging by using common T1-weighted structural MR imaging. (©) RSNA, 2015.

  20. Effect of finite sample size on feature selection and classification: a simulation study.

    PubMed

    Way, Ted W; Sahiner, Berkman; Hadjiiski, Lubomir M; Chan, Heang-Ping

    2010-02-01

    The small number of samples available for training and testing is often the limiting factor in finding the most effective features and designing an optimal computer-aided diagnosis (CAD) system. Training on a limited set of samples introduces bias and variance in the performance of a CAD system relative to that trained with an infinite sample size. In this work, the authors conducted a simulation study to evaluate the performances of various combinations of classifiers and feature selection techniques and their dependence on the class distribution, dimensionality, and the training sample size. The understanding of these relationships will facilitate development of effective CAD systems under the constraint of limited available samples. Three feature selection techniques, the stepwise feature selection (SFS), sequential floating forward search (SFFS), and principal component analysis (PCA), and two commonly used classifiers, Fisher's linear discriminant analysis (LDA) and support vector machine (SVM), were investigated. Samples were drawn from multidimensional feature spaces of multivariate Gaussian distributions with equal or unequal covariance matrices and unequal means, and with equal covariance matrices and unequal means estimated from a clinical data set. Classifier performance was quantified by the area under the receiver operating characteristic curve Az. The mean Az values obtained by resubstitution and hold-out methods were evaluated for training sample sizes ranging from 15 to 100 per class. The number of simulated features available for selection was chosen to be 50, 100, and 200. It was found that the relative performance of the different combinations of classifier and feature selection method depends on the feature space distributions, the dimensionality, and the available training sample sizes. The LDA and SVM with radial kernel performed similarly for most of the conditions evaluated in this study, although the SVM classifier showed a slightly higher hold-out performance than LDA for some conditions and vice versa for other conditions. PCA was comparable to or better than SFS and SFFS for LDA at small samples sizes, but inferior for SVM with polynomial kernel. For the class distributions simulated from clinical data, PCA did not show advantages over the other two feature selection methods. Under this condition, the SVM with radial kernel performed better than the LDA when few training samples were available, while LDA performed better when a large number of training samples were available. None of the investigated feature selection-classifier combinations provided consistently superior performance under the studied conditions for different sample sizes and feature space distributions. In general, the SFFS method was comparable to the SFS method while PCA may have an advantage for Gaussian feature spaces with unequal covariance matrices. The performance of the SVM with radial kernel was better than, or comparable to, that of the SVM with polynomial kernel under most conditions studied.

  1. Deep learning of support vector machines with class probability output networks.

    PubMed

    Kim, Sangwook; Yu, Zhibin; Kil, Rhee Man; Lee, Minho

    2015-04-01

    Deep learning methods endeavor to learn features automatically at multiple levels and allow systems to learn complex functions mapping from the input space to the output space for the given data. The ability to learn powerful features automatically is increasingly important as the volume of data and range of applications of machine learning methods continues to grow. This paper proposes a new deep architecture that uses support vector machines (SVMs) with class probability output networks (CPONs) to provide better generalization power for pattern classification problems. As a result, deep features are extracted without additional feature engineering steps, using multiple layers of the SVM classifiers with CPONs. The proposed structure closely approaches the ideal Bayes classifier as the number of layers increases. Using a simulation of classification problems, the effectiveness of the proposed method is demonstrated. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. Automatic epileptic seizure detection in EEGs using MF-DFA, SVM based on cloud computing.

    PubMed

    Zhang, Zhongnan; Wen, Tingxi; Huang, Wei; Wang, Meihong; Li, Chunfeng

    2017-01-01

    Epilepsy is a chronic disease with transient brain dysfunction that results from the sudden abnormal discharge of neurons in the brain. Since electroencephalogram (EEG) is a harmless and noninvasive detection method, it plays an important role in the detection of neurological diseases. However, the process of analyzing EEG to detect neurological diseases is often difficult because the brain electrical signals are random, non-stationary and nonlinear. In order to overcome such difficulty, this study aims to develop a new computer-aided scheme for automatic epileptic seizure detection in EEGs based on multi-fractal detrended fluctuation analysis (MF-DFA) and support vector machine (SVM). New scheme first extracts features from EEG by MF-DFA during the first stage. Then, the scheme applies a genetic algorithm (GA) to calculate parameters used in SVM and classify the training data according to the selected features using SVM. Finally, the trained SVM classifier is exploited to detect neurological diseases. The algorithm utilizes MLlib from library of SPARK and runs on cloud platform. Applying to a public dataset for experiment, the study results show that the new feature extraction method and scheme can detect signals with less features and the accuracy of the classification reached up to 99%. MF-DFA is a promising approach to extract features for analyzing EEG, because of its simple algorithm procedure and less parameters. The features obtained by MF-DFA can represent samples as well as traditional wavelet transform and Lyapunov exponents. GA can always find useful parameters for SVM with enough execution time. The results illustrate that the classification model can achieve comparable accuracy, which means that it is effective in epileptic seizure detection.

  3. Performance comparison of classifiers for differentiation among obstructive lung diseases based on features of texture analysis at HRCT

    NASA Astrophysics Data System (ADS)

    Lee, Youngjoo; Seo, Joon Beom; Kang, Bokyoung; Kim, Dongil; Lee, June Goo; Kim, Song Soo; Kim, Namkug; Kang, Suk Ho

    2007-03-01

    The performance of classification algorithms for differentiating among obstructive lung diseases based on features from texture analysis using HRCT (High Resolution Computerized Tomography) images was compared. HRCT can provide accurate information for the detection of various obstructive lung diseases, including centrilobular emphysema, panlobular emphysema and bronchiolitis obliterans. Features on HRCT images can be subtle, however, particularly in the early stages of disease, and image-based diagnosis is subject to inter-observer variation. To automate the diagnosis and improve the accuracy, we compared three types of automated classification systems, naÃve Bayesian classifier, ANN (Artificial Neural Net) and SVM (Support Vector Machine), based on their ability to differentiate among normal lung and three types of obstructive lung diseases. To assess the performance and cross-validation of these three classifiers, 5 folding methods with 5 randomly chosen groups were used. For a more robust result, each validation was repeated 100 times. SVM showed the best performance, with 86.5% overall sensitivity, significantly different from the other classifiers (one way ANOVA, p<0.01). We address the characteristics of each classifier affecting performance and the issue of which classifier is the most suitable for clinical applications, and propose an appropriate method to choose the best classifier and determine its optimal parameters for optimal disease discrimination. These results can be applied to classifiers for differentiation of other diseases.

  4. MISR Level 2 TOA/Cloud Classifier parameters (MIL2TCCL_V3)

    NASA Technical Reports Server (NTRS)

    Diner, David J. (Principal Investigator)

    The TOA/Cloud Classifiers contain the Angular Signature Cloud Mask (ASCM), a scene classifier calculated using support vector machine technology (SVM) both of which are on a 1.1 km grid, and cloud fractions at 17.6 km resolution that are available in different height bins (low, middle, high) and are also calculated on an angle-by-angle basis. [Temporal_Coverage: Start_Date=2000-02-24; Stop_Date=] [Spatial_Coverage: Southernmost_Latitude=-90; Northernmost_Latitude=90; Westernmost_Longitude=-180; Easternmost_Longitude=180] [Data_Resolution: Latitude_Resolution=1.1 km; Longitude_Resolution=1.1 km; Temporal_Resolution=about 15 orbits/day].

  5. Evaluating the High Risk Groups for Suicide: A Comparison of Logistic Regression, Support Vector Machine, Decision Tree and Artificial Neural Network

    PubMed Central

    AMINI, Payam; AHMADINIA, Hasan; POOROLAJAL, Jalal; MOQADDASI AMIRI, Mohammad

    2016-01-01

    Background: We aimed to assess the high-risk group for suicide using different classification methods includinglogistic regression (LR), decision tree (DT), artificial neural network (ANN), and support vector machine (SVM). Methods: We used the dataset of a study conducted to predict risk factors of completed suicide in Hamadan Province, the west of Iran, in 2010. To evaluate the high-risk groups for suicide, LR, SVM, DT and ANN were performed. The applied methods were compared using sensitivity, specificity, positive predicted value, negative predicted value, accuracy and the area under curve. Cochran-Q test was implied to check differences in proportion among methods. To assess the association between the observed and predicted values, Ø coefficient, contingency coefficient, and Kendall tau-b were calculated. Results: Gender, age, and job were the most important risk factors for fatal suicide attempts in common for four methods. SVM method showed the highest accuracy 0.68 and 0.67 for training and testing sample, respectively. However, this method resulted in the highest specificity (0.67 for training and 0.68 for testing sample) and the highest sensitivity for training sample (0.85), but the lowest sensitivity for the testing sample (0.53). Cochran-Q test resulted in differences between proportions in different methods (P<0.001). The association of SVM predictions and observed values, Ø coefficient, contingency coefficient, and Kendall tau-b were 0.239, 0.232 and 0.239, respectively. Conclusion: SVM had the best performance to classify fatal suicide attempts comparing to DT, LR and ANN. PMID:27957463

  6. The combination of a histogram-based clustering algorithm and support vector machine for the diagnosis of osteoporosis.

    PubMed

    Kavitha, Muthu Subash; Asano, Akira; Taguchi, Akira; Heo, Min-Suk

    2013-09-01

    To prevent low bone mineral density (BMD), that is, osteoporosis, in postmenopausal women, it is essential to diagnose osteoporosis more precisely. This study presented an automatic approach utilizing a histogram-based automatic clustering (HAC) algorithm with a support vector machine (SVM) to analyse dental panoramic radiographs (DPRs) and thus improve diagnostic accuracy by identifying postmenopausal women with low BMD or osteoporosis. We integrated our newly-proposed histogram-based automatic clustering (HAC) algorithm with our previously-designed computer-aided diagnosis system. The extracted moment-based features (mean, variance, skewness, and kurtosis) of the mandibular cortical width for the radial basis function (RBF) SVM classifier were employed. We also compared the diagnostic efficacy of the SVM model with the back propagation (BP) neural network model. In this study, DPRs and BMD measurements of 100 postmenopausal women patients (aged >50 years), with no previous record of osteoporosis, were randomly selected for inclusion. The accuracy, sensitivity, and specificity of the BMD measurements using our HAC-SVM model to identify women with low BMD were 93.0% (88.0%-98.0%), 95.8% (91.9%-99.7%) and 86.6% (79.9%-93.3%), respectively, at the lumbar spine; and 89.0% (82.9%-95.1%), 96.0% (92.2%-99.8%) and 84.0% (76.8%-91.2%), respectively, at the femoral neck. Our experimental results predict that the proposed HAC-SVM model combination applied on DPRs could be useful to assist dentists in early diagnosis and help to reduce the morbidity and mortality associated with low BMD and osteoporosis.

  7. A machine learning approach to galaxy-LSS classification - I. Imprints on halo merger trees

    NASA Astrophysics Data System (ADS)

    Hui, Jianan; Aragon, Miguel; Cui, Xinping; Flegal, James M.

    2018-04-01

    The cosmic web plays a major role in the formation and evolution of galaxies and defines, to a large extent, their properties. However, the relation between galaxies and environment is still not well understood. Here, we present a machine learning approach to study imprints of environmental effects on the mass assembly of haloes. We present a galaxy-LSS machine learning classifier based on galaxy properties sensitive to the environment. We then use the classifier to assess the relevance of each property. Correlations between galaxy properties and their cosmic environment can be used to predict galaxy membership to void/wall or filament/cluster with an accuracy of 93 per cent. Our study unveils environmental information encoded in properties of haloes not normally considered directly dependent on the cosmic environment such as merger history and complexity. Understanding the physical mechanism by which the cosmic web is imprinted in a halo can lead to significant improvements in galaxy formation models. This is accomplished by extracting features from galaxy properties and merger trees, computing feature scores for each feature and then applying support vector machine (SVM) to different feature sets. To this end, we have discovered that the shape and depth of the merger tree, formation time, and density of the galaxy are strongly associated with the cosmic environment. We describe a significant improvement in the original classification algorithm by performing LU decomposition of the distance matrix computed by the feature vectors and then using the output of the decomposition as input vectors for SVM.

  8. SU-C-BRA-05: Delineating High-Dose Clinical Target Volumes for Head and Neck Tumors Using Machine Learning Algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cardenas, C; The University of Texas Graduate School of Biomedical Sciences, Houston, TX; Wong, A

    Purpose: To develop and test population-based machine learning algorithms for delineating high-dose clinical target volumes (CTVs) in H&N tumors. Automating and standardizing the contouring of CTVs can reduce both physician contouring time and inter-physician variability, which is one of the largest sources of uncertainty in H&N radiotherapy. Methods: Twenty-five node-negative patients treated with definitive radiotherapy were selected (6 right base of tongue, 11 left and 9 right tonsil). All patients had GTV and CTVs manually contoured by an experienced radiation oncologist prior to treatment. This contouring process, which is driven by anatomical, pathological, and patient specific information, typically results inmore » non-uniform margin expansions about the GTV. Therefore, we tested two methods to delineate high-dose CTV given a manually-contoured GTV: (1) regression-support vector machines(SVM) and (2) classification-SVM. These models were trained and tested on each patient group using leave-one-out cross-validation. The volume difference(VD) and Dice similarity coefficient(DSC) between the manual and auto-contoured CTV were calculated to evaluate the results. Distances from GTV-to-CTV were computed about each patient’s GTV and these distances, in addition to distances from GTV to surrounding anatomy in the expansion direction, were utilized in the regression-SVM method. The classification-SVM method used categorical voxel-information (GTV, selected anatomical structures, else) from a 3×3×3cm3 ROI centered about the voxel to classify voxels as CTV. Results: Volumes for the auto-contoured CTVs ranged from 17.1 to 149.1cc and 17.4 to 151.9cc; the average(range) VD between manual and auto-contoured CTV were 0.93 (0.48–1.59) and 1.16(0.48–1.97); while average(range) DSC values were 0.75(0.59–0.88) and 0.74(0.59–0.81) for the regression-SVM and classification-SVM methods, respectively. Conclusion: We developed two novel machine learning methods to delineate high-dose CTV for H&N patients. Both methods showed promising results that hint to a solution to the standardization of the contouring process of clinical target volumes. Varian Medical Systems grant.« less

  9. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis.

    PubMed

    You, Zhu-Hong; Lei, Ying-Ke; Zhu, Lin; Xia, Junfeng; Wang, Bing

    2013-01-01

    Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.

  10. Raman spectroscopy and PCA-SVM as a non-invasive diagnostic tool to identify and classify qualitatively glycated hemoglobin levels in vivo.

    PubMed

    Villa-Manríquez, J F; Castro-Ramos, J; Gutiérrez-Delgado, F; Lopéz-Pacheco, M A; Villanueva-Luna, A E

    2017-08-01

    In this study we identify and classify high and low levels of glycated hemoglobin (HbA1c) in healthy volunteers (HV) and diabetic patients (DP). Overall, 86 subjects were evaluated. The Raman spectrum was measured in three anatomical regions of the body: index fingertip, right ear lobe, and forehead. The measurements were performed to compare the difference between the HV and DP (22 well controlled diabetic patients (WCDP) (HbA1c <6.5%), and 49 not controlled diabetic patients (NCDP) (HbA1c ≥6.5%)). Multivariable methods such as principal components analysis (PCA) combined with support vector machine (SVM) were used to develop effective diagnostic algorithms for classification among these groups. The forehead of HV versus WCDP showed the highest sensitivity (100%) and specificity (100%). Sensitivity (100%) and specificity (60%), were highest in the forehead of WCDP, versus NCDP. In HV versus NCDP, the fingertip had the highest sensitivity (100%) and specificity (80%). The efficacy of the diagnostic algorithm by receiver operating characteristic (ROC) curve was confirmed. Overall, our study demonstrated that the combination of Raman spectroscopy and PCA-SVM are feasible non-invasive diagnostic tool in diabetes to classify qualitatively high and low levels of HbA1c in vivo. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Derivation and validation of different machine-learning models in mortality prediction of trauma in motorcycle riders: a cross-sectional retrospective study in southern Taiwan

    PubMed Central

    Kuo, Pao-Jen; Wu, Shao-Chun; Chien, Peng-Chen; Rau, Cheng-Shyuan; Chen, Yi-Chun; Hsieh, Hsiao-Yun; Hsieh, Ching-Hua

    2018-01-01

    Objectives This study aimed to build and test the models of machine learning (ML) to predict the mortality of hospitalised motorcycle riders. Setting The study was conducted in a level-1 trauma centre in southern Taiwan. Participants Motorcycle riders who were hospitalised between January 2009 and December 2015 were classified into a training set (n=6306) and test set (n=946). Using the demographic information, injury characteristics and laboratory data of patients, logistic regression (LR), support vector machine (SVM) and decision tree (DT) analyses were performed to determine the mortality of individual motorcycle riders, under different conditions, using all samples or reduced samples, as well as all variables or selected features in the algorithm. Primary and secondary outcome measures The predictive performance of the model was evaluated based on accuracy, sensitivity, specificity and geometric mean, and an analysis of the area under the receiver operating characteristic curves of the two different models was carried out. Results In the training set, both LR and SVM had a significantly higher area under the receiver operating characteristic curve (AUC) than DT. No significant difference was observed in the AUC of LR and SVM, regardless of whether all samples or reduced samples and whether all variables or selected features were used. In the test set, the performance of the SVM model for all samples with selected features was better than that of all other models, with an accuracy of 98.73%, sensitivity of 86.96%, specificity of 99.02%, geometric mean of 92.79% and AUC of 0.9517, in mortality prediction. Conclusion ML can provide a feasible level of accuracy in predicting the mortality of motorcycle riders. Integration of the ML model, particularly the SVM algorithm in the trauma system, may help identify high-risk patients and, therefore, guide appropriate interventions by the clinical staff. PMID:29306885

  12. GAPscreener: An automatic tool for screening human genetic association literature in PubMed using the support vector machine technique

    PubMed Central

    Yu, Wei; Clyne, Melinda; Dolan, Siobhan M; Yesupriya, Ajay; Wulf, Anja; Liu, Tiebin; Khoury, Muin J; Gwinn, Marta

    2008-01-01

    Background Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. Results The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. Conclusion GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge. PMID:18430222

  13. Reconstruction of hyperspectral image using matting model for classification

    NASA Astrophysics Data System (ADS)

    Xie, Weiying; Li, Yunsong; Ge, Chiru

    2016-05-01

    Although hyperspectral images (HSIs) captured by satellites provide much information in spectral regions, some bands are redundant or have large amounts of noise, which are not suitable for image analysis. To address this problem, we introduce a method for reconstructing the HSI with noise reduction and contrast enhancement using a matting model for the first time. The matting model refers to each spectral band of an HSI that can be decomposed into three components, i.e., alpha channel, spectral foreground, and spectral background. First, one spectral band of an HSI with more refined information than most other bands is selected, and is referred to as an alpha channel of the HSI to estimate the hyperspectral foreground and hyperspectral background. Finally, a combination operation is applied to reconstruct the HSI. In addition, the support vector machine (SVM) classifier and three sparsity-based classifiers, i.e., orthogonal matching pursuit (OMP), simultaneous OMP, and OMP based on first-order neighborhood system weighted classifiers, are utilized on the reconstructed HSI and the original HSI to verify the effectiveness of the proposed method. Specifically, using the reconstructed HSI, the average accuracy of the SVM classifier can be improved by as much as 19%.

  14. Training set extension for SVM ensemble in P300-speller with familiar face paradigm.

    PubMed

    Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou

    2018-03-27

    P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.

  15. Incremental classification learning for anomaly detection in medical images

    NASA Astrophysics Data System (ADS)

    Giritharan, Balathasan; Yuan, Xiaohui; Liu, Jianguo

    2009-02-01

    Computer-aided diagnosis usually screens thousands of instances to find only a few positive cases that indicate probable presence of disease.The amount of patient data increases consistently all the time. In diagnosis of new instances, disagreement occurs between a CAD system and physicians, which suggests inaccurate classifiers. Intuitively, misclassified instances and the previously acquired data should be used to retrain the classifier. This, however, is very time consuming and, in some cases where dataset is too large, becomes infeasible. In addition, among the patient data, only a small percentile shows positive sign, which is known as imbalanced data.We present an incremental Support Vector Machines(SVM) as a solution for the class imbalance problem in classification of anomaly in medical images. The support vectors provide a concise representation of the distribution of the training data. Here we use bootstrapping to identify potential candidate support vectors for future iterations. Experiments were conducted using images from endoscopy videos, and the sensitivity and specificity were close to that of SVM trained using all samples available at a given incremental step with significantly improved efficiency in training the classifier.

  16. Automated diagnosis of epilepsy using CWT, HOS and texture parameters.

    PubMed

    Acharya, U Rajendra; Yanti, Ratna; Zheng, Jia Wei; Krishnan, M Muthu Rama; Tan, Jen Hong; Martis, Roshan Joy; Lim, Choo Min

    2013-06-01

    Epilepsy is a chronic brain disorder which manifests as recurrent seizures. Electroencephalogram (EEG) signals are generally analyzed to study the characteristics of epileptic seizures. In this work, we propose a method for the automated classification of EEG signals into normal, interictal and ictal classes using Continuous Wavelet Transform (CWT), Higher Order Spectra (HOS) and textures. First the CWT plot was obtained for the EEG signals and then the HOS and texture features were extracted from these plots. Then the statistically significant features were fed to four classifiers namely Decision Tree (DT), K-Nearest Neighbor (KNN), Probabilistic Neural Network (PNN) and Support Vector Machine (SVM) to select the best classifier. We observed that the SVM classifier with Radial Basis Function (RBF) kernel function yielded the best results with an average accuracy of 96%, average sensitivity of 96.9% and average specificity of 97% for 23.6 s duration of EEG data. Our proposed technique can be used as an automatic seizure monitoring software. It can also assist the doctors to cross check the efficacy of their prescribed drugs.

  17. A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data.

    PubMed

    Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat

    2016-12-22

    The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.

  18. LMD Based Features for the Automatic Seizure Detection of EEG Signals Using SVM.

    PubMed

    Zhang, Tao; Chen, Wanzhong

    2017-08-01

    Achieving the goal of detecting seizure activity automatically using electroencephalogram (EEG) signals is of great importance and significance for the treatment of epileptic seizures. To realize this aim, a newly-developed time-frequency analytical algorithm, namely local mean decomposition (LMD), is employed in the presented study. LMD is able to decompose an arbitrary signal into a series of product functions (PFs). Primarily, the raw EEG signal is decomposed into several PFs, and then the temporal statistical and non-linear features of the first five PFs are calculated. The features of each PF are fed into five classifiers, including back propagation neural network (BPNN), K-nearest neighbor (KNN), linear discriminant analysis (LDA), un-optimized support vector machine (SVM) and SVM optimized by genetic algorithm (GA-SVM), for five classification cases, respectively. Confluent features of all PFs and raw EEG are further passed into the high-performance GA-SVM for the same classification tasks. Experimental results on the international public Bonn epilepsy EEG dataset show that the average classification accuracy of the presented approach are equal to or higher than 98.10% in all the five cases, and this indicates the effectiveness of the proposed approach for automated seizure detection.

  19. A Tool for Creating Regionally Calibrated High-Resolution Land Cover Data Sets for the West African Sahel: Using Machine Learning to Scale Up Hand-Classified Maps in a Data-Sparse Environment

    NASA Astrophysics Data System (ADS)

    Van Gordon, M.; Van Gordon, S.; Min, A.; Sullivan, J.; Weiner, Z.; Tappan, G. G.

    2017-12-01

    Using support vector machine (SVM) learning and high-accuracy hand-classified maps, we have developed a publicly available land cover classification tool for the West African Sahel. Our classifier produces high-resolution and regionally calibrated land cover maps for the Sahel, representing a significant contribution to the data available for this region. Global land cover products are unreliable for the Sahel, and accurate land cover data for the region are sparse. To address this gap, the U.S. Geological Survey and the Regional Center for Agriculture, Hydrology and Meteorology (AGRHYMET) in Niger produced high-quality land cover maps for the region via hand-classification of Landsat images. This method produces highly accurate maps, but the time and labor required constrain the spatial and temporal resolution of the data products. By using these hand-classified maps alongside SVM techniques, we successfully increase the resolution of the land cover maps by 1-2 orders of magnitude, from 2km-decadal resolution to 30m-annual resolution. These high-resolution regionally calibrated land cover datasets, along with the classifier we developed to produce them, lay the foundation for major advances in studies of land surface processes in the region. These datasets will provide more accurate inputs for food security modeling, hydrologic modeling, analyses of land cover change and climate change adaptation efforts. The land cover classification tool we have developed will be publicly available for use in creating additional West Africa land cover datasets with future remote sensing data and can be adapted for use in other parts of the world.

  20. Automated classification of neurological disorders of gait using spatio-temporal gait parameters.

    PubMed

    Pradhan, Cauchy; Wuehr, Max; Akrami, Farhoud; Neuhaeusser, Maximilian; Huth, Sabrina; Brandt, Thomas; Jahn, Klaus; Schniepp, Roman

    2015-04-01

    Automated pattern recognition systems have been used for accurate identification of neurological conditions as well as the evaluation of the treatment outcomes. This study aims to determine the accuracy of diagnoses of (oto-)neurological gait disorders using different types of automated pattern recognition techniques. Clinically confirmed cases of phobic postural vertigo (N = 30), cerebellar ataxia (N = 30), progressive supranuclear palsy (N = 30), bilateral vestibulopathy (N = 30), as well as healthy subjects (N = 30) were recruited for the study. 8 measurements with 136 variables using a GAITRite(®) sensor carpet were obtained from each subject. Subjects were randomly divided into two groups (training cases and validation cases). Sensitivity and specificity of k-nearest neighbor (KNN), naive-bayes classifier (NB), artificial neural network (ANN), and support vector machine (SVM) in classifying the validation cases were calculated. ANN and SVM had the highest overall sensitivity with 90.6% and 92.0% respectively, followed by NB (76.0%) and KNN (73.3%). SVM and ANN showed high false negative rates for bilateral vestibulopathy cases (20.0% and 26.0%); while KNN and NB had high false negative rates for progressive supranuclear palsy cases (76.7% and 40.0%). Automated pattern recognition systems are able to identify pathological gait patterns and establish clinical diagnosis with good accuracy. SVM and ANN in particular differentiate gait patterns of several distinct oto-neurological disorders of gait with high sensitivity and specificity compared to KNN and NB. Both SVM and ANN appear to be a reliable diagnostic and management tool for disorders of gait. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Multi-feature classifiers for burst detection in single EEG channels from preterm infants

    NASA Astrophysics Data System (ADS)

    Navarro, X.; Porée, F.; Kuchenbuch, M.; Chavez, M.; Beuchée, Alain; Carrault, G.

    2017-08-01

    Objective. The study of electroencephalographic (EEG) bursts in preterm infants provides valuable information about maturation or prognostication after perinatal asphyxia. Over the last two decades, a number of works proposed algorithms to automatically detect EEG bursts in preterm infants, but they were designed for populations under 35 weeks of post menstrual age (PMA). However, as the brain activity evolves rapidly during postnatal life, these solutions might be under-performing with increasing PMA. In this work we focused on preterm infants reaching term ages (PMA  ⩾36 weeks) using multi-feature classification on a single EEG channel. Approach. Five EEG burst detectors relying on different machine learning approaches were compared: logistic regression (LR), linear discriminant analysis (LDA), k-nearest neighbors (kNN), support vector machines (SVM) and thresholding (Th). Classifiers were trained by visually labeled EEG recordings from 14 very preterm infants (born after 28 weeks of gestation) with 36-41 weeks PMA. Main results. The most performing classifiers reached about 95% accuracy (kNN, SVM and LR) whereas Th obtained 84%. Compared to human-automatic agreements, LR provided the highest scores (Cohen’s kappa  =  0.71) using only three EEG features. Applying this classifier in an unlabeled database of 21 infants  ⩾36 weeks PMA, we found that long EEG bursts and short inter-burst periods are characteristic of infants with the highest PMA and weights. Significance. In view of these results, LR-based burst detection could be a suitable tool to study maturation in monitoring or portable devices using a single EEG channel.

  2. Research on bearing fault diagnosis of large machinery based on mathematical morphology

    NASA Astrophysics Data System (ADS)

    Wang, Yu

    2018-04-01

    To study the automatic diagnosis of large machinery fault based on support vector machine, combining the four common faults of the large machinery, the support vector machine is used to classify and identify the fault. The extracted feature vectors are entered. The feature vector is trained and identified by multi - classification method. The optimal parameters of the support vector machine are searched by trial and error method and cross validation method. Then, the support vector machine is compared with BP neural network. The results show that the support vector machines are short in time and high in classification accuracy. It is more suitable for the research of fault diagnosis in large machinery. Therefore, it can be concluded that the training speed of support vector machines (SVM) is fast and the performance is good.

  3. Impact of corpus domain for sentiment classification: An evaluation study using supervised machine learning techniques

    NASA Astrophysics Data System (ADS)

    Karsi, Redouane; Zaim, Mounia; El Alami, Jamila

    2017-07-01

    Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.

  4. Bias in error estimation when using cross-validation for model selection.

    PubMed

    Varma, Sudhir; Simon, Richard

    2006-02-23

    Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these "null" datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With "null" and "non null" (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the "null" datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training data-sets. For SVM with optimal parameters the estimated error rate was less than 30% on 38% of "null" data-sets. Performance of the optimized classifiers on the independent test set was no better than chance. The nested CV procedure reduces the bias considerably and gives an estimate of the error that is very close to that obtained on the independent testing set for both Shrunken Centroids and SVM classifiers for "null" and "non-null" data distributions. We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.

  5. Application of texture analysis method for mammogram density classification

    NASA Astrophysics Data System (ADS)

    Nithya, R.; Santhi, B.

    2017-07-01

    Mammographic density is considered a major risk factor for developing breast cancer. This paper proposes an automated approach to classify breast tissue types in digital mammogram. The main objective of the proposed Computer-Aided Diagnosis (CAD) system is to investigate various feature extraction methods and classifiers to improve the diagnostic accuracy in mammogram density classification. Texture analysis methods are used to extract the features from the mammogram. Texture features are extracted by using histogram, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Difference Matrix (GLDM), Local Binary Pattern (LBP), Entropy, Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT), Gabor transform and trace transform. These extracted features are selected using Analysis of Variance (ANOVA). The features selected by ANOVA are fed into the classifiers to characterize the mammogram into two-class (fatty/dense) and three-class (fatty/glandular/dense) breast density classification. This work has been carried out by using the mini-Mammographic Image Analysis Society (MIAS) database. Five classifiers are employed namely, Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). Experimental results show that ANN provides better performance than LDA, NB, KNN and SVM classifiers. The proposed methodology has achieved 97.5% accuracy for three-class and 99.37% for two-class density classification.

  6. Automated diagnosis of dry eye using infrared thermography images

    NASA Astrophysics Data System (ADS)

    Acharya, U. Rajendra; Tan, Jen Hong; Koh, Joel E. W.; Sudarshan, Vidya K.; Yeo, Sharon; Too, Cheah Loon; Chua, Chua Kuang; Ng, E. Y. K.; Tong, Louis

    2015-07-01

    Dry Eye (DE) is a condition of either decreased tear production or increased tear film evaporation. Prolonged DE damages the cornea causing the corneal scarring, thinning and perforation. There is no single uniform diagnosis test available to date; combinations of diagnostic tests are to be performed to diagnose DE. The current diagnostic methods available are subjective, uncomfortable and invasive. Hence in this paper, we have developed an efficient, fast and non-invasive technique for the automated identification of normal and DE classes using infrared thermography images. The features are extracted from nonlinear method called Higher Order Spectra (HOS). Features are ranked using t-test ranking strategy. These ranked features are fed to various classifiers namely, K-Nearest Neighbor (KNN), Nave Bayesian Classifier (NBC), Decision Tree (DT), Probabilistic Neural Network (PNN), and Support Vector Machine (SVM) to select the best classifier using minimum number of features. Our proposed system is able to identify the DE and normal classes automatically with classification accuracy of 99.8%, sensitivity of 99.8%, and specificity if 99.8% for left eye using PNN and KNN classifiers. And we have reported classification accuracy of 99.8%, sensitivity of 99.9%, and specificity if 99.4% for right eye using SVM classifier with polynomial order 2 kernel.

  7. Development of a Support Vector Machine - Based Image Analysis System for Focal Liver Lesions Classification in Magnetic Resonance Images

    NASA Astrophysics Data System (ADS)

    Gatos, I.; Tsantis, S.; Karamesini, M.; Skouroliakou, A.; Kagadis, G.

    2015-09-01

    Purpose: The design and implementation of a computer-based image analysis system employing the support vector machine (SVM) classifier system for the classification of Focal Liver Lesions (FLLs) on routine non-enhanced, T2-weighted Magnetic Resonance (MR) images. Materials and Methods: The study comprised 92 patients; each one of them has undergone MRI performed on a Magnetom Concerto (Siemens). Typical signs on dynamic contrast-enhanced MRI and biopsies were employed towards a three class categorization of the 92 cases: 40-benign FLLs, 25-Hepatocellular Carcinomas (HCC) within Cirrhotic liver parenchyma and 27-liver metastases from Non-Cirrhotic liver. Prior to FLLs classification an automated lesion segmentation algorithm based on Marcov Random Fields was employed in order to acquire each FLL Region of Interest. 42 texture features derived from the gray-level histogram, co-occurrence and run-length matrices and 12 morphological features were obtained from each lesion. Stepwise multi-linear regression analysis was utilized to avoid feature redundancy leading to a feature subset that fed the multiclass SVM classifier designed for lesion classification. SVM System evaluation was performed by means of leave-one-out method and ROC analysis. Results: Maximum accuracy for all three classes (90.0%) was obtained by means of the Radial Basis Kernel Function and three textural features (Inverse- Different-Moment, Sum-Variance and Long-Run-Emphasis) that describe lesion's contrast, variability and shape complexity. Sensitivity values for the three classes were 92.5%, 81.5% and 96.2% respectively, whereas specificity values were 94.2%, 95.3% and 95.5%. The AUC value achieved for the selected subset was 0.89 with 0.81 - 0.94 confidence interval. Conclusion: The proposed SVM system exhibit promising results that could be utilized as a second opinion tool to the radiologist in order to decrease the time/cost of diagnosis and the need for patients to undergo invasive examination.

  8. An investigation of the usability of sound recognition for source separation of packaging wastes in reverse vending machines.

    PubMed

    Korucu, M Kemal; Kaplan, Özgür; Büyük, Osman; Güllü, M Kemal

    2016-10-01

    In this study, we investigate the usability of sound recognition for source separation of packaging wastes in reverse vending machines (RVMs). For this purpose, an experimental setup equipped with a sound recording mechanism was prepared. Packaging waste sounds generated by three physical impacts such as free falling, pneumatic hitting and hydraulic crushing were separately recorded using two different microphones. To classify the waste types and sizes based on sound features of the wastes, a support vector machine (SVM) and a hidden Markov model (HMM) based sound classification systems were developed. In the basic experimental setup in which only free falling impact type was considered, SVM and HMM systems provided 100% classification accuracy for both microphones. In the expanded experimental setup which includes all three impact types, material type classification accuracies were 96.5% for dynamic microphone and 97.7% for condenser microphone. When both the material type and the size of the wastes were classified, the accuracy was 88.6% for the microphones. The modeling studies indicated that hydraulic crushing impact type recordings were very noisy for an effective sound recognition application. In the detailed analysis of the recognition errors, it was observed that most of the errors occurred in the hitting impact type. According to the experimental results, it can be said that the proposed novel approach for the separation of packaging wastes could provide a high classification performance for RVMs. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Automatic Feature Selection and Improved Classification in SICADA Counterfeit Electronics Detection

    DTIC Science & Technology

    2017-03-20

    The SICADA methodology was developed to detect such counterfeit microelectronics by collecting power side channel data and applying machine learning...to identify counterfeits. This methodology has been extended to include a two-step automated feature selection process and now uses a one-class SVM...classifier. We describe this methodology and show results for empirical data collected from several types of Microchip dsPIC33F microcontrollers

  10. Natural Language Processing Based Instrument for Classification of Free Text Medical Records

    PubMed Central

    2016-01-01

    According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful. PMID:27668260

  11. Automated analysis of individual sperm cells using stain-free interferometric phase microscopy and machine learning.

    PubMed

    Mirsky, Simcha K; Barnea, Itay; Levi, Mattan; Greenspan, Hayit; Shaked, Natan T

    2017-09-01

    Currently, the delicate process of selecting sperm cells to be used for in vitro fertilization (IVF) is still based on the subjective, qualitative analysis of experienced clinicians using non-quantitative optical microscopy techniques. In this work, a method was developed for the automated analysis of sperm cells based on the quantitative phase maps acquired through use of interferometric phase microscopy (IPM). Over 1,400 human sperm cells from 8 donors were imaged using IPM, and an algorithm was designed to digitally isolate sperm cell heads from the quantitative phase maps while taking into consideration both the cell 3D morphology and contents, as well as acquire features describing sperm head morphology. A subset of these features was used to train a support vector machine (SVM) classifier to automatically classify sperm of good and bad morphology. The SVM achieves an area under the receiver operating characteristic curve of 88.59% and an area under the precision-recall curve of 88.67%, as well as precisions of 90% or higher. We believe that our automatic analysis can become the basis for objective and automatic sperm cell selection in IVF. © 2017 International Society for Advancement of Cytometry. © 2017 International Society for Advancement of Cytometry.

  12. Plant microRNA-Target Interaction Identification Model Based on the Integration of Prediction Tools and Support Vector Machine

    PubMed Central

    Meng, Jun; Shi, Lin; Luan, Yushi

    2014-01-01

    Background Confident identification of microRNA-target interactions is significant for studying the function of microRNA (miRNA). Although some computational miRNA target prediction methods have been proposed for plants, results of various methods tend to be inconsistent and usually lead to more false positive. To address these issues, we developed an integrated model for identifying plant miRNA–target interactions. Results Three online miRNA target prediction toolkits and machine learning algorithms were integrated to identify and analyze Arabidopsis thaliana miRNA-target interactions. Principle component analysis (PCA) feature extraction and self-training technology were introduced to improve the performance. Results showed that the proposed model outperformed the previously existing methods. The results were validated by using degradome sequencing supported Arabidopsis thaliana miRNA-target interactions. The proposed model constructed on Arabidopsis thaliana was run over Oryza sativa and Vitis vinifera to demonstrate that our model is effective for other plant species. Conclusions The integrated model of online predictors and local PCA-SVM classifier gained credible and high quality miRNA-target interactions. The supervised learning algorithm of PCA-SVM classifier was employed in plant miRNA target identification for the first time. Its performance can be substantially improved if more experimentally proved training samples are provided. PMID:25051153

  13. A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease

    NASA Astrophysics Data System (ADS)

    Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas

    2017-08-01

    The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.

  14. Pedestrian detection in crowded scenes with the histogram of gradients principle

    NASA Astrophysics Data System (ADS)

    Sidla, O.; Rosner, M.; Lypetskyy, Y.

    2006-10-01

    This paper describes a close to real-time scale invariant implementation of a pedestrian detector system which is based on the Histogram of Oriented Gradients (HOG) principle. Salient HOG features are first selected from a manually created very large database of samples with an evolutionary optimization procedure that directly trains a polynomial Support Vector Machine (SVM). Real-time operation is achieved by a cascaded 2-step classifier which uses first a very fast linear SVM (with the same features as the polynomial SVM) to reject most of the irrelevant detections and then computes the decision function with a polynomial SVM on the remaining set of candidate detections. Scale invariance is achieved by running the detector of constant size on scaled versions of the original input images and by clustering the results over all resolutions. The pedestrian detection system has been implemented in two versions: i) fully body detection, and ii) upper body only detection. The latter is especially suited for very busy and crowded scenarios. On a state-of-the-art PC it is able to run at a frequency of 8 - 20 frames/sec.

  15. Online image classification under monotonic decision boundary constraint

    NASA Astrophysics Data System (ADS)

    Lu, Cheng; Allebach, Jan; Wagner, Jerry; Pitta, Brandi; Larson, David; Guo, Yandong

    2015-01-01

    Image classification is a prerequisite for copy quality enhancement in all-in-one (AIO) device that comprises a printer and scanner, and which can be used to scan, copy and print. Different processing pipelines are provided in an AIO printer. Each of the processing pipelines is designed specifically for one type of input image to achieve the optimal output image quality. A typical approach to this problem is to apply Support Vector Machine to classify the input image and feed it to its corresponding processing pipeline. The online training SVM can help users to improve the performance of classification as input images accumulate. At the same time, we want to make quick decision on the input image to speed up the classification which means sometimes the AIO device does not need to scan the entire image to make a final decision. These two constraints, online SVM and quick decision, raise questions regarding: 1) what features are suitable for classification; 2) how we should control the decision boundary in online SVM training. This paper will discuss the compatibility of online SVM and quick decision capability.

  16. Fault diagnosis method based on FFT-RPCA-SVM for Cascaded-Multilevel Inverter.

    PubMed

    Wang, Tianzhen; Qi, Jie; Xu, Hao; Wang, Yide; Liu, Lei; Gao, Diju

    2016-01-01

    Thanks to reduced switch stress, high quality of load wave, easy packaging and good extensibility, the cascaded H-bridge multilevel inverter is widely used in wind power system. To guarantee stable operation of system, a new fault diagnosis method, based on Fast Fourier Transform (FFT), Relative Principle Component Analysis (RPCA) and Support Vector Machine (SVM), is proposed for H-bridge multilevel inverter. To avoid the influence of load variation on fault diagnosis, the output voltages of the inverter is chosen as the fault characteristic signals. To shorten the time of diagnosis and improve the diagnostic accuracy, the main features of the fault characteristic signals are extracted by FFT. To further reduce the training time of SVM, the feature vector is reduced based on RPCA that can get a lower dimensional feature space. The fault classifier is constructed via SVM. An experimental prototype of the inverter is built to test the proposed method. Compared to other fault diagnosis methods, the experimental results demonstrate the high accuracy and efficiency of the proposed method. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.

  17. Multispectral image analysis for object recognition and classification

    NASA Astrophysics Data System (ADS)

    Viau, C. R.; Payeur, P.; Cretu, A.-M.

    2016-05-01

    Computer and machine vision applications are used in numerous fields to analyze static and dynamic imagery in order to assist or automate decision-making processes. Advancements in sensor technologies now make it possible to capture and visualize imagery at various wavelengths (or bands) of the electromagnetic spectrum. Multispectral imaging has countless applications in various fields including (but not limited to) security, defense, space, medical, manufacturing and archeology. The development of advanced algorithms to process and extract salient information from the imagery is a critical component of the overall system performance. The fundamental objective of this research project was to investigate the benefits of combining imagery from the visual and thermal bands of the electromagnetic spectrum to improve the recognition rates and accuracy of commonly found objects in an office setting. A multispectral dataset (visual and thermal) was captured and features from the visual and thermal images were extracted and used to train support vector machine (SVM) classifiers. The SVM's class prediction ability was evaluated separately on the visual, thermal and multispectral testing datasets.

  18. Optical biopsy using fluorescence spectroscopy for prostate cancer diagnosis

    NASA Astrophysics Data System (ADS)

    Wu, Binlin; Gao, Xin; Smith, Jason; Bailin, Jacob

    2017-02-01

    Native fluorescence spectra are acquired from fresh normal and cancerous human prostate tissues. The fluorescence data are analyzed using a multivariate analysis algorithm such as non-negative matrix factorization. The nonnegative spectral components are retrieved and attributed to the native fluorophores such as collagen, reduced nicotinamide adenine dinucleotide (NADH), and flavin adenine dinucleotide (FAD) in tissue. The retrieved weights of the components, e.g. NADH and FAD are used to estimate the relative concentrations of the native fluorophores and the redox ratio. A machine learning algorithm such as support vector machine (SVM) is used for classification to distinguish normal and cancerous tissue samples based on either the relative concentrations of NADH and FAD or the redox ratio alone. The classification performance is shown based on statistical measures such as sensitivity, specificity, and accuracy, along with the area under receiver operating characteristic (ROC) curve. A cross validation method such as leave-one-out is used to evaluate the predictive performance of the SVM classifier to avoid bias due to overfitting.

  19. Classification Algorithms for Big Data Analysis, a Map Reduce Approach

    NASA Astrophysics Data System (ADS)

    Ayma, V. A.; Ferreira, R. S.; Happ, P.; Oliveira, D.; Feitosa, R.; Costa, G.; Plaza, A.; Gamba, P.

    2015-03-01

    Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.

  20. Lidar detection of underwater objects using a neuro-SVM-based architecture.

    PubMed

    Mitra, Vikramjit; Wang, Chia-Jiu; Banerjee, Satarupa

    2006-05-01

    This paper presents a neural network architecture using a support vector machine (SVM) as an inference engine (IE) for classification of light detection and ranging (Lidar) data. Lidar data gives a sequence of laser backscatter intensities obtained from laser shots generated from an airborne object at various altitudes above the earth surface. Lidar data is pre-filtered to remove high frequency noise. As the Lidar shots are taken from above the earth surface, it has some air backscatter information, which is of no importance for detecting underwater objects. Because of these, the air backscatter information is eliminated from the data and a segment of this data is subsequently selected to extract features for classification. This is then encoded using linear predictive coding (LPC) and polynomial approximation. The coefficients thus generated are used as inputs to the two branches of a parallel neural architecture. The decisions obtained from the two branches are vector multiplied and the result is fed to an SVM-based IE that presents the final inference. Two parallel neural architectures using multilayer perception (MLP) and hybrid radial basis function (HRBF) are considered in this paper. The proposed structure fits the Lidar data classification task well due to the inherent classification efficiency of neural networks and accurate decision-making capability of SVM. A Bayesian classifier and a quadratic classifier were considered for the Lidar data classification task but they failed to offer high prediction accuracy. Furthermore, a single-layered artificial neural network (ANN) classifier was also considered and it failed to offer good accuracy. The parallel ANN architecture proposed in this paper offers high prediction accuracy (98.9%) and is found to be the most suitable architecture for the proposed task of Lidar data classification.

  1. Combination of mass spectrometry-based targeted lipidomics and supervised machine learning algorithms in detecting adulterated admixtures of white rice.

    PubMed

    Lim, Dong Kyu; Long, Nguyen Phuoc; Mo, Changyeun; Dong, Ziyuan; Cui, Lingmei; Kim, Giyoung; Kwon, Sung Won

    2017-10-01

    The mixing of extraneous ingredients with original products is a common adulteration practice in food and herbal medicines. In particular, authenticity of white rice and its corresponding blended products has become a key issue in food industry. Accordingly, our current study aimed to develop and evaluate a novel discrimination method by combining targeted lipidomics with powerful supervised learning methods, and eventually introduce a platform to verify the authenticity of white rice. A total of 30 cultivars were collected, and 330 representative samples of white rice from Korea and China as well as seven mixing ratios were examined. Random forests (RF), support vector machines (SVM) with a radial basis function kernel, C5.0, model averaged neural network, and k-nearest neighbor classifiers were used for the classification. We achieved desired results, and the classifiers effectively differentiated white rice from Korea to blended samples with high prediction accuracy for the contamination ratio as low as five percent. In addition, RF and SVM classifiers were generally superior to and more robust than the other techniques. Our approach demonstrated that the relative differences in lysoGPLs can be successfully utilized to detect the adulterated mixing of white rice originating from different countries. In conclusion, the present study introduces a novel and high-throughput platform that can be applied to authenticate adulterated admixtures from original white rice samples. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. A general prediction model for the detection of ADHD and Autism using structural and functional MRI.

    PubMed

    Sen, Bhaskar; Borle, Neil C; Greiner, Russell; Brown, Matthew R G

    2018-01-01

    This work presents a novel method for learning a model that can diagnose Attention Deficit Hyperactivity Disorder (ADHD), as well as Autism, using structural texture and functional connectivity features obtained from 3-dimensional structural magnetic resonance imaging (MRI) and 4-dimensional resting-state functional magnetic resonance imaging (fMRI) scans of subjects. We explore a series of three learners: (1) The LeFMS learner first extracts features from the structural MRI images using the texture-based filters produced by a sparse autoencoder. These filters are then convolved with the original MRI image using an unsupervised convolutional network. The resulting features are used as input to a linear support vector machine (SVM) classifier. (2) The LeFMF learner produces a diagnostic model by first computing spatial non-stationary independent components of the fMRI scans, which it uses to decompose each subject's fMRI scan into the time courses of these common spatial components. These features can then be used with a learner by themselves or in combination with other features to produce the model. Regardless of which approach is used, the final set of features are input to a linear support vector machine (SVM) classifier. (3) Finally, the overall LeFMSF learner uses the combined features obtained from the two feature extraction processes in (1) and (2) above as input to an SVM classifier, achieving an accuracy of 0.673 on the ADHD-200 holdout data and 0.643 on the ABIDE holdout data. Both of these results, obtained with the same LeFMSF framework, are the best known, over all hold-out accuracies on these datasets when only using imaging data-exceeding previously-published results by 0.012 for ADHD and 0.042 for Autism. Our results show that combining multi-modal features can yield good classification accuracy for diagnosis of ADHD and Autism, which is an important step towards computer-aided diagnosis of these psychiatric diseases and perhaps others as well.

  3. Using support vector machines with tract-based spatial statistics for automated classification of Tourette syndrome children

    NASA Astrophysics Data System (ADS)

    Wen, Hongwei; Liu, Yue; Wang, Jieqiong; Zhang, Jishui; Peng, Yun; He, Huiguang

    2016-03-01

    Tourette syndrome (TS) is a developmental neuropsychiatric disorder with the cardinal symptoms of motor and vocal tics which emerges in early childhood and fluctuates in severity in later years. To date, the neural basis of TS is not fully understood yet and TS has a long-term prognosis that is difficult to accurately estimate. Few studies have looked at the potential of using diffusion tensor imaging (DTI) in conjunction with machine learning algorithms in order to automate the classification of healthy children and TS children. Here we apply Tract-Based Spatial Statistics (TBSS) method to 44 TS children and 48 age and gender matched healthy children in order to extract the diffusion values from each voxel in the white matter (WM) skeleton, and a feature selection algorithm (ReliefF) was used to select the most salient voxels for subsequent classification with support vector machine (SVM). We use a nested cross validation to yield an unbiased assessment of the classification method and prevent overestimation. The accuracy (88.04%), sensitivity (88.64%) and specificity (87.50%) were achieved in our method as peak performance of the SVM classifier was achieved using the axial diffusion (AD) metric, demonstrating the potential of a joint TBSS and SVM pipeline for fast, objective classification of healthy and TS children. These results support that our methods may be useful for the early identification of subjects with TS, and hold promise for predicting prognosis and treatment outcome for individuals with TS.

  4. Supervised machine learning-based classification scheme to segment the brainstem on MRI in multicenter brain tumor treatment context.

    PubMed

    Dolz, Jose; Laprie, Anne; Ken, Soléakhéna; Leroy, Henri-Arthur; Reyns, Nicolas; Massoptier, Laurent; Vermandel, Maximilien

    2016-01-01

    To constrain the risk of severe toxicity in radiotherapy and radiosurgery, precise volume delineation of organs at risk is required. This task is still manually performed, which is time-consuming and prone to observer variability. To address these issues, and as alternative to atlas-based segmentation methods, machine learning techniques, such as support vector machines (SVM), have been recently presented to segment subcortical structures on magnetic resonance images (MRI). SVM is proposed to segment the brainstem on MRI in multicenter brain cancer context. A dataset composed by 14 adult brain MRI scans is used to evaluate its performance. In addition to spatial and probabilistic information, five different image intensity values (IIVs) configurations are evaluated as features to train the SVM classifier. Segmentation accuracy is evaluated by computing the Dice similarity coefficient (DSC), absolute volumes difference (AVD) and percentage volume difference between automatic and manual contours. Mean DSC for all proposed IIVs configurations ranged from 0.89 to 0.90. Mean AVD values were below 1.5 cm(3), where the value for best performing IIVs configuration was 0.85 cm(3), representing an absolute mean difference of 3.99% with respect to the manual segmented volumes. Results suggest consistent volume estimation and high spatial similarity with respect to expert delineations. The proposed approach outperformed presented methods to segment the brainstem, not only in volume similarity metrics, but also in segmentation time. Preliminary results showed that the approach might be promising for adoption in clinical use.

  5. Target attribute-based false alarm rejection in small infrared target detection

    NASA Astrophysics Data System (ADS)

    Kim, Sungho

    2012-11-01

    Infrared search and track is an important research area in military applications. Although there are a lot of works on small infrared target detection methods, we cannot apply them in real field due to high false alarm rate caused by clutters. This paper presents a novel target attribute extraction and machine learning-based target discrimination method. Eight kinds of target features are extracted and analyzed statistically. Learning-based classifiers such as SVM and Adaboost are developed and compared with conventional classifiers for real infrared images. In addition, the generalization capability is also inspected for various infrared clutters.

  6. The epidural needle guidance with an intelligent and automatic identification system for epidural anesthesia

    NASA Astrophysics Data System (ADS)

    Kao, Meng-Chun; Ting, Chien-Kun; Kuo, Wen-Chuan

    2018-02-01

    Incorrect placement of the needle causes medical complications in the epidural block, such as dural puncture or spinal cord injury. This study proposes a system which combines an optical coherence tomography (OCT) imaging probe with an automatic identification (AI) system to objectively identify the position of the epidural needle tip. The automatic identification system uses three features as image parameters to distinguish the different tissue by three classifiers. Finally, we found that the support vector machine (SVM) classifier has highest accuracy, specificity, and sensitivity, which reached to 95%, 98%, and 92%, respectively.

  7. Women with previous fragility fractures can be classified based on bone microarchitecture and finite element analysis measured with HR-pQCT.

    PubMed

    Nishiyama, K K; Macdonald, H M; Hanley, D A; Boyd, S K

    2013-05-01

    High-resolution peripheral quantitative computed tomography (HR-pQCT) measurements of distal radius and tibia bone microarchitecture and finite element (FE) estimates of bone strength performed well at classifying postmenopausal women with and without previous fracture. The HR-pQCT measurements outperformed dual energy x-ray absorptiometry (DXA) at classifying forearm fractures and fractures at other skeletal sites. Areal bone mineral density (aBMD) is the primary measurement used to assess osteoporosis and fracture risk; however, it does not take into account bone microarchitecture, which also contributes to bone strength. Thus, our objective was to determine if bone microarchitecture measured with HR-pQCT and FE estimates of bone strength could classify women with and without low-trauma fractures. We used HR-pQCT to assess bone microarchitecture at the distal radius and tibia in 44 postmenopausal women with a history of low-trauma fracture and 88 age-matched controls from the Calgary cohort of the Canadian Multicentre Osteoporosis Study (CaMos) study. We estimated bone strength using FE analysis and simulated distal radius aBMD from the HR-pQCT scans. Femoral neck (FN) and lumbar spine (LS) aBMD were measured with DXA. We used support vector machines (SVM) and a tenfold cross-validation to classify the fracture cases and controls and to determine accuracy. The combination of HR-pQCT measures of microarchitecture and FE estimates of bone strength had the highest area under the receiver operating characteristic (ROC) curve of 0.82 when classifying forearm fractures compared to an area under the curve (AUC) of 0.71 from DXA-derived aBMD of the forearm and 0.63 from FN and spine DXA. For all fracture types, FE estimates of bone strength at the forearm alone resulted in an AUC of 0.69. Models based on HR-pQCT measurements of bone microarchitecture and estimates of bone strength performed better than DXA-derived aBMD at classifying women with and without prior fracture. In future, these models may improve prediction of individuals at risk of low-trauma fracture.

  8. MapReduce SVM Game

    DOE PAGES

    Vineyard, Craig M.; Verzi, Stephen J.; James, Conrad D.; ...

    2015-08-10

    Despite technological advances making computing devices faster, smaller, and more prevalent in today's age, data generation and collection has outpaced data processing capabilities. Simply having more compute platforms does not provide a means of addressing challenging problems in the big data era. Rather, alternative processing approaches are needed and the application of machine learning to big data is hugely important. The MapReduce programming paradigm is an alternative to conventional supercomputing approaches, and requires less stringent data passing constrained problem decompositions. Rather, MapReduce relies upon defining a means of partitioning the desired problem so that subsets may be computed independently andmore » recom- bined to yield the net desired result. However, not all machine learning algorithms are amenable to such an approach. Game-theoretic algorithms are often innately distributed, consisting of local interactions between players without requiring a central authority and are iterative by nature rather than requiring extensive retraining. Effectively, a game-theoretic approach to machine learning is well suited for the MapReduce paradigm and provides a novel, alternative new perspective to addressing the big data problem. In this paper we present a variant of our Support Vector Machine (SVM) Game classifier which may be used in a distributed manner, and show an illustrative example of applying this algorithm.« less

  9. MapReduce SVM Game

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vineyard, Craig M.; Verzi, Stephen J.; James, Conrad D.

    Despite technological advances making computing devices faster, smaller, and more prevalent in today's age, data generation and collection has outpaced data processing capabilities. Simply having more compute platforms does not provide a means of addressing challenging problems in the big data era. Rather, alternative processing approaches are needed and the application of machine learning to big data is hugely important. The MapReduce programming paradigm is an alternative to conventional supercomputing approaches, and requires less stringent data passing constrained problem decompositions. Rather, MapReduce relies upon defining a means of partitioning the desired problem so that subsets may be computed independently andmore » recom- bined to yield the net desired result. However, not all machine learning algorithms are amenable to such an approach. Game-theoretic algorithms are often innately distributed, consisting of local interactions between players without requiring a central authority and are iterative by nature rather than requiring extensive retraining. Effectively, a game-theoretic approach to machine learning is well suited for the MapReduce paradigm and provides a novel, alternative new perspective to addressing the big data problem. In this paper we present a variant of our Support Vector Machine (SVM) Game classifier which may be used in a distributed manner, and show an illustrative example of applying this algorithm.« less

  10. Tissue multifractality and hidden Markov model based integrated framework for optimum precancer detection

    NASA Astrophysics Data System (ADS)

    Mukhopadhyay, Sabyasachi; Das, Nandan K.; Kurmi, Indrajit; Pradhan, Asima; Ghosh, Nirmalya; Panigrahi, Prasanta K.

    2017-10-01

    We report the application of a hidden Markov model (HMM) on multifractal tissue optical properties derived via the Born approximation-based inverse light scattering method for effective discrimination of precancerous human cervical tissue sites from the normal ones. Two global fractal parameters, generalized Hurst exponent and the corresponding singularity spectrum width, computed by multifractal detrended fluctuation analysis (MFDFA), are used here as potential biomarkers. We develop a methodology that makes use of these multifractal parameters by integrating with different statistical classifiers like the HMM and support vector machine (SVM). It is shown that the MFDFA-HMM integrated model achieves significantly better discrimination between normal and different grades of cancer as compared to the MFDFA-SVM integrated model.

  11. Classifying epileptic EEG signals with delay permutation entropy and Multi-Scale K-means.

    PubMed

    Zhu, Guohun; Li, Yan; Wen, Peng Paul; Wang, Shuaifang

    2015-01-01

    Most epileptic EEG classification algorithms are supervised and require large training datasets, that hinder their use in real time applications. This chapter proposes an unsupervised Multi-Scale K-means (MSK-means) MSK-means algorithm to distinguish epileptic EEG signals and identify epileptic zones. The random initialization of the K-means algorithm can lead to wrong clusters. Based on the characteristics of EEGs, the MSK-means MSK-means algorithm initializes the coarse-scale centroid of a cluster with a suitable scale factor. In this chapter, the MSK-means algorithm is proved theoretically superior to the K-means algorithm on efficiency. In addition, three classifiers: the K-means, MSK-means MSK-means and support vector machine (SVM), are used to identify seizure and localize epileptogenic zone using delay permutation entropy features. The experimental results demonstrate that identifying seizure with the MSK-means algorithm and delay permutation entropy achieves 4. 7 % higher accuracy than that of K-means, and 0. 7 % higher accuracy than that of the SVM.

  12. Electrophysiological resting-state biomarker for diagnosing mesial temporal lobe epilepsy with hippocampal sclerosis.

    PubMed

    Jin, Seung-Hyun; Chung, Chun Kee

    2017-01-01

    The main aim of the present study was to evaluate whether resting-state functional connectivity of magnetoencephalography (MEG) signals can differentiate patients with mesial temporal lobe epilepsy (MTLE) from healthy controls (HC) and can differentiate between right and left MTLE as a diagnostic biomarker. To this end, a support vector machine (SVM) method among various machine learning algorithms was employed. We compared resting-state functional networks between 46 MTLE (right MTLE=23; left MTLE=23) patients with histologically proven HS who were free of seizure after surgery, and 46 HC. The optimal SVM group classifier distinguished MTLE patients with a mean accuracy of 95.1% (sensitivity=95.8%; specificity=94.3%). Increased connectivity including the right posterior cingulate gyrus and decreased connectivity including at least one sensory-related resting-state network were key features reflecting the differences between MTLE patients and HC. The optimal SVM model distinguished between right and left MTLE patients with a mean accuracy of 76.2% (sensitivity=76.0%; specificity=76.5%). We showed the potential of electrophysiological resting-state functional connectivity, which reflects brain network reorganization in MTLE patients, as a possible diagnostic biomarker to differentiate MTLE patients from HC and differentiate between right and left MTLE patients. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Analyzing Kernel Matrices for the Identification of Differentially Expressed Genes

    PubMed Central

    Xia, Xiao-Lei; Xing, Huanlai; Liu, Xueqin

    2013-01-01

    One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''the separability of a sample'' which is estimated by performing -like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method's essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate. PMID:24349110

  14. Compound Structure-Independent Activity Prediction in High-Dimensional Target Space.

    PubMed

    Balfer, Jenny; Hu, Ye; Bajorath, Jürgen

    2014-08-01

    Profiling of compound libraries against arrays of targets has become an important approach in pharmaceutical research. The prediction of multi-target compound activities also represents an attractive task for machine learning with potential for drug discovery applications. Herein, we have explored activity prediction in high-dimensional target space. Different types of models were derived to predict multi-target activities. The models included naïve Bayesian (NB) and support vector machine (SVM) classifiers based upon compound structure information and NB models derived on the basis of activity profiles, without considering compound structure. Because the latter approach can be applied to incomplete training data and principally depends on the feature independence assumption, SVM modeling was not applicable in this case. Furthermore, iterative hybrid NB models making use of both activity profiles and compound structure information were built. In high-dimensional target space, NB models utilizing activity profile data were found to yield more accurate activity predictions than structure-based NB and SVM models or hybrid models. An in-depth analysis of activity profile-based models revealed the presence of correlation effects across different targets and rationalized prediction accuracy. Taken together, the results indicate that activity profile information can be effectively used to predict the activity of test compounds against novel targets. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Mechanical Fault Diagnosis of High Voltage Circuit Breakers Based on Variational Mode Decomposition and Multi-Layer Classifier.

    PubMed

    Huang, Nantian; Chen, Huaijin; Cai, Guowei; Fang, Lihua; Wang, Yuqiang

    2016-11-10

    Mechanical fault diagnosis of high-voltage circuit breakers (HVCBs) based on vibration signal analysis is one of the most significant issues in improving the reliability and reducing the outage cost for power systems. The limitation of training samples and types of machine faults in HVCBs causes the existing mechanical fault diagnostic methods to recognize new types of machine faults easily without training samples as either a normal condition or a wrong fault type. A new mechanical fault diagnosis method for HVCBs based on variational mode decomposition (VMD) and multi-layer classifier (MLC) is proposed to improve the accuracy of fault diagnosis. First, HVCB vibration signals during operation are measured using an acceleration sensor. Second, a VMD algorithm is used to decompose the vibration signals into several intrinsic mode functions (IMFs). The IMF matrix is divided into submatrices to compute the local singular values (LSV). The maximum singular values of each submatrix are selected as the feature vectors for fault diagnosis. Finally, a MLC composed of two one-class support vector machines (OCSVMs) and a support vector machine (SVM) is constructed to identify the fault type. Two layers of independent OCSVM are adopted to distinguish normal or fault conditions with known or unknown fault types, respectively. On this basis, SVM recognizes the specific fault type. Real diagnostic experiments are conducted with a real SF₆ HVCB with normal and fault states. Three different faults (i.e., jam fault of the iron core, looseness of the base screw, and poor lubrication of the connecting lever) are simulated in a field experiment on a real HVCB to test the feasibility of the proposed method. Results show that the classification accuracy of the new method is superior to other traditional methods.

  16. Mechanical Fault Diagnosis of High Voltage Circuit Breakers Based on Variational Mode Decomposition and Multi-Layer Classifier

    PubMed Central

    Huang, Nantian; Chen, Huaijin; Cai, Guowei; Fang, Lihua; Wang, Yuqiang

    2016-01-01

    Mechanical fault diagnosis of high-voltage circuit breakers (HVCBs) based on vibration signal analysis is one of the most significant issues in improving the reliability and reducing the outage cost for power systems. The limitation of training samples and types of machine faults in HVCBs causes the existing mechanical fault diagnostic methods to recognize new types of machine faults easily without training samples as either a normal condition or a wrong fault type. A new mechanical fault diagnosis method for HVCBs based on variational mode decomposition (VMD) and multi-layer classifier (MLC) is proposed to improve the accuracy of fault diagnosis. First, HVCB vibration signals during operation are measured using an acceleration sensor. Second, a VMD algorithm is used to decompose the vibration signals into several intrinsic mode functions (IMFs). The IMF matrix is divided into submatrices to compute the local singular values (LSV). The maximum singular values of each submatrix are selected as the feature vectors for fault diagnosis. Finally, a MLC composed of two one-class support vector machines (OCSVMs) and a support vector machine (SVM) is constructed to identify the fault type. Two layers of independent OCSVM are adopted to distinguish normal or fault conditions with known or unknown fault types, respectively. On this basis, SVM recognizes the specific fault type. Real diagnostic experiments are conducted with a real SF6 HVCB with normal and fault states. Three different faults (i.e., jam fault of the iron core, looseness of the base screw, and poor lubrication of the connecting lever) are simulated in a field experiment on a real HVCB to test the feasibility of the proposed method. Results show that the classification accuracy of the new method is superior to other traditional methods. PMID:27834902

  17. A machine learning tool for re-planning and adaptive RT: A multicenter cohort investigation.

    PubMed

    Guidi, G; Maffei, N; Meduri, B; D'Angelo, E; Mistretta, G M; Ceroni, P; Ciarmatori, A; Bernabei, A; Maggi, S; Cardinali, M; Morabito, V E; Rosica, F; Malara, S; Savini, A; Orlandi, G; D'Ugo, C; Bunkheila, F; Bono, M; Lappi, S; Blasi, C; Lohr, F; Costi, T

    2016-12-01

    To predict patients who would benefit from adaptive radiotherapy (ART) and re-planning intervention based on machine learning from anatomical and dosimetric variations in a retrospective dataset. 90 patients (pts) treated for head-neck cancer (H&N) formed a multicenter data-set. 41 H&N pts (45.6%) were considered for learning; 49 pts (54.4%) were used to test the tool. A homemade machine-learning classifier was developed to analyze volume and dose variations of parotid glands (PG). Using deformable image registration (DIR) and GPU, patients' conditions were analyzed automatically. Support Vector Machines (SVM) was used for time-series evaluation. "Inadequate" class identified patients that might benefit from replanning. Double-blind evaluation by two radiation oncologists (ROs) was carried out to validate day/week selected for re-planning by the classifier. The cohort was affected by PG mean reduction of 23.7±8.8%. During the first 3weeks, 86.7% cases show PG deformation aligned with predefined tolerance, thus not requiring re-planning. From 4th week, an increased number of pts would potentially benefit from re-planning: a mean of 58% of cases, with an inter-center variability of 8.3%, showed "inadequate" conditions. 11% of cases showed "bias" due to DIR and script failure; 6% showed "warning" output due to potential positioning issues. Comparing re-planning suggested by tool with recommended by ROs, the 4th week seems the most favorable time in 70% cases. SVM and decision-making tool was applied to overcome ART challenges. Pts would benefit from ART and ideal time for re-planning intervention was identified in this retrospective analysis. Copyright © 2016 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  18. Classification of cardiovascular tissues using LBP based descriptors and a cascade SVM.

    PubMed

    Mazo, Claudia; Alegre, Enrique; Trujillo, Maria

    2017-08-01

    Histological images have characteristics, such as texture, shape, colour and spatial structure, that permit the differentiation of each fundamental tissue and organ. Texture is one of the most discriminative features. The automatic classification of tissues and organs based on histology images is an open problem, due to the lack of automatic solutions when treating tissues without pathologies. In this paper, we demonstrate that it is possible to automatically classify cardiovascular tissues using texture information and Support Vector Machines (SVM). Additionally, we realised that it is feasible to recognise several cardiovascular organs following the same process. The texture of histological images was described using Local Binary Patterns (LBP), LBP Rotation Invariant (LBPri), Haralick features and different concatenations between them, representing in this way its content. Using a SVM with linear kernel, we selected the more appropriate descriptor that, for this problem, was a concatenation of LBP and LBPri. Due to the small number of the images available, we could not follow an approach based on deep learning, but we selected the classifier who yielded the higher performance by comparing SVM with Random Forest and Linear Discriminant Analysis. Once SVM was selected as the classifier with a higher area under the curve that represents both higher recall and precision, we tuned it evaluating different kernels, finding that a linear SVM allowed us to accurately separate four classes of tissues: (i) cardiac muscle of the heart, (ii) smooth muscle of the muscular artery, (iii) loose connective tissue, and (iv) smooth muscle of the large vein and the elastic artery. The experimental validation was conducted using 3000 blocks of 100 × 100 sized pixels, with 600 blocks per class and the classification was assessed using a 10-fold cross-validation. using LBP as the descriptor, concatenated with LBPri and a SVM with linear kernel, the main four classes of tissues were recognised with an AUC higher than 0.98. A polynomial kernel was then used to separate the elastic artery and vein, yielding an AUC in both cases superior to 0.98. Following the proposed approach, it is possible to separate with very high precision (AUC greater than 0.98) the fundamental tissues of the cardiovascular system along with some organs, such as the heart, arteries and veins. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Component Pin Recognition Using Algorithms Based on Machine Learning

    NASA Astrophysics Data System (ADS)

    Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang

    2018-04-01

    The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.

  20. Predicting hemispheric dominance for language production in healthy individuals using support vector machine.

    PubMed

    Zago, Laure; Hervé, Pierre-Yves; Genuer, Robin; Laurent, Alexandre; Mazoyer, Bernard; Tzourio-Mazoyer, Nathalie; Joliot, Marc

    2017-12-01

    We used a Support Vector Machine (SVM) classifier to assess hemispheric pattern of language dominance of 47 individuals categorized as non-typical for language from their hemispheric functional laterality index (HFLI) measured on a sentence minus word-list production fMRI-BOLD contrast map. The SVM classifier was trained at discriminating between Dominant and Non-Dominant hemispheric language production activation pattern on a group of 250 participants previously identified as Typicals (HFLI strongly leftward). Then, SVM was applied to each hemispheric language activation pattern of 47 non-typical individuals. The results showed that at least one hemisphere (left or right) was found to be Dominant in every, except 3 individuals, indicating that the "dominant" type of functional organization is the most frequent in non-typicals. Specifically, left hemisphere dominance was predicted in all non-typical right-handers (RH) and in 57.4% of non-typical left-handers (LH). When both hemisphere classifications were jointly considered, four types of brain patterns were observed. The most often predicted pattern (51%) was left-dominant (Dominant left-hemisphere and Non-Dominant right-hemisphere), followed by right-dominant (23%, Dominant right-hemisphere and Non-Dominant left-hemisphere) and co-dominant (19%, 2 Dominant hemispheres) patterns. Co-non-dominant was rare (6%, 2 Non-Dominant hemispheres), but was normal variants of hemispheric specialization. In RH, only left-dominant (72%) and co-dominant patterns were detected, while for LH, all types were found, although with different occurrences. Among the 10 LH with a strong rightward HFLI, 8 had a right-dominant brain pattern. Whole-brain analysis of the right-dominant pattern group confirmed that it exhibited a functional organization strictly mirroring that of left-dominant pattern group. Hum Brain Mapp 38:5871-5889, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  1. Emotion Recognition from Single-Trial EEG Based on Kernel Fisher's Emotion Pattern and Imbalanced Quasiconformal Kernel Support Vector Machine

    PubMed Central

    Liu, Yi-Hung; Wu, Chien-Te; Cheng, Wei-Teng; Hsiao, Yu-Tsung; Chen, Po-Ming; Teng, Jyh-Tong

    2014-01-01

    Electroencephalogram-based emotion recognition (EEG-ER) has received increasing attention in the fields of health care, affective computing, and brain-computer interface (BCI). However, satisfactory ER performance within a bi-dimensional and non-discrete emotional space using single-trial EEG data remains a challenging task. To address this issue, we propose a three-layer scheme for single-trial EEG-ER. In the first layer, a set of spectral powers of different EEG frequency bands are extracted from multi-channel single-trial EEG signals. In the second layer, the kernel Fisher's discriminant analysis method is applied to further extract features with better discrimination ability from the EEG spectral powers. The feature vector produced by layer 2 is called a kernel Fisher's emotion pattern (KFEP), and is sent into layer 3 for further classification where the proposed imbalanced quasiconformal kernel support vector machine (IQK-SVM) serves as the emotion classifier. The outputs of the three layer EEG-ER system include labels of emotional valence and arousal. Furthermore, to collect effective training and testing datasets for the current EEG-ER system, we also use an emotion-induction paradigm in which a set of pictures selected from the International Affective Picture System (IAPS) are employed as emotion induction stimuli. The performance of the proposed three-layer solution is compared with that of other EEG spectral power-based features and emotion classifiers. Results on 10 healthy participants indicate that the proposed KFEP feature performs better than other spectral power features, and IQK-SVM outperforms traditional SVM in terms of the EEG-ER accuracy. Our findings also show that the proposed EEG-ER scheme achieves the highest classification accuracies of valence (82.68%) and arousal (84.79%) among all testing methods. PMID:25061837

  2. Emotion recognition from single-trial EEG based on kernel Fisher's emotion pattern and imbalanced quasiconformal kernel support vector machine.

    PubMed

    Liu, Yi-Hung; Wu, Chien-Te; Cheng, Wei-Teng; Hsiao, Yu-Tsung; Chen, Po-Ming; Teng, Jyh-Tong

    2014-07-24

    Electroencephalogram-based emotion recognition (EEG-ER) has received increasing attention in the fields of health care, affective computing, and brain-computer interface (BCI). However, satisfactory ER performance within a bi-dimensional and non-discrete emotional space using single-trial EEG data remains a challenging task. To address this issue, we propose a three-layer scheme for single-trial EEG-ER. In the first layer, a set of spectral powers of different EEG frequency bands are extracted from multi-channel single-trial EEG signals. In the second layer, the kernel Fisher's discriminant analysis method is applied to further extract features with better discrimination ability from the EEG spectral powers. The feature vector produced by layer 2 is called a kernel Fisher's emotion pattern (KFEP), and is sent into layer 3 for further classification where the proposed imbalanced quasiconformal kernel support vector machine (IQK-SVM) serves as the emotion classifier. The outputs of the three layer EEG-ER system include labels of emotional valence and arousal. Furthermore, to collect effective training and testing datasets for the current EEG-ER system, we also use an emotion-induction paradigm in which a set of pictures selected from the International Affective Picture System (IAPS) are employed as emotion induction stimuli. The performance of the proposed three-layer solution is compared with that of other EEG spectral power-based features and emotion classifiers. Results on 10 healthy participants indicate that the proposed KFEP feature performs better than other spectral power features, and IQK-SVM outperforms traditional SVM in terms of the EEG-ER accuracy. Our findings also show that the proposed EEG-ER scheme achieves the highest classification accuracies of valence (82.68%) and arousal (84.79%) among all testing methods.

  3. Millennial Filipino Student Engagement Analyzer Using Facial Feature Classification

    NASA Astrophysics Data System (ADS)

    Manseras, R.; Eugenio, F.; Palaoag, T.

    2018-03-01

    Millennials has been a word of mouth of everybody and a target market of various companies nowadays. In the Philippines, they comprise one third of the total population and most of them are still in school. Having a good education system is important for this generation to prepare them for better careers. And a good education system means having quality instruction as one of the input component indicators. In a classroom environment, teachers use facial features to measure the affect state of the class. Emerging technologies like Affective Computing is one of today’s trends to improve quality instruction delivery. This, together with computer vision, can be used in analyzing affect states of the students and improve quality instruction delivery. This paper proposed a system of classifying student engagement using facial features. Identifying affect state, specifically Millennial Filipino student engagement, is one of the main priorities of every educator and this directed the authors to develop a tool to assess engagement percentage. Multiple face detection framework using Face API was employed to detect as many student faces as possible to gauge current engagement percentage of the whole class. The binary classifier model using Support Vector Machine (SVM) was primarily set in the conceptual framework of this study. To achieve the most accuracy performance of this model, a comparison of SVM to two of the most widely used binary classifiers were tested. Results show that SVM bested RandomForest and Naive Bayesian algorithms in most of the experiments from the different test datasets.

  4. Computer-aided diagnosis of malignant mammograms using Zernike moments and SVM.

    PubMed

    Sharma, Shubhi; Khanna, Pritee

    2015-02-01

    This work is directed toward the development of a computer-aided diagnosis (CAD) system to detect abnormalities or suspicious areas in digital mammograms and classify them as malignant or nonmalignant. Original mammogram is preprocessed to separate the breast region from its background. To work on the suspicious area of the breast, region of interest (ROI) patches of a fixed size of 128×128 are extracted from the original large-sized digital mammograms. For training, patches are extracted manually from a preprocessed mammogram. For testing, patches are extracted from a highly dense area identified by clustering technique. For all extracted patches corresponding to a mammogram, Zernike moments of different orders are computed and stored as a feature vector. A support vector machine (SVM) is used to classify extracted ROI patches. The experimental study shows that the use of Zernike moments with order 20 and SVM classifier gives better results among other studies. The proposed system is tested on Image Retrieval In Medical Application (IRMA) reference dataset and Digital Database for Screening Mammography (DDSM) mammogram database. On IRMA reference dataset, it attains 99% sensitivity and 99% specificity, and on DDSM mammogram database, it obtained 97% sensitivity and 96% specificity. To verify the applicability of Zernike moments as a fitting texture descriptor, the performance of the proposed CAD system is compared with the other well-known texture descriptors namely gray-level co-occurrence matrix (GLCM) and discrete cosine transform (DCT).

  5. Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning

    PubMed Central

    2013-01-01

    Background Plastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning. Results In this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, Nterminal-Center-Cterminal composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction accuracy for distinguishing plastid vs. non-plastids and only 20% in classifying various plastid-types, indicating the need and importance of machine learning algorithms. Conclusion The current work is a first attempt to develop a methodology for classifying various plastid-type proteins. The prediction modules have also been made available as a web tool, PLpred available at http://bioinfo.okstate.edu/PLpred/ for real time identification/characterization. We believe this tool will be very useful in the functional annotation of various genomes. PMID:24266945

  6. Hybrid Disease Diagnosis Using Multiobjective Optimization with Evolutionary Parameter Optimization

    PubMed Central

    Nalluri, MadhuSudana Rao; K., Kannan; M., Manisha

    2017-01-01

    With the widespread adoption of e-Healthcare and telemedicine applications, accurate, intelligent disease diagnosis systems have been profoundly coveted. In recent years, numerous individual machine learning-based classifiers have been proposed and tested, and the fact that a single classifier cannot effectively classify and diagnose all diseases has been almost accorded with. This has seen a number of recent research attempts to arrive at a consensus using ensemble classification techniques. In this paper, a hybrid system is proposed to diagnose ailments using optimizing individual classifier parameters for two classifier techniques, namely, support vector machine (SVM) and multilayer perceptron (MLP) technique. We employ three recent evolutionary algorithms to optimize the parameters of the classifiers above, leading to six alternative hybrid disease diagnosis systems, also referred to as hybrid intelligent systems (HISs). Multiple objectives, namely, prediction accuracy, sensitivity, and specificity, have been considered to assess the efficacy of the proposed hybrid systems with existing ones. The proposed model is evaluated on 11 benchmark datasets, and the obtained results demonstrate that our proposed hybrid diagnosis systems perform better in terms of disease prediction accuracy, sensitivity, and specificity. Pertinent statistical tests were carried out to substantiate the efficacy of the obtained results. PMID:29065626

  7. Detecting epileptic seizure with different feature extracting strategies using robust machine learning classification techniques by applying advance parameter optimization approach.

    PubMed

    Hussain, Lal

    2018-06-01

    Epilepsy is a neurological disorder produced due to abnormal excitability of neurons in the brain. The research reveals that brain activity is monitored through electroencephalogram (EEG) of patients suffered from seizure to detect the epileptic seizure. The performance of EEG detection based epilepsy require feature extracting strategies. In this research, we have extracted varying features extracting strategies based on time and frequency domain characteristics, nonlinear, wavelet based entropy and few statistical features. A deeper study was undertaken using novel machine learning classifiers by considering multiple factors. The support vector machine kernels are evaluated based on multiclass kernel and box constraint level. Likewise, for K-nearest neighbors (KNN), we computed the different distance metrics, Neighbor weights and Neighbors. Similarly, the decision trees we tuned the paramours based on maximum splits and split criteria and ensemble classifiers are evaluated based on different ensemble methods and learning rate. For training/testing tenfold Cross validation was employed and performance was evaluated in form of TPR, NPR, PPV, accuracy and AUC. In this research, a deeper analysis approach was performed using diverse features extracting strategies using robust machine learning classifiers with more advanced optimal options. Support Vector Machine linear kernel and KNN with City block distance metric give the overall highest accuracy of 99.5% which was higher than using the default parameters for these classifiers. Moreover, highest separation (AUC = 0.9991, 0.9990) were obtained at different kernel scales using SVM. Additionally, the K-nearest neighbors with inverse squared distance weight give higher performance at different Neighbors. Moreover, to distinguish the postictal heart rate oscillations from epileptic ictal subjects, and highest performance of 100% was obtained using different machine learning classifiers.

  8. Machine learning modelling for predicting soil liquefaction susceptibility

    NASA Astrophysics Data System (ADS)

    Samui, P.; Sitharam, T. G.

    2011-01-01

    This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N1)60] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [(N1)60 and peck ground acceleration (amax/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.

  9. Automatic optical detection and classification of marine animals around MHK converters using machine vision

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brunton, Steven

    Optical systems provide valuable information for evaluating interactions and associations between organisms and MHK energy converters and for capturing potentially rare encounters between marine organisms and MHK device. The deluge of optical data from cabled monitoring packages makes expert review time-consuming and expensive. We propose algorithms and a processing framework to automatically extract events of interest from underwater video. The open-source software framework consists of background subtraction, filtering, feature extraction and hierarchical classification algorithms. This principle classification pipeline was validated on real-world data collected with an experimental underwater monitoring package. An event detection rate of 100% was achieved using robustmore » principal components analysis (RPCA), Fourier feature extraction and a support vector machine (SVM) binary classifier. The detected events were then further classified into more complex classes – algae | invertebrate | vertebrate, one species | multiple species of fish, and interest rank. Greater than 80% accuracy was achieved using a combination of machine learning techniques.« less

  10. Point shear wave ultrasound elastography with Esaote compared to real-time 2D shear wave elastography with supersonic imagine for the quantification of liver stiffness.

    PubMed

    Mulazzani, L; Salvatore, V; Ravaioli, F; Allegretti, G; Matassoni, F; Granata, R; Ferrarini, A; Stefanescu, H; Piscaglia, Fabio

    2017-09-01

    Different shear wave elastography (SWE) machines able to quantify liver stiffness (LS) have been recently introduced by various companies. The aim of this study was to investigate the agreement between point SWE with Esaote MyLab Twice (pSWE.ESA) and 2D SWE with Aixplorer SuperSonic (2D SWE.SSI). Moreover, we assessed the correlation of these machines with Fibroscan in a subgroup of patients. A total of 81 liver disease patients and 27 subjects without liver disease accessing the ultrasound lab were considered. Exclusion criteria were liver nodules, BMI >35, and severe comorbidities. LS was sampled from the same intercostal space with both pSWE.ESA and 2D SWE.SSI and values were tested with Lin's analysis and Bland-Altman analysis (B&A). Agreement between each SWE machine and Fibroscan was assessed in 26 liver disease patients with Spearman correlation. Precision and accuracy between pSWE.ESA and 2D SWE.SSI were, respectively, 0.839 and 0.999. B&A showed a mean of only -0.2 kPa, with no systematic deviation between the techniques and limits of agreement at -11.6 and 11.3 kPa. Spearman's rho correlation versus Fibroscan was 0.849 for pSWE.ESA and 0.878 for 2D SWE.SSI. The relationship became less strict in the higher range of LS (≥15.2 kPa), corresponding to cirrhosis. The overall degree of concordance of pSWE.ESA and 2D SWE.SSI in measuring LS resulted remarkable, also when compared with Fibroscan. The less strict correlation for patients with LS in the higher range would not affect the staging of disease as such patients are anyhow classified as cirrhotic.

  11. A Power Transformers Fault Diagnosis Model Based on Three DGA Ratios and PSO Optimization SVM

    NASA Astrophysics Data System (ADS)

    Ma, Hongzhe; Zhang, Wei; Wu, Rongrong; Yang, Chunyan

    2018-03-01

    In order to make up for the shortcomings of existing transformer fault diagnosis methods in dissolved gas-in-oil analysis (DGA) feature selection and parameter optimization, a transformer fault diagnosis model based on the three DGA ratios and particle swarm optimization (PSO) optimize support vector machine (SVM) is proposed. Using transforming support vector machine to the nonlinear and multi-classification SVM, establishing the particle swarm optimization to optimize the SVM multi classification model, and conducting transformer fault diagnosis combined with the cross validation principle. The fault diagnosis results show that the average accuracy of test method is better than the standard support vector machine and genetic algorithm support vector machine, and the proposed method can effectively improve the accuracy of transformer fault diagnosis is proved.

  12. Support vector machine for automatic pain recognition

    NASA Astrophysics Data System (ADS)

    Monwar, Md Maruf; Rezaei, Siamak

    2009-02-01

    Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.

  13. Automatic system for radar echoes filtering based on textural features and artificial intelligence

    NASA Astrophysics Data System (ADS)

    Hedir, Mehdia; Haddad, Boualem

    2017-10-01

    Among the very popular Artificial Intelligence (AI) techniques, Artificial Neural Network (ANN) and Support Vector Machine (SVM) have been retained to process Ground Echoes (GE) on meteorological radar images taken from Setif (Algeria) and Bordeaux (France) with different climates and topologies. To achieve this task, AI techniques were associated with textural approaches. We used Gray Level Co-occurrence Matrix (GLCM) and Completed Local Binary Pattern (CLBP); both methods were largely used in image analysis. The obtained results show the efficiency of texture to preserve precipitations forecast on both sites with the accuracy of 98% on Bordeaux and 95% on Setif despite the AI technique used. 98% of GE are suppressed with SVM, this rate is outperforming ANN skills. CLBP approach associated to SVM eliminates 98% of GE and preserves precipitations forecast on Bordeaux site better than on Setif's, while it exhibits lower accuracy with ANN. SVM classifier is well adapted to the proposed application since the average filtering rate is 95-98% with texture and 92-93% with CLBP. These approaches allow removing Anomalous Propagations (APs) too with a better accuracy of 97.15% with texture and SVM. In fact, textural features associated to AI techniques are an efficient tool for incoherent radars to surpass spurious echoes.

  14. Comparison of performance of object-based image analysis techniques available in open source software (Spring and Orfeo Toolbox/Monteverdi) considering very high spatial resolution data

    NASA Astrophysics Data System (ADS)

    Teodoro, Ana C.; Araujo, Ricardo

    2016-01-01

    The use of unmanned aerial vehicles (UAVs) for remote sensing applications is becoming more frequent. However, this type of information can result in several software problems related to the huge amount of data available. Object-based image analysis (OBIA) has proven to be superior to pixel-based analysis for very high-resolution images. The main objective of this work was to explore the potentialities of the OBIA methods available in two different open source software applications, Spring and OTB/Monteverdi, in order to generate an urban land cover map. An orthomosaic derived from UAVs was considered, 10 different regions of interest were selected, and two different approaches were followed. The first one (Spring) uses the region growing segmentation algorithm followed by the Bhattacharya classifier. The second approach (OTB/Monteverdi) uses the mean shift segmentation algorithm followed by the support vector machine (SVM) classifier. Two strategies were followed: four classes were considered using Spring and thereafter seven classes were considered for OTB/Monteverdi. The SVM classifier produces slightly better results and presents a shorter processing time. However, the poor spectral resolution of the data (only RGB bands) is an important factor that limits the performance of the classifiers applied.

  15. Prediction of troponin-T degradation using color image texture features in 10d aged beef longissimus steaks.

    PubMed

    Sun, X; Chen, K J; Berg, E P; Newman, D J; Schwartz, C A; Keller, W L; Maddock Carlin, K R

    2014-02-01

    The objective was to use digital color image texture features to predict troponin-T degradation in beef. Image texture features, including 88 gray level co-occurrence texture features, 81 two-dimension fast Fourier transformation texture features, and 48 Gabor wavelet filter texture features, were extracted from color images of beef strip steaks (longissimus dorsi, n = 102) aged for 10d obtained using a digital camera and additional lighting. Steaks were designated degraded or not-degraded based on troponin-T degradation determined on d 3 and d 10 postmortem by immunoblotting. Statistical analysis (STEPWISE regression model) and artificial neural network (support vector machine model, SVM) methods were designed to classify protein degradation. The d 3 and d 10 STEPWISE models were 94% and 86% accurate, respectively, while the d 3 and d 10 SVM models were 63% and 71%, respectively, in predicting protein degradation in aged meat. STEPWISE and SVM models based on image texture features show potential to predict troponin-T degradation in meat. © 2013.

  16. a Comparison Study of Different Kernel Functions for Svm-Based Classification of Multi-Temporal Polarimetry SAR Data

    NASA Astrophysics Data System (ADS)

    Yekkehkhany, B.; Safari, A.; Homayouni, S.; Hasanlou, M.

    2014-10-01

    In this paper, a framework is developed based on Support Vector Machines (SVM) for crop classification using polarimetric features extracted from multi-temporal Synthetic Aperture Radar (SAR) imageries. The multi-temporal integration of data not only improves the overall retrieval accuracy but also provides more reliable estimates with respect to single-date data. Several kernel functions are employed and compared in this study for mapping the input space to higher Hilbert dimension space. These kernel functions include linear, polynomials and Radial Based Function (RBF). The method is applied to several UAVSAR L-band SAR images acquired over an agricultural area near Winnipeg, Manitoba, Canada. In this research, the temporal alpha features of H/A/α decomposition method are used in classification. The experimental tests show an SVM classifier with RBF kernel for three dates of data increases the Overall Accuracy (OA) to up to 3% in comparison to using linear kernel function, and up to 1% in comparison to a 3rd degree polynomial kernel function.

  17. A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.

    PubMed

    Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2012-01-01

    A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner.

  18. Developing a radiomics framework for classifying non-small cell lung carcinoma subtypes

    NASA Astrophysics Data System (ADS)

    Yu, Dongdong; Zang, Yali; Dong, Di; Zhou, Mu; Gevaert, Olivier; Fang, Mengjie; Shi, Jingyun; Tian, Jie

    2017-03-01

    Patient-targeted treatment of non-small cell lung carcinoma (NSCLC) has been well documented according to the histologic subtypes over the past decade. In parallel, recent development of quantitative image biomarkers has recently been highlighted as important diagnostic tools to facilitate histological subtype classification. In this study, we present a radiomics analysis that classifies the adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). We extract 52-dimensional, CT-based features (7 statistical features and 45 image texture features) to represent each nodule. We evaluate our approach on a clinical dataset including 324 ADCs and 110 SqCCs patients with CT image scans. Classification of these features is performed with four different machine-learning classifiers including Support Vector Machines with Radial Basis Function kernel (RBF-SVM), Random forest (RF), K-nearest neighbor (KNN), and RUSBoost algorithms. To improve the classifiers' performance, optimal feature subset is selected from the original feature set by using an iterative forward inclusion and backward eliminating algorithm. Extensive experimental results demonstrate that radiomics features achieve encouraging classification results on both complete feature set (AUC=0.89) and optimal feature subset (AUC=0.91).

  19. Computer aided diagnosis system for Alzheimer disease using brain diffusion tensor imaging features selected by Pearson's correlation.

    PubMed

    Graña, M; Termenon, M; Savio, A; Gonzalez-Pinto, A; Echeveste, J; Pérez, J M; Besga, A

    2011-09-20

    The aim of this paper is to obtain discriminant features from two scalar measures of Diffusion Tensor Imaging (DTI) data, Fractional Anisotropy (FA) and Mean Diffusivity (MD), and to train and test classifiers able to discriminate Alzheimer's Disease (AD) patients from controls on the basis of features extracted from the FA or MD volumes. In this study, support vector machine (SVM) classifier was trained and tested on FA and MD data. Feature selection is done computing the Pearson's correlation between FA or MD values at voxel site across subjects and the indicative variable specifying the subject class. Voxel sites with high absolute correlation are selected for feature extraction. Results are obtained over an on-going study in Hospital de Santiago Apostol collecting anatomical T1-weighted MRI volumes and DTI data from healthy control subjects and AD patients. FA features and a linear SVM classifier achieve perfect accuracy, sensitivity and specificity in several cross-validation studies, supporting the usefulness of DTI-derived features as an image-marker for AD and to the feasibility of building Computer Aided Diagnosis systems for AD based on them. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  20. Artificial intelligence techniques: predicting necessity for biopsy in renal transplant recipients suspected of acute cellular rejection or nephrotoxicity.

    PubMed

    Hummel, A D; Maciel, R F; Sousa, F S; Cohrs, F M; Falcão, A E J; Teixeira, F; Baptista, R; Mancini, F; da Costa, T M; Alves, D; Rodrigues, R G D S; Miranda, R; Pisa, I T

    2011-05-01

    The gold standard for nephrotoxicity and acute cellular rejection (ACR) is a biopsy, an invasive and expensive procedure. More efficient strategies to screen patients for biopsy are important from the clinical and financial points of view. The aim of this study was to evaluate various artificial intelligence techniques to screen for the need for a biopsy among patients suspected of nephrotoxicity or ACR during the first year after renal transplantation. We used classifiers like artificial neural networks (ANN), support vector machines (SVM), and Bayesian inference (BI) to indicate if the clinical course of the event suggestive of the need for a biopsy. Each classifier was evaluated by values of sensitivity and area under the ROC curve (AUC) for each of the classifiers. The technique that showed the best sensitivity value as an indicator for biopsy was SVM with an AUC of 0.79 and an accuracy rate of 79.86%. The results were better than those described in previous works. The accuracy for an indication of biopsy screening was efficient enough to become useful in clinical practice. Copyright © 2011 Elsevier Inc. All rights reserved.

Top