Multi-view L2-SVM and its multi-view core vector machine.
Huang, Chengquan; Chung, Fu-lai; Wang, Shitong
2016-03-01
In this paper, a novel L2-SVM based classifier Multi-view L2-SVM is proposed to address multi-view classification tasks. The proposed Multi-view L2-SVM classifier does not have any bias in its objective function and hence has the flexibility like μ-SVC in the sense that the number of the yielded support vectors can be controlled by a pre-specified parameter. The proposed Multi-view L2-SVM classifier can make full use of the coherence and the difference of different views through imposing the consensus among multiple views to improve the overall classification performance. Besides, based on the generalized core vector machine GCVM, the proposed Multi-view L2-SVM classifier is extended into its GCVM version MvCVM which can realize its fast training on large scale multi-view datasets, with its asymptotic linear time complexity with the sample size and its space complexity independent of the sample size. Our experimental results demonstrated the effectiveness of the proposed Multi-view L2-SVM classifier for small scale multi-view datasets and the proposed MvCVM classifier for large scale multi-view datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.
Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong
2015-09-01
Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.
NASA Astrophysics Data System (ADS)
Wang, Danshi; Zhang, Min; Cai, Zhongle; Cui, Yue; Li, Ze; Han, Huanhuan; Fu, Meixia; Luo, Bin
2016-06-01
An effective machine learning algorithm, the support vector machine (SVM), is presented in the context of a coherent optical transmission system. As a classifier, the SVM can create nonlinear decision boundaries to mitigate the distortions caused by nonlinear phase noise (NLPN). Without any prior information or heuristic assumptions, the SVM can learn and capture the link properties from only a few training data. Compared with the maximum likelihood estimation (MLE) algorithm, a lower bit-error rate (BER) is achieved by the SVM for a given launch power; moreover, the launch power dynamic range (LPDR) is increased by 3.3 dBm for 8 phase-shift keying (8 PSK), 1.2 dBm for QPSK, and 0.3 dBm for BPSK. The maximum transmission distance corresponding to a BER of 1 ×10-3 is increased by 480 km for the case of 8 PSK. The larger launch power range and longer transmission distance improve the tolerance to amplitude and phase noise, which demonstrates the feasibility of the SVM in digital signal processing for M-PSK formats. Meanwhile, in order to apply the SVM method to 16 quadratic amplitude modulation (16 QAM) detection, we propose a parameter optimization scheme. By utilizing a cross-validation and grid-search techniques, the optimal parameters of SVM can be selected, thus leading to the LPDR improvement by 2.8 dBm. Additionally, we demonstrate that the SVM is also effective in combating the laser phase noise combined with the inphase and quadrature (I/Q) modulator imperfections, but the improvement is insignificant for the linear noise and separate I/Q imbalance. The computational complexity of SVM is also discussed. The relatively low complexity makes it possible for SVM to implement the real-time processing.
Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying
2015-01-01
Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.
2015-01-01
Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes. PMID:26681483
A linear-RBF multikernel SVM to classify big text corpora.
Romero, R; Iglesias, E L; Borrajo, L
2015-01-01
Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers.
Onboard Classifiers for Science Event Detection on a Remote Sensing Spacecraft
NASA Technical Reports Server (NTRS)
Castano, Rebecca; Mazzoni, Dominic; Tang, Nghia; Greeley, Ron; Doggett, Thomas; Cichy, Ben; Chien, Steve; Davies, Ashley
2006-01-01
Typically, data collected by a spacecraft is downlinked to Earth and pre-processed before any analysis is performed. We have developed classifiers that can be used onboard a spacecraft to identify high priority data for downlink to Earth, providing a method for maximizing the use of a potentially bandwidth limited downlink channel. Onboard analysis can also enable rapid reaction to dynamic events, such as flooding, volcanic eruptions or sea ice break-up. Four classifiers were developed to identify cryosphere events using hyperspectral images. These classifiers include a manually constructed classifier, a Support Vector Machine (SVM), a Decision Tree and a classifier derived by searching over combinations of thresholded band ratios. Each of the classifiers was designed to run in the computationally constrained operating environment of the spacecraft. A set of scenes was hand-labeled to provide training and testing data. Performance results on the test data indicate that the SVM and manual classifiers outperformed the Decision Tree and band-ratio classifiers with the SVM yielding slightly better classifications than the manual classifier.
DCS-SVM: a novel semi-automated method for human brain MR image segmentation.
Ahmadvand, Ali; Daliri, Mohammad Reza; Hajiali, Mohammadtaghi
2017-11-27
In this paper, a novel method is proposed which appropriately segments magnetic resonance (MR) brain images into three main tissues. This paper proposes an extension of our previous work in which we suggested a combination of multiple classifiers (CMC)-based methods named dynamic classifier selection-dynamic local training local Tanimoto index (DCS-DLTLTI) for MR brain image segmentation into three main cerebral tissues. This idea is used here and a novel method is developed that tries to use more complex and accurate classifiers like support vector machine (SVM) in the ensemble. This work is challenging because the CMC-based methods are time consuming, especially on huge datasets like three-dimensional (3D) brain MR images. Moreover, SVM is a powerful method that is used for modeling datasets with complex feature space, but it also has huge computational cost for big datasets, especially those with strong interclass variability problems and with more than two classes such as 3D brain images; therefore, we cannot use SVM in DCS-DLTLTI. Therefore, we propose a novel approach named "DCS-SVM" to use SVM in DCS-DLTLTI to improve the accuracy of segmentation results. The proposed method is applied on well-known datasets of the Internet Brain Segmentation Repository (IBSR) and promising results are obtained.
Approximate l-fold cross-validation with Least Squares SVM and Kernel Ridge Regression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edwards, Richard E; Zhang, Hao; Parker, Lynne Edwards
2013-01-01
Kernel methods have difficulties scaling to large modern data sets. The scalability issues are based on computational and memory requirements for working with a large matrix. These requirements have been addressed over the years by using low-rank kernel approximations or by improving the solvers scalability. However, Least Squares Support VectorMachines (LS-SVM), a popular SVM variant, and Kernel Ridge Regression still have several scalability issues. In particular, the O(n^3) computational complexity for solving a single model, and the overall computational complexity associated with tuning hyperparameters are still major problems. We address these problems by introducing an O(n log n) approximate l-foldmore » cross-validation method that uses a multi-level circulant matrix to approximate the kernel. In addition, we prove our algorithm s computational complexity and present empirical runtimes on data sets with approximately 1 million data points. We also validate our approximate method s effectiveness at selecting hyperparameters on real world and standard benchmark data sets. Lastly, we provide experimental results on using a multi-level circulant kernel approximation to solve LS-SVM problems with hyperparameters selected using our method.« less
Ranking Support Vector Machine with Kernel Approximation
Dou, Yong
2017-01-01
Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms. PMID:28293256
Ranking Support Vector Machine with Kernel Approximation.
Chen, Kai; Li, Rongchun; Dou, Yong; Liang, Zhengfa; Lv, Qi
2017-01-01
Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms.
Multiclass Posterior Probability Twin SVM for Motor Imagery EEG Classification.
She, Qingshan; Ma, Yuliang; Meng, Ming; Luo, Zhizeng
2015-01-01
Motor imagery electroencephalography is widely used in the brain-computer interface systems. Due to inherent characteristics of electroencephalography signals, accurate and real-time multiclass classification is always challenging. In order to solve this problem, a multiclass posterior probability solution for twin SVM is proposed by the ranking continuous output and pairwise coupling in this paper. First, two-class posterior probability model is constructed to approximate the posterior probability by the ranking continuous output techniques and Platt's estimating method. Secondly, a solution of multiclass probabilistic outputs for twin SVM is provided by combining every pair of class probabilities according to the method of pairwise coupling. Finally, the proposed method is compared with multiclass SVM and twin SVM via voting, and multiclass posterior probability SVM using different coupling approaches. The efficacy on the classification accuracy and time complexity of the proposed method has been demonstrated by both the UCI benchmark datasets and real world EEG data from BCI Competition IV Dataset 2a, respectively.
Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.
Sriwastava, Brijesh K; Basu, Subhadip; Maulik, Ujjwal
2015-01-01
Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.
Deeb, Omar; Shaik, Basheerulla; Agrawal, Vijay K
2014-10-01
Quantitative Structure-Activity Relationship (QSAR) models for binding affinity constants (log Ki) of 78 flavonoid ligands towards the benzodiazepine site of GABA (A) receptor complex were calculated using the machine learning methods: artificial neural network (ANN) and support vector machine (SVM) techniques. The models obtained were compared with those obtained using multiple linear regression (MLR) analysis. The descriptor selection and model building were performed with 10-fold cross-validation using the training data set. The SVM and MLR coefficient of determination values are 0.944 and 0.879, respectively, for the training set and are higher than those of ANN models. Though the SVM model shows improvement of training set fitting, the ANN model was superior to SVM and MLR in predicting the test set. Randomization test is employed to check the suitability of the models.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.
Becker, Natalia; Toedt, Grischa; Lichter, Peter; Benner, Axel
2011-05-09
Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data
2011-01-01
Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters. The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'. We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets. PMID:21554689
Binding Affinity prediction with Property Encoded Shape Distribution signatures
Das, Sourav; Krein, Michael P.
2010-01-01
We report the use of the molecular signatures known as “Property-Encoded Shape Distributions” (PESD) together with standard Support Vector Machine (SVM) techniques to produce validated models that can predict the binding affinity of a large number of protein ligand complexes. This “PESD-SVM” method uses PESD signatures that encode molecular shapes and property distributions on protein and ligand surfaces as features to build SVM models that require no subjective feature selection. A simple protocol was employed for tuning the SVM models during their development, and the results were compared to SFCscore – a regression-based method that was previously shown to perform better than 14 other scoring functions. Although the PESD-SVM method is based on only two surface property maps, the overall results were comparable. For most complexes with a dominant enthalpic contribution to binding (ΔH/-TΔS > 3), a good correlation between true and predicted affinities was observed. Entropy and solvent were not considered in the present approach and further improvement in accuracy would require accounting for these components rigorously. PMID:20095526
Online Least Squares One-Class Support Vector Machines-Based Abnormal Visual Event Detection
Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem
2013-01-01
The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method. PMID:24351629
Online least squares one-class support vector machines-based abnormal visual event detection.
Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem
2013-12-12
The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method.
Combining SVM and flame radiation to forecast BOF end-point
NASA Astrophysics Data System (ADS)
Wen, Hongyuan; Zhao, Qi; Xu, Lingfei; Zhou, Munchun; Chen, Yanru
2009-05-01
Because of complex reactions in Basic Oxygen Furnace (BOF) for steelmaking, the main end-point control methods of steelmaking have insurmountable difficulties. Aiming at these problems, a support vector machine (SVM) method for forecasting the BOF steelmaking end-point is presented based on flame radiation information. The basis is that the furnace flame is the performance of the carbon oxygen reaction, because the carbon oxygen reaction is the major reaction in the steelmaking furnace. The system can acquire spectrum and image data quickly in the steelmaking adverse environment. The structure of SVM and the multilayer feed-ward neural network are similar, but SVM model could overcome the inherent defects of the latter. The model is trained and forecasted by using SVM and some appropriate variables of light and image characteristic information. The model training process follows the structure risk minimum (SRM) criterion and the design parameter can be adjusted automatically according to the sampled data in the training process. Experimental results indicate that the prediction precision of the SVM model and the executive time both meet the requirements of end-point judgment online.
Dong, Ni; Huang, Helai; Zheng, Liang
2015-09-01
In zone-level crash prediction, accounting for spatial dependence has become an extensively studied topic. This study proposes Support Vector Machine (SVM) model to address complex, large and multi-dimensional spatial data in crash prediction. Correlation-based Feature Selector (CFS) was applied to evaluate candidate factors possibly related to zonal crash frequency in handling high-dimension spatial data. To demonstrate the proposed approaches and to compare them with the Bayesian spatial model with conditional autoregressive prior (i.e., CAR), a dataset in Hillsborough county of Florida was employed. The results showed that SVM models accounting for spatial proximity outperform the non-spatial model in terms of model fitting and predictive performance, which indicates the reasonableness of considering cross-zonal spatial correlations. The best model predictive capability, relatively, is associated with the model considering proximity of the centroid distance by choosing the RBF kernel and setting the 10% of the whole dataset as the testing data, which further exhibits SVM models' capacity for addressing comparatively complex spatial data in regional crash prediction modeling. Moreover, SVM models exhibit the better goodness-of-fit compared with CAR models when utilizing the whole dataset as the samples. A sensitivity analysis of the centroid-distance-based spatial SVM models was conducted to capture the impacts of explanatory variables on the mean predicted probabilities for crash occurrence. While the results conform to the coefficient estimation in the CAR models, which supports the employment of the SVM model as an alternative in regional safety modeling. Copyright © 2015 Elsevier Ltd. All rights reserved.
Optimal structural design of the midship of a VLCC based on the strategy integrating SVM and GA
NASA Astrophysics Data System (ADS)
Sun, Li; Wang, Deyu
2012-03-01
In this paper a hybrid process of modeling and optimization, which integrates a support vector machine (SVM) and genetic algorithm (GA), was introduced to reduce the high time cost in structural optimization of ships. SVM, which is rooted in statistical learning theory and an approximate implementation of the method of structural risk minimization, can provide a good generalization performance in metamodeling the input-output relationship of real problems and consequently cuts down on high time cost in the analysis of real problems, such as FEM analysis. The GA, as a powerful optimization technique, possesses remarkable advantages for the problems that can hardly be optimized with common gradient-based optimization methods, which makes it suitable for optimizing models built by SVM. Based on the SVM-GA strategy, optimization of structural scantlings in the midship of a very large crude carrier (VLCC) ship was carried out according to the direct strength assessment method in common structural rules (CSR), which eventually demonstrates the high efficiency of SVM-GA in optimizing the ship structural scantlings under heavy computational complexity. The time cost of this optimization with SVM-GA has been sharply reduced, many more loops have been processed within a small amount of time and the design has been improved remarkably.
Real-time detection with AdaBoost-svm combination in various face orientation
NASA Astrophysics Data System (ADS)
Fhonna, R. P.; Nasution, M. K. M.; Tulus
2018-03-01
Most of the research has used algorithm AdaBoost-SVM for face detection. However, to our knowledge so far there is no research has been facing detection on real-time data with various orientations using the combination of AdaBoost and Support Vector Machine (SVM). Characteristics of complex and diverse face variations and real-time data in various orientations, and with a very complex application will slow down the performance of the face detection system this becomes a challenge in this research. Face orientation performed on the detection system, that is 900, 450, 00, -450, and -900. This combination method is expected to be an effective and efficient solution in various face orientations. The results showed that the highest average detection rate is on the face detection oriented 00 and the lowest detection rate is in the face orientation 900.
Nonlinear detection for a high rate extended binary phase shift keying system.
Chen, Xian-Qing; Wu, Le-Nan
2013-03-28
The algorithm and the results of a nonlinear detector using a machine learning technique called support vector machine (SVM) on an efficient modulation system with high data rate and low energy consumption is presented in this paper. Simulation results showed that the performance achieved by the SVM detector is comparable to that of a conventional threshold decision (TD) detector. The two detectors detect the received signals together with the special impacting filter (SIF) that can improve the energy utilization efficiency. However, unlike the TD detector, the SVM detector concentrates not only on reducing the BER of the detector, but also on providing accurate posterior probability estimates (PPEs), which can be used as soft-inputs of the LDPC decoder. The complexity of this detector is considered in this paper by using four features and simplifying the decision function. In addition, a bandwidth efficient transmission is analyzed with both SVM and TD detector. The SVM detector is more robust to sampling rate than TD detector. We find that the SVM is suitable for extended binary phase shift keying (EBPSK) signal detection and can provide accurate posterior probability for LDPC decoding.
Nonlinear Detection for a High Rate Extended Binary Phase Shift Keying System
Chen, Xian-Qing; Wu, Le-Nan
2013-01-01
The algorithm and the results of a nonlinear detector using a machine learning technique called support vector machine (SVM) on an efficient modulation system with high data rate and low energy consumption is presented in this paper. Simulation results showed that the performance achieved by the SVM detector is comparable to that of a conventional threshold decision (TD) detector. The two detectors detect the received signals together with the special impacting filter (SIF) that can improve the energy utilization efficiency. However, unlike the TD detector, the SVM detector concentrates not only on reducing the BER of the detector, but also on providing accurate posterior probability estimates (PPEs), which can be used as soft-inputs of the LDPC decoder. The complexity of this detector is considered in this paper by using four features and simplifying the decision function. In addition, a bandwidth efficient transmission is analyzed with both SVM and TD detector. The SVM detector is more robust to sampling rate than TD detector. We find that the SVM is suitable for extended binary phase shift keying (EBPSK) signal detection and can provide accurate posterior probability for LDPC decoding. PMID:23539034
Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin
2013-03-01
Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Lin, Xiaohui; Li, Chao; Zhang, Yanhui; Su, Benzhe; Fan, Meng; Wei, Hai
2017-12-26
Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.
NASA Astrophysics Data System (ADS)
Yeganeh, B.; Motlagh, M. Shafie Pour; Rashidi, Y.; Kamalan, H.
2012-08-01
Due to the health impacts caused by exposures to air pollutants in urban areas, monitoring and forecasting of air quality parameters have become popular as an important topic in atmospheric and environmental research today. The knowledge on the dynamics and complexity of air pollutants behavior has made artificial intelligence models as a useful tool for a more accurate pollutant concentration prediction. This paper focuses on an innovative method of daily air pollution prediction using combination of Support Vector Machine (SVM) as predictor and Partial Least Square (PLS) as a data selection tool based on the measured values of CO concentrations. The CO concentrations of Rey monitoring station in the south of Tehran, from Jan. 2007 to Feb. 2011, have been used to test the effectiveness of this method. The hourly CO concentrations have been predicted using the SVM and the hybrid PLS-SVM models. Similarly, daily CO concentrations have been predicted based on the aforementioned four years measured data. Results demonstrated that both models have good prediction ability; however the hybrid PLS-SVM has better accuracy. In the analysis presented in this paper, statistic estimators including relative mean errors, root mean squared errors and the mean absolute relative error have been employed to compare performances of the models. It has been concluded that the errors decrease after size reduction and coefficients of determination increase from 56 to 81% for SVM model to 65-85% for hybrid PLS-SVM model respectively. Also it was found that the hybrid PLS-SVM model required lower computational time than SVM model as expected, hence supporting the more accurate and faster prediction ability of hybrid PLS-SVM model.
Zhang, Daqing; Xiao, Jianfeng; Zhou, Nannan; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian
2015-01-01
Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. PMID:26504797
Application of machine learning on brain cancer multiclass classification
NASA Astrophysics Data System (ADS)
Panca, V.; Rustam, Z.
2017-07-01
Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.
An IPSO-SVM algorithm for security state prediction of mine production logistics system
NASA Astrophysics Data System (ADS)
Zhang, Yanliang; Lei, Junhui; Ma, Qiuli; Chen, Xin; Bi, Runfang
2017-06-01
A theoretical basis for the regulation of corporate security warning and resources was provided in order to reveal the laws behind the security state in mine production logistics. Considering complex mine production logistics system and the variable is difficult to acquire, a superior security status predicting model of mine production logistics system based on the improved particle swarm optimization and support vector machine (IPSO-SVM) is proposed in this paper. Firstly, through the linear adjustments of inertia weight and learning weights, the convergence speed and search accuracy are enhanced with the aim to deal with situations associated with the changeable complexity and the data acquisition difficulty. The improved particle swarm optimization (IPSO) is then introduced to resolve the problem of parameter settings in traditional support vector machines (SVM). At the same time, security status index system is built to determine the classification standards of safety status. The feasibility and effectiveness of this method is finally verified using the experimental results.
Chesnokov, Yuriy V
2008-06-01
Paroxysmal atrial fibrillation (PAF) is a serious arrhythmia associated with morbidity and mortality. We explore the possibility of distant prediction of PAF by analyzing changes in heart rate variability (HRV) dynamics of non-PAF rhythms immediately before PAF event. We use that model for distant prognosis of PAF onset with artificial intelligence methods. We analyzed 30-min non-PAF HRV records from 51 subjects immediately before PAF onset and at least 45min distant from any PAF event. We used spectral and complexity analysis with sample (SmEn) and approximate (ApEn) entropies and their multiscale versions on extracted HRV data. We used that features to train the artificial neural networks (ANNs) and support vector machine (SVM) classifiers to differentiate the subjects. The trained classifiers were further tested for distant PAF event prognosis on 16 subjects from independent database on non-PAF rhythm lasting from 60 to 320 min before PAF onset classifying the 30-min segments as distant or leading to PAF. We found statistically significant increase in 30-min non-PAF HRV recordings from 51 subjects in the VLF, LF, HF bands and total power (p<0.0001) before PAF event compared to PAF distant ones. The SmEn and ApEn analysis provided significant decrease in complexity (p<0.0001 and p<0.001) before PAF onset. For training ANN and SVM classifiers the data from 51 subjects were randomly split to training, validation and testing. ANN provided better results in terms of sensitivity (Se), specificity (Sp) and positive predictivity (Pp) compared to SVM which became biased towards positive case. The validation results of the ANN classifier we achieved: Se 76%, Sp 93%, Pp 94%. Testing ANN and SVM classifiers on 16 subjects with non-PAF HRV data preceding PAF events we obtained distant prediction of PAF onset with SVM classifier in 10 subjects (58+/-18 min in advance). ANN classifier provided distant prediction of PAF event in 13 subjects (62+/-21 min in advance). From the results of distant PAF prediction we conclude that ANN and SVM classifiers learned the changes in the HRV dynamics immediately before PAF event and successfully identified them during distant PAF prognosis on independent database. This confirms the reported in the literature results that corresponding changes in the HRV data occur about 60 min before PAF onset and proves the possibility of distant PAF prediction with ANN and SVM methods.
Devos, Olivier; Downey, Gerard; Duponchel, Ludovic
2014-04-01
Classification is an important task in chemometrics. For several years now, support vector machines (SVMs) have proven to be powerful for infrared spectral data classification. However such methods require optimisation of parameters in order to control the risk of overfitting and the complexity of the boundary. Furthermore, it is established that the prediction ability of classification models can be improved using pre-processing in order to remove unwanted variance in the spectra. In this paper we propose a new methodology based on genetic algorithm (GA) for the simultaneous optimisation of SVM parameters and pre-processing (GENOPT-SVM). The method has been tested for the discrimination of the geographical origin of Italian olive oil (Ligurian and non-Ligurian) on the basis of near infrared (NIR) or mid infrared (FTIR) spectra. Different classification models (PLS-DA, SVM with mean centre data, GENOPT-SVM) have been tested and statistically compared using McNemar's statistical test. For the two datasets, SVM with optimised pre-processing give models with higher accuracy than the one obtained with PLS-DA on pre-processed data. In the case of the NIR dataset, most of this accuracy improvement (86.3% compared with 82.8% for PLS-DA) occurred using only a single pre-processing step. For the FTIR dataset, three optimised pre-processing steps are required to obtain SVM model with significant accuracy improvement (82.2%) compared to the one obtained with PLS-DA (78.6%). Furthermore, this study demonstrates that even SVM models have to be developed on the basis of well-corrected spectral data in order to obtain higher classification rates. Copyright © 2013 Elsevier Ltd. All rights reserved.
Intelligent Gearbox Diagnosis Methods Based on SVM, Wavelet Lifting and RBR
Gao, Lixin; Ren, Zhiqiang; Tang, Wenliang; Wang, Huaqing; Chen, Peng
2010-01-01
Given the problems in intelligent gearbox diagnosis methods, it is difficult to obtain the desired information and a large enough sample size to study; therefore, we propose the application of various methods for gearbox fault diagnosis, including wavelet lifting, a support vector machine (SVM) and rule-based reasoning (RBR). In a complex field environment, it is less likely for machines to have the same fault; moreover, the fault features can also vary. Therefore, a SVM could be used for the initial diagnosis. First, gearbox vibration signals were processed with wavelet packet decomposition, and the signal energy coefficients of each frequency band were extracted and used as input feature vectors in SVM for normal and faulty pattern recognition. Second, precision analysis using wavelet lifting could successfully filter out the noisy signals while maintaining the impulse characteristics of the fault; thus effectively extracting the fault frequency of the machine. Lastly, the knowledge base was built based on the field rules summarized by experts to identify the detailed fault type. Results have shown that SVM is a powerful tool to accomplish gearbox fault pattern recognition when the sample size is small, whereas the wavelet lifting scheme can effectively extract fault features, and rule-based reasoning can be used to identify the detailed fault type. Therefore, a method that combines SVM, wavelet lifting and rule-based reasoning ensures effective gearbox fault diagnosis. PMID:22399894
Intelligent gearbox diagnosis methods based on SVM, wavelet lifting and RBR.
Gao, Lixin; Ren, Zhiqiang; Tang, Wenliang; Wang, Huaqing; Chen, Peng
2010-01-01
Given the problems in intelligent gearbox diagnosis methods, it is difficult to obtain the desired information and a large enough sample size to study; therefore, we propose the application of various methods for gearbox fault diagnosis, including wavelet lifting, a support vector machine (SVM) and rule-based reasoning (RBR). In a complex field environment, it is less likely for machines to have the same fault; moreover, the fault features can also vary. Therefore, a SVM could be used for the initial diagnosis. First, gearbox vibration signals were processed with wavelet packet decomposition, and the signal energy coefficients of each frequency band were extracted and used as input feature vectors in SVM for normal and faulty pattern recognition. Second, precision analysis using wavelet lifting could successfully filter out the noisy signals while maintaining the impulse characteristics of the fault; thus effectively extracting the fault frequency of the machine. Lastly, the knowledge base was built based on the field rules summarized by experts to identify the detailed fault type. Results have shown that SVM is a powerful tool to accomplish gearbox fault pattern recognition when the sample size is small, whereas the wavelet lifting scheme can effectively extract fault features, and rule-based reasoning can be used to identify the detailed fault type. Therefore, a method that combines SVM, wavelet lifting and rule-based reasoning ensures effective gearbox fault diagnosis.
SVM based colon polyps classifier in a wireless active stereo endoscope.
Ayoub, J; Granado, B; Mhanna, Y; Romain, O
2010-01-01
This work focuses on the recognition of three-dimensional colon polyps captured by an active stereo vision sensor. The detection algorithm consists of SVM classifier trained on robust feature descriptors. The study is related to Cyclope, this prototype sensor allows real time 3D object reconstruction and continues to be optimized technically to improve its classification task by differentiation between hyperplastic and adenomatous polyps. Experimental results were encouraging and show correct classification rate of approximately 97%. The work contains detailed statistics about the detection rate and the computing complexity. Inspired by intensity histogram, the work shows a new approach that extracts a set of features based on depth histogram and combines stereo measurement with SVM classifiers to correctly classify benign and malignant polyps.
Tripathy, Rajesh Kumar; Dandapat, Samarendra
2017-04-01
The complex wavelet sub-band bi-spectrum (CWSB) features are proposed for detection and classification of myocardial infarction (MI), heart muscle disease (HMD) and bundle branch block (BBB) from 12-lead ECG. The dual tree CW transform of 12-lead ECG produces CW coefficients at different sub-bands. The higher-order CW analysis is used for evaluation of CWSB. The mean of the absolute value of CWSB, and the number of negative phase angle and the number of positive phase angle features from the phase of CWSB of 12-lead ECG are evaluated. Extreme learning machine and support vector machine (SVM) classifiers are used to evaluate the performance of CWSB features. Experimental results show that the proposed CWSB features of 12-lead ECG and the SVM classifier are successful for classification of various heart pathologies. The individual accuracy values for MI, HMD and BBB classes are obtained as 98.37, 97.39 and 96.40%, respectively, using SVM classifier and radial basis function kernel function. A comparison has also been made with existing 12-lead ECG-based cardiac disease detection techniques.
Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.
Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook
2014-11-01
As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Granular support vector machines with association rules mining for protein homology prediction.
Tang, Yuchun; Jin, Bo; Zhang, Yan-Qing
2005-01-01
Protein homology prediction between protein sequences is one of critical problems in computational biology. Such a complex classification problem is common in medical or biological information processing applications. How to build a model with superior generalization capability from training samples is an essential issue for mining knowledge to accurately predict/classify unseen new samples and to effectively support human experts to make correct decisions. A new learning model called granular support vector machines (GSVM) is proposed based on our previous work. GSVM systematically and formally combines the principles from statistical learning theory and granular computing theory and thus provides an interesting new mechanism to address complex classification problems. It works by building a sequence of information granules and then building support vector machines (SVM) in some of these information granules on demand. A good granulation method to find suitable granules is crucial for modeling a GSVM with good performance. In this paper, we also propose an association rules-based granulation method. For the granules induced by association rules with high enough confidence and significant support, we leave them as they are because of their high "purity" and significant effect on simplifying the classification task. For every other granule, a SVM is modeled to discriminate the corresponding data. In this way, a complex classification problem is divided into multiple smaller problems so that the learning task is simplified. The proposed algorithm, here named GSVM-AR, is compared with SVM by KDDCUP04 protein homology prediction data. The experimental results show that finding the splitting hyperplane is not a trivial task (we should be careful to select the association rules to avoid overfitting) and GSVM-AR does show significant improvement compared to building one single SVM in the whole feature space. Another advantage is that the utility of GSVM-AR is very good because it is easy to be implemented. More importantly and more interestingly, GSVM provides a new mechanism to address complex classification problems.
NASA Astrophysics Data System (ADS)
Erener, A.
2013-04-01
Automatic extraction of urban features from high resolution satellite images is one of the main applications in remote sensing. It is useful for wide scale applications, namely: urban planning, urban mapping, disaster management, GIS (geographic information systems) updating, and military target detection. One common approach to detecting urban features from high resolution images is to use automatic classification methods. This paper has four main objectives with respect to detecting buildings. The first objective is to compare the performance of the most notable supervised classification algorithms, including the maximum likelihood classifier (MLC) and the support vector machine (SVM). In this experiment the primary consideration is the impact of kernel configuration on the performance of the SVM. The second objective of the study is to explore the suitability of integrating additional bands, namely first principal component (1st PC) and the intensity image, for original data for multi classification approaches. The performance evaluation of classification results is done using two different accuracy assessment methods: pixel based and object based approaches, which reflect the third aim of the study. The objective here is to demonstrate the differences in the evaluation of accuracies of classification methods. Considering consistency, the same set of ground truth data which is produced by labeling the building boundaries in the GIS environment is used for accuracy assessment. Lastly, the fourth aim is to experimentally evaluate variation in the accuracy of classifiers for six different real situations in order to identify the impact of spatial and spectral diversity on results. The method is applied to Quickbird images for various urban complexity levels, extending from simple to complex urban patterns. The simple surface type includes a regular urban area with low density and systematic buildings with brick rooftops. The complex surface type involves almost all kinds of challenges, such as high dense build up areas, regions with bare soil, and small and large buildings with different rooftops, such as concrete, brick, and metal. Using the pixel based accuracy assessment it was shown that the percent building detection (PBD) and quality percent (QP) of the MLC and SVM depend on the complexity and texture variation of the region. Generally, PBD values range between 70% and 90% for the MLC and SVM, respectively. No substantial improvements were observed when the SVM and MLC classifications were developed by the addition of more variables, instead of the use of only four bands. In the evaluation of object based accuracy assessment, it was demonstrated that while MLC and SVM provide higher rates of correct detection, they also provide higher rates of false alarms.
NASA Astrophysics Data System (ADS)
Jing, Ya-Bing; Liu, Chang-Wen; Bi, Feng-Rong; Bi, Xiao-Yang; Wang, Xia; Shao, Kang
2017-07-01
Numerous vibration-based techniques are rarely used in diesel engines fault diagnosis in a direct way, due to the surface vibration signals of diesel engines with the complex non-stationary and nonlinear time-varying features. To investigate the fault diagnosis of diesel engines, fractal correlation dimension, wavelet energy and entropy as features reflecting the diesel engine fault fractal and energy characteristics are extracted from the decomposed signals through analyzing vibration acceleration signals derived from the cylinder head in seven different states of valve train. An intelligent fault detector FastICA-SVM is applied for diesel engine fault diagnosis and classification. The results demonstrate that FastICA-SVM achieves higher classification accuracy and makes better generalization performance in small samples recognition. Besides, the fractal correlation dimension and wavelet energy and entropy as the special features of diesel engine vibration signal are considered as input vectors of classifier FastICA-SVM and could produce the excellent classification results. The proposed methodology improves the accuracy of feature extraction and the fault diagnosis of diesel engines.
Lamb Wave Damage Quantification Using GA-Based LS-SVM.
Sun, Fuqiang; Wang, Ning; He, Jingjing; Guan, Xuefei; Yang, Jinsong
2017-06-12
Lamb waves have been reported to be an efficient tool for non-destructive evaluations (NDE) for various application scenarios. However, accurate and reliable damage quantification using the Lamb wave method is still a practical challenge, due to the complex underlying mechanism of Lamb wave propagation and damage detection. This paper presents a Lamb wave damage quantification method using a least square support vector machine (LS-SVM) and a genetic algorithm (GA). Three damage sensitive features, namely, normalized amplitude, phase change, and correlation coefficient, were proposed to describe changes of Lamb wave characteristics caused by damage. In view of commonly used data-driven methods, the GA-based LS-SVM model using the proposed three damage sensitive features was implemented to evaluate the crack size. The GA method was adopted to optimize the model parameters. The results of GA-based LS-SVM were validated using coupon test data and lap joint component test data with naturally developed fatigue cracks. Cases of different loading and manufacturer were also included to further verify the robustness of the proposed method for crack quantification.
Lamb Wave Damage Quantification Using GA-Based LS-SVM
Sun, Fuqiang; Wang, Ning; He, Jingjing; Guan, Xuefei; Yang, Jinsong
2017-01-01
Lamb waves have been reported to be an efficient tool for non-destructive evaluations (NDE) for various application scenarios. However, accurate and reliable damage quantification using the Lamb wave method is still a practical challenge, due to the complex underlying mechanism of Lamb wave propagation and damage detection. This paper presents a Lamb wave damage quantification method using a least square support vector machine (LS-SVM) and a genetic algorithm (GA). Three damage sensitive features, namely, normalized amplitude, phase change, and correlation coefficient, were proposed to describe changes of Lamb wave characteristics caused by damage. In view of commonly used data-driven methods, the GA-based LS-SVM model using the proposed three damage sensitive features was implemented to evaluate the crack size. The GA method was adopted to optimize the model parameters. The results of GA-based LS-SVM were validated using coupon test data and lap joint component test data with naturally developed fatigue cracks. Cases of different loading and manufacturer were also included to further verify the robustness of the proposed method for crack quantification. PMID:28773003
A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease
NASA Astrophysics Data System (ADS)
Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas
2017-08-01
The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.
Quantum optimization for training support vector machines.
Anguita, Davide; Ridella, Sandro; Rivieccio, Fabio; Zunino, Rodolfo
2003-01-01
Refined concepts, such as Rademacher estimates of model complexity and nonlinear criteria for weighting empirical classification errors, represent recent and promising approaches to characterize the generalization ability of Support Vector Machines (SVMs). The advantages of those techniques lie in both improving the SVM representation ability and yielding tighter generalization bounds. On the other hand, they often make Quadratic-Programming algorithms no longer applicable, and SVM training cannot benefit from efficient, specialized optimization techniques. The paper considers the application of Quantum Computing to solve the problem of effective SVM training, especially in the case of digital implementations. The presented research compares the behavioral aspects of conventional and enhanced SVMs; experiments in both a synthetic and real-world problems support the theoretical analysis. At the same time, the related differences between Quadratic-Programming and Quantum-based optimization techniques are considered.
Dandapat, Samarendra
2017-01-01
The complex wavelet sub-band bi-spectrum (CWSB) features are proposed for detection and classification of myocardial infarction (MI), heart muscle disease (HMD) and bundle branch block (BBB) from 12-lead ECG. The dual tree CW transform of 12-lead ECG produces CW coefficients at different sub-bands. The higher-order CW analysis is used for evaluation of CWSB. The mean of the absolute value of CWSB, and the number of negative phase angle and the number of positive phase angle features from the phase of CWSB of 12-lead ECG are evaluated. Extreme learning machine and support vector machine (SVM) classifiers are used to evaluate the performance of CWSB features. Experimental results show that the proposed CWSB features of 12-lead ECG and the SVM classifier are successful for classification of various heart pathologies. The individual accuracy values for MI, HMD and BBB classes are obtained as 98.37, 97.39 and 96.40%, respectively, using SVM classifier and radial basis function kernel function. A comparison has also been made with existing 12-lead ECG-based cardiac disease detection techniques. PMID:28894589
Fault detection of Tennessee Eastman process based on topological features and SVM
NASA Astrophysics Data System (ADS)
Zhao, Huiyang; Hu, Yanzhu; Ai, Xinbo; Hu, Yu; Meng, Zhen
2018-03-01
Fault detection in industrial process is a popular research topic. Although the distributed control system(DCS) has been introduced to monitor the state of industrial process, it still cannot satisfy all the requirements for fault detection of all the industrial systems. In this paper, we proposed a novel method based on topological features and support vector machine(SVM), for fault detection of industrial process. The proposed method takes global information of measured variables into account by complex network model and predicts whether a system has generated some faults or not by SVM. The proposed method can be divided into four steps, i.e. network construction, network analysis, model training and model testing respectively. Finally, we apply the model to Tennessee Eastman process(TEP). The results show that this method works well and can be a useful supplement for fault detection of industrial process.
Diagnosis of periodontal diseases using different classification algorithms: a preliminary study.
Ozden, F O; Özgönenel, O; Özden, B; Aydogdu, A
2015-01-01
The purpose of the proposed study was to develop an identification unit for classifying periodontal diseases using support vector machine (SVM), decision tree (DT), and artificial neural networks (ANNs). A total of 150 patients was divided into two groups such as training (100) and testing (50). The codes created for risk factors, periodontal data, and radiographically bone loss were formed as a matrix structure and regarded as inputs for the classification unit. A total of six periodontal conditions was the outputs of the classification unit. The accuracy of the suggested methods was compared according to their resolution and working time. DT and SVM were best to classify the periodontal diseases with a high accuracy according to the clinical research based on 150 patients. The performances of SVM and DT were found 98% with total computational time of 19.91 and 7.00 s, respectively. ANN had the worst correlation between input and output variable, and its performance was calculated as 46%. SVM and DT appeared to be sufficiently complex to reflect all the factors associated with the periodontal status, simple enough to be understandable and practical as a decision-making aid for prediction of periodontal disease.
Frick, Andreas; Gingnell, Malin; Marquand, Andre F.; Howner, Katarina; Fischer, Håkan; Kristiansson, Marianne; Williams, Steven C.R.; Fredrikson, Mats; Furmark, Tomas
2014-01-01
Functional neuroimaging of social anxiety disorder (SAD) support altered neural activation to threat-provoking stimuli focally in the fear network, while structural differences are distributed over the temporal and frontal cortices as well as limbic structures. Previous neuroimaging studies have investigated the brain at the voxel level using mass-univariate methods which do not enable detection of more complex patterns of activity and structural alterations that may separate SAD from healthy individuals. Support vector machine (SVM) is a supervised machine learning method that capitalizes on brain activation and structural patterns to classify individuals. The aim of this study was to investigate if it is possible to discriminate SAD patients (n = 14) from healthy controls (n = 12) using SVM based on (1) functional magnetic resonance imaging during fearful face processing and (2) regional gray matter volume. Whole brain and region of interest (fear network) SVM analyses were performed for both modalities. For functional scans, significant classifications were obtained both at whole brain level and when restricting the analysis to the fear network while gray matter SVM analyses correctly classified participants only when using the whole brain search volume. These results support that SAD is characterized by aberrant neural activation to affective stimuli in the fear network, while disorder-related alterations in regional gray matter volume are more diffusely distributed over the whole brain. SVM may thus be useful for identifying imaging biomarkers of SAD. PMID:24239689
NASA Astrophysics Data System (ADS)
Zhao, Shouwei; Zhang, Yong; Zhou, Bin; Ma, Dongxi
2014-09-01
Interaction is one of the key techniques of augmented reality (AR) maintenance guiding system. Because of the complexity of the maintenance guiding system's image background and the high dimensionality of gesture characteristics, the whole process of gesture recognition can be divided into three stages which are gesture segmentation, gesture characteristic feature modeling and trick recognition. In segmentation stage, for solving the misrecognition of skin-like region, a segmentation algorithm combing background mode and skin color to preclude some skin-like regions is adopted. In gesture characteristic feature modeling of image attributes stage, plenty of characteristic features are analyzed and acquired, such as structure characteristics, Hu invariant moments features and Fourier descriptor. In trick recognition stage, a classifier based on Support Vector Machine (SVM) is introduced into the augmented reality maintenance guiding process. SVM is a novel learning method based on statistical learning theory, processing academic foundation and excellent learning ability, having a lot of issues in machine learning area and special advantages in dealing with small samples, non-linear pattern recognition at high dimension. The gesture recognition of augmented reality maintenance guiding system is realized by SVM after the granulation of all the characteristic features. The experimental results of the simulation of number gesture recognition and its application in augmented reality maintenance guiding system show that the real-time performance and robustness of gesture recognition of AR maintenance guiding system can be greatly enhanced by improved SVM.
NASA Astrophysics Data System (ADS)
Paino, A.; Keller, J.; Popescu, M.; Stone, K.
2014-06-01
In this paper we present an approach that uses Genetic Programming (GP) to evolve novel feature extraction algorithms for greyscale images. Our motivation is to create an automated method of building new feature extraction algorithms for images that are competitive with commonly used human-engineered features, such as Local Binary Pattern (LBP) and Histogram of Oriented Gradients (HOG). The evolved feature extraction algorithms are functions defined over the image space, and each produces a real-valued feature vector of variable length. Each evolved feature extractor breaks up the given image into a set of cells centered on every pixel, performs evolved operations on each cell, and then combines the results of those operations for every cell using an evolved operator. Using this method, the algorithm is flexible enough to reproduce both LBP and HOG features. The dataset we use to train and test our approach consists of a large number of pre-segmented image "chips" taken from a Forward Looking Infrared Imagery (FLIR) camera mounted on the hood of a moving vehicle. The goal is to classify each image chip as either containing or not containing a buried object. To this end, we define the fitness of a candidate solution as the cross-fold validation accuracy of the features generated by said candidate solution when used in conjunction with a Support Vector Machine (SVM) classifier. In order to validate our approach, we compare the classification accuracy of an SVM trained using our evolved features with the accuracy of an SVM trained using mainstream feature extraction algorithms, including LBP and HOG.
FPGA Coprocessor for Accelerated Classification of Images
NASA Technical Reports Server (NTRS)
Pingree, Paula J.; Scharenbroich, Lucas J.; Werne, Thomas A.
2008-01-01
An effort related to that described in the preceding article focuses on developing a spaceborne processing platform for fast and accurate onboard classification of image data, a critical part of modern satellite image processing. The approach again has been to exploit the versatility of recently developed hybrid Virtex-4FX field-programmable gate array (FPGA) to run diverse science applications on embedded processors while taking advantage of the reconfigurable hardware resources of the FPGAs. In this case, the FPGA serves as a coprocessor that implements legacy C-language support-vector-machine (SVM) image-classification algorithms to detect and identify natural phenomena such as flooding, volcanic eruptions, and sea-ice break-up. The FPGA provides hardware acceleration for increased onboard processing capability than previously demonstrated in software. The original C-language program demonstrated on an imaging instrument aboard the Earth Observing-1 (EO-1) satellite implements a linear-kernel SVM algorithm for classifying parts of the images as snow, water, ice, land, or cloud or unclassified. Current onboard processors, such as on EO-1, have limited computing power, extremely limited active storage capability and are no longer considered state-of-the-art. Using commercially available software that translates C-language programs into hardware description language (HDL) files, the legacy C-language program, and two newly formulated programs for a more capable expanded-linear-kernel and a more accurate polynomial-kernel SVM algorithm, have been implemented in the Virtex-4FX FPGA. In tests, the FPGA implementations have exhibited significant speedups over conventional software implementations running on general-purpose hardware.
Daily River Flow Forecasting with Hybrid Support Vector Machine – Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Zaini, N.; Malek, M. A.; Yusoff, M.; Mardi, N. H.; Norhisham, S.
2018-04-01
The application of artificial intelligence techniques for river flow forecasting can further improve the management of water resources and flood prevention. This study concerns the development of support vector machine (SVM) based model and its hybridization with particle swarm optimization (PSO) to forecast short term daily river flow at Upper Bertam Catchment located in Cameron Highland, Malaysia. Ten years duration of historical rainfall, antecedent river flow data and various meteorology parameters data from 2003 to 2012 are used in this study. Four SVM based models are proposed which are SVM1, SVM2, SVM-PSO1 and SVM-PSO2 to forecast 1 to 7 day ahead of river flow. SVM1 and SVM-PSO1 are the models with historical rainfall and antecedent river flow as its input, while SVM2 and SVM-PSO2 are the models with historical rainfall, antecedent river flow data and additional meteorological parameters as input. The performances of the proposed model are measured in term of RMSE and R2 . It is found that, SVM2 outperformed SVM1 and SVM-PSO2 outperformed SVM-PSO1 which meant the additional meteorology parameters used as input to the proposed models significantly affect the model performances. Hybrid models SVM-PSO1 and SVM-PSO2 yield higher performances as compared to SVM1 and SVM2. It is found that hybrid models are more effective in forecasting river flow at 1 to 7 day ahead at the study area.
Binning in Gaussian Kernel Regularization
2005-04-01
OSU-SVM Matlab package, the SVM trained on 966 bins has a comparable test classification rate as the SVM trained on 27,179 samples, but reduces the...71.40%) on 966 randomly sampled data. Using the OSU-SVM Matlab package, the SVM trained on 966 bins has a comparable test classification rate as the...the OSU-SVM Matlab package, the SVM trained on 966 bins has a comparable test classification rate as the SVM trained on 27,179 samples, and reduces
Domínguez, Rocio Berenice; Moreno-Barón, Laura; Muñoz, Roberto; Gutiérrez, Juan Manuel
2014-01-01
This paper describes a new method based on a voltammetric electronic tongue (ET) for the recognition of distinctive features in coffee samples. An ET was directly applied to different samples from the main Mexican coffee regions without any pretreatment before the analysis. The resulting electrochemical information was modeled with two different mathematical tools, namely Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM). Growing conditions (i.e., organic or non-organic practices and altitude of crops) were considered for a first classification. LDA results showed an average discrimination rate of 88% ± 6.53% while SVM successfully accomplished an overall accuracy of 96.4% ± 3.50% for the same task. A second classification based on geographical origin of samples was carried out. Results showed an overall accuracy of 87.5% ± 7.79% for LDA and a superior performance of 97.5% ± 3.22% for SVM. Given the complexity of coffee samples, the high accuracy percentages achieved by ET coupled with SVM in both classification problems suggested a potential applicability of ET in the assessment of selected coffee features with a simpler and faster methodology along with a null sample pretreatment. In addition, the proposed method can be applied to authentication assessment while improving cost, time and accuracy of the general procedure. PMID:25254303
Domínguez, Rocio Berenice; Moreno-Barón, Laura; Muñoz, Roberto; Gutiérrez, Juan Manuel
2014-09-24
This paper describes a new method based on a voltammetric electronic tongue (ET) for the recognition of distinctive features in coffee samples. An ET was directly applied to different samples from the main Mexican coffee regions without any pretreatment before the analysis. The resulting electrochemical information was modeled with two different mathematical tools, namely Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM). Growing conditions (i.e., organic or non-organic practices and altitude of crops) were considered for a first classification. LDA results showed an average discrimination rate of 88% ± 6.53% while SVM successfully accomplished an overall accuracy of 96.4% ± 3.50% for the same task. A second classification based on geographical origin of samples was carried out. Results showed an overall accuracy of 87.5% ± 7.79% for LDA and a superior performance of 97.5% ± 3.22% for SVM. Given the complexity of coffee samples, the high accuracy percentages achieved by ET coupled with SVM in both classification problems suggested a potential applicability of ET in the assessment of selected coffee features with a simpler and faster methodology along with a null sample pretreatment. In addition, the proposed method can be applied to authentication assessment while improving cost, time and accuracy of the general procedure.
Fraley, Stephanie I.; Athamanolap, Pornpat; Masek, Billie J.; Hardick, Justin; Carroll, Karen C.; Hsieh, Yu-Hsiang; Rothman, Richard E.; Gaydos, Charlotte A.; Wang, Tza-Huei; Yang, Samuel
2016-01-01
High Resolution Melt (HRM) is a versatile and rapid post-PCR DNA analysis technique primarily used to differentiate sequence variants among only a few short amplicons. We recently developed a one-vs-one support vector machine algorithm (OVO SVM) that enables the use of HRM for identifying numerous short amplicon sequences automatically and reliably. Herein, we set out to maximize the discriminating power of HRM + SVM for a single genetic locus by testing longer amplicons harboring significantly more sequence information. Using universal primers that amplify the hypervariable bacterial 16 S rRNA gene as a model system, we found that long amplicons yield more complex HRM curve shapes. We developed a novel nested OVO SVM approach to take advantage of this feature and achieved 100% accuracy in the identification of 37 clinically relevant bacteria in Leave-One-Out-Cross-Validation. A subset of organisms were independently tested. Those from pure culture were identified with high accuracy, while those tested directly from clinical blood bottles displayed more technical variability and reduced accuracy. Our findings demonstrate that long sequences can be accurately and automatically profiled by HRM with a novel nested SVM approach and suggest that clinical sample testing is feasible with further optimization. PMID:26778280
New KF-PP-SVM classification method for EEG in brain-computer interfaces.
Yang, Banghua; Han, Zhijun; Zan, Peng; Wang, Qian
2014-01-01
Classification methods are a crucial direction in the current study of brain-computer interfaces (BCIs). To improve the classification accuracy for electroencephalogram (EEG) signals, a novel KF-PP-SVM (kernel fisher, posterior probability, and support vector machine) classification method is developed. Its detailed process entails the use of common spatial patterns to obtain features, based on which the within-class scatter is calculated. Then the scatter is added into the kernel function of a radial basis function to construct a new kernel function. This new kernel is integrated into the SVM to obtain a new classification model. Finally, the output of SVM is calculated based on posterior probability and the final recognition result is obtained. To evaluate the effectiveness of the proposed KF-PP-SVM method, EEG data collected from laboratory are processed with four different classification schemes (KF-PP-SVM, KF-SVM, PP-SVM, and SVM). The results showed that the overall average improvements arising from the use of the KF-PP-SVM scheme as opposed to KF-SVM, PP-SVM and SVM schemes are 2.49%, 5.83 % and 6.49 % respectively.
NASA Astrophysics Data System (ADS)
Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore
2017-10-01
This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.
Support vector machine firefly algorithm based optimization of lens system.
Shamshirband, Shahaboddin; Petković, Dalibor; Pavlović, Nenad T; Ch, Sudheer; Altameem, Torki A; Gani, Abdullah
2015-01-01
Lens system design is an important factor in image quality. The main aspect of the lens system design methodology is the optimization procedure. Since optimization is a complex, nonlinear task, soft computing optimization algorithms can be used. There are many tools that can be employed to measure optical performance, but the spot diagram is the most useful. The spot diagram gives an indication of the image of a point object. In this paper, the spot size radius is considered an optimization criterion. Intelligent soft computing scheme support vector machines (SVMs) coupled with the firefly algorithm (FFA) are implemented. The performance of the proposed estimators is confirmed with the simulation results. The result of the proposed SVM-FFA model has been compared with support vector regression (SVR), artificial neural networks, and generic programming methods. The results show that the SVM-FFA model performs more accurately than the other methodologies. Therefore, SVM-FFA can be used as an efficient soft computing technique in the optimization of lens system designs.
NASA Astrophysics Data System (ADS)
Imani, Moslem; You, Rey-Jer; Kuo, Chung-Yen
2014-10-01
Sea level forecasting at various time intervals is of great importance in water supply management. Evolutionary artificial intelligence (AI) approaches have been accepted as an appropriate tool for modeling complex nonlinear phenomena in water bodies. In the study, we investigated the ability of two AI techniques: support vector machine (SVM), which is mathematically well-founded and provides new insights into function approximation, and gene expression programming (GEP), which is used to forecast Caspian Sea level anomalies using satellite altimetry observations from June 1992 to December 2013. SVM demonstrates the best performance in predicting Caspian Sea level anomalies, given the minimum root mean square error (RMSE = 0.035) and maximum coefficient of determination (R2 = 0.96) during the prediction periods. A comparison between the proposed AI approaches and the cascade correlation neural network (CCNN) model also shows the superiority of the GEP and SVM models over the CCNN.
MiRduplexSVM: A High-Performing MiRNA-Duplex Prediction and Evaluation Methodology
Karathanasis, Nestoras; Tsamardinos, Ioannis; Poirazi, Panayiota
2015-01-01
We address the problem of predicting the position of a miRNA duplex on a microRNA hairpin via the development and application of a novel SVM-based methodology. Our method combines a unique problem representation and an unbiased optimization protocol to learn from mirBase19.0 an accurate predictive model, termed MiRduplexSVM. This is the first model that provides precise information about all four ends of the miRNA duplex. We show that (a) our method outperforms four state-of-the-art tools, namely MaturePred, MiRPara, MatureBayes, MiRdup as well as a Simple Geometric Locator when applied on the same training datasets employed for each tool and evaluated on a common blind test set. (b) In all comparisons, MiRduplexSVM shows superior performance, achieving up to a 60% increase in prediction accuracy for mammalian hairpins and can generalize very well on plant hairpins, without any special optimization. (c) The tool has a number of important applications such as the ability to accurately predict the miRNA or the miRNA*, given the opposite strand of a duplex. Its performance on this task is superior to the 2nts overhang rule commonly used in computational studies and similar to that of a comparative genomic approach, without the need for prior knowledge or the complexity of performing multiple alignments. Finally, it is able to evaluate novel, potential miRNAs found either computationally or experimentally. In relation with recent confidence evaluation methods used in miRBase, MiRduplexSVM was successful in identifying high confidence potential miRNAs. PMID:25961860
LaSVM-based big data learning system for dynamic prediction of air pollution in Tehran.
Ghaemi, Z; Alimohammadi, A; Farnaghi, M
2018-04-20
Due to critical impacts of air pollution, prediction and monitoring of air quality in urban areas are important tasks. However, because of the dynamic nature and high spatio-temporal variability, prediction of the air pollutant concentrations is a complex spatio-temporal problem. Distribution of pollutant concentration is influenced by various factors such as the historical pollution data and weather conditions. Conventional methods such as the support vector machine (SVM) or artificial neural networks (ANN) show some deficiencies when huge amount of streaming data have to be analyzed for urban air pollution prediction. In order to overcome the limitations of the conventional methods and improve the performance of urban air pollution prediction in Tehran, a spatio-temporal system is designed using a LaSVM-based online algorithm. Pollutant concentration and meteorological data along with geographical parameters are continually fed to the developed online forecasting system. Performance of the system is evaluated by comparing the prediction results of the Air Quality Index (AQI) with those of a traditional SVM algorithm. Results show an outstanding increase of speed by the online algorithm while preserving the accuracy of the SVM classifier. Comparison of the hourly predictions for next coming 24 h, with those of the measured pollution data in Tehran pollution monitoring stations shows an overall accuracy of 0.71, root mean square error of 0.54 and coefficient of determination of 0.81. These results are indicators of the practical usefulness of the online algorithm for real-time spatial and temporal prediction of the urban air quality.
Weighted K-means support vector machine for cancer prediction.
Kim, SungHwan
2016-01-01
To date, the support vector machine (SVM) has been widely applied to diverse bio-medical fields to address disease subtype identification and pathogenicity of genetic variants. In this paper, I propose the weighted K-means support vector machine (wKM-SVM) and weighted support vector machine (wSVM), for which I allow the SVM to impose weights to the loss term. Besides, I demonstrate the numerical relations between the objective function of the SVM and weights. Motivated by general ensemble techniques, which are known to improve accuracy, I directly adopt the boosting algorithm to the newly proposed weighted KM-SVM (and wSVM). For predictive performance, a range of simulation studies demonstrate that the weighted KM-SVM (and wSVM) with boosting outperforms the standard KM-SVM (and SVM) including but not limited to many popular classification rules. I applied the proposed methods to simulated data and two large-scale real applications in the TCGA pan-cancer methylation data of breast and kidney cancer. In conclusion, the weighted KM-SVM (and wSVM) increases accuracy of the classification model, and will facilitate disease diagnosis and clinical treatment decisions to benefit patients. A software package (wSVM) is publicly available at the R-project webpage (https://www.r-project.org).
Maximum margin semi-supervised learning with irrelevant data.
Yang, Haiqin; Huang, Kaizhu; King, Irwin; Lyu, Michael R
2015-10-01
Semi-supervised learning (SSL) is a typical learning paradigms training a model from both labeled and unlabeled data. The traditional SSL models usually assume unlabeled data are relevant to the labeled data, i.e., following the same distributions of the targeted labeled data. In this paper, we address a different, yet formidable scenario in semi-supervised classification, where the unlabeled data may contain irrelevant data to the labeled data. To tackle this problem, we develop a maximum margin model, named tri-class support vector machine (3C-SVM), to utilize the available training data, while seeking a hyperplane for separating the targeted data well. Our 3C-SVM exhibits several characteristics and advantages. First, it does not need any prior knowledge and explicit assumption on the data relatedness. On the contrary, it can relieve the effect of irrelevant unlabeled data based on the logistic principle and maximum entropy principle. That is, 3C-SVM approaches an ideal classifier. This classifier relies heavily on labeled data and is confident on the relevant data lying far away from the decision hyperplane, while maximally ignoring the irrelevant data, which are hardly distinguished. Second, theoretical analysis is provided to prove that in what condition, the irrelevant data can help to seek the hyperplane. Third, 3C-SVM is a generalized model that unifies several popular maximum margin models, including standard SVMs, Semi-supervised SVMs (S(3)VMs), and SVMs learned from the universum (U-SVMs) as its special cases. More importantly, we deploy a concave-convex produce to solve the proposed 3C-SVM, transforming the original mixed integer programming, to a semi-definite programming relaxation, and finally to a sequence of quadratic programming subproblems, which yields the same worst case time complexity as that of S(3)VMs. Finally, we demonstrate the effectiveness and efficiency of our proposed 3C-SVM through systematical experimental comparisons. Copyright © 2015 Elsevier Ltd. All rights reserved.
Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng
2013-01-01
In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR.
SVM and SVM Ensembles in Breast Cancer Prediction.
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
SVM and SVM Ensembles in Breast Cancer Prediction
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807
Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng
2013-01-01
In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR. PMID:23536777
CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks.
Gillani, Zeeshan; Akash, Muhammad Sajid Hamid; Rahaman, M D Matiur; Chen, Ming
2014-11-30
Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size. We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network. For network with nodes (<200) and average (over all sizes of networks), SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/ .
Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li
2011-01-01
Background Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Methodology/Principal Findings Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. Conclusions/Significance The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice. PMID:21359184
Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li
2011-02-16
Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice.
Combination of minimum enclosing balls classifier with SVM in coal-rock recognition.
Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan
2017-01-01
Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition.
Combination of minimum enclosing balls classifier with SVM in coal-rock recognition
Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan
2017-01-01
Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition. PMID:28937987
Yu, Wei; Clyne, Melinda; Dolan, Siobhan M; Yesupriya, Ajay; Wulf, Anja; Liu, Tiebin; Khoury, Muin J; Gwinn, Marta
2008-04-22
Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.
Chen, Zhenyu; Li, Jianping; Wei, Liwei
2007-10-01
Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.
Generalized SMO algorithm for SVM-based multitask learning.
Cai, Feng; Cherkassky, Vladimir
2012-06-01
Exploiting additional information to improve traditional inductive learning is an active research area in machine learning. In many supervised-learning applications, training data can be naturally separated into several groups, and incorporating this group information into learning may improve generalization. Recently, Vapnik proposed a general approach to formalizing such problems, known as "learning with structured data" and its support vector machine (SVM) based optimization formulation called SVM+. Liang and Cherkassky showed the connection between SVM+ and multitask learning (MTL) approaches in machine learning, and proposed an SVM-based formulation for MTL called SVM+MTL for classification. Training the SVM+MTL classifier requires the solution of a large quadratic programming optimization problem which scales as O(n(3)) with sample size n. So there is a need to develop computationally efficient algorithms for implementing SVM+MTL. This brief generalizes Platt's sequential minimal optimization (SMO) algorithm to the SVM+MTL setting. Empirical results show that, for typical SVM+MTL problems, the proposed generalized SMO achieves over 100 times speed-up, in comparison with general-purpose optimization routines.
Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels
2014-01-01
Background Protein complexes play important roles in biological systems such as gene regulatory networks and metabolic pathways. Most methods for predicting protein complexes try to find protein complexes with size more than three. It, however, is known that protein complexes with smaller sizes occupy a large part of whole complexes for several species. In our previous work, we developed a method with several feature space mappings and the domain composition kernel for prediction of heterodimeric protein complexes, which outperforms existing methods. Results We propose methods for prediction of heterotrimeric protein complexes by extending techniques in the previous work on the basis of the idea that most heterotrimeric protein complexes are not likely to share the same protein with each other. We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase. As the second classifier, we examine SVMs and relevance vector machines (RVMs). We perform 10-fold cross-validation computational experiments. The results suggest that our proposed two-phase methods and SVM with the extended features outperform the existing method NWE, which was reported to outperform other existing methods such as MCL, MCODE, DPClus, CMC, COACH, RRW, and PPSampler for prediction of heterotrimeric protein complexes. Conclusions We propose two-phase prediction methods with the extended features, the domain composition kernel, SVMs and RVMs. The two-phase method with the extended features and the domain composition kernel using SVM as the second classifier is particularly useful for prediction of heterotrimeric protein complexes. PMID:24564744
A structural SVM approach for reference parsing.
Zhang, Xiaoli; Zou, Jie; Le, Daniel X; Thoma, George R
2011-06-09
Automated extraction of bibliographic data, such as article titles, author names, abstracts, and references is essential to the affordable creation of large citation databases. References, typically appearing at the end of journal articles, can also provide valuable information for extracting other bibliographic data. Therefore, parsing individual reference to extract author, title, journal, year, etc. is sometimes a necessary preprocessing step in building citation-indexing systems. The regular structure in references enables us to consider reference parsing a sequence learning problem and to study structural Support Vector Machine (structural SVM), a newly developed structured learning algorithm on parsing references. In this study, we implemented structural SVM and used two types of contextual features to compare structural SVM with conventional SVM. Both methods achieve above 98% token classification accuracy and above 95% overall chunk-level accuracy for reference parsing. We also compared SVM and structural SVM to Conditional Random Field (CRF). The experimental results show that structural SVM and CRF achieve similar accuracies at token- and chunk-levels. When only basic observation features are used for each token, structural SVM achieves higher performance compared to SVM since it utilizes the contextual label features. However, when the contextual observation features from neighboring tokens are combined, SVM performance improves greatly, and is close to that of structural SVM after adding the second order contextual observation features. The comparison of these two methods with CRF using the same set of binary features show that both structural SVM and CRF perform better than SVM, indicating their stronger sequence learning ability in reference parsing.
Pirooznia, Mehdi; Deng, Youping
2006-12-12
Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.
Epileptic seizure detection in EEG signal with GModPCA and support vector machine.
Jaiswal, Abeg Kumar; Banka, Haider
2017-01-01
Epilepsy is one of the most common neurological disorders caused by recurrent seizures. Electroencephalograms (EEGs) record neural activity and can detect epilepsy. Visual inspection of an EEG signal for epileptic seizure detection is a time-consuming process and may lead to human error; therefore, recently, a number of automated seizure detection frameworks were proposed to replace these traditional methods. Feature extraction and classification are two important steps in these procedures. Feature extraction focuses on finding the informative features that could be used for classification and correct decision-making. Therefore, proposing effective feature extraction techniques for seizure detection is of great significance. Principal Component Analysis (PCA) is a dimensionality reduction technique used in different fields of pattern recognition including EEG signal classification. Global modular PCA (GModPCA) is a variation of PCA. In this paper, an effective framework with GModPCA and Support Vector Machine (SVM) is presented for epileptic seizure detection in EEG signals. The feature extraction is performed with GModPCA, whereas SVM trained with radial basis function kernel performed the classification between seizure and nonseizure EEG signals. Seven different experimental cases were conducted on the benchmark epilepsy EEG dataset. The system performance was evaluated using 10-fold cross-validation. In addition, we prove analytically that GModPCA has less time and space complexities as compared to PCA. The experimental results show that EEG signals have strong inter-sub-pattern correlations. GModPCA and SVM have been able to achieve 100% accuracy for the classification between normal and epileptic signals. Along with this, seven different experimental cases were tested. The classification results of the proposed approach were better than were compared the results of some of the existing methods proposed in literature. It is also found that the time and space complexities of GModPCA are less as compared to PCA. This study suggests that GModPCA and SVM could be used for automated epileptic seizure detection in EEG signal.
The construction of support vector machine classifier using the firefly algorithm.
Chao, Chih-Feng; Horng, Ming-Huwi
2015-01-01
The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.
The Construction of Support Vector Machine Classifier Using the Firefly Algorithm
Chao, Chih-Feng; Horng, Ming-Huwi
2015-01-01
The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy. PMID:25802511
NASA Astrophysics Data System (ADS)
Shao, Yanhua; Mei, Yanying; Chu, Hongyu; Chang, Zhiyuan; He, Yuxuan; Zhan, Huayi
2018-04-01
Pedestrian detection (PD) is an important application domain in computer vision and pattern recognition. Unmanned Aerial Vehicles (UAVs) have become a major field of research in recent years. In this paper, an algorithm for a robust pedestrian detection method based on the combination of the infrared HOG (IR-HOG) feature and SVM is proposed for highly complex outdoor scenarios on the basis of airborne IR image sequences from UAV. The basic flow of our application operation is as follows. Firstly, the thermal infrared imager (TAU2-336), which was installed on our Outdoor Autonomous Searching (OAS) UAV, is used for taking pictures of the designated outdoor area. Secondly, image sequences collecting and processing were accomplished by using high-performance embedded system with Samsung ODROID-XU4 and Ubuntu as the core and operating system respectively, and IR-HOG features were extracted. Finally, the SVM is used to train the pedestrian classifier. Experiment show that, our method shows promising results under complex conditions including strong noise corruption, partial occlusion etc.
Fu, Chunjiang; Wu, Gang; Lv, Fenglin; Tian, Feifei
2012-05-01
Many protein-protein interactions are mediated by a peptide-recognizing domain, such as WW, PDZ, or SH3. In the present study, we describe a new method called position-dependent noncovalent potential analysis (PDNPA), which can accurately characterize the nonbonding profile between the human endophilin-1 Src homology 3 (hEndo1 SH3) domain and its peptide ligands and quantitatively predict the binding affinity of peptide to hEndo1 SH3. In this procedure, structure models of diverse peptides in complex with the hEndo1 SH3 domain are constructed by molecular dynamics simulation and a virtual mutagenesis protocol. Subsequently, three noncovalent interactions associated with each position of the peptide ligand in the complexed state are analyzed using empirical potential functions, and the resulting potential descriptors are then correlated with the experimentally measured affinity on the basis of 1997 hEndo1 SH3-binding peptides with known activities, using linear partial least squares regression (PLS) and the nonlinear support vector machine (SVM). The results suggest that: (i) the electrostatics appears to be more important than steric properties and hydrophobicity in the formation of the hEndo1 SH3-peptide complex; (ii) P(-4) of the core decapeptide ligand with the sequence pattern P(-6)P(-5)P(-4)P(-3)P(-2)P(-1)P(0)P(1)P(2)P(3) is the most important position in terms of determining both the stability and specificity of the architecture of the complex, and; (iii) nonlinear SVM appears to be more effective than linear PLS for accurately predicting the binding affinity of a peptide ligand to hEndo1 SH3, whereas PLS models are straightforward and easy to interpret as compared to those built by SVM.
Optimization of Support Vector Machine (SVM) for Object Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew; Dhingra, Neil; Lu, Thomas T.; Chao, Tien-Hsin
2012-01-01
The Support Vector Machine (SVM) is a powerful algorithm, useful in classifying data into species. The SVMs implemented in this research were used as classifiers for the final stage in a Multistage Automatic Target Recognition (ATR) system. A single kernel SVM known as SVMlight, and a modified version known as a SVM with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SVM as a method for classification. From trial to trial, SVM produces consistent results.
Algorithm for detection the QRS complexes based on support vector machine
NASA Astrophysics Data System (ADS)
Van, G. V.; Podmasteryev, K. V.
2017-11-01
The efficiency of computer ECG analysis depends on the accurate detection of QRS-complexes. This paper presents an algorithm for QRS complex detection based of support vector machine (SVM). The proposed algorithm is evaluated on annotated standard databases such as MIT-BIH Arrhythmia database. The QRS detector obtained a sensitivity Se = 98.32% and specificity Sp = 95.46% for MIT-BIH Arrhythmia database. This algorithm can be used as the basis for the software to diagnose electrical activity of the heart.
SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.
SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306
Classifying High-noise EEG in Complex Environments for Brain-computer Interaction Technologies
2012-02-01
differentiation in the brain signal that our classification approach seeks to identify despite the noise in the recorded EEG signal and the complexity of...performed two offline classifications , one using BCILab (1), the other using LibSVM (2). Distinct classifiers were trained for each individual in...order to improve individual classifier performance (3). The highest classification performance results were obtained using individual frequency bands
gkmSVM: an R package for gapped-kmer SVM
Ghandi, Mahmoud; Mohammad-Noori, Morteza; Ghareghani, Narges; Lee, Dongwon; Garraway, Levi; Beer, Michael A.
2016-01-01
Summary: We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel. Availability and Implementation: gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The C ++ implementation is available at www.beerlab.org/gkmsvm Contact: mghandi@gmail.com or mbeer@jhu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153639
Classification of stellar spectra with SVM based on within-class scatter and between-class scatter
NASA Astrophysics Data System (ADS)
Liu, Zhong-bao; Zhou, Fang-xiao; Qin, Zhen-tao; Luo, Xue-gang; Zhang, Jing
2018-07-01
Support Vector Machine (SVM) is a popular data mining technique, and it has been widely applied in astronomical tasks, especially in stellar spectra classification. Since SVM doesn't take the data distribution into consideration, and therefore, its classification efficiencies can't be greatly improved. Meanwhile, SVM ignores the internal information of the training dataset, such as the within-class structure and between-class structure. In view of this, we propose a new classification algorithm-SVM based on Within-Class Scatter and Between-Class Scatter (WBS-SVM) in this paper. WBS-SVM tries to find an optimal hyperplane to separate two classes. The difference is that it incorporates minimum within-class scatter and maximum between-class scatter in Linear Discriminant Analysis (LDA) into SVM. These two scatters represent the distributions of the training dataset, and the optimization of WBS-SVM ensures the samples in the same class are as close as possible and the samples in different classes are as far as possible. Experiments on the K-, F-, G-type stellar spectra from Sloan Digital Sky Survey (SDSS), Data Release 8 show that our proposed WBS-SVM can greatly improve the classification accuracies.
Progressive Classification Using Support Vector Machines
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri; Kocurek, Michael
2009-01-01
An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user can halt this reclassification process at any point, thereby obtaining the best possible result for a given amount of computation time. Alternatively, the results can be displayed as they are generated, providing the user with real-time feedback about the current accuracy of classification.
a Gsa-Svm Hybrid System for Classification of Binary Problems
NASA Astrophysics Data System (ADS)
Sarafrazi, Soroor; Nezamabadi-pour, Hossein; Barahman, Mojgan
2011-06-01
This paperhybridizesgravitational search algorithm (GSA) with support vector machine (SVM) and made a novel GSA-SVM hybrid system to improve the classification accuracy in binary problems. GSA is an optimization heuristic toolused to optimize the value of SVM kernel parameter (in this paper, radial basis function (RBF) is chosen as the kernel function). The experimental results show that this newapproach can achieve high classification accuracy and is comparable to or better than the particle swarm optimization (PSO)-SVM and genetic algorithm (GA)-SVM, which are two hybrid systems for classification.
Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders.
Subasi, Abdulhamit
2013-06-01
Support vector machine (SVM) is an extensively used machine learning method with many biomedical signal classification applications. In this study, a novel PSO-SVM model has been proposed that hybridized the particle swarm optimization (PSO) and SVM to improve the EMG signal classification accuracy. This optimization mechanism involves kernel parameter setting in the SVM training procedure, which significantly influences the classification accuracy. The experiments were conducted on the basis of EMG signal to classify into normal, neurogenic or myopathic. In the proposed method the EMG signals were decomposed into the frequency sub-bands using discrete wavelet transform (DWT) and a set of statistical features were extracted from these sub-bands to represent the distribution of wavelet coefficients. The obtained results obviously validate the superiority of the SVM method compared to conventional machine learning methods, and suggest that further significant enhancements in terms of classification accuracy can be achieved by the proposed PSO-SVM classification system. The PSO-SVM yielded an overall accuracy of 97.41% on 1200 EMG signals selected from 27 subject records against 96.75%, 95.17% and 94.08% for the SVM, the k-NN and the RBF classifiers, respectively. PSO-SVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of PSO-SVM for diagnosis of neuromuscular disorders. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhan, Liwei; Li, Chengwei
2017-02-01
A hybrid PSO-SVM-based model is proposed to predict the friction coefficient between aircraft tire and coating. The presented hybrid model combines a support vector machine (SVM) with particle swarm optimization (PSO) technique. SVM has been adopted to solve regression problems successfully. Its regression accuracy is greatly related to optimizing parameters such as the regularization constant C , the parameter gamma γ corresponding to RBF kernel and the epsilon parameter \\varepsilon in the SVM training procedure. However, the friction coefficient which is predicted based on SVM has yet to be explored between aircraft tire and coating. The experiment reveals that drop height and tire rotational speed are the factors affecting friction coefficient. Bearing in mind, the friction coefficient can been predicted using the hybrid PSO-SVM-based model by the measured friction coefficient between aircraft tire and coating. To compare regression accuracy, a grid search (GS) method and a genetic algorithm (GA) are used to optimize the relevant parameters (C , γ and \\varepsilon ), respectively. The regression accuracy could be reflected by the coefficient of determination ({{R}2} ). The result shows that the hybrid PSO-RBF-SVM-based model has better accuracy compared with the GS-RBF-SVM- and GA-RBF-SVM-based models. The agreement of this model (PSO-RBF-SVM) with experiment data confirms its good performance.
Efficient HIK SVM learning for image classification.
Wu, Jianxin
2012-10-01
Histograms are used in almost every aspect of image processing and computer vision, from visual descriptors to image representations. Histogram intersection kernel (HIK) and support vector machine (SVM) classifiers are shown to be very effective in dealing with histograms. This paper presents contributions concerning HIK SVM for image classification. First, we propose intersection coordinate descent (ICD), a deterministic and scalable HIK SVM solver. ICD is much faster than, and has similar accuracies to, general purpose SVM solvers and other fast HIK SVM training methods. We also extend ICD to the efficient training of a broader family of kernels. Second, we show an important empirical observation that ICD is not sensitive to the C parameter in SVM, and we provide some theoretical analyses to explain this observation. ICD achieves high accuracies in many problems, using its default parameters. This is an attractive property for practitioners, because many image processing tasks are too large to choose SVM parameters using cross-validation.
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862
Density-based penalty parameter optimization on C-SVM.
Liu, Yun; Lian, Jie; Bartolacci, Michael R; Zeng, Qing-An
2014-01-01
The support vector machine (SVM) is one of the most widely used approaches for data classification and regression. SVM achieves the largest distance between the positive and negative support vectors, which neglects the remote instances away from the SVM interface. In order to avoid a position change of the SVM interface as the result of an error system outlier, C-SVM was implemented to decrease the influences of the system's outliers. Traditional C-SVM holds a uniform parameter C for both positive and negative instances; however, according to the different number proportions and the data distribution, positive and negative instances should be set with different weights for the penalty parameter of the error terms. Therefore, in this paper, we propose density-based penalty parameter optimization of C-SVM. The experiential results indicated that our proposed algorithm has outstanding performance with respect to both precision and recall.
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.
Fast and Accurate Support Vector Machines on Large Scale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vishnu, Abhinav; Narasimhan, Jayenthi; Holder, Larry
Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminatemore » the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively --- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm--- de facto sequential SVM software --- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.« less
Assessing the druggability of protein-protein interactions by a supervised machine-learning method.
Sugaya, Nobuyoshi; Ikeda, Kazuyoshi
2009-08-25
Protein-protein interactions (PPIs) are challenging but attractive targets of small molecule drugs for therapeutic interventions of human diseases. In this era of rapid accumulation of PPI data, there is great need for a methodology that can efficiently select drug target PPIs by holistically assessing the druggability of PPIs. To address this need, we propose here a novel approach based on a supervised machine-learning method, support vector machine (SVM). To assess the druggability of the PPIs, 69 attributes were selected to cover a wide range of structural, drug and chemical, and functional information on the PPIs. These attributes were used as feature vectors in the SVM-based method. Thirty PPIs known to be druggable were carefully selected from previous studies; these were used as positive instances. Our approach was applied to 1,295 human PPIs with tertiary structures of their protein complexes already solved. The best SVM model constructed discriminated the already-known target PPIs from others at an accuracy of 81% (sensitivity, 82%; specificity, 79%) in cross-validation. Among the attributes, the two with the greatest discriminative power in the best SVM model were the number of interacting proteins and the number of pathways. Using the model, we predicted several promising candidates for druggable PPIs, such as SMAD4/SKI. As more PPI data are accumulated in the near future, our method will have increased ability to accelerate the discovery of druggable PPIs.
gkmSVM: an R package for gapped-kmer SVM.
Ghandi, Mahmoud; Mohammad-Noori, Morteza; Ghareghani, Narges; Lee, Dongwon; Garraway, Levi; Beer, Michael A
2016-07-15
We present a new R package for training gapped-kmer SVM classifiers for DNA and protein sequences. We describe an improved algorithm for kernel matrix calculation that speeds run time by about 2 to 5-fold over our original gkmSVM algorithm. This package supports several sequence kernels, including: gkmSVM, kmer-SVM, mismatch kernel and wildcard kernel. gkmSVM package is freely available through the Comprehensive R Archive Network (CRAN), for Linux, Mac OS and Windows platforms. The C ++ implementation is available at www.beerlab.org/gkmsvm mghandi@gmail.com or mbeer@jhu.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Lee, Ching-Pei; Lin, Chih-Jen
2014-04-01
Linear rankSVM is one of the widely used methods for learning to rank. Although its performance may be inferior to nonlinear methods such as kernel rankSVM and gradient boosting decision trees, linear rankSVM is useful to quickly produce a baseline model. Furthermore, following its recent development for classification, linear rankSVM may give competitive performance for large and sparse data. A great deal of works have studied linear rankSVM. The focus is on the computational efficiency when the number of preference pairs is large. In this letter, we systematically study existing works, discuss their advantages and disadvantages, and propose an efficient algorithm. We discuss different implementation issues and extensions with detailed experiments. Finally, we develop a robust linear rankSVM tool for public use.
Identification of eggs from different production systems based on hyperspectra and CS-SVM.
Sun, J; Cong, S L; Mao, H P; Zhou, X; Wu, X H; Zhang, X D
2017-06-01
1. To identify the origin of table eggs more accurately, a method based on hyperspectral imaging technology was studied. 2. The hyperspectral data of 200 samples of intensive and extensive eggs were collected. Standard normalised variables combined with a Savitzky-Golay were used to eliminate noise, then stepwise regression (SWR) was used for feature selection. Grid search algorithm (GS), genetic search algorithm (GA), particle swarm optimisation algorithm (PSO) and cuckoo search algorithm (CS) were applied by support vector machine (SVM) methods to establish an SVM identification model with the optimal parameters. The full spectrum data and the data after feature selection were the input of the model, while egg category was the output. 3. The SWR-CS-SVM model performed better than the other models, including SWR-GS-SVM, SWR-GA-SVM, SWR-PSO-SVM and others based on full spectral data. The training and test classification accuracy of the SWR-CS-SVM model were respectively 99.3% and 96%. 4. SWR-CS-SVM proved effective for identifying egg varieties and could also be useful for the non-destructive identification of other types of egg.
Predicting enhancer activity and variant impact using gkm-SVM.
Beer, Michael A
2017-09-01
We participated in the Critical Assessment of Genome Interpretation eQTL challenge to further test computational models of regulatory variant impact and their association with human disease. Our prediction model is based on a discriminative gapped-kmer SVM (gkm-SVM) trained on genome-wide chromatin accessibility data in the cell type of interest. The comparisons with massively parallel reporter assays (MPRA) in lymphoblasts show that gkm-SVM is among the most accurate prediction models even though all other models used the MPRA data for model training, and gkm-SVM did not. In addition, we compare gkm-SVM with other MPRA datasets and show that gkm-SVM is a reliable predictor of expression and that deltaSVM is a reliable predictor of variant impact in K562 cells and mouse retina. We further show that DHS (DNase-I hypersensitive sites) and ATAC-seq (assay for transposase-accessible chromatin using sequencing) data are equally predictive substrates for training gkm-SVM, and that DHS regions flanked by H3K27Ac and H3K4me1 marks are more predictive than DHS regions alone. © 2017 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Li, Shao-Xin; Zeng, Qiu-Yao; Li, Lin-Fang; Zhang, Yan-Jiao; Wan, Ming-Ming; Liu, Zhi-Ming; Xiong, Hong-Lian; Guo, Zhou-Yi; Liu, Song-Hao
2013-02-01
The ability of combining serum surface-enhanced Raman spectroscopy (SERS) with support vector machine (SVM) for improving classification esophageal cancer patients from normal volunteers is investigated. Two groups of serum SERS spectra based on silver nanoparticles (AgNPs) are obtained: one group from patients with pathologically confirmed esophageal cancer (n=30) and the other group from healthy volunteers (n=31). Principal components analysis (PCA), conventional SVM (C-SVM) and conventional SVM combination with PCA (PCA-SVM) methods are implemented to classify the same spectral dataset. Results show that a diagnostic accuracy of 77.0% is acquired for PCA technique, while diagnostic accuracies of 83.6% and 85.2% are obtained for C-SVM and PCA-SVM methods based on radial basis functions (RBF) models. The results prove that RBF SVM models are superior to PCA algorithm in classification serum SERS spectra. The study demonstrates that serum SERS in combination with SVM technique has great potential to provide an effective and accurate diagnostic schema for noninvasive detection of esophageal cancer.
Application of GA-SVM method with parameter optimization for landslide development prediction
NASA Astrophysics Data System (ADS)
Li, X. Z.; Kong, J. M.
2013-10-01
Prediction of landslide development process is always a hot issue in landslide research. So far, many methods for landslide displacement series prediction have been proposed. Support vector machine (SVM) has been proved to be a novel algorithm with good performance. However, the performance strongly depends on the right selection of the parameters (C and γ) of SVM model. In this study, we presented an application of GA-SVM method with parameter optimization in landslide displacement rate prediction. We selected a typical large-scale landslide in some hydro - electrical engineering area of Southwest China as a case. On the basis of analyzing the basic characteristics and monitoring data of the landslide, a single-factor GA-SVM model and a multi-factor GA-SVM model of the landslide were built. Moreover, the models were compared with single-factor and multi-factor SVM models of the landslide. The results show that, the four models have high prediction accuracies, but the accuracies of GA-SVM models are slightly higher than those of SVM models and the accuracies of multi-factor models are slightly higher than those of single-factor models for the landslide prediction. The accuracy of the multi-factor GA-SVM models is the highest, with the smallest RSME of 0.0009 and the biggest RI of 0.9992.
Cross Validation Through Two-Dimensional Solution Surface for Cost-Sensitive SVM.
Gu, Bin; Sheng, Victor S; Tay, Keng Yeow; Romano, Walter; Li, Shuo
2017-06-01
Model selection plays an important role in cost-sensitive SVM (CS-SVM). It has been proven that the global minimum cross validation (CV) error can be efficiently computed based on the solution path for one parameter learning problems. However, it is a challenge to obtain the global minimum CV error for CS-SVM based on one-dimensional solution path and traditional grid search, because CS-SVM is with two regularization parameters. In this paper, we propose a solution and error surfaces based CV approach (CV-SES). More specifically, we first compute a two-dimensional solution surface for CS-SVM based on a bi-parameter space partition algorithm, which can fit solutions of CS-SVM for all values of both regularization parameters. Then, we compute a two-dimensional validation error surface for each CV fold, which can fit validation errors of CS-SVM for all values of both regularization parameters. Finally, we obtain the CV error surface by superposing K validation error surfaces, which can find the global minimum CV error of CS-SVM. Experiments are conducted on seven datasets for cost sensitive learning and on four datasets for imbalanced learning. Experimental results not only show that our proposed CV-SES has a better generalization ability than CS-SVM with various hybrids between grid search and solution path methods, and than recent proposed cost-sensitive hinge loss SVM with three-dimensional grid search, but also show that CV-SES uses less running time.
Gradient Evolution-based Support Vector Machine Algorithm for Classification
NASA Astrophysics Data System (ADS)
Zulvia, Ferani E.; Kuo, R. J.
2018-03-01
This paper proposes a classification algorithm based on a support vector machine (SVM) and gradient evolution (GE) algorithms. SVM algorithm has been widely used in classification. However, its result is significantly influenced by the parameters. Therefore, this paper aims to propose an improvement of SVM algorithm which can find the best SVMs’ parameters automatically. The proposed algorithm employs a GE algorithm to automatically determine the SVMs’ parameters. The GE algorithm takes a role as a global optimizer in finding the best parameter which will be used by SVM algorithm. The proposed GE-SVM algorithm is verified using some benchmark datasets and compared with other metaheuristic-based SVM algorithms. The experimental results show that the proposed GE-SVM algorithm obtains better results than other algorithms tested in this paper.
Extended robust support vector machine based on financial risk minimization.
Takeda, Akiko; Fujiwara, Shuhei; Kanamori, Takafumi
2014-11-01
Financial risk measures have been used recently in machine learning. For example, ν-support vector machine ν-SVM) minimizes the conditional value at risk (CVaR) of margin distribution. The measure is popular in finance because of the subadditivity property, but it is very sensitive to a few outliers in the tail of the distribution. We propose a new classification method, extended robust SVM (ER-SVM), which minimizes an intermediate risk measure between the CVaR and value at risk (VaR) by expecting that the resulting model becomes less sensitive than ν-SVM to outliers. We can regard ER-SVM as an extension of robust SVM, which uses a truncated hinge loss. Numerical experiments imply the ER-SVM's possibility of achieving a better prediction performance with proper parameter setting.
The generalization ability of online SVM classification based on Markov sampling.
Xu, Jie; Yan Tang, Yuan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang
2015-03-01
In this paper, we consider online support vector machine (SVM) classification learning algorithms with uniformly ergodic Markov chain (u.e.M.c.) samples. We establish the bound on the misclassification error of an online SVM classification algorithm with u.e.M.c. samples based on reproducing kernel Hilbert spaces and obtain a satisfactory convergence rate. We also introduce a novel online SVM classification algorithm based on Markov sampling, and present the numerical studies on the learning ability of online SVM classification based on Markov sampling for benchmark repository. The numerical studies show that the learning performance of the online SVM classification algorithm based on Markov sampling is better than that of classical online SVM classification based on random sampling as the size of training samples is larger.
Comparison of water extraction methods in Tibet based on GF-1 data
NASA Astrophysics Data System (ADS)
Jia, Lingjun; Shang, Kun; Liu, Jing; Sun, Zhongqing
2018-03-01
In this study, we compared four different water extraction methods with GF-1 data according to different water types in Tibet, including Support Vector Machine (SVM), Principal Component Analysis (PCA), Decision Tree Classifier based on False Normalized Difference Water Index (FNDWI-DTC), and PCA-SVM. The results show that all of the four methods can extract large area water body, but only SVM and PCA-SVM can obtain satisfying extraction results for small size water body. The methods were evaluated by both overall accuracy (OAA) and Kappa coefficient (KC). The OAA of PCA-SVM, SVM, FNDWI-DTC, PCA are 96.68%, 94.23%, 93.99%, 93.01%, and the KCs are 0.9308, 0.8995, 0.8962, 0.8842, respectively, in consistent with visual inspection. In summary, SVM is better for narrow rivers extraction and PCA-SVM is suitable for water extraction of various types. As for dark blue lakes, the methods using PCA can extract more quickly and accurately.
NASA Astrophysics Data System (ADS)
Gavrishchaka, V. V.; Ganguli, S. B.
2001-12-01
Reliable forecasting of rare events in a complex dynamical system is a challenging problem that is important for many practical applications. Due to the nature of rare events, data set available for construction of the statistical and/or machine learning model is often very limited and incomplete. Therefore many widely used approaches including such robust algorithms as neural networks can easily become inadequate for rare events prediction. Moreover in many practical cases models with high-dimensional inputs are required. This limits applications of the existing rare event modeling techniques (e.g., extreme value theory) that focus on univariate cases. These approaches are not easily extended to multivariate cases. Support vector machine (SVM) is a machine learning system that can provide an optimal generalization using very limited and incomplete training data sets and can efficiently handle high-dimensional data. These features may allow to use SVM to model rare events in some applications. We have applied SVM-based system to the problem of large-amplitude substorm prediction and extreme event forecasting in stock and currency exchange markets. Encouraging preliminary results will be presented and other possible applications of the system will be discussed.
NASA Astrophysics Data System (ADS)
Lu, Z. L.; Li, D. C.; Lu, B. H.; Zhang, A. F.; Zhu, G. X.; Pi, G.
2010-05-01
Laser Engineered Net Shaping (LENS) is an advanced manufacturing technology, but it is difficult to control the depositing height (DH) of the prototype because there are many technology parameters influencing the forming process. The effect of main parameters (laser power, scanning speed and powder feeding rate) on the DH of single track is firstly analyzed, and then it shows that there is the complex nonlinear intrinsic relationship between them. In order to predict the DH, the back propagation (BP) based network improved with Adaptive learning rate and Momentum coefficient (AM) algorithm, and the least square support vector machine (LS-SVM) network are both adopted. The mapping relationship between above parameters and the DH is constructed according to training samples collected by LENS experiments, and then their generalization ability, function-approximating ability and real-time are contrastively investigated. The results show that although the predicted result by the BP-AM approximates the experimental result, above performance index of the LS-SVM are better than those of the BP-AM. Finally, high-definition thin-walled parts of AISI316L are successfully fabricated. Hence, the LS-SVM network is more suitable for the prediction of the DH.
Excitons, trions, and biexcitons in transition-metal dichalcogenides: Magnetic-field dependence
NASA Astrophysics Data System (ADS)
Van der Donck, M.; Zarenia, M.; Peeters, F. M.
2018-05-01
The influence of a perpendicular magnetic field on the binding energy and structural properties of excitons, trions, and biexcitons in monolayers of semiconducting transition metal dichalcogenides (TMDs) is investigated. The stochastic variational method (SVM) with a correlated Gaussian basis is used to calculate the different properties of these few-particle systems. In addition, we present a simplified variational approach which supports the SVM results for excitons as a function of magnetic field. The exciton diamagnetic shift is compared with recent experimental results, and we extend this concept to trions and biexcitons. The effect of a local potential fluctuation, which we model by a circular potential well, on the binding energy of trions and biexcitons is investigated and found to significantly increase the binding of those excitonic complexes.
NASA Astrophysics Data System (ADS)
Khawaja, Taimoor Saleem
A high-belief low-overhead Prognostics and Health Management (PHM) system is desired for online real-time monitoring of complex non-linear systems operating in a complex (possibly non-Gaussian) noise environment. This thesis presents a Bayesian Least Squares Support Vector Machine (LS-SVM) based framework for fault diagnosis and failure prognosis in nonlinear non-Gaussian systems. The methodology assumes the availability of real-time process measurements, definition of a set of fault indicators and the existence of empirical knowledge (or historical data) to characterize both nominal and abnormal operating conditions. An efficient yet powerful Least Squares Support Vector Machine (LS-SVM) algorithm, set within a Bayesian Inference framework, not only allows for the development of real-time algorithms for diagnosis and prognosis but also provides a solid theoretical framework to address key concepts related to classification for diagnosis and regression modeling for prognosis. SVM machines are founded on the principle of Structural Risk Minimization (SRM) which tends to find a good trade-off between low empirical risk and small capacity. The key features in SVM are the use of non-linear kernels, the absence of local minima, the sparseness of the solution and the capacity control obtained by optimizing the margin. The Bayesian Inference framework linked with LS-SVMs allows a probabilistic interpretation of the results for diagnosis and prognosis. Additional levels of inference provide the much coveted features of adaptability and tunability of the modeling parameters. The two main modules considered in this research are fault diagnosis and failure prognosis. With the goal of designing an efficient and reliable fault diagnosis scheme, a novel Anomaly Detector is suggested based on the LS-SVM machines. The proposed scheme uses only baseline data to construct a 1-class LS-SVM machine which, when presented with online data is able to distinguish between normal behavior and any abnormal or novel data during real-time operation. The results of the scheme are interpreted as a posterior probability of health (1 - probability of fault). As shown through two case studies in Chapter 3, the scheme is well suited for diagnosing imminent faults in dynamical non-linear systems. Finally, the failure prognosis scheme is based on an incremental weighted Bayesian LS-SVR machine. It is particularly suited for online deployment given the incremental nature of the algorithm and the quick optimization problem solved in the LS-SVR algorithm. By way of kernelization and a Gaussian Mixture Modeling (GMM) scheme, the algorithm can estimate "possibly" non-Gaussian posterior distributions for complex non-linear systems. An efficient regression scheme associated with the more rigorous core algorithm allows for long-term predictions, fault growth estimation with confidence bounds and remaining useful life (RUL) estimation after a fault is detected. The leading contributions of this thesis are (a) the development of a novel Bayesian Anomaly Detector for efficient and reliable Fault Detection and Identification (FDI) based on Least Squares Support Vector Machines, (b) the development of a data-driven real-time architecture for long-term Failure Prognosis using Least Squares Support Vector Machines, (c) Uncertainty representation and management using Bayesian Inference for posterior distribution estimation and hyper-parameter tuning, and finally (d) the statistical characterization of the performance of diagnosis and prognosis algorithms in order to relate the efficiency and reliability of the proposed schemes.
Yu, Wei; Clyne, Melinda; Dolan, Siobhan M; Yesupriya, Ajay; Wulf, Anja; Liu, Tiebin; Khoury, Muin J; Gwinn, Marta
2008-01-01
Background Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. Results The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. Conclusion GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge. PMID:18430222
Lex-SVM: exploring the potential of exon expression profiling for disease classification.
Yuan, Xiongying; Zhao, Yi; Liu, Changning; Bu, Dongbo
2011-04-01
Exon expression profiling technologies, including exon arrays and RNA-Seq, measure the abundance of every exon in a gene. Compared with gene expression profiling technologies like 3' array, exon expression profiling technologies could detect alterations in both transcription and alternative splicing, therefore they are expected to be more sensitive in diagnosis. However, exon expression profiling also brings higher dimension, more redundancy, and significant correlation among features. Ignoring the correlation structure among exons of a gene, a popular classification method like L1-SVM selects exons individually from each gene and thus is vulnerable to noise. To overcome this limitation, we present in this paper a new variant of SVM named Lex-SVM to incorporate correlation structure among exons and known splicing patterns to promote classification performance. Specifically, we construct a new norm, ex-norm, including our prior knowledge on exon correlation structure to regularize the coefficients of a linear SVM. Lex-SVM can be solved efficiently using standard linear programming techniques. The advantage of Lex-SVM is that it can select features group-wisely, force features in a subgroup to take equal weihts and exclude the features that contradict the majority in the subgroup. Experimental results suggest that on exon expression profile, Lex-SVM is more accurate than existing methods. Lex-SVM also generates a more compact model and selects genes more consistently in cross-validation. Unlike L1-SVM selecting only one exon in a gene, Lex-SVM assigns equal weights to as many exons in a gene as possible, lending itself easier for further interpretation.
Semisupervised learning using Bayesian interpretation: application to LS-SVM.
Adankon, Mathias M; Cheriet, Mohamed; Biem, Alain
2011-04-01
Bayesian reasoning provides an ideal basis for representing and manipulating uncertain knowledge, with the result that many interesting algorithms in machine learning are based on Bayesian inference. In this paper, we use the Bayesian approach with one and two levels of inference to model the semisupervised learning problem and give its application to the successful kernel classifier support vector machine (SVM) and its variant least-squares SVM (LS-SVM). Taking advantage of Bayesian interpretation of LS-SVM, we develop a semisupervised learning algorithm for Bayesian LS-SVM using our approach based on two levels of inference. Experimental results on both artificial and real pattern recognition problems show the utility of our method.
Glavatskikh, Marta; Madzhidov, Timur; Solov'ev, Vitaly; Marcou, Gilles; Horvath, Dragos; Varnek, Alexandre
2016-12-01
In this work, we report QSPR modeling of the free energy ΔG of 1 : 1 hydrogen bond complexes of different H-bond acceptors and donors. The modeling was performed on a large and structurally diverse set of 3373 complexes featuring a single hydrogen bond, for which ΔG was measured at 298 K in CCl 4 . The models were prepared using Support Vector Machine and Multiple Linear Regression, with ISIDA fragment descriptors. The marked atoms strategy was applied at fragmentation stage, in order to capture the location of H-bond donor and acceptor centers. Different strategies of model validation have been suggested, including the targeted omission of individual H-bond acceptors and donors from the training set, in order to check whether the predictive ability of the model is not limited to the interpolation of H-bond strength between two already encountered partners. Successfully cross-validating individual models were combined into a consensus model, and challenged to predict external test sets of 629 and 12 complexes, in which donor and acceptor formed single and cooperative H-bonds, respectively. In all cases, SVM models outperform MLR. The SVM consensus model performs well both in 3-fold cross-validation (RMSE=1.50 kJ/mol), and on the external test sets containing complexes with single (RMSE=3.20 kJ/mol) and cooperative H-bonds (RMSE=1.63 kJ/mol). © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin
2010-12-01
We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.
Support vector machine for day ahead electricity price forecasting
NASA Astrophysics Data System (ADS)
Razak, Intan Azmira binti Wan Abdul; Abidin, Izham bin Zainal; Siah, Yap Keem; Rahman, Titik Khawa binti Abdul; Lada, M. Y.; Ramani, Anis Niza binti; Nasir, M. N. M.; Ahmad, Arfah binti
2015-05-01
Electricity price forecasting has become an important part of power system operation and planning. In a pool- based electric energy market, producers submit selling bids consisting in energy blocks and their corresponding minimum selling prices to the market operator. Meanwhile, consumers submit buying bids consisting in energy blocks and their corresponding maximum buying prices to the market operator. Hence, both producers and consumers use day ahead price forecasts to derive their respective bidding strategies to the electricity market yet reduce the cost of electricity. However, forecasting electricity prices is a complex task because price series is a non-stationary and highly volatile series. Many factors cause for price spikes such as volatility in load and fuel price as well as power import to and export from outside the market through long term contract. This paper introduces an approach of machine learning algorithm for day ahead electricity price forecasting with Least Square Support Vector Machine (LS-SVM). Previous day data of Hourly Ontario Electricity Price (HOEP), generation's price and demand from Ontario power market are used as the inputs for training data. The simulation is held using LSSVMlab in Matlab with the training and testing data of 2004. SVM that widely used for classification and regression has great generalization ability with structured risk minimization principle rather than empirical risk minimization. Moreover, same parameter settings in trained SVM give same results that absolutely reduce simulation process compared to other techniques such as neural network and time series. The mean absolute percentage error (MAPE) for the proposed model shows that SVM performs well compared to neural network.
NASA Astrophysics Data System (ADS)
Kale, Mandar; Mukhopadhyay, Sudipta; Dash, Jatindra K.; Garg, Mandeep; Khandelwal, Niranjan
2016-03-01
Interstitial lung disease (ILD) is complicated group of pulmonary disorders. High Resolution Computed Tomography (HRCT) considered to be best imaging technique for analysis of different pulmonary disorders. HRCT findings can be categorised in several patterns viz. Consolidation, Emphysema, Ground Glass Opacity, Nodular, Normal etc. based on their texture like appearance. Clinician often find it difficult to diagnosis these pattern because of their complex nature. In such scenario computer-aided diagnosis system could help clinician to identify patterns. Several approaches had been proposed for classification of ILD patterns. This includes computation of textural feature and training /testing of classifier such as artificial neural network (ANN), support vector machine (SVM) etc. In this paper, wavelet features are calculated from two different ILD database, publically available MedGIFT ILD database and private ILD database, followed by performance evaluation of ANN and SVM classifiers in terms of average accuracy. It is found that average classification accuracy by SVM is greater than ANN where trained and tested on same database. Investigation continued further to test variation in accuracy of classifier when training and testing is performed with alternate database and training and testing of classifier with database formed by merging samples from same class from two individual databases. The average classification accuracy drops when two independent databases used for training and testing respectively. There is significant improvement in average accuracy when classifiers are trained and tested with merged database. It infers dependency of classification accuracy on training data. It is observed that SVM outperforms ANN when same database is used for training and testing.
Protein classification based on text document classification techniques.
Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith
2005-03-01
The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.
NASA Astrophysics Data System (ADS)
Wang, X.; Xu, L.
2018-04-01
One of the most important applications of remote sensing classification is water extraction. The water index (WI) based on Landsat images is one of the most common ways to distinguish water bodies from other land surface features. But conventional WI methods take into account spectral information only form a limited number of bands, and therefore the accuracy of those WI methods may be constrained in some areas which are covered with snow/ice, clouds, etc. An accurate and robust water extraction method is the key to the study at present. The support vector machine (SVM) using all bands spectral information can reduce for these classification error to some extent. Nevertheless, SVM which barely considers spatial information is relatively sensitive to noise in local regions. Conditional random field (CRF) which considers both spatial information and spectral information has proven to be able to compensate for these limitations. Hence, in this paper, we develop a systematic water extraction method by taking advantage of the complementarity between the SVM and a water index-guided stochastic fully-connected conditional random field (SVM-WIGSFCRF) to address the above issues. In addition, we comprehensively evaluate the reliability and accuracy of the proposed method using Landsat-8 operational land imager (OLI) images of one test site. We assess the method's performance by calculating the following accuracy metrics: Omission Errors (OE) and Commission Errors (CE); Kappa coefficient (KP) and Total Error (TE). Experimental results show that the new method can improve target detection accuracy under complex and changeable environments.
Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging.
Wang, Rui; Li, Rui; Lei, Yanyan; Zhu, Quing
2015-01-01
Support vector machine (SVM) is one of the most effective classification methods for cancer detection. The efficiency and quality of a SVM classifier depends strongly on several important features and a set of proper parameters. Here, a series of classification analyses, with one set of photoacoustic data from ovarian tissues ex vivo and a widely used breast cancer dataset- the Wisconsin Diagnostic Breast Cancer (WDBC), revealed the different accuracy of a SVM classification in terms of the number of features used and the parameters selected. A pattern recognition system is proposed by means of SVM-Recursive Feature Elimination (RFE) with the Radial Basis Function (RBF) kernel. To improve the effectiveness and robustness of the system, an optimized tuning ensemble algorithm called as SVM-RFE(C) with correlation filter was implemented to quantify feature and parameter information based on cross validation. The proposed algorithm is first demonstrated outperforming SVM-RFE on WDBC. Then the best accuracy of 94.643% and sensitivity of 94.595% were achieved when using SVM-RFE(C) to test 57 new PAT data from 19 patients. The experiment results show that the classifier constructed with SVM-RFE(C) algorithm is able to learn additional information from new data and has significant potential in ovarian cancer diagnosis.
Sriwastava, Brijesh Kumar; Basu, Subhadip; Maulik, Ujjwal
2015-10-01
Protein-protein interaction (PPI) site prediction aids to ascertain the interface residues that participate in interaction processes. Fuzzy support vector machine (F-SVM) is proposed as an effective method to solve this problem, and we have shown that the performance of the classical SVM can be enhanced with the help of an interaction-affinity based fuzzy membership function. The performances of both SVM and F-SVM on the PPI databases of the Homo sapiens and E. coli organisms are evaluated and estimated the statistical significance of the developed method over classical SVM and other fuzzy membership-based SVM methods available in the literature. Our membership function uses the residue-level interaction affinity scores for each pair of positive and negative sequence fragments. The average AUC scores in the 10-fold cross-validation experiments are measured as 79.94% and 80.48% for the Homo sapiens and E. coli organisms respectively. On the independent test datasets, AUC scores are obtained as 76.59% and 80.17% respectively for the two organisms. In almost all cases, the developed F-SVM method improves the performances obtained by the corresponding classical SVM and the other classifiers, available in the literature.
Suresh, V; Parthasarathy, S
2014-01-01
We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.
2012-01-01
Background Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in electrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophic disease. To this end, we propose a new method, which employs wavelets and simple feature selection. Methods For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in 90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for differentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalized and signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point. We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers to those features. Results We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods. Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively. The SVM classifier detects 355 ischemic ST episodes. Conclusions We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any numerical values of the parameters to be supplied in advance. In the case of the SVM classifier, one has to select a single parameter. PMID:22703641
Improving protein complex classification accuracy using amino acid composition profile.
Huang, Chien-Hung; Chou, Szu-Yu; Ng, Ka-Lok
2013-09-01
Protein complex prediction approaches are based on the assumptions that complexes have dense protein-protein interactions and high functional similarity between their subunits. We investigated those assumptions by studying the subunits' interaction topology, sequence similarity and molecular function for human and yeast protein complexes. Inclusion of amino acids' physicochemical properties can provide better understanding of protein complex properties. Principal component analysis is carried out to determine the major features. Adopting amino acid composition profile information with the SVM classifier serves as an effective post-processing step for complexes classification. Improvement is based on primary sequence information only, which is easy to obtain. Copyright © 2013 Elsevier Ltd. All rights reserved.
Guinness, Robert E
2015-04-28
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.
Guinness, Robert E.
2015-01-01
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity. PMID:25928060
Analysis of miRNA expression profile based on SVM algorithm
NASA Astrophysics Data System (ADS)
Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian
2018-05-01
Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.
NASA Astrophysics Data System (ADS)
Vasat, Radim; Klement, Ales; Jaksik, Ondrej; Kodesova, Radka; Drabek, Ondrej; Boruvka, Lubos
2014-05-01
Visible and near-infrared diffuse reflectance spectroscopy (VNIR-DRS) provides a rapid and inexpensive tool for simultaneous prediction of a variety of soil properties. Usually, some sophisticated multivariate mathematical or statistical methods are employed in order to extract the required information from the raw spectra measurement. For this purpose especially the Partial least squares regression (PLSR) and Support vector machines (SVM) are the most frequently used. These methods generally benefit from the complexity with which the soil spectra are treated. But it is interesting that also techniques that focus only on a single spectral feature, such as a simple linear regression with selected continuum-removed spectra (CRS) characteristic (e.g. peak depth), can often provide competitive results. Therefore, we decided to enhance the potential of CRS taking into account all possible CRS peak parameters (area, width and depth) and develop a comprehensive methodology based on multiple linear regression approach. The eight considered soil properties were oxidizable carbon content (Cox), exchangeable (pHex) and active soil pH (pHa), particle and bulk density, CaCO3 content, crystalline and amorphous (Fed) and amorphous Fe (Feox) forms. In four cases (pHa, bulk density, Fed and Feox), of which two (Fed and Feox) were predicted reliably accurately (0.50 < R2cv < 0.80) and the other two (pHa and bulk density) only poorly (R2cv < 0.50), we obtained slightly better results than with PLSR and SVM. In one case (pHex) we achieved a significantly higher, although just reliable, accuracy (R2cv = 0.601) than with PLSR and SVM (R2cv = 0.448 and 0.442, resp.). But most interestingly, in the case of particle density, the presented approach outperformed the PLSR and SVM dramatically offering a fairly accurate prediction (R2cv = 0.827) against two failures (R2cv = 0.034 and 0.121 for PLSR and SVM, resp.). In last two cases (Cox and CaCO3) a slightly worse results were achieved then with PLSR and SVM with overall fairly accurate prediction (R2cv > 0.80). Acknowledgment: Authors acknowledge the financial support of the Ministry of Agriculture of the Czech Republic (grant No. QJ1230319).
Solution Path for Pin-SVM Classifiers With Positive and Negative $\\tau $ Values.
Huang, Xiaolin; Shi, Lei; Suykens, Johan A K
2017-07-01
Applying the pinball loss in a support vector machine (SVM) classifier results in pin-SVM. The pinball loss is characterized by a parameter τ . Its value is related to the quantile level and different τ values are suitable for different problems. In this paper, we establish an algorithm to find the entire solution path for pin-SVM with different τ values. This algorithm is based on the fact that the optimal solution to pin-SVM is continuous and piecewise linear with respect to τ . We also show that the nonnegativity constraint on τ is not necessary, i.e., τ can be extended to negative values. First, in some applications, a negative τ leads to better accuracy. Second, τ = -1 corresponds to a simple solution that links SVM and the classical kernel rule. The solution for τ = -1 can be obtained directly and then be used as a starting point of the solution path. The proposed method efficiently traverses τ values through the solution path, and then achieves good performance by a suitable τ . In particular, τ = 0 corresponds to C-SVM, meaning that the traversal algorithm can output a result at least as good as C-SVM with respect to validation error.
Support Vector Machine Based on Adaptive Acceleration Particle Swarm Optimization
Abdulameer, Mohammed Hasan; Othman, Zulaiha Ali
2014-01-01
Existing face recognition methods utilize particle swarm optimizer (PSO) and opposition based particle swarm optimizer (OPSO) to optimize the parameters of SVM. However, the utilization of random values in the velocity calculation decreases the performance of these techniques; that is, during the velocity computation, we normally use random values for the acceleration coefficients and this creates randomness in the solution. To address this problem, an adaptive acceleration particle swarm optimization (AAPSO) technique is proposed. To evaluate our proposed method, we employ both face and iris recognition based on AAPSO with SVM (AAPSO-SVM). In the face and iris recognition systems, performance is evaluated using two human face databases, YALE and CASIA, and the UBiris dataset. In this method, we initially perform feature extraction and then recognition on the extracted features. In the recognition process, the extracted features are used for SVM training and testing. During the training and testing, the SVM parameters are optimized with the AAPSO technique, and in AAPSO, the acceleration coefficients are computed using the particle fitness values. The parameters in SVM, which are optimized by AAPSO, perform efficiently for both face and iris recognition. A comparative analysis between our proposed AAPSO-SVM and the PSO-SVM technique is presented. PMID:24790584
Overlaid caption extraction in news video based on SVM
NASA Astrophysics Data System (ADS)
Liu, Manman; Su, Yuting; Ji, Zhong
2007-11-01
Overlaid caption in news video often carries condensed semantic information which is key cues for content-based video indexing and retrieval. However, it is still a challenging work to extract caption from video because of its complex background and low resolution. In this paper, we propose an effective overlaid caption extraction approach for news video. We first scan the video key frames using a small window, and then classify the blocks into the text and non-text ones via support vector machine (SVM), with statistical features extracted from the gray level co-occurrence matrices, the LH and HL sub-bands wavelet coefficients and the orientated edge intensity ratios. Finally morphological filtering and projection profile analysis are employed to localize and refine the candidate caption regions. Experiments show its high performance on four 30-minute news video programs.
An SVM model with hybrid kernels for hydrological time series
NASA Astrophysics Data System (ADS)
Wang, C.; Wang, H.; Zhao, X.; Xie, Q.
2017-12-01
Support Vector Machine (SVM) models have been widely applied to the forecast of climate/weather and its impact on other environmental variables such as hydrologic response to climate/weather. When using SVM, the choice of the kernel function plays the key role. Conventional SVM models mostly use one single type of kernel function, e.g., radial basis kernel function. Provided that there are several featured kernel functions available, each having its own advantages and drawbacks, a combination of these kernel functions may give more flexibility and robustness to SVM approach, making it suitable for a wide range of application scenarios. This paper presents such a linear combination of radial basis kernel and polynomial kernel for the forecast of monthly flowrate in two gaging stations using SVM approach. The results indicate significant improvement in the accuracy of predicted series compared to the approach with either individual kernel function, thus demonstrating the feasibility and advantages of such hybrid kernel approach for SVM applications.
Tuning support vector machines for minimax and Neyman-Pearson classification.
Davenport, Mark A; Baraniuk, Richard G; Scott, Clayton D
2010-10-01
This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and Neyman-Pearson criteria. In principle, these criteria can be optimized in a straightforward way using a cost-sensitive SVM. In practice, however, because these criteria require especially accurate error estimation, standard techniques for tuning SVM parameters, such as cross-validation, can lead to poor classifier performance. To address this issue, we first prove that the usual cost-sensitive SVM, here called the 2C-SVM, is equivalent to another formulation called the 2nu-SVM. We then exploit a characterization of the 2nu-SVM parameter space to develop a simple yet powerful approach to error estimation based on smoothing. In an extensive experimental study, we demonstrate that smoothing significantly improves the accuracy of cross-validation error estimates, leading to dramatic performance gains. Furthermore, we propose coordinate descent strategies that offer significant gains in computational efficiency, with little to no loss in performance.
Rodriguez, Javier; Voss, Andreas; Caminal, Pere; Bayes-Genis, Antoni; Giraldo, Beatriz F
2017-07-01
Cardiac death risk is still a big problem by an important part of the population, especially in elderly patients. In this study, we propose to characterize and analyze the cardiovascular and cardiorespiratory systems using the Poincaré plot. A total of 46 cardiomyopathy patients and 36 healthy subjets were analyzed. Left ventricular ejection fraction (LVEF) was used to stratify patients with low risk (LR: LVEF > 35%, 16 patients), and high risk (HR: LVEF ≤ 35%, 30 patients) of heart attack. RR, SBP and T Tot time series were extracted from the ECG, blood pressure and respiratory flow signals, respectively. Parameters that describe the scatterplott of Poincaré method, related to short- and long-term variabilities, acceleration and deceleration of the dynamic system, and the complex correlation index were extracted. The linear discriminant analysis (LDA) and the support vector machines (SVM) classification methods were used to analyze the results of the extracted parameters. The results showed that cardiac parameters were the best to discriminate between HR and LR groups, especially the complex correlation index (p = 0.009). Analising the interaction, the best result was obtained with the relation between the difference of the standard deviation of the cardiac and respiratory system (p = 0.003). When comparing HR vs LR groups, the best classification was obtained applying SVM method, using an ANOVA kernel, with an accuracy of 98.12%. An accuracy of 97.01% was obtained by comparing patients versus healthy, with a SVM classifier and Laplacian kernel. The morphology of Poincaré plot introduces parameters that allow the characterization of the cardiorespiratory system dynamics.
NASA Astrophysics Data System (ADS)
Luo, Jianjun; Wei, Caisheng; Dai, Honghua; Yuan, Jianping
2018-03-01
This paper focuses on robust adaptive control for a class of uncertain nonlinear systems subject to input saturation and external disturbance with guaranteed predefined tracking performance. To reduce the limitations of classical predefined performance control method in the presence of unknown initial tracking errors, a novel predefined performance function with time-varying design parameters is first proposed. Then, aiming at reducing the complexity of nonlinear approximations, only two least-square-support-vector-machine-based (LS-SVM-based) approximators with two design parameters are required through norm form transformation of the original system. Further, a novel LS-SVM-based adaptive constrained control scheme is developed under the time-vary predefined performance using backstepping technique. Wherein, to avoid the tedious analysis and repeated differentiations of virtual control laws in the backstepping technique, a simple and robust finite-time-convergent differentiator is devised to only extract its first-order derivative at each step in the presence of external disturbance. In this sense, the inherent demerit of backstepping technique-;explosion of terms; brought by the recursive virtual controller design is conquered. Moreover, an auxiliary system is designed to compensate the control saturation. Finally, three groups of numerical simulations are employed to validate the effectiveness of the newly developed differentiator and the proposed adaptive constrained control scheme.
Sung, Yao-Ting; Chen, Ju-Ling; Cha, Ji-Her; Tseng, Hou-Chiang; Chang, Tao-Hsing; Chang, Kuo-En
2015-06-01
Multilevel linguistic features have been proposed for discourse analysis, but there have been few applications of multilevel linguistic features to readability models and also few validations of such models. Most traditional readability formulae are based on generalized linear models (GLMs; e.g., discriminant analysis and multiple regression), but these models have to comply with certain statistical assumptions about data properties and include all of the data in formulae construction without pruning the outliers in advance. The use of such readability formulae tends to produce a low text classification accuracy, while using a support vector machine (SVM) in machine learning can enhance the classification outcome. The present study constructed readability models by integrating multilevel linguistic features with SVM, which is more appropriate for text classification. Taking the Chinese language as an example, this study developed 31 linguistic features as the predicting variables at the word, semantic, syntax, and cohesion levels, with grade levels of texts as the criterion variable. The study compared four types of readability models by integrating unilevel and multilevel linguistic features with GLMs and an SVM. The results indicate that adopting a multilevel approach in readability analysis provides a better representation of the complexities of both texts and the reading comprehension process.
A tri-fold hybrid classification approach for diagnostics with unexampled faulty states
NASA Astrophysics Data System (ADS)
Tamilselvan, Prasanna; Wang, Pingfeng
2015-01-01
System health diagnostics provides diversified benefits such as improved safety, improved reliability and reduced costs for the operation and maintenance of engineered systems. Successful health diagnostics requires the knowledge of system failures. However, with an increasing system complexity, it is extraordinarily difficult to have a well-tested system so that all potential faulty states can be realized and studied at product testing stage. Thus, real time health diagnostics requires automatic detection of unexampled system faulty states based upon sensory data to avoid sudden catastrophic system failures. This paper presents a trifold hybrid classification (THC) approach for structural health diagnosis with unexampled health states (UHS), which comprises of preliminary UHS identification using a new thresholded Mahalanobis distance (TMD) classifier, UHS diagnostics using a two-class support vector machine (SVM) classifier, and exampled health states diagnostics using a multi-class SVM classifier. The proposed THC approach, which takes the advantages of both TMD and SVM-based classification techniques, is able to identify and isolate the unexampled faulty states through interactively detecting the deviation of sensory data from the exampled health states and forming new ones autonomously. The proposed THC approach is further extended to a generic framework for health diagnostics problems with unexampled faulty states and demonstrated with health diagnostics case studies for power transformers and rolling bearings.
Bahrami, Sheyda; Shamsi, Mousa
2017-01-01
Functional magnetic resonance imaging (fMRI) is a popular method to probe the functional organization of the brain using hemodynamic responses. In this method, volume images of the entire brain are obtained with a very good spatial resolution and low temporal resolution. However, they always suffer from high dimensionality in the face of classification algorithms. In this work, we combine a support vector machine (SVM) with a self-organizing map (SOM) for having a feature-based classification by using SVM. Then, a linear kernel SVM is used for detecting the active areas. Here, we use SOM for feature extracting and labeling the datasets. SOM has two major advances: (i) it reduces dimension of data sets for having less computational complexity and (ii) it is useful for identifying brain regions with small onset differences in hemodynamic responses. Our non-parametric model is compared with parametric and non-parametric methods. We use simulated fMRI data sets and block design inputs in this paper and consider the contrast to noise ratio (CNR) value equal to 0.6 for simulated datasets. fMRI simulated dataset has contrast 1-4% in active areas. The accuracy of our proposed method is 93.63% and the error rate is 6.37%.
Chiogna, Gabriele; Marcolini, Giorgia; Liu, Wanying; Pérez Ciria, Teresa; Tuo, Ye
2018-08-15
Water management in the alpine region has an important impact on streamflow. In particular, hydropower production is known to cause hydropeaking i.e., sudden fluctuations in river stage caused by the release or storage of water in artificial reservoirs. Modeling hydropeaking with hydrological models, such as the Soil Water Assessment Tool (SWAT), requires knowledge of reservoir management rules. These data are often not available since they are sensitive information belonging to hydropower production companies. In this short communication, we propose to couple the results of a calibrated hydrological model with a machine learning method to reproduce hydropeaking without requiring the knowledge of the actual reservoir management operation. We trained a support vector machine (SVM) with SWAT model outputs, the day of the week and the energy price. We tested the model for the Upper Adige river basin in North-East Italy. A wavelet analysis showed that energy price has a significant influence on river discharge, and a wavelet coherence analysis demonstrated the improved performance of the SVM model in comparison to the SWAT model alone. The SVM model was also able to capture the fluctuations in streamflow caused by hydropeaking when both energy price and river discharge displayed a complex temporal dynamic. Copyright © 2018 Elsevier B.V. All rights reserved.
Learning using privileged information: SVM+ and weighted SVM.
Lapin, Maksim; Hein, Matthias; Schiele, Bernt
2014-05-01
Prior knowledge can be used to improve predictive performance of learning algorithms or reduce the amount of data required for training. The same goal is pursued within the learning using privileged information paradigm which was recently introduced by Vapnik et al. and is aimed at utilizing additional information available only at training time-a framework implemented by SVM+. We relate the privileged information to importance weighting and show that the prior knowledge expressible with privileged features can also be encoded by weights associated with every training example. We show that a weighted SVM can always replicate an SVM+ solution, while the converse is not true and we construct a counterexample highlighting the limitations of SVM+. Finally, we touch on the problem of choosing weights for weighted SVMs when privileged features are not available. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.
2017-10-01
Crop maps are essential inputs for the agricultural planning done at various governmental and agribusinesses agencies. Remote sensing offers timely and costs efficient technologies to identify and map crop types over large areas. Among the plethora of classification methods, Support Vector Machine (SVM) and Random Forest (RF) are widely used because of their proven performance. In this work, we study the synergic use of both methods by introducing a random forest kernel (RFK) in an SVM classifier. A time series of multispectral WorldView-2 images acquired over Mali (West Africa) in 2014 was used to develop our case study. Ground truth containing five common crop classes (cotton, maize, millet, peanut, and sorghum) were collected at 45 farms and used to train and test the classifiers. An SVM with the standard Radial Basis Function (RBF) kernel, a RF, and an SVM-RFK were trained and tested over 10 random training and test subsets generated from the ground data. Results show that the newly proposed SVM-RFK classifier can compete with both RF and SVM-RBF. The overall accuracies based on the spectral bands only are of 83, 82 and 83% respectively. Adding vegetation indices to the analysis result in the classification accuracy of 82, 81 and 84% for SVM-RFK, RF, and SVM-RBF respectively. Overall, it can be observed that the newly tested RFK can compete with SVM-RBF and RF classifiers in terms of classification accuracy.
Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Manurung, Jonson; Mawengkang, Herman; Zamzami, Elviawaty
2017-12-01
Support vector machine (SVM) is a popular classification method known to have strong generalization capabilities. SVM can solve the problem of classification and linear regression or nonlinear kernel which can be a learning algorithm for the ability of classification and regression. However, SVM also has a weakness that is difficult to determine the optimal parameter value. SVM calculates the best linear separator on the input feature space according to the training data. To classify data which are non-linearly separable, SVM uses kernel tricks to transform the data into a linearly separable data on a higher dimension feature space. The kernel trick using various kinds of kernel functions, such as : linear kernel, polynomial, radial base function (RBF) and sigmoid. Each function has parameters which affect the accuracy of SVM classification. To solve the problem genetic algorithms are proposed to be applied as the optimal parameter value search algorithm thus increasing the best classification accuracy on SVM. Data taken from UCI repository of machine learning database: Australian Credit Approval. The results show that the combination of SVM and genetic algorithms is effective in improving classification accuracy. Genetic algorithms has been shown to be effective in systematically finding optimal kernel parameters for SVM, instead of randomly selected kernel parameters. The best accuracy for data has been upgraded from kernel Linear: 85.12%, polynomial: 81.76%, RBF: 77.22% Sigmoid: 78.70%. However, for bigger data sizes, this method is not practical because it takes a lot of time.
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine
Manavalan, Balachandran; Shin, Tae H.; Lee, Gwang
2018-01-01
Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html. PMID:29616000
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.
Manavalan, Balachandran; Shin, Tae H; Lee, Gwang
2018-01-01
Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.
Face recognition using total margin-based adaptive fuzzy support vector machines.
Liu, Yi-Hung; Chen, Yen-Ting
2007-01-01
This paper presents a new classifier called total margin-based adaptive fuzzy support vector machines (TAF-SVM) that deals with several problems that may occur in support vector machines (SVMs) when applied to the face recognition. The proposed TAF-SVM not only solves the overfitting problem resulted from the outlier with the approach of fuzzification of the penalty, but also corrects the skew of the optimal separating hyperplane due to the very imbalanced data sets by using different cost algorithm. In addition, by introducing the total margin algorithm to replace the conventional soft margin algorithm, a lower generalization error bound can be obtained. Those three functions are embodied into the traditional SVM so that the TAF-SVM is proposed and reformulated in both linear and nonlinear cases. By using two databases, the Chung Yuan Christian University (CYCU) multiview and the facial recognition technology (FERET) face databases, and using the kernel Fisher's discriminant analysis (KFDA) algorithm to extract discriminating face features, experimental results show that the proposed TAF-SVM is superior to SVM in terms of the face-recognition accuracy. The results also indicate that the proposed TAF-SVM can achieve smaller error variances than SVM over a number of tests such that better recognition stability can be obtained.
Zhang, Huiling; Huang, Qingsheng; Bei, Zhendong; Wei, Yanjie; Floudas, Christodoulos A
2016-03-01
In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position-specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα-Cα atoms. First, using a rigorous leave-one-protein-out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state-of-the-art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/. © 2016 Wiley Periodicals, Inc.
Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat
2016-12-22
The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
... enough of the enzyme to break down certain complex molecules, the molecules build up in harmful amounts. ... chromosome, an enzyme that's needed to break down complex sugar molecules is missing or malfunctioning. Without this ...
lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.
Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia
2015-01-01
Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.
[Study on application of SVM in prediction of coronary heart disease].
Zhu, Yue; Wu, Jianghua; Fang, Ying
2013-12-01
Base on the data of blood pressure, plasma lipid, Glu and UA by physical test, Support Vector Machine (SVM) was applied to identify coronary heart disease (CHD) in patients and non-CHD individuals in south China population for guide of further prevention and treatment of the disease. Firstly, the SVM classifier was built using radial basis kernel function, liner kernel function and polynomial kernel function, respectively. Secondly, the SVM penalty factor C and kernel parameter sigma were optimized by particle swarm optimization (PSO) and then employed to diagnose and predict the CHD. By comparison with those from artificial neural network with the back propagation (BP) model, linear discriminant analysis, logistic regression method and non-optimized SVM, the overall results of our calculation demonstrated that the classification performance of optimized RBF-SVM model could be superior to other classifier algorithm with higher accuracy rate, sensitivity and specificity, which were 94.51%, 92.31% and 96.67%, respectively. So, it is well concluded that SVM could be used as a valid method for assisting diagnosis of CHD.
NASA Astrophysics Data System (ADS)
Su, Lihong
In remote sensing communities, support vector machine (SVM) learning has recently received increasing attention. SVM learning usually requires large memory and enormous amounts of computation time on large training sets. According to SVM algorithms, the SVM classification decision function is fully determined by support vectors, which compose a subset of the training sets. In this regard, a solution to optimize SVM learning is to efficiently reduce training sets. In this paper, a data reduction method based on agglomerative hierarchical clustering is proposed to obtain smaller training sets for SVM learning. Using a multiple angle remote sensing dataset of a semi-arid region, the effectiveness of the proposed method is evaluated by classification experiments with a series of reduced training sets. The experiments show that there is no loss of SVM accuracy when the original training set is reduced to 34% using the proposed approach. Maximum likelihood classification (MLC) also is applied on the reduced training sets. The results show that MLC can also maintain the classification accuracy. This implies that the most informative data instances can be retained by this approach.
A fast button surface defects detection method based on convolutional neural network
NASA Astrophysics Data System (ADS)
Liu, Lizhe; Cao, Danhua; Wu, Songlin; Wu, Yubin; Wei, Taoran
2018-01-01
Considering the complexity of the button surface texture and the variety of buttons and defects, we propose a fast visual method for button surface defect detection, based on convolutional neural network (CNN). CNN has the ability to extract the essential features by training, avoiding designing complex feature operators adapted to different kinds of buttons, textures and defects. Firstly, we obtain the normalized button region and then use HOG-SVM method to identify the front and back side of the button. Finally, a convolutional neural network is developed to recognize the defects. Aiming at detecting the subtle defects, we propose a network structure with multiple feature channels input. To deal with the defects of different scales, we take a strategy of multi-scale image block detection. The experimental results show that our method is valid for a variety of buttons and able to recognize all kinds of defects that have occurred, including dent, crack, stain, hole, wrong paint and uneven. The detection rate exceeds 96%, which is much better than traditional methods based on SVM and methods based on template match. Our method can reach the speed of 5 fps on DSP based smart camera with 600 MHz frequency.
Efficient Exact Inference With Loss Augmented Objective in Structured Learning.
Bauer, Alexander; Nakajima, Shinichi; Muller, Klaus-Robert
2016-08-19
Structural support vector machine (SVM) is an elegant approach for building complex and accurate models with structured outputs. However, its applicability relies on the availability of efficient inference algorithms--the state-of-the-art training algorithms repeatedly perform inference to compute a subgradient or to find the most violating configuration. In this paper, we propose an exact inference algorithm for maximizing nondecomposable objectives due to special type of a high-order potential having a decomposable internal structure. As an important application, our method covers the loss augmented inference, which enables the slack and margin scaling formulations of structural SVM with a variety of dissimilarity measures, e.g., Hamming loss, precision and recall, Fβ-loss, intersection over union, and many other functions that can be efficiently computed from the contingency table. We demonstrate the advantages of our approach in natural language parsing and sequence segmentation applications.
MIEC-SVM: automated pipeline for protein peptide/ligand interaction prediction.
Li, Nan; Ainsworth, Richard I; Wu, Meixin; Ding, Bo; Wang, Wei
2016-03-15
MIEC-SVM is a structure-based method for predicting protein recognition specificity. Here, we present an automated MIEC-SVM pipeline providing an integrated and user-friendly workflow for construction and application of the MIEC-SVM models. This pipeline can handle standard amino acids and those with post-translational modifications (PTMs) or small molecules. Moreover, multi-threading and support to Sun Grid Engine (SGE) are implemented to significantly boost the computational efficiency. The program is available at http://wanglab.ucsd.edu/MIEC-SVM CONTACT: : wei-wang@ucsd.edu Supplementary data available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
[Hyperspectral remote sensing image classification based on SVM optimized by clonal selection].
Liu, Qing-Jie; Jing, Lin-Hai; Wang, Meng-Fei; Lin, Qi-Zhong
2013-03-01
Model selection for support vector machine (SVM) involving kernel and the margin parameter values selection is usually time-consuming, impacts training efficiency of SVM model and final classification accuracies of SVM hyperspectral remote sensing image classifier greatly. Firstly, based on combinatorial optimization theory and cross-validation method, artificial immune clonal selection algorithm is introduced to the optimal selection of SVM (CSSVM) kernel parameter a and margin parameter C to improve the training efficiency of SVM model. Then an experiment of classifying AVIRIS in India Pine site of USA was performed for testing the novel CSSVM, as well as a traditional SVM classifier with general Grid Searching cross-validation method (GSSVM) for comparison. And then, evaluation indexes including SVM model training time, classification overall accuracy (OA) and Kappa index of both CSSVM and GSSVM were all analyzed quantitatively. It is demonstrated that OA of CSSVM on test samples and whole image are 85.1% and 81.58, the differences from that of GSSVM are both within 0.08% respectively; And Kappa indexes reach 0.8213 and 0.7728, the differences from that of GSSVM are both within 0.001; While the ratio of model training time of CSSVM and GSSVM is between 1/6 and 1/10. Therefore, CSSVM is fast and accurate algorithm for hyperspectral image classification and is superior to GSSVM.
Li, Weide; Kong, Demeng; Wu, Jinran
2017-01-01
Air pollution in China is becoming more serious especially for the particular matter (PM) because of rapid economic growth and fast expansion of urbanization. To solve the growing environment problems, daily PM2.5 and PM10 concentration data form January 1, 2015, to August 23, 2016, in Kunming and Yuxi (two important cities in Yunnan Province, China) are used to present a new hybrid model CI-FPA-SVM to forecast air PM2.5 and PM10 concentration in this paper. The proposed model involves two parts. Firstly, due to its deficiency to assess the possible correlation between different variables, the cointegration theory is introduced to get the input-output relationship and then obtain the nonlinear dynamical system with support vector machine (SVM), in which the parameters c and g are optimized by flower pollination algorithm (FPA). Six benchmark models, including FPA-SVM, CI-SVM, CI-GA-SVM, CI-PSO-SVM, CI-FPA-NN, and multiple linear regression model, are considered to verify the superiority of the proposed hybrid model. The empirical study results demonstrate that the proposed model CI-FPA-SVM is remarkably superior to all considered benchmark models for its high prediction accuracy, and the application of the model for forecasting can give effective monitoring and management of further air quality.
sw-SVM: sensor weighting support vector machines for EEG-based brain-computer interfaces.
Jrad, N; Congedo, M; Phlypo, R; Rousseau, S; Flamary, R; Yger, F; Rakotomamonjy, A
2011-10-01
In many machine learning applications, like brain-computer interfaces (BCI), high-dimensional sensor array data are available. Sensor measurements are often highly correlated and signal-to-noise ratio is not homogeneously spread across sensors. Thus, collected data are highly variable and discrimination tasks are challenging. In this work, we focus on sensor weighting as an efficient tool to improve the classification procedure. We present an approach integrating sensor weighting in the classification framework. Sensor weights are considered as hyper-parameters to be learned by a support vector machine (SVM). The resulting sensor weighting SVM (sw-SVM) is designed to satisfy a margin criterion, that is, the generalization error. Experimental studies on two data sets are presented, a P300 data set and an error-related potential (ErrP) data set. For the P300 data set (BCI competition III), for which a large number of trials is available, the sw-SVM proves to perform equivalently with respect to the ensemble SVM strategy that won the competition. For the ErrP data set, for which a small number of trials are available, the sw-SVM shows superior performances as compared to three state-of-the art approaches. Results suggest that the sw-SVM promises to be useful in event-related potentials classification, even with a small number of training trials.
Wu, Jinran
2017-01-01
Air pollution in China is becoming more serious especially for the particular matter (PM) because of rapid economic growth and fast expansion of urbanization. To solve the growing environment problems, daily PM2.5 and PM10 concentration data form January 1, 2015, to August 23, 2016, in Kunming and Yuxi (two important cities in Yunnan Province, China) are used to present a new hybrid model CI-FPA-SVM to forecast air PM2.5 and PM10 concentration in this paper. The proposed model involves two parts. Firstly, due to its deficiency to assess the possible correlation between different variables, the cointegration theory is introduced to get the input-output relationship and then obtain the nonlinear dynamical system with support vector machine (SVM), in which the parameters c and g are optimized by flower pollination algorithm (FPA). Six benchmark models, including FPA-SVM, CI-SVM, CI-GA-SVM, CI-PSO-SVM, CI-FPA-NN, and multiple linear regression model, are considered to verify the superiority of the proposed hybrid model. The empirical study results demonstrate that the proposed model CI-FPA-SVM is remarkably superior to all considered benchmark models for its high prediction accuracy, and the application of the model for forecasting can give effective monitoring and management of further air quality. PMID:28932237
Yu, Xiao; Ding, Enjie; Chen, Chunxu; Liu, Xiaoming; Li, Li
2015-01-01
Because roller element bearings (REBs) failures cause unexpected machinery breakdowns, their fault diagnosis has attracted considerable research attention. Established fault feature extraction methods focus on statistical characteristics of the vibration signal, which is an approach that loses sight of the continuous waveform features. Considering this weakness, this article proposes a novel feature extraction method for frequency bands, named Window Marginal Spectrum Clustering (WMSC) to select salient features from the marginal spectrum of vibration signals by Hilbert–Huang Transform (HHT). In WMSC, a sliding window is used to divide an entire HHT marginal spectrum (HMS) into window spectrums, following which Rand Index (RI) criterion of clustering method is used to evaluate each window. The windows returning higher RI values are selected to construct characteristic frequency bands (CFBs). Next, a hybrid REBs fault diagnosis is constructed, termed by its elements, HHT-WMSC-SVM (support vector machines). The effectiveness of HHT-WMSC-SVM is validated by running series of experiments on REBs defect datasets from the Bearing Data Center of Case Western Reserve University (CWRU). The said test results evidence three major advantages of the novel method. First, the fault classification accuracy of the HHT-WMSC-SVM model is higher than that of HHT-SVM and ST-SVM, which is a method that combines statistical characteristics with SVM. Second, with Gauss white noise added to the original REBs defect dataset, the HHT-WMSC-SVM model maintains high classification accuracy, while the classification accuracy of ST-SVM and HHT-SVM models are significantly reduced. Third, fault classification accuracy by HHT-WMSC-SVM can exceed 95% under a Pmin range of 500–800 and a m range of 50–300 for REBs defect dataset, adding Gauss white noise at Signal Noise Ratio (SNR) = 5. Experimental results indicate that the proposed WMSC method yields a high REBs fault classification accuracy and a good performance in Gauss white noise reduction. PMID:26540059
Yu, Xiao; Ding, Enjie; Chen, Chunxu; Liu, Xiaoming; Li, Li
2015-11-03
Because roller element bearings (REBs) failures cause unexpected machinery breakdowns, their fault diagnosis has attracted considerable research attention. Established fault feature extraction methods focus on statistical characteristics of the vibration signal, which is an approach that loses sight of the continuous waveform features. Considering this weakness, this article proposes a novel feature extraction method for frequency bands, named Window Marginal Spectrum Clustering (WMSC) to select salient features from the marginal spectrum of vibration signals by Hilbert-Huang Transform (HHT). In WMSC, a sliding window is used to divide an entire HHT marginal spectrum (HMS) into window spectrums, following which Rand Index (RI) criterion of clustering method is used to evaluate each window. The windows returning higher RI values are selected to construct characteristic frequency bands (CFBs). Next, a hybrid REBs fault diagnosis is constructed, termed by its elements, HHT-WMSC-SVM (support vector machines). The effectiveness of HHT-WMSC-SVM is validated by running series of experiments on REBs defect datasets from the Bearing Data Center of Case Western Reserve University (CWRU). The said test results evidence three major advantages of the novel method. First, the fault classification accuracy of the HHT-WMSC-SVM model is higher than that of HHT-SVM and ST-SVM, which is a method that combines statistical characteristics with SVM. Second, with Gauss white noise added to the original REBs defect dataset, the HHT-WMSC-SVM model maintains high classification accuracy, while the classification accuracy of ST-SVM and HHT-SVM models are significantly reduced. Third, fault classification accuracy by HHT-WMSC-SVM can exceed 95% under a Pmin range of 500-800 and a m range of 50-300 for REBs defect dataset, adding Gauss white noise at Signal Noise Ratio (SNR) = 5. Experimental results indicate that the proposed WMSC method yields a high REBs fault classification accuracy and a good performance in Gauss white noise reduction.
A Sensor Dynamic Measurement Error Prediction Model Based on NAPSO-SVM.
Jiang, Minlan; Jiang, Lan; Jiang, Dingde; Li, Fei; Song, Houbing
2018-01-15
Dynamic measurement error correction is an effective way to improve sensor precision. Dynamic measurement error prediction is an important part of error correction, and support vector machine (SVM) is often used for predicting the dynamic measurement errors of sensors. Traditionally, the SVM parameters were always set manually, which cannot ensure the model's performance. In this paper, a SVM method based on an improved particle swarm optimization (NAPSO) is proposed to predict the dynamic measurement errors of sensors. Natural selection and simulated annealing are added in the PSO to raise the ability to avoid local optima. To verify the performance of NAPSO-SVM, three types of algorithms are selected to optimize the SVM's parameters: the particle swarm optimization algorithm (PSO), the improved PSO optimization algorithm (NAPSO), and the glowworm swarm optimization (GSO). The dynamic measurement error data of two sensors are applied as the test data. The root mean squared error and mean absolute percentage error are employed to evaluate the prediction models' performances. The experimental results show that among the three tested algorithms the NAPSO-SVM method has a better prediction precision and a less prediction errors, and it is an effective method for predicting the dynamic measurement errors of sensors.
A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics.
Halloran, John T; Rocke, David M
2018-05-04
Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide-spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l 2 -SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l 2 -SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l 2 -SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade .
NASA Astrophysics Data System (ADS)
Li, S. X.; Zhang, Y. J.; Zeng, Q. Y.; Li, L. F.; Guo, Z. Y.; Liu, Z. M.; Xiong, H. L.; Liu, S. H.
2014-06-01
Cancer is the most common disease to threaten human health. The ability to screen individuals with malignant tumours with only a blood sample would be greatly advantageous to early diagnosis and intervention. This study explores the possibility of discriminating between cancer patients and normal subjects with serum surface-enhanced Raman spectroscopy (SERS) and a support vector machine (SVM) through a peripheral blood sample. A total of 130 blood samples were obtained from patients with liver cancer, colonic cancer, esophageal cancer, nasopharyngeal cancer, gastric cancer, as well as 113 blood samples from normal volunteers. Several diagnostic models were built with the serum SERS spectra using SVM and principal component analysis (PCA) techniques. The results show that a diagnostic accuracy of 85.5% is acquired with a PCA algorithm, while a diagnostic accuracy of 95.8% is obtained using radial basis function (RBF), PCA-SVM methods. The results prove that a RBF kernel PCA-SVM technique is superior to PCA and conventional SVM (C-SVM) algorithms in classification serum SERS spectra. The study demonstrates that serum SERS, in combination with SVM techniques, has great potential for screening cancerous patients with any solid malignant tumour through a peripheral blood sample.
Hybrid NN/SVM Computational System for Optimizing Designs
NASA Technical Reports Server (NTRS)
Rai, Man Mohan
2009-01-01
A computational method and system based on a hybrid of an artificial neural network (NN) and a support vector machine (SVM) (see figure) has been conceived as a means of maximizing or minimizing an objective function, optionally subject to one or more constraints. Such maximization or minimization could be performed, for example, to optimize solve a data-regression or data-classification problem or to optimize a design associated with a response function. A response function can be considered as a subset of a response surface, which is a surface in a vector space of design and performance parameters. A typical example of a design problem that the method and system can be used to solve is that of an airfoil, for which a response function could be the spatial distribution of pressure over the airfoil. In this example, the response surface would describe the pressure distribution as a function of the operating conditions and the geometric parameters of the airfoil. The use of NNs to analyze physical objects in order to optimize their responses under specified physical conditions is well known. NN analysis is suitable for multidimensional interpolation of data that lack structure and enables the representation and optimization of a succession of numerical solutions of increasing complexity or increasing fidelity to the real world. NN analysis is especially useful in helping to satisfy multiple design objectives. Feedforward NNs can be used to make estimates based on nonlinear mathematical models. One difficulty associated with use of a feedforward NN arises from the need for nonlinear optimization to determine connection weights among input, intermediate, and output variables. It can be very expensive to train an NN in cases in which it is necessary to model large amounts of information. Less widely known (in comparison with NNs) are support vector machines (SVMs), which were originally applied in statistical learning theory. In terms that are necessarily oversimplified to fit the scope of this article, an SVM can be characterized as an algorithm that (1) effects a nonlinear mapping of input vectors into a higher-dimensional feature space and (2) involves a dual formulation of governing equations and constraints. One advantageous feature of the SVM approach is that an objective function (which one seeks to minimize to obtain coefficients that define an SVM mathematical model) is convex, so that unlike in the cases of many NN models, any local minimum of an SVM model is also a global minimum.
KDM5A demethylase: Erasing histone modifications to promote repair of DNA breaks
2017-01-01
Repairing DNA breaks within the complexity of the cell chromatin is challenging. In this issue, Gong et al. (2017. J. Cell Biol. https://doi.org/10.1083/jcb.201611135) identify the histone demethylase KDM5A as a critical editor of the cells’ “histone code” that is required to recruit DNA repair complexes to DNA breaks. PMID:28572116
A Classification of Remote Sensing Image Based on Improved Compound Kernels of Svm
NASA Astrophysics Data System (ADS)
Zhao, Jianing; Gao, Wanlin; Liu, Zili; Mou, Guifen; Lu, Lin; Yu, Lina
The accuracy of RS classification based on SVM which is developed from statistical learning theory is high under small number of train samples, which results in satisfaction of classification on RS using SVM methods. The traditional RS classification method combines visual interpretation with computer classification. The accuracy of the RS classification, however, is improved a lot based on SVM method, because it saves much labor and time which is used to interpret images and collect training samples. Kernel functions play an important part in the SVM algorithm. It uses improved compound kernel function and therefore has a higher accuracy of classification on RS images. Moreover, compound kernel improves the generalization and learning ability of the kernel.
Design of Clinical Support Systems Using Integrated Genetic Algorithm and Support Vector Machine
NASA Astrophysics Data System (ADS)
Chen, Yung-Fu; Huang, Yung-Fa; Jiang, Xiaoyi; Hsu, Yuan-Nian; Lin, Hsuan-Hung
Clinical decision support system (CDSS) provides knowledge and specific information for clinicians to enhance diagnostic efficiency and improving healthcare quality. An appropriate CDSS can highly elevate patient safety, improve healthcare quality, and increase cost-effectiveness. Support vector machine (SVM) is believed to be superior to traditional statistical and neural network classifiers. However, it is critical to determine suitable combination of SVM parameters regarding classification performance. Genetic algorithm (GA) can find optimal solution within an acceptable time, and is faster than greedy algorithm with exhaustive searching strategy. By taking the advantage of GA in quickly selecting the salient features and adjusting SVM parameters, a method using integrated GA and SVM (IGS), which is different from the traditional method with GA used for feature selection and SVM for classification, was used to design CDSSs for prediction of successful ventilation weaning, diagnosis of patients with severe obstructive sleep apnea, and discrimination of different cell types form Pap smear. The results show that IGS is better than methods using SVM alone or linear discriminator.
Construction accident narrative classification: An evaluation of text mining techniques.
Goh, Yang Miang; Ubeynarayana, C U
2017-11-01
Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.
Novel maximum-margin training algorithms for supervised neural networks.
Ludwig, Oswaldo; Nunes, Urbano
2010-06-01
This paper proposes three novel training methods, two of them based on the backpropagation approach and a third one based on information theory for multilayer perceptron (MLP) binary classifiers. Both backpropagation methods are based on the maximal-margin (MM) principle. The first one, based on the gradient descent with adaptive learning rate algorithm (GDX) and named maximum-margin GDX (MMGDX), directly increases the margin of the MLP output-layer hyperplane. The proposed method jointly optimizes both MLP layers in a single process, backpropagating the gradient of an MM-based objective function, through the output and hidden layers, in order to create a hidden-layer space that enables a higher margin for the output-layer hyperplane, avoiding the testing of many arbitrary kernels, as occurs in case of support vector machine (SVM) training. The proposed MM-based objective function aims to stretch out the margin to its limit. An objective function based on Lp-norm is also proposed in order to take into account the idea of support vectors, however, overcoming the complexity involved in solving a constrained optimization problem, usually in SVM training. In fact, all the training methods proposed in this paper have time and space complexities O(N) while usual SVM training methods have time complexity O(N (3)) and space complexity O(N (2)) , where N is the training-data-set size. The second approach, named minimization of interclass interference (MICI), has an objective function inspired on the Fisher discriminant analysis. Such algorithm aims to create an MLP hidden output where the patterns have a desirable statistical distribution. In both training methods, the maximum area under ROC curve (AUC) is applied as stop criterion. The third approach offers a robust training framework able to take the best of each proposed training method. The main idea is to compose a neural model by using neurons extracted from three other neural networks, each one previously trained by MICI, MMGDX, and Levenberg-Marquard (LM), respectively. The resulting neural network was named assembled neural network (ASNN). Benchmark data sets of real-world problems have been used in experiments that enable a comparison with other state-of-the-art classifiers. The results provide evidence of the effectiveness of our methods regarding accuracy, AUC, and balanced error rate.
Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM.
Davari Dolatabadi, Azam; Khadem, Siamak Esmael Zadeh; Asl, Babak Mohammadzadeh
2017-01-01
Currently Coronary Artery Disease (CAD) is one of the most prevalent diseases, and also can lead to death, disability and economic loss in patients who suffer from cardiovascular disease. Diagnostic procedures of this disease by medical teams are typically invasive, although they do not satisfy the required accuracy. In this study, we have proposed a methodology for the automatic diagnosis of normal and Coronary Artery Disease conditions using Heart Rate Variability (HRV) signal extracted from electrocardiogram (ECG). The features are extracted from HRV signal in time, frequency and nonlinear domains. The Principal Component Analysis (PCA) is applied to reduce the dimension of the extracted features in order to reduce computational complexity and to reveal the hidden information underlaid in the data. Finally, Support Vector Machine (SVM) classifier has been utilized to classify two classes of data using the extracted distinguishing features. In this paper, parameters of the SVM have been optimized in order to improve the accuracy. Provided reports in this paper indicate that the detection of CAD class from normal class using the proposed algorithm was performed with accuracy of 99.2%, sensitivity of 98.43%, and specificity of 100%. This study has shown that methods which are based on the feature extraction of the biomedical signals are an appropriate approach to predict the health situation of the patients. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai
2017-01-01
Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. PMID:28737705
Computer-Based Readability Testing of Information Booklets for German Cancer Patients.
Keinki, Christian; Zowalla, Richard; Pobiruchin, Monika; Huebner, Jutta; Wiesner, Martin
2018-04-12
Understandable health information is essential for treatment adherence and improved health outcomes. For readability testing, several instruments analyze the complexity of sentence structures, e.g., Flesch-Reading Ease (FRE) or Vienna-Formula (WSTF). Moreover, the vocabulary is of high relevance for readers. The aim of this study is to investigate the agreement of sentence structure and vocabulary-based (SVM) instruments. A total of 52 freely available German patient information booklets on cancer were collected from the Internet. The mean understandability level L was computed for 51 booklets. The resulting values of FRE, WSTF, and SVM were assessed pairwise for agreement with Bland-Altman plots and two-sided, paired t tests. For the pairwise comparison, the mean L values are L FRE = 6.81, L WSTF = 7.39, L SVM = 5.09. The sentence structure-based metrics gave significantly different scores (P < 0.001) for all assessed booklets, confirmed by the Bland-Altman analysis. The study findings suggest that vocabulary-based instruments cannot be interchanged with FRE/WSTF. However, both analytical aspects should be considered and checked by authors to linguistically refine texts with respect to the individual target group. Authors of health information can be supported by automated readability analysis. Health professionals can benefit by direct booklet comparisons allowing for time-effective selection of suitable booklets for patients.
Hong, Haoyuan; Tsangaratos, Paraskevas; Ilia, Ioanna; Liu, Junzhi; Zhu, A-Xing; Xu, Chong
2018-07-15
The main objective of the present study was to utilize Genetic Algorithms (GA) in order to obtain the optimal combination of forest fire related variables and apply data mining methods for constructing a forest fire susceptibility map. In the proposed approach, a Random Forest (RF) and a Support Vector Machine (SVM) was used to produce a forest fire susceptibility map for the Dayu County which is located in southwest of Jiangxi Province, China. For this purpose, historic forest fires and thirteen forest fire related variables were analyzed, namely: elevation, slope angle, aspect, curvature, land use, soil cover, heat load index, normalized difference vegetation index, mean annual temperature, mean annual wind speed, mean annual rainfall, distance to river network and distance to road network. The Natural Break and the Certainty Factor method were used to classify and weight the thirteen variables, while a multicollinearity analysis was performed to determine the correlation among the variables and decide about their usability. The optimal set of variables, determined by the GA limited the number of variables into eight excluding from the analysis, aspect, land use, heat load index, distance to river network and mean annual rainfall. The performance of the forest fire models was evaluated by using the area under the Receiver Operating Characteristic curve (ROC-AUC) based on the validation dataset. Overall, the RF models gave higher AUC values. Also the results showed that the proposed optimized models outperform the original models. Specifically, the optimized RF model gave the best results (0.8495), followed by the original RF (0.8169), while the optimized SVM gave lower values (0.7456) than the RF, however higher than the original SVM (0.7148) model. The study highlights the significance of feature selection techniques in forest fire susceptibility, whereas data mining methods could be considered as a valid approach for forest fire susceptibility modeling. Copyright © 2018 Elsevier B.V. All rights reserved.
Lu, Bingxin; Leong, Hon Wai
2016-02-01
Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.
[Identification of varieties of cashmere by Vis/NIR spectroscopy technology based on PCA-SVM].
Wu, Gui-Fang; He, Yong
2009-06-01
One mixed algorithm was presented to discriminate cashmere varieties with principal component analysis (PCA) and support vector machine (SVM). Cashmere fiber has such characteristics as threadlike, softness, glossiness and high tensile strength. The quality characters and economic value of each breed of cashmere are very different. In order to safeguard the consumer's rights and guarantee the quality of cashmere product, quickly, efficiently and correctly identifying cashmere has significant meaning to the production and transaction of cashmere material. The present research adopts Vis/NIRS spectroscopy diffuse techniques to collect the spectral data of cashmere. The near infrared fingerprint of cashmere was acquired by principal component analysis (PCA), and support vector machine (SVM) methods were used to further identify the cashmere material. The result of PCA indicated that the score map made by the scores of PC1, PC2 and PC3 was used, and 10 principal components (PCs) were selected as the input of support vector machine (SVM) based on the reliabilities of PCs of 99.99%. One hundred cashmere samples were used for calibration and the remaining 75 cashmere samples were used for validation. A one-against-all multi-class SVM model was built, the capabilities of SVM with different kernel function were comparatively analyzed, and the result showed that SVM possessing with the Gaussian kernel function has the best identification capabilities with the accuracy of 100%. This research indicated that the data mining method of PCA-SVM has a good identification effect, and can work as a new method for rapid identification of cashmere material varieties.
Fuzzy support vector machine for microarray imbalanced data classification
NASA Astrophysics Data System (ADS)
Ladayya, Faroh; Purnami, Santi Wulan; Irhamah
2017-11-01
DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different contributions to the classifier. The minority classes have large fuzzy membership so FSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is a high dimensional data with a very large number of features, it is necessary to do feature selection first using Fast Correlation based Filter (FCBF). In this study will be analyzed by SVM, FSVM and both methods by applying FCBF and get the classification performance of them. Based on the overall results, FSVM on selected features has the best classification performance compared to SVM.
Noninvasive extraction of fetal electrocardiogram based on Support Vector Machine
NASA Astrophysics Data System (ADS)
Fu, Yumei; Xiang, Shihan; Chen, Tianyi; Zhou, Ping; Huang, Weiyan
2015-10-01
The fetal electrocardiogram (FECG) signal has important clinical value for diagnosing the fetal heart diseases and choosing suitable therapeutics schemes to doctors. So, the noninvasive extraction of FECG from electrocardiogram (ECG) signals becomes a hot research point. A new method, the Support Vector Machine (SVM) is utilized for the extraction of FECG with limited size of data. Firstly, the theory of the SVM and the principle of the extraction based on the SVM are studied. Secondly, the transformation of maternal electrocardiogram (MECG) component in abdominal composite signal is verified to be nonlinear and fitted with the SVM. Then, the SVM is trained, and the training results are compared with the real data to ensure the effect of the training. Meanwhile, the parameters of the SVM are optimized to achieve the best performance so that the learning machine can be utilized to fit the unknown samples. Finally, the FECG is extracted by removing the optimal estimation of MECG component from the abdominal composite signal. In order to evaluate the performance of FECG extraction based on the SVM, the Signal-to-Noise Ratio (SNR) and the visual test are used. The experimental results show that the FECG with good quality can be extracted, its SNR ratio is significantly increased as high as 9.2349 dB and the time cost is significantly decreased as short as 0.802 seconds. Compared with the traditional method, the noninvasive extraction method based on the SVM has a simple realization, the shorter treatment time and the better extraction quality under the same conditions.
Agricultural mapping using Support Vector Machine-Based Endmember Extraction (SVM-BEE)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archibald, Richard K; Filippi, Anthony M; Bhaduri, Budhendra L
Extracting endmembers from remotely sensed images of vegetated areas can present difficulties. In this research, we applied a recently developed endmember-extraction algorithm based on Support Vector Machines (SVMs) to the problem of semi-autonomous estimation of vegetation endmembers from a hyperspectral image. This algorithm, referred to as Support Vector Machine-Based Endmember Extraction (SVM-BEE), accurately and rapidly yields a computed representation of hyperspectral data that can accommodate multiple distributions. The number of distributions is identified without prior knowledge, based upon this representation. Prior work established that SVM-BEE is robustly noise-tolerant and can semi-automatically and effectively estimate endmembers; synthetic data and a geologicmore » scene were previously analyzed. Here we compared the efficacies of the SVM-BEE and N-FINDR algorithms in extracting endmembers from a predominantly agricultural scene. SVM-BEE was able to estimate vegetation and other endmembers for all classes in the image, which N-FINDR failed to do. Classifications based on SVM-BEE endmembers were markedly more accurate compared with those based on N-FINDR endmembers.« less
NASA Astrophysics Data System (ADS)
Quitadamo, L. R.; Cavrini, F.; Sbernini, L.; Riillo, F.; Bianchi, L.; Seri, S.; Saggio, G.
2017-02-01
Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported.
A Fast Reduced Kernel Extreme Learning Machine.
Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua
2016-04-01
In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Du, Peijun; Tan, Kun; Xing, Xiaoshi
2010-12-01
Combining Support Vector Machine (SVM) with wavelet analysis, we constructed wavelet SVM (WSVM) classifier based on wavelet kernel functions in Reproducing Kernel Hilbert Space (RKHS). In conventional kernel theory, SVM is faced with the bottleneck of kernel parameter selection which further results in time-consuming and low classification accuracy. The wavelet kernel in RKHS is a kind of multidimensional wavelet function that can approximate arbitrary nonlinear functions. Implications on semiparametric estimation are proposed in this paper. Airborne Operational Modular Imaging Spectrometer II (OMIS II) hyperspectral remote sensing image with 64 bands and Reflective Optics System Imaging Spectrometer (ROSIS) data with 115 bands were used to experiment the performance and accuracy of the proposed WSVM classifier. The experimental results indicate that the WSVM classifier can obtain the highest accuracy when using the Coiflet Kernel function in wavelet transform. In contrast with some traditional classifiers, including Spectral Angle Mapping (SAM) and Minimum Distance Classification (MDC), and SVM classifier using Radial Basis Function kernel, the proposed wavelet SVM classifier using the wavelet kernel function in Reproducing Kernel Hilbert Space is capable of improving classification accuracy obviously.
Support vector machine in machine condition monitoring and fault diagnosis
NASA Astrophysics Data System (ADS)
Widodo, Achmad; Yang, Bo-Suk
2007-08-01
Recently, the issue of machine condition monitoring and fault diagnosis as a part of maintenance system became global due to the potential advantages to be gained from reduced maintenance costs, improved productivity and increased machine availability. This paper presents a survey of machine condition monitoring and fault diagnosis using support vector machine (SVM). It attempts to summarize and review the recent research and developments of SVM in machine condition monitoring and diagnosis. Numerous methods have been developed based on intelligent systems such as artificial neural network, fuzzy expert system, condition-based reasoning, random forest, etc. However, the use of SVM for machine condition monitoring and fault diagnosis is still rare. SVM has excellent performance in generalization so it can produce high accuracy in classification for machine condition monitoring and diagnosis. Until 2006, the use of SVM in machine condition monitoring and fault diagnosis is tending to develop towards expertise orientation and problem-oriented domain. Finally, the ability to continually change and obtain a novel idea for machine condition monitoring and fault diagnosis using SVM will be future works.
NASA Astrophysics Data System (ADS)
Wei, ZHANG; Tongyu, WU; Bowen, ZHENG; Shiping, LI; Yipo, ZHANG; Zejie, YIN
2018-04-01
A new neutron-gamma discriminator based on the support vector machine (SVM) method is proposed to improve the performance of the time-of-flight neutron spectrometer. The neutron detector is an EJ-299-33 plastic scintillator with pulse-shape discrimination (PSD) property. The SVM algorithm is implemented in field programmable gate array (FPGA) to carry out the real-time sifting of neutrons in neutron-gamma mixed radiation fields. This study compares the ability of the pulse gradient analysis method and the SVM method. The results show that this SVM discriminator can provide a better discrimination accuracy of 99.1%. The accuracy and performance of the SVM discriminator based on FPGA have been evaluated in the experiments. It can get a figure of merit of 1.30.
An improved conjugate gradient scheme to the solution of least squares SVM.
Chu, Wei; Ong, Chong Jin; Keerthi, S Sathiya
2005-03-01
The least square support vector machines (LS-SVM) formulation corresponds to the solution of a linear system of equations. Several approaches to its numerical solutions have been proposed in the literature. In this letter, we propose an improved method to the numerical solution of LS-SVM and show that the problem can be solved using one reduced system of linear equations. Compared with the existing algorithm for LS-SVM, the approach used in this letter is about twice as efficient. Numerical results using the proposed method are provided for comparisons with other existing algorithms.
NASA Astrophysics Data System (ADS)
Pullanagari, Reddy; Kereszturi, Gábor; Yule, Ian J.; Ghamisi, Pedram
2017-04-01
Accurate and spatially detailed mapping of complex urban environments is essential for land managers. Classifying high spectral and spatial resolution hyperspectral images is a challenging task because of its data abundance and computational complexity. Approaches with a combination of spectral and spatial information in a single classification framework have attracted special attention because of their potential to improve the classification accuracy. We extracted multiple features from spectral and spatial domains of hyperspectral images and evaluated them with two supervised classification algorithms; support vector machines (SVM) and an artificial neural network. The spatial features considered are produced by a gray level co-occurrence matrix and extended multiattribute profiles. All of these features were stacked, and the most informative features were selected using a genetic algorithm-based SVM. After selecting the most informative features, the classification model was integrated with a segmentation map derived using a hidden Markov random field. We tested the proposed method on a real application of a hyperspectral image acquired from AisaFENIX and on widely used hyperspectral images. From the results, it can be concluded that the proposed framework significantly improves the results with different spectral and spatial resolutions over different instrumentation.
Sun, Huiyong; Pan, Peichen; Tian, Sheng; Xu, Lei; Kong, Xiaotian; Li, Youyong; Dan Li; Hou, Tingjun
2016-01-01
The MIEC-SVM approach, which combines molecular interaction energy components (MIEC) derived from free energy decomposition and support vector machine (SVM), has been found effective in capturing the energetic patterns of protein-peptide recognition. However, the performance of this approach in identifying small molecule inhibitors of drug targets has not been well assessed and validated by experiments. Thereafter, by combining different model construction protocols, the issues related to developing best MIEC-SVM models were firstly discussed upon three kinase targets (ABL, ALK, and BRAF). As for the investigated targets, the optimized MIEC-SVM models performed much better than the models based on the default SVM parameters and Autodock for the tested datasets. Then, the proposed strategy was utilized to screen the Specs database for discovering potential inhibitors of the ALK kinase. The experimental results showed that the optimized MIEC-SVM model, which identified 7 actives with IC50 < 10 μM from 50 purchased compounds (namely hit rate of 14%, and 4 in nM level) and performed much better than Autodock (3 actives with IC50 < 10 μM from 50 purchased compounds, namely hit rate of 6%, and 2 in nM level), suggesting that the proposed strategy is a powerful tool in structure-based virtual screening. PMID:27102549
Sun, Huiyong; Pan, Peichen; Tian, Sheng; Xu, Lei; Kong, Xiaotian; Li, Youyong; Dan Li; Hou, Tingjun
2016-04-22
The MIEC-SVM approach, which combines molecular interaction energy components (MIEC) derived from free energy decomposition and support vector machine (SVM), has been found effective in capturing the energetic patterns of protein-peptide recognition. However, the performance of this approach in identifying small molecule inhibitors of drug targets has not been well assessed and validated by experiments. Thereafter, by combining different model construction protocols, the issues related to developing best MIEC-SVM models were firstly discussed upon three kinase targets (ABL, ALK, and BRAF). As for the investigated targets, the optimized MIEC-SVM models performed much better than the models based on the default SVM parameters and Autodock for the tested datasets. Then, the proposed strategy was utilized to screen the Specs database for discovering potential inhibitors of the ALK kinase. The experimental results showed that the optimized MIEC-SVM model, which identified 7 actives with IC50 < 10 μM from 50 purchased compounds (namely hit rate of 14%, and 4 in nM level) and performed much better than Autodock (3 actives with IC50 < 10 μM from 50 purchased compounds, namely hit rate of 6%, and 2 in nM level), suggesting that the proposed strategy is a powerful tool in structure-based virtual screening.
Fan, X-J; Wan, X-B; Huang, Y; Cai, H-M; Fu, X-H; Yang, Z-L; Chen, D-K; Song, S-X; Wu, P-H; Liu, Q; Wang, L; Wang, J-P
2012-01-01
Background: Current imaging modalities are inadequate in preoperatively predicting regional lymph node metastasis (RLNM) status in rectal cancer (RC). Here, we designed support vector machine (SVM) model to address this issue by integrating epithelial–mesenchymal-transition (EMT)-related biomarkers along with clinicopathological variables. Methods: Using tissue microarrays and immunohistochemistry, the EMT-related biomarkers expression was measured in 193 RC patients. Of which, 74 patients were assigned to the training set to select the robust variables for designing SVM model. The SVM model predictive value was validated in the testing set (119 patients). Results: In training set, eight variables, including six EMT-related biomarkers and two clinicopathological variables, were selected to devise SVM model. In testing set, we identified 63 patients with high risk to RLNM and 56 patients with low risk. The sensitivity, specificity and overall accuracy of SVM in predicting RLNM were 68.3%, 81.1% and 72.3%, respectively. Importantly, multivariate logistic regression analysis showed that SVM model was indeed an independent predictor of RLNM status (odds ratio, 11.536; 95% confidence interval, 4.113–32.361; P<0.0001). Conclusion: Our SVM-based model displayed moderately strong predictive power in defining the RLNM status in RC patients, providing an important approach to select RLNM high-risk subgroup for neoadjuvant chemoradiotherapy. PMID:22538975
NASA Astrophysics Data System (ADS)
Zhou, Xin; Jun, Sun; Zhang, Bing; Jun, Wu
2017-07-01
In order to improve the reliability of the spectrum feature extracted by wavelet transform, a method combining wavelet transform (WT) with bacterial colony chemotaxis algorithm and support vector machine (BCC-SVM) algorithm (WT-BCC-SVM) was proposed in this paper. Besides, we aimed to identify different kinds of pesticide residues on lettuce leaves in a novel and rapid non-destructive way by using fluorescence spectra technology. The fluorescence spectral data of 150 lettuce leaf samples of five different kinds of pesticide residues on the surface of lettuce were obtained using Cary Eclipse fluorescence spectrometer. Standard normalized variable detrending (SNV detrending), Savitzky-Golay coupled with Standard normalized variable detrending (SG-SNV detrending) were used to preprocess the raw spectra, respectively. Bacterial colony chemotaxis combined with support vector machine (BCC-SVM) and support vector machine (SVM) classification models were established based on full spectra (FS) and wavelet transform characteristics (WTC), respectively. Moreover, WTC were selected by WT. The results showed that the accuracy of training set, calibration set and the prediction set of the best optimal classification model (SG-SNV detrending-WT-BCC-SVM) were 100%, 98% and 93.33%, respectively. In addition, the results indicated that it was feasible to use WT-BCC-SVM to establish diagnostic model of different kinds of pesticide residues on lettuce leaves.
Breaking news dissemination in the media via propagation behavior based on complex network theory
NASA Astrophysics Data System (ADS)
Liu, Nairong; An, Haizhong; Gao, Xiangyun; Li, Huajiao; Hao, Xiaoqing
2016-07-01
The diffusion of breaking news largely relies on propagation behaviors in the media. The tremendous and intricate propagation relationships in the media form a complex network. An improved understanding of breaking news diffusion characteristics can be obtained through the complex network research. Drawing on the news data of Bohai Gulf oil spill event from June 2011 to May 2014, we constructed a weighted and directed complex network in which media are set as nodes, the propagation relationships as edges and the propagation times as the weight of the edges. The primary results show (1) the propagation network presents small world feature, which means relations among media are close and breaking news originating from any node can spread rapidly; (2) traditional media and official websites are the typical sources for news propagation, while business portals are news collectors and spreaders; (3) the propagation network is assortative and the group of core media facilities the spread of breaking news faster; (4) for online media, news originality factor become less important to propagation behaviors. This study offers a new insight to explore information dissemination from the perspective of statistical physics and is beneficial for utilizing the public opinion in a positive way.
Xu, Zhanfeng; Bunker, Christopher E; Harrington, Peter de B
2010-11-01
Monitoring the changes of jet fuel physical properties is important because fuel used in high-performance aircraft must meet rigorous specifications. Near-infrared (NIR) spectroscopy is a fast method to characterize fuels. Because of the complexity of NIR spectral data, chemometric techniques are used to extract relevant information from spectral data to accurately classify physical properties of complex fuel samples. In this work, discrimination of fuel types and classification of flash point, freezing point, boiling point (10%, v/v), boiling point (50%, v/v), and boiling point (90%, v/v) of jet fuels (JP-5, JP-8, Jet A, and Jet A1) were investigated. Each physical property was divided into three classes, low, medium, and high ranges, using two evaluations with different class boundary definitions. The class boundaries function as the threshold to alarm when the fuel properties change. Optimal partial least squares discriminant analysis (oPLS-DA), fuzzy rule-building expert system (FuRES), and support vector machines (SVM) were used to build the calibration models between the NIR spectra and classes of physical property of jet fuels. OPLS-DA, FuRES, and SVM were compared with respect to prediction accuracy. The validation of the calibration model was conducted by applying bootstrap Latin partition (BLP), which gives a measure of precision. Prediction accuracy of 97 ± 2% of the flash point, 94 ± 2% of freezing point, 99 ± 1% of the boiling point (10%, v/v), 98 ± 2% of the boiling point (50%, v/v), and 96 ± 1% of the boiling point (90%, v/v) were obtained by FuRES in one boundaries definition. Both FuRES and SVM obtained statistically better prediction accuracy over those obtained by oPLS-DA. The results indicate that combined with chemometric classifiers NIR spectroscopy could be a fast method to monitor the changes of jet fuel physical properties.
NASA Astrophysics Data System (ADS)
Gatos, I.; Tsantis, S.; Karamesini, M.; Skouroliakou, A.; Kagadis, G.
2015-09-01
Purpose: The design and implementation of a computer-based image analysis system employing the support vector machine (SVM) classifier system for the classification of Focal Liver Lesions (FLLs) on routine non-enhanced, T2-weighted Magnetic Resonance (MR) images. Materials and Methods: The study comprised 92 patients; each one of them has undergone MRI performed on a Magnetom Concerto (Siemens). Typical signs on dynamic contrast-enhanced MRI and biopsies were employed towards a three class categorization of the 92 cases: 40-benign FLLs, 25-Hepatocellular Carcinomas (HCC) within Cirrhotic liver parenchyma and 27-liver metastases from Non-Cirrhotic liver. Prior to FLLs classification an automated lesion segmentation algorithm based on Marcov Random Fields was employed in order to acquire each FLL Region of Interest. 42 texture features derived from the gray-level histogram, co-occurrence and run-length matrices and 12 morphological features were obtained from each lesion. Stepwise multi-linear regression analysis was utilized to avoid feature redundancy leading to a feature subset that fed the multiclass SVM classifier designed for lesion classification. SVM System evaluation was performed by means of leave-one-out method and ROC analysis. Results: Maximum accuracy for all three classes (90.0%) was obtained by means of the Radial Basis Kernel Function and three textural features (Inverse- Different-Moment, Sum-Variance and Long-Run-Emphasis) that describe lesion's contrast, variability and shape complexity. Sensitivity values for the three classes were 92.5%, 81.5% and 96.2% respectively, whereas specificity values were 94.2%, 95.3% and 95.5%. The AUC value achieved for the selected subset was 0.89 with 0.81 - 0.94 confidence interval. Conclusion: The proposed SVM system exhibit promising results that could be utilized as a second opinion tool to the radiologist in order to decrease the time/cost of diagnosis and the need for patients to undergo invasive examination.
Positive Disintegration as a Process of Symmetry Breaking.
Laycraft, Krystyna
2017-04-01
This article presents an analysis of the positive disintegration as a process of symmetry breaking. Symmetry breaking plays a major role in self-organized patterns formation and correlates directly to increasing complexity and function specialization. According to Dabrowski, a creator of the Theory of Positive Disintegration, the change from lower to higher levels of human development requires a major restructuring of an individual's psychological makeup. Each level of human development is a relatively stable and coherent configuration of emotional-cognitive patterns called developmental dynamisms. Their main function is to restructure a mental structure by breaking the symmetry of a low level and bringing differentiation and then integration to higher levels. The positive disintegration is then the process of transitions from a lower level of high symmetry and low complexity to higher levels of low symmetry and high complexity of mental structure.
Using distances between Top-n-gram and residue pairs for protein remote homology detection.
Liu, Bin; Xu, Jinghao; Zou, Quan; Xu, Ruifeng; Wang, Xiaolong; Chen, Qingcai
2014-01-01
Protein remote homology detection is one of the central problems in bioinformatics, which is important for both basic research and practical application. Currently, discriminative methods based on Support Vector Machines (SVMs) achieve the state-of-the-art performance. Exploring feature vectors incorporating the position information of amino acids or other protein building blocks is a key step to improve the performance of the SVM-based methods. Two new methods for protein remote homology detection were proposed, called SVM-DR and SVM-DT. SVM-DR is a sequence-based method, in which the feature vector representation for protein is based on the distances between residue pairs. SVM-DT is a profile-based method, which considers the distances between Top-n-gram pairs. Top-n-gram can be viewed as a profile-based building block of proteins, which is calculated from the frequency profiles. These two methods are position dependent approaches incorporating the sequence-order information of protein sequences. Various experiments were conducted on a benchmark dataset containing 54 families and 23 superfamilies. Experimental results showed that these two new methods are very promising. Compared with the position independent methods, the performance improvement is obvious. Furthermore, the proposed methods can also provide useful insights for studying the features of protein families. The better performance of the proposed methods demonstrates that the position dependant approaches are efficient for protein remote homology detection. Another advantage of our methods arises from the explicit feature space representation, which can be used to analyze the characteristic features of protein families. The source code of SVM-DT and SVM-DR is available at http://bioinformatics.hitsz.edu.cn/DistanceSVM/index.jsp.
Pulmonary Nodule Recognition Based on Multiple Kernel Learning Support Vector Machine-PSO
Zhu, Zhichuan; Zhao, Qingdong; Liu, Liwei; Zhang, Lijuan
2018-01-01
Pulmonary nodule recognition is the core module of lung CAD. The Support Vector Machine (SVM) algorithm has been widely used in pulmonary nodule recognition, and the algorithm of Multiple Kernel Learning Support Vector Machine (MKL-SVM) has achieved good results therein. Based on grid search, however, the MKL-SVM algorithm needs long optimization time in course of parameter optimization; also its identification accuracy depends on the fineness of grid. In the paper, swarm intelligence is introduced and the Particle Swarm Optimization (PSO) is combined with MKL-SVM algorithm to be MKL-SVM-PSO algorithm so as to realize global optimization of parameters rapidly. In order to obtain the global optimal solution, different inertia weights such as constant inertia weight, linear inertia weight, and nonlinear inertia weight are applied to pulmonary nodules recognition. The experimental results show that the model training time of the proposed MKL-SVM-PSO algorithm is only 1/7 of the training time of the MKL-SVM grid search algorithm, achieving better recognition effect. Moreover, Euclidean norm of normalized error vector is proposed to measure the proximity between the average fitness curve and the optimal fitness curve after convergence. Through statistical analysis of the average of 20 times operation results with different inertial weights, it can be seen that the dynamic inertial weight is superior to the constant inertia weight in the MKL-SVM-PSO algorithm. In the dynamic inertial weight algorithm, the parameter optimization time of nonlinear inertia weight is shorter; the average fitness value after convergence is much closer to the optimal fitness value, which is better than the linear inertial weight. Besides, a better nonlinear inertial weight is verified. PMID:29853983
Pulmonary Nodule Recognition Based on Multiple Kernel Learning Support Vector Machine-PSO.
Li, Yang; Zhu, Zhichuan; Hou, Alin; Zhao, Qingdong; Liu, Liwei; Zhang, Lijuan
2018-01-01
Pulmonary nodule recognition is the core module of lung CAD. The Support Vector Machine (SVM) algorithm has been widely used in pulmonary nodule recognition, and the algorithm of Multiple Kernel Learning Support Vector Machine (MKL-SVM) has achieved good results therein. Based on grid search, however, the MKL-SVM algorithm needs long optimization time in course of parameter optimization; also its identification accuracy depends on the fineness of grid. In the paper, swarm intelligence is introduced and the Particle Swarm Optimization (PSO) is combined with MKL-SVM algorithm to be MKL-SVM-PSO algorithm so as to realize global optimization of parameters rapidly. In order to obtain the global optimal solution, different inertia weights such as constant inertia weight, linear inertia weight, and nonlinear inertia weight are applied to pulmonary nodules recognition. The experimental results show that the model training time of the proposed MKL-SVM-PSO algorithm is only 1/7 of the training time of the MKL-SVM grid search algorithm, achieving better recognition effect. Moreover, Euclidean norm of normalized error vector is proposed to measure the proximity between the average fitness curve and the optimal fitness curve after convergence. Through statistical analysis of the average of 20 times operation results with different inertial weights, it can be seen that the dynamic inertial weight is superior to the constant inertia weight in the MKL-SVM-PSO algorithm. In the dynamic inertial weight algorithm, the parameter optimization time of nonlinear inertia weight is shorter; the average fitness value after convergence is much closer to the optimal fitness value, which is better than the linear inertial weight. Besides, a better nonlinear inertial weight is verified.
Rodenhizer, Kara Anne E; Edwards, Katie M
2017-01-01
Dating violence (DV) and sexual violence (SV) are widespread problems among adolescents and emerging adults. A growing body of literature demonstrates that exposure to sexually explicit media (SEM) and sexually violent media (SVM) may be risk factors for DV and SV. The purpose of this article is to provide a systematic and comprehensive literature review on the impact of exposure to SEM and SVM on DV and SV attitudes and behaviors. A total of 43 studies utilizing adolescent and emerging adult samples were reviewed, and collectively the findings suggest that (1) exposure to SEM and SVM is positively related to DV and SV myths and more accepting attitudes toward DV and SV; (2) exposure to SEM and SVM is positively related to actual and anticipated DV and SV victimization, perpetration, and bystander nonintervention; (3) SEM and SVM more strongly impact men's DV and SV attitudes and behaviors than women's DV and SV attitudes and behaviors; and (4) preexisting attitudes related to DV and SV and media preferences moderate the relationship between SEM and SVM exposure and DV and SV attitudes and behaviors. Future studies should strive to employ longitudinal and experimental designs, more closely examine the mediators and moderators of SEM and SVM exposure on DV and SV outcomes, focus on the impacts of SEM and SVM that extend beyond men's use of violence against women, and examine the extent to which media literacy programs could be used independently or in conjunction with existing DV and SV prevention programs to enhance effectiveness of these programming efforts.
A hybrid SVM-FFA method for prediction of monthly mean global solar radiation
NASA Astrophysics Data System (ADS)
Shamshirband, Shahaboddin; Mohammadi, Kasra; Tong, Chong Wen; Zamani, Mazdak; Motamedi, Shervin; Ch, Sudheer
2016-07-01
In this study, a hybrid support vector machine-firefly optimization algorithm (SVM-FFA) model is proposed to estimate monthly mean horizontal global solar radiation (HGSR). The merit of SVM-FFA is assessed statistically by comparing its performance with three previously used approaches. Using each approach and long-term measured HGSR, three models are calibrated by considering different sets of meteorological parameters measured for Bandar Abbass situated in Iran. It is found that the model (3) utilizing the combination of relative sunshine duration, difference between maximum and minimum temperatures, relative humidity, water vapor pressure, average temperature, and extraterrestrial solar radiation shows superior performance based upon all approaches. Moreover, the extraterrestrial radiation is introduced as a significant parameter to accurately estimate the global solar radiation. The survey results reveal that the developed SVM-FFA approach is greatly capable to provide favorable predictions with significantly higher precision than other examined techniques. For the SVM-FFA (3), the statistical indicators of mean absolute percentage error (MAPE), root mean square error (RMSE), relative root mean square error (RRMSE), and coefficient of determination ( R 2) are 3.3252 %, 0.1859 kWh/m2, 3.7350 %, and 0.9737, respectively which according to the RRMSE has an excellent performance. As a more evaluation of SVM-FFA (3), the ratio of estimated to measured values is computed and found that 47 out of 48 months considered as testing data fall between 0.90 and 1.10. Also, by performing a further verification, it is concluded that SVM-FFA (3) offers absolute superiority over the empirical models using relatively similar input parameters. In a nutshell, the hybrid SVM-FFA approach would be considered highly efficient to estimate the HGSR.
Support vector machine as a binary classifier for automated object detection in remotely sensed data
NASA Astrophysics Data System (ADS)
Wardaya, P. D.
2014-02-01
In the present paper, author proposes the application of Support Vector Machine (SVM) for the analysis of satellite imagery. One of the advantages of SVM is that, with limited training data, it may generate comparable or even better results than the other methods. The SVM algorithm is used for automated object detection and characterization. Specifically, the SVM is applied in its basic nature as a binary classifier where it classifies two classes namely, object and background. The algorithm aims at effectively detecting an object from its background with the minimum training data. The synthetic image containing noises is used for algorithm testing. Furthermore, it is implemented to perform remote sensing image analysis such as identification of Island vegetation, water body, and oil spill from the satellite imagery. It is indicated that SVM provides the fast and accurate analysis with the acceptable result.
Jongin Kim; Boreom Lee
2017-07-01
The classification of neuroimaging data for the diagnosis of Alzheimer's Disease (AD) is one of the main research goals of the neuroscience and clinical fields. In this study, we performed extreme learning machine (ELM) classifier to discriminate the AD, mild cognitive impairment (MCI) from normal control (NC). We compared the performance of ELM with that of a linear kernel support vector machine (SVM) for 718 structural MRI images from Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The data consisted of normal control, MCI converter (MCI-C), MCI non-converter (MCI-NC), and AD. We employed SVM-based recursive feature elimination (RFE-SVM) algorithm to find the optimal subset of features. In this study, we found that the RFE-SVM feature selection approach in combination with ELM shows the superior classification accuracy to that of linear kernel SVM for structural T1 MRI data.
Cho, Ming-Yuan; Hoang, Thi Thom
2017-01-01
Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.
A Predictive Model of Anesthesia Depth Based on SVM in the Primary Visual Cortex
Shi, Li; Li, Xiaoyuan; Wan, Hong
2013-01-01
In this paper, a novel model for predicting anesthesia depth is put forward based on local field potentials (LFPs) in the primary visual cortex (V1 area) of rats. The model is constructed using a Support Vector Machine (SVM) to realize anesthesia depth online prediction and classification. The raw LFP signal was first decomposed into some special scaling components. Among these components, those containing higher frequency information were well suited for more precise analysis of the performance of the anesthetic depth by wavelet transform. Secondly, the characteristics of anesthetized states were extracted by complexity analysis. In addition, two frequency domain parameters were selected. The above extracted features were used as the input vector of the predicting model. Finally, we collected the anesthesia samples from the LFP recordings under the visual stimulus experiments of Long Evans rats. Our results indicate that the predictive model is accurate and computationally fast, and that it is also well suited for online predicting. PMID:24044024
Mapping of Coral Reef Environment in the Arabian Gulf Using Multispectral Remote Sensing
NASA Astrophysics Data System (ADS)
Ben-Romdhane, H.; Marpu, P. R.; Ghedira, H.; Ouarda, T. B. M. J.
2016-06-01
Coral reefs of the Arabian Gulf are subject to several pressures, thus requiring conservation actions. Well-designed conservation plans involve efficient mapping and monitoring systems. Satellite remote sensing is a cost-effective tool for seafloor mapping at large scales. Multispectral remote sensing of coastal habitats, like those of the Arabian Gulf, presents a special challenge due to their complexity and heterogeneity. The present study evaluates the potential of multispectral sensor DubaiSat-2 in mapping benthic communities of United Arab Emirates. We propose to use a spectral-spatial method that includes multilevel segmentation, nonlinear feature analysis and ensemble learning methods. Support Vector Machine (SVM) is used for comparison of classification performances. Comparative data were derived from the habitat maps published by the Environment Agency-Abu Dhabi. The spectral-spatial method produced 96.41% mapping accuracy. SVM classification is assessed to be 94.17% accurate. The adaptation of these methods can help achieving well-designed coastal management plans in the region.
Cancer survival classification using integrated data sets and intermediate information.
Kim, Shinuk; Park, Taesung; Kon, Mark
2014-09-01
Although numerous studies related to cancer survival have been published, increasing the prediction accuracy of survival classes still remains a challenge. Integration of different data sets, such as microRNA (miRNA) and mRNA, might increase the accuracy of survival class prediction. Therefore, we suggested a machine learning (ML) approach to integrate different data sets, and developed a novel method based on feature selection with Cox proportional hazard regression model (FSCOX) to improve the prediction of cancer survival time. FSCOX provides us with intermediate survival information, which is usually discarded when separating survival into 2 groups (short- and long-term), and allows us to perform survival analysis. We used an ML-based protocol for feature selection, integrating information from miRNA and mRNA expression profiles at the feature level. To predict survival phenotypes, we used the following classifiers, first, existing ML methods, support vector machine (SVM) and random forest (RF), second, a new median-based classifier using FSCOX (FSCOX_median), and third, an SVM classifier using FSCOX (FSCOX_SVM). We compared these methods using 3 types of cancer tissue data sets: (i) miRNA expression, (ii) mRNA expression, and (iii) combined miRNA and mRNA expression. The latter data set included features selected either from the combined miRNA/mRNA profile or independently from miRNAs and mRNAs profiles (IFS). In the ovarian data set, the accuracy of survival classification using the combined miRNA/mRNA profiles with IFS was 75% using RF, 86.36% using SVM, 84.09% using FSCOX_median, and 88.64% using FSCOX_SVM with a balanced 22 short-term and 22 long-term survivor data set. These accuracies are higher than those using miRNA alone (70.45%, RF; 75%, SVM; 75%, FSCOX_median; and 75%, FSCOX_SVM) or mRNA alone (65.91%, RF; 63.64%, SVM; 72.73%, FSCOX_median; and 70.45%, FSCOX_SVM). Similarly in the glioblastoma multiforme data, the accuracy of miRNA/mRNA using IFS was 75.51% (RF), 87.76% (SVM) 85.71% (FSCOX_median), 85.71% (FSCOX_SVM). These results are higher than the results of using miRNA expression and mRNA expression alone. In addition we predict 16 hsa-miR-23b and hsa-miR-27b target genes in ovarian cancer data sets, obtained by SVM-based feature selection through integration of sequence information and gene expression profiles. Among the approaches used, the integrated miRNA and mRNA data set yielded better results than the individual data sets. The best performance was achieved using the FSCOX_SVM method with independent feature selection, which uses intermediate survival information between short-term and long-term survival time and the combination of the 2 different data sets. The results obtained using the combined data set suggest that there are some strong interactions between miRNA and mRNA features that are not detectable in the individual analyses. Copyright © 2014 Elsevier B.V. All rights reserved.
[Identification of Pummelo Cultivars Based on Hyperspectral Imaging Technology].
Li, Xun-lan; Yi, Shi-lai; He, Shao-lan; Lü, Qiang; Xie, Rang-jin; Zheng, Yong-qiang; Deng, Lie
2015-09-01
Existing methods for the identification of pummelo cultivars are usually time-consuming and costly, and are therefore inconvenient to be used in cases that a rapid identification is needed. This research was aimed at identifying different pummelo cultivars by hyperspectral imaging technology which can achieve a rapid and highly sensitive measurement. A total of 240 leaf samples, 60 for each of the four cultivars were investigated. Samples were divided into two groups such as calibration set (48 samples of each cultivar) and validation set (12 samples of each cultivar) by a Kennard-Stone-based algorithm. Hyperspectral images of both adaxial and abaxial surfaces of each leaf were obtained, and were segmented into a region of interest (ROI) using a simple threshold. Spectra of leaf samples were extracted from ROI. To remove the absolute noises of the spectra, only the date of spectral range 400~1000 nm was used for analysis. Multiplicative scatter correction (MSC) and standard normal variable (SNV) were utilized for data preprocessing. Principal component analysis (PCA) was used to extract the best principal components, and successive projections algorithm (SPA) was used to extract the effective wavelengths. Least squares support vector machine (LS-SVM) was used to obtain the discrimination model of the four different pummelo cultivars. To find out the optimal values of σ2 and γ which were important parameters in LS-SVM modeling, Grid-search technique and Cross-Validation were applied. The first 10 and 11 principal components were extracted by PCA for the hyperspectral data of adaxial surface and abaxial surface, respectively. There were 31 and 21 effective wavelengths selected by SPA based on the hyperspectral data of adaxial surface and abaxial surface, respectively. The best principal components and the effective wavelengths were used as inputs of LS-SVM models, and then the PCA-LS-SVM model and the SPA-LS-SVM model were built. The results showed that 99.46% and 98.44% of identification accuracy was achieved in the calibration set for the PCA-LS-SVM model and the SPA-LS-SVM model, respectively, and a 95.83% of identification accuracy was achieved in the validation set for both the PCA-LS-SVM and the SPA- LS-SVM models, which were built based on the hyperspectral data of adaxial surface. Comparatively, the results of the PCA-LS-SVM and the SPA-LS-SVM models built based on the hyperspectral data of abaxial surface both achieved identification accuracies of 100% for both calibration set and validation set. The overall results demonstrated that use of hyperspectral data of adaxial and abaxial leaf surfaces coupled with the use of PCA-LS-SVM and the SPA-LS-SVM could achieve an accurate identification of pummelo cultivars. It was feasible to use hyperspectral imaging technology to identify different pummelo cultivars, and hyperspectral imaging technology provided an alternate way of rapid identification of pummelo cultivars. Moreover, the results in this paper demonstrated that the data from the abaxial surface of leaf was more sensitive in identifying pummelo cultivars. This study provided a new method for to the fast discrimination of pummelo cultivars.
Research on Classification of Chinese Text Data Based on SVM
NASA Astrophysics Data System (ADS)
Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao
2017-09-01
Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.
Classification of Multiple Chinese Liquors by Means of a QCM-based E-Nose and MDS-SVM Classifier.
Li, Qiang; Gu, Yu; Jia, Jing
2017-01-30
Chinese liquors are internationally well-known fermentative alcoholic beverages. They have unique flavors attributable to the use of various bacteria and fungi, raw materials, and production processes. Developing a novel, rapid, and reliable method to identify multiple Chinese liquors is of positive significance. This paper presents a pattern recognition system for classifying ten brands of Chinese liquors based on multidimensional scaling (MDS) and support vector machine (SVM) algorithms in a quartz crystal microbalance (QCM)-based electronic nose (e-nose) we designed. We evaluated the comprehensive performance of the MDS-SVM classifier that predicted all ten brands of Chinese liquors individually. The prediction accuracy (98.3%) showed superior performance of the MDS-SVM classifier over the back-propagation artificial neural network (BP-ANN) classifier (93.3%) and moving average-linear discriminant analysis (MA-LDA) classifier (87.6%). The MDS-SVM classifier has reasonable reliability, good fitting and prediction (generalization) performance in classification of the Chinese liquors. Taking both application of the e-nose and validation of the MDS-SVM classifier into account, we have thus created a useful method for the classification of multiple Chinese liquors.
Zhou, Shu; Li, Guo-Bo; Huang, Lu-Yi; Xie, Huan-Zhang; Zhao, Ying-Lan; Chen, Yu-Zong; Li, Lin-Li; Yang, Sheng-Yong
2014-08-01
Drug-induced ototoxicity, as a toxic side effect, is an important issue needed to be considered in drug discovery. Nevertheless, current experimental methods used to evaluate drug-induced ototoxicity are often time-consuming and expensive, indicating that they are not suitable for a large-scale evaluation of drug-induced ototoxicity in the early stage of drug discovery. We thus, in this investigation, established an effective computational prediction model of drug-induced ototoxicity using an optimal support vector machine (SVM) method, GA-CG-SVM. Three GA-CG-SVM models were developed based on three training sets containing agents bearing different risk levels of drug-induced ototoxicity. For comparison, models based on naïve Bayesian (NB) and recursive partitioning (RP) methods were also used on the same training sets. Among all the prediction models, the GA-CG-SVM model II showed the best performance, which offered prediction accuracies of 85.33% and 83.05% for two independent test sets, respectively. Overall, the good performance of the GA-CG-SVM model II indicates that it could be used for the prediction of drug-induced ototoxicity in the early stage of drug discovery. Copyright © 2014 Elsevier Ltd. All rights reserved.
Liu, Xue-song; Sun, Fen-fang; Jin, Ye; Wu, Yong-jiang; Gu, Zhi-xin; Zhu, Li; Yan, Dong-lan
2015-12-01
A novel method was developed for the rapid determination of multi-indicators in corni fructus by means of near infrared (NIR) spectroscopy. Particle swarm optimization (PSO) based least squares support vector machine was investigated to increase the levels of quality control. The calibration models of moisture, extractum, morroniside and loganin were established using the PSO-LS-SVM algorithm. The performance of PSO-LS-SVM models was compared with partial least squares regression (PLSR) and back propagation artificial neural network (BP-ANN). The calibration and validation results of PSO-LS-SVM were superior to both PLS and BP-ANN. For PSO-LS-SVM models, the correlation coefficients (r) of calibrations were all above 0.942. The optimal prediction results were also achieved by PSO-LS-SVM models with the RMSEP (root mean square error of prediction) and RSEP (relative standard errors of prediction) less than 1.176 and 15.5% respectively. The results suggest that PSO-LS-SVM algorithm has a good model performance and high prediction accuracy. NIR has a potential value for rapid determination of multi-indicators in Corni Fructus.
[Measurement of soil organic matter and available K based on SPA-LS-SVM].
Zhang, Hai-Liang; Liu, Xue-Mei; He, Yong
2014-05-01
Visible and short wave infrared spectroscopy (Vis/SW-NIRS) was investigated in the present study for measurement of soil organic matter (OM) and available potassium (K). Four types of pretreatments including smoothing, SNV, MSC and SG smoothing+first derivative were adopted to eliminate the system noises and external disturbances. Then partial least squares regression (PLSR) and least squares-support vector machine (LS-SVM) models were implemented for calibration models. The LS-SVM model was built by using characteristic wavelength based on successive projections algorithm (SPA). Simultaneously, the performance of LSSVM models was compared with PLSR models. The results indicated that LS-SVM models using characteristic wavelength as inputs based on SPA outperformed PLSR models. The optimal SPA-LS-SVM models were achieved, and the correlation coefficient (r), and RMSEP were 0. 860 2 and 2. 98 for OM and 0. 730 5 and 15. 78 for K, respectively. The results indicated that visible and short wave near infrared spectroscopy (Vis/SW-NIRS) (325 approximately 1 075 nm) combined with LS-SVM based on SPA could be utilized as a precision method for the determination of soil properties.
Hybrid wavelet-support vector machine approach for modelling rainfall-runoff process.
Komasi, Mehdi; Sharghi, Soroush
2016-01-01
Because of the importance of water resources management, the need for accurate modeling of the rainfall-runoff process has rapidly grown in the past decades. Recently, the support vector machine (SVM) approach has been used by hydrologists for rainfall-runoff modeling and the other fields of hydrology. Similar to the other artificial intelligence models, such as artificial neural network (ANN) and adaptive neural fuzzy inference system, the SVM model is based on the autoregressive properties. In this paper, the wavelet analysis was linked to the SVM model concept for modeling the rainfall-runoff process of Aghchai and Eel River watersheds. In this way, the main time series of two variables, rainfall and runoff, were decomposed to multiple frequent time series by wavelet theory; then, these time series were imposed as input data on the SVM model in order to predict the runoff discharge one day ahead. The obtained results show that the wavelet SVM model can predict both short- and long-term runoff discharges by considering the seasonality effects. Also, the proposed hybrid model is relatively more appropriate than classical autoregressive ones such as ANN and SVM because it uses the multi-scale time series of rainfall and runoff data in the modeling process.
Choi, Chang Won; Park, Moon Sung
2015-10-01
The Korean Neonatal Network (KNN), a nationwide prospective registry of very-low-birth-weight (VLBW, < 1,500 g at birth) infants, was launched in April 2013. Data management (DM) and site-visit monitoring (SVM) were crucial in ensuring the quality of the data collected from 55 participating hospitals across the country on 116 clinical variables. We describe the processes and results of DM and SVM performed during the establishment stage of the registry. The DM procedure included automated proof checks, electronic data validation, query creation, query resolution, and revalidation of the corrected data. SVM included SVM team organization, identification of unregistered cases, source document verification, and post-visit report production. By March 31, 2015, 4,063 VLBW infants were registered and 1,693 queries were produced. Of these, 1,629 queries were resolved and 64 queries remain unresolved. By November 28, 2014, 52 participating hospitals were visited, with 136 site-visits completed since April 2013. Each participating hospital was visited biannually. DM and SVM were performed to ensure the quality of the data collected for the KNN registry. Our experience with DM and SVM can be applied for similar multi-center registries with large numbers of participating centers.
Yuan, Yaxia; Zheng, Fang; Zhan, Chang-Guo
2018-03-21
Blood-brain barrier (BBB) permeability of a compound determines whether the compound can effectively enter the brain. It is an essential property which must be accounted for in drug discovery with a target in the brain. Several computational methods have been used to predict the BBB permeability. In particular, support vector machine (SVM), which is a kernel-based machine learning method, has been used popularly in this field. For SVM training and prediction, the compounds are characterized by molecular descriptors. Some SVM models were based on the use of molecular property-based descriptors (including 1D, 2D, and 3D descriptors) or fragment-based descriptors (known as the fingerprints of a molecule). The selection of descriptors is critical for the performance of a SVM model. In this study, we aimed to develop a generally applicable new SVM model by combining all of the features of the molecular property-based descriptors and fingerprints to improve the accuracy for the BBB permeability prediction. The results indicate that our SVM model has improved accuracy compared to the currently available models of the BBB permeability prediction.
Availability of MudPIT data for classification of biological samples.
Silvestre, Dario Di; Zoppis, Italo; Brambilla, Francesca; Bellettato, Valeria; Mauri, Giancarlo; Mauri, Pierluigi
2013-01-14
Mass spectrometry is an important analytical tool for clinical proteomics. Primarily employed for biomarker discovery, it is increasingly used for developing methods which may help to provide unambiguous diagnosis of biological samples. In this context, we investigated the classification of phenotypes by applying support vector machine (SVM) on experimental data obtained by MudPIT approach. In particular, we compared the performance capabilities of SVM by using two independent collection of complex samples and different data-types, such as mass spectra (m/z), peptides and proteins. Globally, protein and peptide data allowed a better discriminant informative content than experimental mass spectra (overall accuracy higher than 87% in both collection 1 and 2). These results indicate that sequencing of peptides and proteins reduces the experimental noise affecting the raw mass spectra, and allows the extraction of more informative features available for the effective classification of samples. In addition, proteins and peptides features selected by SVM matched for 80% with the differentially expressed proteins identified by the MAProMa software. These findings confirm the availability of the most label-free quantitative methods based on processing of spectral count and SEQUEST-based SCORE values. On the other hand, it stresses the usefulness of MudPIT data for a correct grouping of sample phenotypes, by applying both supervised and unsupervised learning algorithms. This capacity permit the evaluation of actual samples and it is a good starting point to translate proteomic methodology to clinical application.
NASA Astrophysics Data System (ADS)
Othman, Arsalan; Gloaguen, Richard
2015-04-01
Topographic effects and complex vegetation cover hinder lithology classification in mountain regions based not only in field, but also in reflectance remote sensing data. The area of interest "Bardi-Zard" is located in the NE of Iraq. It is part of the Zagros orogenic belt, where seven lithological units outcrop and is known for its chromite deposit. The aim of this study is to compare three machine learning algorithms (MLAs): Maximum Likelihood (ML), Support Vector Machines (SVM), and Random Forest (RF) in the context of a supervised lithology classification task using Advanced Space-borne Thermal Emission and Reflection radiometer (ASTER) satellite, its derived, spatial information (spatial coordinates) and geomorphic data. We emphasize the enhancement in remote sensing lithological mapping accuracy that arises from the integration of geomorphic features and spatial information (spatial coordinates) in classifications. This study identifies that RF is better than ML and SVM algorithms in almost the sixteen combination datasets, which were tested. The overall accuracy of the best dataset combination with the RF map for the all seven classes reach ~80% and the producer and user's accuracies are ~73.91% and 76.09% respectively while the kappa coefficient is ~0.76. TPI is more effective with SVM algorithm than an RF algorithm. This paper demonstrates that adding geomorphic indices such as TPI and spatial information in the dataset increases the lithological classification accuracy.
NASA Astrophysics Data System (ADS)
Luna, A. S.; Paredes, M. L. L.; de Oliveira, G. C. G.; Corrêa, S. M.
2014-12-01
It is well known that air quality is a complex function of emissions, meteorology and topography, and statistical tools provide a sound framework for relating these variables. The observed data were contents of nitrogen dioxide (NO2), nitrogen monoxide (NO), nitrogen oxides (NOx), carbon monoxide (CO), ozone (O3), scalar wind speed (SWS), global solar radiation (GSR), temperature (TEM), moisture content in the air (HUM), collected by a mobile automatic monitoring station at Rio de Janeiro City in two places of the metropolitan area during 2011 and 2012. The aims of this study were: (1) to analyze the behavior of the variables, using the method of PCA for exploratory data analysis; (2) to propose forecasts of O3 levels from primary pollutants and meteorological factors, using nonlinear regression methods like ANN and SVM, from primary pollutants and meteorological factors. The PCA technique showed that for first dataset, variables NO, NOx and SWS have a greater impact on the concentration of O3 and the other data set had the TEM and GSR as the most influential variables. The obtained results from the nonlinear regression techniques ANN and SVM were remarkably closely and acceptable to one dataset presenting coefficient of determination for validation respectively 0.9122 and 0.9152, and root mean square error of 7.66 and 7.85, respectively. For these datasets, the PCA, SVM and ANN had demonstrated their robustness as useful tools for evaluation, and forecast scenarios for air quality.
NASA Astrophysics Data System (ADS)
Zhang, Meijun; Tang, Jian; Zhang, Xiaoming; Zhang, Jiaojiao
2016-03-01
The high accurate classification ability of an intelligent diagnosis method often needs a large amount of training samples with high-dimensional eigenvectors, however the characteristics of the signal need to be extracted accurately. Although the existing EMD(empirical mode decomposition) and EEMD(ensemble empirical mode decomposition) are suitable for processing non-stationary and non-linear signals, but when a short signal, such as a hydraulic impact signal, is concerned, their decomposition accuracy become very poor. An improve EEMD is proposed specifically for short hydraulic impact signals. The improvements of this new EEMD are mainly reflected in four aspects, including self-adaptive de-noising based on EEMD, signal extension based on SVM(support vector machine), extreme center fitting based on cubic spline interpolation, and pseudo component exclusion based on cross-correlation analysis. After the energy eigenvector is extracted from the result of the improved EEMD, the fault pattern recognition based on SVM with small amount of low-dimensional training samples is studied. At last, the diagnosis ability of improved EEMD+SVM method is compared with the EEMD+SVM and EMD+SVM methods, and its diagnosis accuracy is distinctly higher than the other two methods no matter the dimension of the eigenvectors are low or high. The improved EEMD is very propitious for the decomposition of short signal, such as hydraulic impact signal, and its combination with SVM has high ability for the diagnosis of hydraulic impact faults.
NASA Astrophysics Data System (ADS)
Xin, Ni; Gu, Xiao-Feng; Wu, Hao; Hu, Yu-Zhu; Yang, Zhong-Lin
2012-04-01
Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.
Semi-supervised SVM for individual tree crown species classification
NASA Astrophysics Data System (ADS)
Dalponte, Michele; Ene, Liviu Theodor; Marconcini, Mattia; Gobakken, Terje; Næsset, Erik
2015-12-01
In this paper a novel semi-supervised SVM classifier is presented, specifically developed for tree species classification at individual tree crown (ITC) level. In ITC tree species classification, all the pixels belonging to an ITC should have the same label. This assumption is used in the learning of the proposed semi-supervised SVM classifier (ITC-S3VM). This method exploits the information contained in the unlabeled ITC samples in order to improve the classification accuracy of a standard SVM. The ITC-S3VM method can be easily implemented using freely available software libraries. The datasets used in this study include hyperspectral imagery and laser scanning data acquired over two boreal forest areas characterized by the presence of three information classes (Pine, Spruce, and Broadleaves). The experimental results quantify the effectiveness of the proposed approach, which provides classification accuracies significantly higher (from 2% to above 27%) than those obtained by the standard supervised SVM and by a state-of-the-art semi-supervised SVM (S3VM). Particularly, by reducing the number of training samples (i.e. from 100% to 25%, and from 100% to 5% for the two datasets, respectively) the proposed method still exhibits results comparable to the ones of a supervised SVM trained with the full available training set. This property of the method makes it particularly suitable for practical forest inventory applications in which collection of in situ information can be very expensive both in terms of cost and time.
Predicting complications of percutaneous coronary intervention using a novel support vector method.
Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan
2013-01-01
To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer-Lemeshow χ(2) value (seven cases) and the mean cross-entropy error (eight cases). The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains.
NASA Astrophysics Data System (ADS)
Li, Yun; Zhang, Ji; Li, Tao; Liu, Honggao; Li, Jieqing; Wang, Yuanzhong
2017-04-01
In this work, the data fusion strategy of Fourier transform mid infrared (FT-MIR) spectroscopy and inductively coupled plasma-atomic emission spectrometry (ICP-AES) was used in combination with Support Vector Machine (SVM) to determine the geographic origin of Boletus edulis collected from nine regions of Yunnan Province in China. Firstly, competitive adaptive reweighted sampling (CARS) was used for selecting an optimal combination of key wavenumbers of second derivative FT-MIR spectra, and thirteen elements were sorted with variable importance in projection (VIP) scores. Secondly, thirteen subsets of multi-elements with the best VIP score were generated and each subset was used to fuse with FT-MIR. Finally, the classification models were established by SVM, and the combination of parameter C and γ (gamma) of SVM models was calculated by the approaches of grid search (GS) and genetic algorithm (GA). The results showed that both GS-SVM and GA-SVM models achieved good performances based on the #9 subset and the prediction accuracy in calibration and validation sets of the two models were 81.40% and 90.91%, correspondingly. In conclusion, it indicated that the data fusion strategy of FT-MIR and ICP-AES coupled with the algorithm of SVM can be used as a reliable tool for accurate identification of B. edulis, and it can provide a useful way of thinking for the quality control of edible mushrooms.
Predicting complications of percutaneous coronary intervention using a novel support vector method
Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan
2013-01-01
Objective To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Materials and methods Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. Results The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer–Lemeshow χ2 value (seven cases) and the mean cross-entropy error (eight cases). Conclusions The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains. PMID:23599229
TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM.
Hu, Jun; Han, Ke; Li, Yang; Yang, Jing-Yu; Shen, Hong-Bin; Yu, Dong-Jun
2016-11-01
The accurate prediction of whether a protein will crystallize plays a crucial role in improving the success rate of protein crystallization projects. A common critical problem in the development of machine-learning-based protein crystallization predictors is how to effectively utilize protein features extracted from different views. In this study, we aimed to improve the efficiency of fusing multi-view protein features by proposing a new two-layered SVM (2L-SVM) which switches the feature-level fusion problem to a decision-level fusion problem: the SVMs in the 1st layer of the 2L-SVM are trained on each of the multi-view feature sets; then, the outputs of the 1st layer SVMs, which are the "intermediate" decisions made based on the respective feature sets, are further ensembled by a 2nd layer SVM. Based on the proposed 2L-SVM, we implemented a sequence-based protein crystallization predictor called TargetCrys. Experimental results on several benchmark datasets demonstrated the efficacy of the proposed 2L-SVM for fusing multi-view features. We also compared TargetCrys with existing sequence-based protein crystallization predictors and demonstrated that the proposed TargetCrys outperformed most of the existing predictors and is competitive with the state-of-the-art predictors. The TargetCrys webserver and datasets used in this study are freely available for academic use at: http://csbio.njust.edu.cn/bioinf/TargetCrys .
Li, Yun; Zhang, Ji; Li, Tao; Liu, Honggao; Li, Jieqing; Wang, Yuanzhong
2017-04-15
In this work, the data fusion strategy of Fourier transform mid infrared (FT-MIR) spectroscopy and inductively coupled plasma-atomic emission spectrometry (ICP-AES) was used in combination with Support Vector Machine (SVM) to determine the geographic origin of Boletus edulis collected from nine regions of Yunnan Province in China. Firstly, competitive adaptive reweighted sampling (CARS) was used for selecting an optimal combination of key wavenumbers of second derivative FT-MIR spectra, and thirteen elements were sorted with variable importance in projection (VIP) scores. Secondly, thirteen subsets of multi-elements with the best VIP score were generated and each subset was used to fuse with FT-MIR. Finally, the classification models were established by SVM, and the combination of parameter C and γ (gamma) of SVM models was calculated by the approaches of grid search (GS) and genetic algorithm (GA). The results showed that both GS-SVM and GA-SVM models achieved good performances based on the #9 subset and the prediction accuracy in calibration and validation sets of the two models were 81.40% and 90.91%, correspondingly. In conclusion, it indicated that the data fusion strategy of FT-MIR and ICP-AES coupled with the algorithm of SVM can be used as a reliable tool for accurate identification of B. edulis, and it can provide a useful way of thinking for the quality control of edible mushrooms. Copyright © 2017. Published by Elsevier B.V.
A Hybrid Vehicle Detection Method Based on Viola-Jones and HOG + SVM from UAV Images
Xu, Yongzheng; Yu, Guizhen; Wang, Yunpeng; Wu, Xinkai; Ma, Yalong
2016-01-01
A new hybrid vehicle detection scheme which integrates the Viola-Jones (V-J) and linear SVM classifier with HOG feature (HOG + SVM) methods is proposed for vehicle detection from low-altitude unmanned aerial vehicle (UAV) images. As both V-J and HOG + SVM are sensitive to on-road vehicles’ in-plane rotation, the proposed scheme first adopts a roadway orientation adjustment method, which rotates each UAV image to align the roads with the horizontal direction so the original V-J or HOG + SVM method can be directly applied to achieve fast detection and high accuracy. To address the issue of descending detection speed for V-J and HOG + SVM, the proposed scheme further develops an adaptive switching strategy which sophistically integrates V-J and HOG + SVM methods based on their different descending trends of detection speed to improve detection efficiency. A comprehensive evaluation shows that the switching strategy, combined with the road orientation adjustment method, can significantly improve the efficiency and effectiveness of the vehicle detection from UAV images. The results also show that the proposed vehicle detection method is competitive compared with other existing vehicle detection methods. Furthermore, since the proposed vehicle detection method can be performed on videos captured from moving UAV platforms without the need of image registration or additional road database, it has great potentials of field applications. Future research will be focusing on expanding the current method for detecting other transportation modes such as buses, trucks, motors, bicycles, and pedestrians. PMID:27548179
A Hybrid Vehicle Detection Method Based on Viola-Jones and HOG + SVM from UAV Images.
Xu, Yongzheng; Yu, Guizhen; Wang, Yunpeng; Wu, Xinkai; Ma, Yalong
2016-08-19
A new hybrid vehicle detection scheme which integrates the Viola-Jones (V-J) and linear SVM classifier with HOG feature (HOG + SVM) methods is proposed for vehicle detection from low-altitude unmanned aerial vehicle (UAV) images. As both V-J and HOG + SVM are sensitive to on-road vehicles' in-plane rotation, the proposed scheme first adopts a roadway orientation adjustment method, which rotates each UAV image to align the roads with the horizontal direction so the original V-J or HOG + SVM method can be directly applied to achieve fast detection and high accuracy. To address the issue of descending detection speed for V-J and HOG + SVM, the proposed scheme further develops an adaptive switching strategy which sophistically integrates V-J and HOG + SVM methods based on their different descending trends of detection speed to improve detection efficiency. A comprehensive evaluation shows that the switching strategy, combined with the road orientation adjustment method, can significantly improve the efficiency and effectiveness of the vehicle detection from UAV images. The results also show that the proposed vehicle detection method is competitive compared with other existing vehicle detection methods. Furthermore, since the proposed vehicle detection method can be performed on videos captured from moving UAV platforms without the need of image registration or additional road database, it has great potentials of field applications. Future research will be focusing on expanding the current method for detecting other transportation modes such as buses, trucks, motors, bicycles, and pedestrians.
Balabin, Roman M; Lomakina, Ekaterina I
2011-06-28
A multilayer feed-forward artificial neural network (MLP-ANN) with a single, hidden layer that contains a finite number of neurons can be regarded as a universal non-linear approximator. Today, the ANN method and linear regression (MLR) model are widely used for quantum chemistry (QC) data analysis (e.g., thermochemistry) to improve their accuracy (e.g., Gaussian G2-G4, B3LYP/B3-LYP, X1, or W1 theoretical methods). In this study, an alternative approach based on support vector machines (SVMs) is used, the least squares support vector machine (LS-SVM) regression. It has been applied to ab initio (first principle) and density functional theory (DFT) quantum chemistry data. So, QC + SVM methodology is an alternative to QC + ANN one. The task of the study was to estimate the Møller-Plesset (MPn) or DFT (B3LYP, BLYP, BMK) energies calculated with large basis sets (e.g., 6-311G(3df,3pd)) using smaller ones (6-311G, 6-311G*, 6-311G**) plus molecular descriptors. A molecular set (BRM-208) containing a total of 208 organic molecules was constructed and used for the LS-SVM training, cross-validation, and testing. MP2, MP3, MP4(DQ), MP4(SDQ), and MP4/MP4(SDTQ) ab initio methods were tested. Hartree-Fock (HF/SCF) results were also reported for comparison. Furthermore, constitutional (CD: total number of atoms and mole fractions of different atoms) and quantum-chemical (QD: HOMO-LUMO gap, dipole moment, average polarizability, and quadrupole moment) molecular descriptors were used for the building of the LS-SVM calibration model. Prediction accuracies (MADs) of 1.62 ± 0.51 and 0.85 ± 0.24 kcal mol(-1) (1 kcal mol(-1) = 4.184 kJ mol(-1)) were reached for SVM-based approximations of ab initio and DFT energies, respectively. The LS-SVM model was more accurate than the MLR model. A comparison with the artificial neural network approach shows that the accuracy of the LS-SVM method is similar to the accuracy of ANN. The extrapolation and interpolation results show that LS-SVM is superior by almost an order of magnitude over the ANN method in terms of the stability, generality, and robustness of the final model. The LS-SVM model needs a much smaller numbers of samples (a much smaller sample set) to make accurate prediction results. Potential energy surface (PES) approximations for molecular dynamics (MD) studies are discussed as a promising application for the LS-SVM calibration approach. This journal is © the Owner Societies 2011
HYBRID NEURAL NETWORK AND SUPPORT VECTOR MACHINE METHOD FOR OPTIMIZATION
NASA Technical Reports Server (NTRS)
Rai, Man Mohan (Inventor)
2005-01-01
System and method for optimization of a design associated with a response function, using a hybrid neural net and support vector machine (NN/SVM) analysis to minimize or maximize an objective function, optionally subject to one or more constraints. As a first example, the NN/SVM analysis is applied iteratively to design of an aerodynamic component, such as an airfoil shape, where the objective function measures deviation from a target pressure distribution on the perimeter of the aerodynamic component. As a second example, the NN/SVM analysis is applied to data classification of a sequence of data points in a multidimensional space. The NN/SVM analysis is also applied to data regression.
Hybrid Neural Network and Support Vector Machine Method for Optimization
NASA Technical Reports Server (NTRS)
Rai, Man Mohan (Inventor)
2007-01-01
System and method for optimization of a design associated with a response function, using a hybrid neural net and support vector machine (NN/SVM) analysis to minimize or maximize an objective function, optionally subject to one or more constraints. As a first example, the NN/SVM analysis is applied iteratively to design of an aerodynamic component, such as an airfoil shape, where the objective function measures deviation from a target pressure distribution on the perimeter of the aerodynamic component. As a second example, the NN/SVM analysis is applied to data classification of a sequence of data points in a multidimensional space. The NN/SVM analysis is also applied to data regression.
Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA
Ma, Xiaoqi
2015-01-01
A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. PMID:26543867
An assessment of support vector machines for land cover classification
Huang, C.; Davis, L.S.; Townshend, J.R.G.
2002-01-01
The support vector machine (SVM) is a group of theoretically superior machine learning algorithms. It was found competitive with the best available machine learning algorithms in classifying high-dimensional data sets. This paper gives an introduction to the theoretical development of the SVM and an experimental evaluation of its accuracy, stability and training speed in deriving land cover classifications from satellite images. The SVM was compared to three other popular classifiers, including the maximum likelihood classifier (MLC), neural network classifiers (NNC) and decision tree classifiers (DTC). The impacts of kernel configuration on the performance of the SVM and of the selection of training data and input variables on the four classifiers were also evaluated in this experiment.
NASA Astrophysics Data System (ADS)
Liu, Di; Mishra, Ashok K.; Yu, Zhongbo
2016-07-01
This paper examines the combination of support vector machines (SVM) and the dual ensemble Kalman filter (EnKF) technique to estimate root zone soil moisture at different soil layers up to 100 cm depth. Multiple experiments are conducted in a data rich environment to construct and validate the SVM model and to explore the effectiveness and robustness of the EnKF technique. It was observed that the performance of SVM relies more on the initial length of training set than other factors (e.g., cost function, regularization parameter, and kernel parameters). The dual EnKF technique proved to be efficient to improve SVM with observed data either at each time step or at a flexible time steps. The EnKF technique can reach its maximum efficiency when the updating ensemble size approaches a certain threshold. It was observed that the SVM model performance for the multi-layer soil moisture estimation can be influenced by the rainfall magnitude (e.g., dry and wet spells).
Support Vector Machine algorithm for regression and classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Chenggang; Zavaljevski, Nela
2001-08-01
The software is an implementation of the Support Vector Machine (SVM) algorithm that was invented and developed by Vladimir Vapnik and his co-workers at AT&T Bell Laboratories. The specific implementation reported here is an Active Set method for solving a quadratic optimization problem that forms the major part of any SVM program. The implementation is tuned to specific constraints generated in the SVM learning. Thus, it is more efficient than general-purpose quadratic optimization programs. A decomposition method has been implemented in the software that enables processing large data sets. The size of the learning data is virtually unlimited by themore » capacity of the computer physical memory. The software is flexible and extensible. Two upper bounds are implemented to regulate the SVM learning for classification, which allow users to adjust the false positive and false negative rates. The software can be used either as a standalone, general-purpose SVM regression or classification program, or be embedded into a larger software system.« less
Loosli, Gaelle; Canu, Stephane; Ong, Cheng Soon
2016-06-01
This paper presents a theoretical foundation for an SVM solver in Kreĭn spaces. Up to now, all methods are based either on the matrix correction, or on non-convex minimization, or on feature-space embedding. Here we justify and evaluate a solution that uses the original (indefinite) similarity measure, in the original Kreĭn space. This solution is the result of a stabilization procedure. We establish the correspondence between the stabilization problem (which has to be solved) and a classical SVM based on minimization (which is easy to solve). We provide simple equations to go from one to the other (in both directions). This link between stabilization and minimization problems is the key to obtain a solution in the original Kreĭn space. Using KSVM, one can solve SVM with usually troublesome kernels (large negative eigenvalues or large numbers of negative eigenvalues). We show experiments showing that our algorithm KSVM outperforms all previously proposed approaches to deal with indefinite matrices in SVM-like kernel methods.
SVM classifier on chip for melanoma detection.
Afifi, Shereen; GholamHosseini, Hamid; Sinha, Roopak
2017-07-01
Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM-based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy.
A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.
Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner.
Gao, Xiang-Ming; Yang, Shi-Feng; Pan, San-Bo
2017-01-01
Predicting the output power of photovoltaic system with nonstationarity and randomness, an output power prediction model for grid-connected PV systems is proposed based on empirical mode decomposition (EMD) and support vector machine (SVM) optimized with an artificial bee colony (ABC) algorithm. First, according to the weather forecast data sets on the prediction date, the time series data of output power on a similar day with 15-minute intervals are built. Second, the time series data of the output power are decomposed into a series of components, including some intrinsic mode components IMFn and a trend component Res, at different scales using EMD. The corresponding SVM prediction model is established for each IMF component and trend component, and the SVM model parameters are optimized with the artificial bee colony algorithm. Finally, the prediction results of each model are reconstructed, and the predicted values of the output power of the grid-connected PV system can be obtained. The prediction model is tested with actual data, and the results show that the power prediction model based on the EMD and ABC-SVM has a faster calculation speed and higher prediction accuracy than do the single SVM prediction model and the EMD-SVM prediction model without optimization.
Training set extension for SVM ensemble in P300-speller with familiar face paradigm.
Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou
2018-03-27
P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.
NASA Astrophysics Data System (ADS)
Wu, Di; He, Yong
2007-11-01
The aim of this study is to investigate the potential of the visible and near infrared spectroscopy (Vis/NIRS) technique for non-destructive measurement of soluble solids contents (SSC) in grape juice beverage. 380 samples were studied in this paper. Smoothing way of Savitzky-Golay and standard normal variate were applied for the pre-processing of spectral data. Least-squares support vector machines (LS-SVM) with RBF kernel function was applied to developing the SSC prediction model based on the Vis/NIRS absorbance data. The determination coefficient for prediction (Rp2) of the results predicted by LS-SVM model was 0. 962 and root mean square error (RMSEP) was 0. 434137. It is concluded that Vis/NIRS technique can quantify the SSC of grape juice beverage fast and non-destructively.. At the same time, LS-SVM model was compared with PLS and back propagation neural network (BP-NN) methods. The results showed that LS-SVM was superior to the conventional linear and non-linear methods in predicting SSC of grape juice beverage. In this study, the generation ability of LS-SVM, PLS and BP-NN models were also investigated. It is concluded that LS-SVM regression method is a promising technique for chemometrics in quantitative prediction.
Lin, Yi; Cai, Fu-Ying; Zhang, Guang-Ya
2007-01-01
A quantitative structure-property relationship (QSPR) model in terms of amino acid composition and the activity of Bacillus thuringiensis insecticidal crystal proteins was established. Support vector machine (SVM) is a novel general machine-learning tool based on the structural risk minimization principle that exhibits good generalization when fault samples are few; it is especially suitable for classification, forecasting, and estimation in cases where small amounts of samples are involved such as fault diagnosis; however, some parameters of SVM are selected based on the experience of the operator, which has led to decreased efficiency of SVM in practical application. The uniform design (UD) method was applied to optimize the running parameters of SVM. It was found that the average accuracy rate approached 73% when the penalty factor was 0.01, the epsilon 0.2, the gamma 0.05, and the range 0.5. The results indicated that UD might be used an effective method to optimize the parameters of SVM and SVM and could be used as an alternative powerful modeling tool for QSPR studies of the activity of Bacillus thuringiensis (Bt) insecticidal crystal proteins. Therefore, a novel method for predicting the insecticidal activity of Bt insecticidal crystal proteins was proposed by the authors of this study.
Shahlaei, M.; Saghaie, L.
2014-01-01
A quantitative structure–activity relationship (QSAR) study is suggested for the prediction of biological activity (pIC50) of 3, 4-dihydropyrido [3,2-d] pyrimidone derivatives as p38 inhibitors. Modeling of the biological activities of compounds of interest as a function of molecular structures was established by means of principal component analysis (PCA) and least square support vector machine (LS-SVM) methods. The results showed that the pIC50 values calculated by LS-SVM are in good agreement with the experimental data, and the performance of the LS-SVM regression model is superior to the PCA-based model. The developed LS-SVM model was applied for the prediction of the biological activities of pyrimidone derivatives, which were not in the modeling procedure. The resulted model showed high prediction ability with root mean square error of prediction of 0.460 for LS-SVM. The study provided a novel and effective approach for predicting biological activities of 3, 4-dihydropyrido [3,2-d] pyrimidone derivatives as p38 inhibitors and disclosed that LS-SVM can be used as a powerful chemometrics tool for QSAR studies. PMID:26339262
A Mass Spectrometric Analysis Method Based on PPCA and SVM for Early Detection of Ovarian Cancer.
Wu, Jiang; Ji, Yanju; Zhao, Ling; Ji, Mengying; Ye, Zhuang; Li, Suyi
2016-01-01
Background. Surfaced-enhanced laser desorption-ionization-time of flight mass spectrometry (SELDI-TOF-MS) technology plays an important role in the early diagnosis of ovarian cancer. However, the raw MS data is highly dimensional and redundant. Therefore, it is necessary to study rapid and accurate detection methods from the massive MS data. Methods. The clinical data set used in the experiments for early cancer detection consisted of 216 SELDI-TOF-MS samples. An MS analysis method based on probabilistic principal components analysis (PPCA) and support vector machine (SVM) was proposed and applied to the ovarian cancer early classification in the data set. Additionally, by the same data set, we also established a traditional PCA-SVM model. Finally we compared the two models in detection accuracy, specificity, and sensitivity. Results. Using independent training and testing experiments 10 times to evaluate the ovarian cancer detection models, the average prediction accuracy, sensitivity, and specificity of the PCA-SVM model were 83.34%, 82.70%, and 83.88%, respectively. In contrast, those of the PPCA-SVM model were 90.80%, 92.98%, and 88.97%, respectively. Conclusions. The PPCA-SVM model had better detection performance. And the model combined with the SELDI-TOF-MS technology had a prospect in early clinical detection and diagnosis of ovarian cancer.
Mei, Suyu; Zhu, Hao
2015-01-26
Protein-protein interaction (PPI) prediction is generally treated as a problem of binary classification wherein negative data sampling is still an open problem to be addressed. The commonly used random sampling is prone to yield less representative negative data with considerable false negatives. Meanwhile rational constraints are seldom exerted on model selection to reduce the risk of false positive predictions for most of the existing computational methods. In this work, we propose a novel negative data sampling method based on one-class SVM (support vector machine, SVM) to predict proteome-wide protein interactions between HTLV retrovirus and Homo sapiens, wherein one-class SVM is used to choose reliable and representative negative data, and two-class SVM is used to yield proteome-wide outcomes as predictive feedback for rational model selection. Computational results suggest that one-class SVM is more suited to be used as negative data sampling method than two-class PPI predictor, and the predictive feedback constrained model selection helps to yield a rational predictive model that reduces the risk of false positive predictions. Some predictions have been validated by the recent literature. Lastly, gene ontology based clustering of the predicted PPI networks is conducted to provide valuable cues for the pathogenesis of HTLV retrovirus.
A Sensor Dynamic Measurement Error Prediction Model Based on NAPSO-SVM
Jiang, Minlan; Jiang, Lan; Jiang, Dingde; Li, Fei
2018-01-01
Dynamic measurement error correction is an effective way to improve sensor precision. Dynamic measurement error prediction is an important part of error correction, and support vector machine (SVM) is often used for predicting the dynamic measurement errors of sensors. Traditionally, the SVM parameters were always set manually, which cannot ensure the model’s performance. In this paper, a SVM method based on an improved particle swarm optimization (NAPSO) is proposed to predict the dynamic measurement errors of sensors. Natural selection and simulated annealing are added in the PSO to raise the ability to avoid local optima. To verify the performance of NAPSO-SVM, three types of algorithms are selected to optimize the SVM’s parameters: the particle swarm optimization algorithm (PSO), the improved PSO optimization algorithm (NAPSO), and the glowworm swarm optimization (GSO). The dynamic measurement error data of two sensors are applied as the test data. The root mean squared error and mean absolute percentage error are employed to evaluate the prediction models’ performances. The experimental results show that among the three tested algorithms the NAPSO-SVM method has a better prediction precision and a less prediction errors, and it is an effective method for predicting the dynamic measurement errors of sensors. PMID:29342942
2017-01-01
Predicting the output power of photovoltaic system with nonstationarity and randomness, an output power prediction model for grid-connected PV systems is proposed based on empirical mode decomposition (EMD) and support vector machine (SVM) optimized with an artificial bee colony (ABC) algorithm. First, according to the weather forecast data sets on the prediction date, the time series data of output power on a similar day with 15-minute intervals are built. Second, the time series data of the output power are decomposed into a series of components, including some intrinsic mode components IMFn and a trend component Res, at different scales using EMD. The corresponding SVM prediction model is established for each IMF component and trend component, and the SVM model parameters are optimized with the artificial bee colony algorithm. Finally, the prediction results of each model are reconstructed, and the predicted values of the output power of the grid-connected PV system can be obtained. The prediction model is tested with actual data, and the results show that the power prediction model based on the EMD and ABC-SVM has a faster calculation speed and higher prediction accuracy than do the single SVM prediction model and the EMD-SVM prediction model without optimization. PMID:28912803
Liu, Xue-Mei; Liu, Jian-She
2012-11-01
Visible infrared spectroscopy (Vis/SW-NIRS) was investigated in the present study for measurement accuracy of soil properties,namely, available nitrogen(N) and available potassium(K). Three types of pretreatments including standard normal variate (SNV), multiplicative scattering correction (MSC) and Savitzky-Golay smoothing+first derivative were adopted to eliminate the system noises and external disturbances. Then partial least squares (PLS) and least squares-support vector machine (LS-SVM) models analysis were implemented for calibration models. Simultaneously, the performance of least squares-support vector machine (LS-SVM) models was compared with three kinds of inputs, including PCA(PCs), latent variables (LVs), and effective wavelengths (EWs). The results indicated that all LS-SVM models outperformed PLS models. The performance of the model was evaluated by the correlation coefficient (r2) and RMSEP. The optimal EWs-LS-SVM models were achieved, and the correlation coefficient (r2) and RMSEP were 0.82 and 17.2 for N and 0.72 and 15.0 for K, respectively. The results indicated that visible and short wave-near infrared spectroscopy (Vis/SW-NIRS)(325-1 075 nm) combined with LS-SVM could be utilized as a precision method for the determination of soil properties.
Lu, Xinjiang; Liu, Wenbo; Zhou, Chuang; Huang, Minghui
2017-06-13
The least-squares support vector machine (LS-SVM) is a popular data-driven modeling method and has been successfully applied to a wide range of applications. However, it has some disadvantages, including being ineffective at handling non-Gaussian noise as well as being sensitive to outliers. In this paper, a robust LS-SVM method is proposed and is shown to have more reliable performance when modeling a nonlinear system under conditions where Gaussian or non-Gaussian noise is present. The construction of a new objective function allows for a reduction of the mean of the modeling error as well as the minimization of its variance, and it does not constrain the mean of the modeling error to zero. This differs from the traditional LS-SVM, which uses a worst-case scenario approach in order to minimize the modeling error and constrains the mean of the modeling error to zero. In doing so, the proposed method takes the modeling error distribution information into consideration and is thus less conservative and more robust in regards to random noise. A solving method is then developed in order to determine the optimal parameters for the proposed robust LS-SVM. An additional analysis indicates that the proposed LS-SVM gives a smaller weight to a large-error training sample and a larger weight to a small-error training sample, and is thus more robust than the traditional LS-SVM. The effectiveness of the proposed robust LS-SVM is demonstrated using both artificial and real life cases.
Leong, Max K.; Syu, Ren-Guei; Ding, Yi-Lung; Weng, Ching-Feng
2017-01-01
The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r2 = 0.928–0.988, = 0.894–0.954, RMSE = 0.002–0.412, s = 0.001–0.214), and the predicted pKi values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r2 = 0.967, = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q2 = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery. PMID:28059133
Leong, Max K; Syu, Ren-Guei; Ding, Yi-Lung; Weng, Ching-Feng
2017-01-06
The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r 2 = 0.928-0.988, = 0.894-0.954, RMSE = 0.002-0.412, s = 0.001-0.214), and the predicted pK i values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r 2 = 0.967, = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q 2 = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery.
Signal peptide discrimination and cleavage site identification using SVM and NN.
Kazemian, H B; Yusuf, S A; White, K
2014-02-01
About 15% of all proteins in a genome contain a signal peptide (SP) sequence, at the N-terminus, that targets the protein to intracellular secretory pathways. Once the protein is targeted correctly in the cell, the SP is cleaved, releasing the mature protein. Accurate prediction of the presence of these short amino-acid SP chains is crucial for modelling the topology of membrane proteins, since SP sequences can be confused with transmembrane domains due to similar composition of hydrophobic amino acids. This paper presents a cascaded Support Vector Machine (SVM)-Neural Network (NN) classification methodology for SP discrimination and cleavage site identification. The proposed method utilises a dual phase classification approach using SVM as a primary classifier to discriminate SP sequences from Non-SP. The methodology further employs NNs to predict the most suitable cleavage site candidates. In phase one, a SVM classification utilises hydrophobic propensities as a primary feature vector extraction using symmetric sliding window amino-acid sequence analysis for discrimination of SP and Non-SP. In phase two, a NN classification uses asymmetric sliding window sequence analysis for prediction of cleavage site identification. The proposed SVM-NN method was tested using Uni-Prot non-redundant datasets of eukaryotic and prokaryotic proteins with SP and Non-SP N-termini. Computer simulation results demonstrate an overall accuracy of 0.90 for SP and Non-SP discrimination based on Matthews Correlation Coefficient (MCC) tests using SVM. For SP cleavage site prediction, the overall accuracy is 91.5% based on cross-validation tests using the novel SVM-NN model. © 2013 Published by Elsevier Ltd.
Zhang, Shanxin; Zhou, Zhiping; Chen, Xinmeng; Hu, Yong; Yang, Lindong
2017-08-07
DNase I hypersensitive sites (DHSs) are accessible chromatin regions hypersensitive to cleavages by DNase I endonucleases. DHSs are indicative of cis-regulatory DNA elements (CREs), all of which play important roles in global gene expression regulation. It is helpful for discovering CREs by recognition of DHSs in genome. To accelerate the investigation, it is an important complement to develop cost-effective computational methods to identify DHSs. However, there is a lack of tools used for identifying DHSs in plant genome. Here we presented pDHS-SVM, a computational predictor to identify plant DHSs. To integrate the global sequence-order information and local DNA properties, reverse complement kmer and dinucleotide-based auto covariance of DNA sequences were applied to construct the feature space. In this work, fifteen physical-chemical properties of dinucleotides were used and Support Vector Machine (SVM) was employed. To further improve the performance of the predictor and extract an optimized subset of nucleotide physical-chemical properties positive for the DHSs, a heuristic nucleotide physical-chemical property selection algorithm was introduced. With the optimized subset of properties, experimental results of Arabidopsis thaliana and rice (Oryza sativa) showed that pDHS-SVM could achieve accuracies up to 87.00%, and 85.79%, respectively. The results indicated the effectiveness of proposed method for predicting DHSs. Furthermore, pDHS-SVM could provide a helpful complement for predicting CREs in plant genome. Our implementation of the novel proposed method pDHS-SVM is freely available as source code, at https://github.com/shanxinzhang/pDHS-SVM. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cheng, Shu-Xi; Xie, Chuan-Qi; Wang, Qiao-Nan; He, Yong; Shao, Yong-Ni
2014-05-01
Identification of early blight on tomato leaves by using hyperspectral imaging technique based on different effective wavelengths selection methods (successive projections algorithm, SPA; x-loading weights, x-LW; gram-schmidt orthogonaliza-tion, GSO) was studied in the present paper. Hyperspectral images of seventy healthy and seventy infected tomato leaves were obtained by hyperspectral imaging system across the wavelength range of 380-1023 nm. Reflectance of all pixels in region of interest (ROI) was extracted by ENVI 4. 7 software. Least squares-support vector machine (LS-SVM) model was established based on the full spectral wavelengths. It obtained an excellent result with the highest identification accuracy (100%) in both calibration and prediction sets. Then, EW-LS-SVM and EW-LDA models were established based on the selected wavelengths suggested by SPA, x-LW and GSO, respectively. The results showed that all of the EW-LS-SVM and EW-LDA models performed well with the identification accuracy of 100% in EW-LS-SVM model and 100%, 100% and 97. 83% in EW-LDA model, respectively. Moreover, the number of input wavelengths of SPA-LS-SVM, x-LW-LS-SVM and GSO-LS-SVM models were four (492, 550, 633 and 680 nm), three (631, 719 and 747 nm) and two (533 and 657 nm), respectively. Fewer input variables were beneficial for the development of identification instrument. It demonstrated that it is feasible to identify early blight on tomato leaves by using hyperspectral imaging, and SPA, x-LW and GSO were effective wavelengths selection methods.
NASA Astrophysics Data System (ADS)
Yang, Dong; Lu, Anxiang; Ren, Dong; Wang, Jihua
2017-11-01
This study explored the feasibility of rapid detection of biogenic amines (BAs) in cooked beef during the storage process using hyperspectral imaging technique combined with sparse representation (SR) algorithm. The hyperspectral images of samples were collected in the two spectral ranges of 400-1000 nm and 1000-1800 nm, separately. The spectral data were reduced dimensionality by SR and principal component analysis (PCA) algorithms, and then integrated the least square support vector machine (LS-SVM) to build the SR-LS-SVM and PC-LS-SVM models for the prediction of BAs values in cooked beef. The results showed that the SR-LS-SVM model exhibited the best predictive ability with determination coefficients (RP2) of 0.943 and root mean square errors (RMSEP) of 1.206 in the range of 400-1000 nm of prediction set. The SR and PCA algorithms were further combined to establish the best SR-PC-LS-SVM model for BAs prediction, which had high RP2of 0.969 and low RMSEP of 1.039 in the region of 400-1000 nm. The visual map of the BAs was generated using the best SR-PC-LS-SVM model with imaging process algorithms, which could be used to observe the changes of BAs in cooked beef more intuitively. The study demonstrated that hyperspectral imaging technique combined with sparse representation were able to detect effectively the BAs values in cooked beef during storage and the built SR-PC-LS-SVM model had a potential for rapid and accurate determination of freshness indexes in other meat and meat products.
NASA Astrophysics Data System (ADS)
Leong, Max K.; Syu, Ren-Guei; Ding, Yi-Lung; Weng, Ching-Feng
2017-01-01
The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r2 = 0.928-0.988, = 0.894-0.954, RMSE = 0.002-0.412, s = 0.001-0.214), and the predicted pKi values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r2 = 0.967, = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q2 = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery.
NASA Astrophysics Data System (ADS)
Sehad, Mounir; Lazri, Mourad; Ameur, Soltane
2017-03-01
In this work, a new rainfall estimation technique based on the high spatial and temporal resolution of the Spinning Enhanced Visible and Infra Red Imager (SEVIRI) aboard the Meteosat Second Generation (MSG) is presented. This work proposes efficient scheme rainfall estimation based on two multiclass support vector machine (SVM) algorithms: SVM_D for daytime and SVM_N for night time rainfall estimations. Both SVM models are trained using relevant rainfall parameters based on optical, microphysical and textural cloud proprieties. The cloud parameters are derived from the Spectral channels of the SEVIRI MSG radiometer. The 3-hourly and daily accumulated rainfall are derived from the 15 min-rainfall estimation given by the SVM classifiers for each MSG observation image pixel. The SVMs were trained with ground meteorological radar precipitation scenes recorded from November 2006 to March 2007 over the north of Algeria located in the Mediterranean region. Further, the SVM_D and SVM_N models were used to estimate 3-hourly and daily rainfall using data set gathered from November 2010 to March 2011 over north Algeria. The results were validated against collocated rainfall observed by rain gauge network. Indeed, the statistical scores given by correlation coefficient, bias, root mean square error and mean absolute error, showed good accuracy of rainfall estimates by the present technique. Moreover, rainfall estimates of our technique were compared with two high accuracy rainfall estimates methods based on MSG SEVIRI imagery namely: random forests (RF) based approach and an artificial neural network (ANN) based technique. The findings of the present technique indicate higher correlation coefficient (3-hourly: 0.78; daily: 0.94), and lower mean absolute error and root mean square error values. The results show that the new technique assign 3-hourly and daily rainfall with good and better accuracy than ANN technique and (RF) model.
An improvement of vehicle detection under shadow regions in satellite imagery
NASA Astrophysics Data System (ADS)
Karim, Shahid; Zhang, Ye; Ali, Saad; Asif, Muhammad Rizwan
2018-04-01
The processing of satellite imagery is dependent upon the quality of imagery. Due to low resolution, it is difficult to extract accurate information according to the requirements of applications. For the purpose of vehicle detection under shadow regions, we have used HOG for feature extraction, SVM is used for classification and HOG is discerned worthwhile tool for complex environments. Shadow images have been scrutinized and found very complex for detection as observed very low detection rates therefore our dedication is towards enhancement of detection rate under shadow regions by implementing appropriate preprocessing. Vehicles are precisely detected under non-shadow regions with high detection rate than shadow regions.
Recursive SVM biomarker selection for early detection of breast cancer in peripheral blood.
Zhang, Fan; Kaufman, Howard L; Deng, Youping; Drabier, Renee
2013-01-01
Breast cancer is worldwide the second most common type of cancer after lung cancer. Traditional mammography and Tissue Microarray has been studied for early cancer detection and cancer prediction. However, there is a need for more reliable diagnostic tools for early detection of breast cancer. This can be a challenge due to a number of factors and logistics. First, obtaining tissue biopsies can be difficult. Second, mammography may not detect small tumors, and is often unsatisfactory for younger women who typically have dense breast tissue. Lastly, breast cancer is not a single homogeneous disease but consists of multiple disease states, each arising from a distinct molecular mechanism and having a distinct clinical progression path which makes the disease difficult to detect and predict in early stages. In the paper, we present a Support Vector Machine based on Recursive Feature Elimination and Cross Validation (SVM-RFE-CV) algorithm for early detection of breast cancer in peripheral blood and show how to use SVM-RFE-CV to model the classification and prediction problem of early detection of breast cancer in peripheral blood.The training set which consists of 32 health and 33 cancer samples and the testing set consisting of 31 health and 34 cancer samples were randomly separated from a dataset of peripheral blood of breast cancer that is downloaded from Gene Express Omnibus. First, we identified the 42 differentially expressed biomarkers between "normal" and "cancer". Then, with the SVM-RFE-CV we extracted 15 biomarkers that yield zero cross validation score. Lastly, we compared the classification and prediction performance of SVM-RFE-CV with that of SVM and SVM Recursive Feature Elimination (SVM-RFE). We found that 1) the SVM-RFE-CV is suitable for analyzing noisy high-throughput microarray data, 2) it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features, and 3) it can improve the prediction performance (Area Under Curve) in the testing data set from 0.5826 to 0.7879. Further pathway analysis showed that the biomarkers are associated with Signaling, Hemostasis, Hormones, and Immune System, which are consistent with previous findings. Our prediction model can serve as a general model for biomarker discovery in early detection of other cancers. In the future, Polymerase Chain Reaction (PCR) is planned for validation of the ability of these potential biomarkers for early detection of breast cancer.
NASA Astrophysics Data System (ADS)
Alagha, Jawad S.; Seyam, Mohammed; Md Said, Md Azlin; Mogheir, Yunes
2017-12-01
Artificial intelligence (AI) techniques have increasingly become efficient alternative modeling tools in the water resources field, particularly when the modeled process is influenced by complex and interrelated variables. In this study, two AI techniques—artificial neural networks (ANNs) and support vector machine (SVM)—were employed to achieve deeper understanding of the salinization process (represented by chloride concentration) in complex coastal aquifers influenced by various salinity sources. Both models were trained using 11 years of groundwater quality data from 22 municipal wells in Khan Younis Governorate, Gaza, Palestine. Both techniques showed satisfactory prediction performance, where the mean absolute percentage error (MAPE) and correlation coefficient ( R) for the test data set were, respectively, about 4.5 and 99.8% for the ANNs model, and 4.6 and 99.7% for SVM model. The performances of the developed models were further noticeably improved through preprocessing the wells data set using a k-means clustering method, then conducting AI techniques separately for each cluster. The developed models with clustered data were associated with higher performance, easiness and simplicity. They can be employed as an analytical tool to investigate the influence of input variables on coastal aquifer salinity, which is of great importance for understanding salinization processes, leading to more effective water-resources-related planning and decision making.
Li, Liwei; Wang, Bo; Meroueh, Samy O
2011-09-26
The community structure-activity resource (CSAR) data sets are used to develop and test a support vector machine-based scoring function in regression mode (SVR). Two scoring functions (SVR-KB and SVR-EP) are derived with the objective of reproducing the trend of the experimental binding affinities provided within the two CSAR data sets. The features used to train SVR-KB are knowledge-based pairwise potentials, while SVR-EP is based on physicochemical properties. SVR-KB and SVR-EP were compared to seven other widely used scoring functions, including Glide, X-score, GoldScore, ChemScore, Vina, Dock, and PMF. Results showed that SVR-KB trained with features obtained from three-dimensional complexes of the PDBbind data set outperformed all other scoring functions, including best performing X-score, by nearly 0.1 using three correlation coefficients, namely Pearson, Spearman, and Kendall. It was interesting that higher performance in rank ordering did not translate into greater enrichment in virtual screening assessed using the 40 targets of the Directory of Useful Decoys (DUD). To remedy this situation, a variant of SVR-KB (SVR-KBD) was developed by following a target-specific tailoring strategy that we had previously employed to derive SVM-SP. SVR-KBD showed a much higher enrichment, outperforming all other scoring functions tested, and was comparable in performance to our previously derived scoring function SVM-SP.
Rock breaking methods to replace blasting
NASA Astrophysics Data System (ADS)
Zhou, Huisheng; Xie, Xinghua; Feng, Yuqing
2018-03-01
The method of breaking rock by blasting has a high efficiency and the cost is relatively low, but the associated vibration, flyrock, production of toxic gases since the 1970’s, the Western developed countries began to study the safety of breaking rock. This paper introduces different methods and their progress to safely break rock. Ideally, safe rock breaking would have little vibration, no fly stone, and no toxic gases, which can be widely used in municipal engineering, road excavation, high-risk mining, quarrying and complex environment.
STAR-GALAXY CLASSIFICATION IN MULTI-BAND OPTICAL IMAGING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fadely, Ross; Willman, Beth; Hogg, David W.
2012-11-20
Ground-based optical surveys such as PanSTARRS, DES, and LSST will produce large catalogs to limiting magnitudes of r {approx}> 24. Star-galaxy separation poses a major challenge to such surveys because galaxies-even very compact galaxies-outnumber halo stars at these depths. We investigate photometric classification techniques on stars and galaxies with intrinsic FWHM <0.2 arcsec. We consider unsupervised spectral energy distribution template fitting and supervised, data-driven support vector machines (SVMs). For template fitting, we use a maximum likelihood (ML) method and a new hierarchical Bayesian (HB) method, which learns the prior distribution of template probabilities from the data. SVM requires training datamore » to classify unknown sources; ML and HB do not. We consider (1) a best-case scenario (SVM{sub best}) where the training data are (unrealistically) a random sampling of the data in both signal-to-noise and demographics and (2) a more realistic scenario where training is done on higher signal-to-noise data (SVM{sub real}) at brighter apparent magnitudes. Testing with COSMOS ugriz data, we find that HB outperforms ML, delivering {approx}80% completeness, with purity of {approx}60%-90% for both stars and galaxies. We find that no algorithm delivers perfect performance and that studies of metal-poor main-sequence turnoff stars may be challenged by poor star-galaxy separation. Using the Receiver Operating Characteristic curve, we find a best-to-worst ranking of SVM{sub best}, HB, ML, and SVM{sub real}. We conclude, therefore, that a well-trained SVM will outperform template-fitting methods. However, a normally trained SVM performs worse. Thus, HB template fitting may prove to be the optimal classification method in future surveys.« less
PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons.
Long, Yi; Du, Zhi-Jiang; Wang, Wei-Dong; Zhao, Guang-Yu; Xu, Guo-Qiang; He, Long; Mao, Xi-Wang; Dong, Wei
2016-09-02
Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM) optimized by particle swarm optimization (PSO) to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS) attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz), a three-layer wavelet packet analysis (WPA) is used for feature extraction, after which, the kernel principal component analysis (kPCA) is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA) is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance.
PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons
Long, Yi; Du, Zhi-Jiang; Wang, Wei-Dong; Zhao, Guang-Yu; Xu, Guo-Qiang; He, Long; Mao, Xi-Wang; Dong, Wei
2016-01-01
Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM) optimized by particle swarm optimization (PSO) to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS) attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz), a three-layer wavelet packet analysis (WPA) is used for feature extraction, after which, the kernel principal component analysis (kPCA) is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA) is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance. PMID:27598160
Paiva, Joana S; Cardoso, João; Pereira, Tânia
2018-01-01
The main goal of this study was to develop an automatic method based on supervised learning methods, able to distinguish healthy from pathologic arterial pulse wave (APW), and those two from noisy waveforms (non-relevant segments of the signal), from the data acquired during a clinical examination with a novel optical system. The APW dataset analysed was composed by signals acquired in a clinical environment from a total of 213 subjects, including healthy volunteers and non-healthy patients. The signals were parameterised by means of 39pulse features: morphologic, time domain statistics, cross-correlation features, wavelet features. Multiclass Support Vector Machine Recursive Feature Elimination (SVM RFE) method was used to select the most relevant features. A comparative study was performed in order to evaluate the performance of the two classifiers: Support Vector Machine (SVM) and Artificial Neural Network (ANN). SVM achieved a statistically significant better performance for this problem with an average accuracy of 0.9917±0.0024 and a F-Measure of 0.9925±0.0019, in comparison with ANN, which reached the values of 0.9847±0.0032 and 0.9852±0.0031 for Accuracy and F-Measure, respectively. A significant difference was observed between the performances obtained with SVM classifier using a different number of features from the original set available. The comparison between SVM and NN allowed reassert the higher performance of SVM. The results obtained in this study showed the potential of the proposed method to differentiate those three important signal outcomes (healthy, pathologic and noise) and to reduce bias associated with clinical diagnosis of cardiovascular disease using APW. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Kocurek, Michael J.
2005-01-01
The HARVIST project seeks to automatically provide an accurate, interactive interface to predict crop yield over the entire United States. In order to accomplish this goal, large images must be quickly and automatically classified by crop type. Current trained and untrained classification algorithms, while accurate, are highly inefficient when operating on large datasets. This project sought to develop new variants of two standard trained and untrained classification algorithms that are optimized to take advantage of the spatial nature of image data. The first algorithm, harvist-cluster, utilizes divide-and-conquer techniques to precluster an image in the hopes of increasing overall clustering speed. The second algorithm, harvistSVM, utilizes support vector machines (SVMs), a type of trained classifier. It seeks to increase classification speed by applying a "meta-SVM" to a quick (but inaccurate) SVM to approximate a slower, yet more accurate, SVM. Speedups were achieved by tuning the algorithm to quickly identify when the quick SVM was incorrect, and then reclassifying low-confidence pixels as necessary. Comparing the classification speeds of both algorithms to known baselines showed a slight speedup for large values of k (the number of clusters) for harvist-cluster, and a significant speedup for harvistSVM. Future work aims to automate the parameter tuning process required for harvistSVM, and further improve classification accuracy and speed. Additionally, this research will move documents created in Canvas into ArcGIS. The launch of the Mars Reconnaissance Orbiter (MRO) will provide a wealth of image data such as global maps of Martian weather and high resolution global images of Mars. The ability to store this new data in a georeferenced format will support future Mars missions by providing data for landing site selection and the search for water on Mars.
Xiao, Chuncai; Hao, Kuangrong; Ding, Yongsheng
2014-12-30
This paper creates a bi-directional prediction model to predict the performance of carbon fiber and the productive parameters based on a support vector machine (SVM) and improved particle swarm optimization (IPSO) algorithm (SVM-IPSO). In the SVM, it is crucial to select the parameters that have an important impact on the performance of prediction. The IPSO is proposed to optimize them, and then the SVM-IPSO model is applied to the bi-directional prediction of carbon fiber production. The predictive accuracy of SVM is mainly dependent on its parameters, and IPSO is thus exploited to seek the optimal parameters for SVM in order to improve its prediction capability. Inspired by a cell communication mechanism, we propose IPSO by incorporating information of the global best solution into the search strategy to improve exploitation, and we employ IPSO to establish the bi-directional prediction model: in the direction of the forward prediction, we consider productive parameters as input and property indexes as output; in the direction of the backward prediction, we consider property indexes as input and productive parameters as output, and in this case, the model becomes a scheme design for novel style carbon fibers. The results from a set of the experimental data show that the proposed model can outperform the radial basis function neural network (RNN), the basic particle swarm optimization (PSO) method and the hybrid approach of genetic algorithm and improved particle swarm optimization (GA-IPSO) method in most of the experiments. In other words, simulation results demonstrate the effectiveness and advantages of the SVM-IPSO model in dealing with the problem of forecasting.
Pizarro, Ricardo A; Cheng, Xi; Barnett, Alan; Lemaitre, Herve; Verchinski, Beth A; Goldman, Aaron L; Xiao, Ena; Luo, Qian; Berman, Karen F; Callicott, Joseph H; Weinberger, Daniel R; Mattay, Venkata S
2016-01-01
High-resolution three-dimensional magnetic resonance imaging (3D-MRI) is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM) algorithm in the quality assessment of structural brain images, using global and region of interest (ROI) automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy) of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.
Ye, Fei; Lou, Xin Yuan; Sun, Lin Fu
2017-01-01
This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm's performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem.
Mourão-Miranda, Janaina; Hardoon, David R.; Hahn, Tim; Marquand, Andre F.; Williams, Steve C.R.; Shawe-Taylor, John; Brammer, Michael
2011-01-01
Pattern recognition approaches, such as the Support Vector Machine (SVM), have been successfully used to classify groups of individuals based on their patterns of brain activity or structure. However these approaches focus on finding group differences and are not applicable to situations where one is interested in accessing deviations from a specific class or population. In the present work we propose an application of the one-class SVM (OC-SVM) to investigate if patterns of fMRI response to sad facial expressions in depressed patients would be classified as outliers in relation to patterns of healthy control subjects. We defined features based on whole brain voxels and anatomical regions. In both cases we found a significant correlation between the OC-SVM predictions and the patients' Hamilton Rating Scale for Depression (HRSD), i.e. the more depressed the patients were the more of an outlier they were. In addition the OC-SVM split the patient groups into two subgroups whose membership was associated with future response to treatment. When applied to region-based features the OC-SVM classified 52% of patients as outliers. However among the patients classified as outliers 70% did not respond to treatment and among those classified as non-outliers 89% responded to treatment. In addition 89% of the healthy controls were classified as non-outliers. PMID:21723950
Lou, Xin Yuan; Sun, Lin Fu
2017-01-01
This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm’s performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem. PMID:28369096
NASA Astrophysics Data System (ADS)
Meyer, Hanna; Kühnlein, Meike; Appelhans, Tim; Nauss, Thomas
2016-03-01
Machine learning (ML) algorithms have successfully been demonstrated to be valuable tools in satellite-based rainfall retrievals which show the practicability of using ML algorithms when faced with high dimensional and complex data. Moreover, recent developments in parallel computing with ML present new possibilities for training and prediction speed and therefore make their usage in real-time systems feasible. This study compares four ML algorithms - random forests (RF), neural networks (NNET), averaged neural networks (AVNNET) and support vector machines (SVM) - for rainfall area detection and rainfall rate assignment using MSG SEVIRI data over Germany. Satellite-based proxies for cloud top height, cloud top temperature, cloud phase and cloud water path serve as predictor variables. The results indicate an overestimation of rainfall area delineation regardless of the ML algorithm (averaged bias = 1.8) but a high probability of detection ranging from 81% (SVM) to 85% (NNET). On a 24-hour basis, the performance of the rainfall rate assignment yielded R2 values between 0.39 (SVM) and 0.44 (AVNNET). Though the differences in the algorithms' performance were rather small, NNET and AVNNET were identified as the most suitable algorithms. On average, they demonstrated the best performance in rainfall area delineation as well as in rainfall rate assignment. NNET's computational speed is an additional advantage in work with large datasets such as in remote sensing based rainfall retrievals. However, since no single algorithm performed considerably better than the others we conclude that further research in providing suitable predictors for rainfall is of greater necessity than an optimization through the choice of the ML algorithm.
Van Esbroeck, Alexander; Rubinfeld, Ilan; Hall, Bruce; Syed, Zeeshan
2014-11-01
To investigate the use of machine learning to empirically determine the risk of individual surgical procedures and to improve surgical models with this information. American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) data from 2005 to 2009 were used to train support vector machine (SVM) classifiers to learn the relationship between textual constructs in current procedural terminology (CPT) descriptions and mortality, morbidity, Clavien 4 complications, and surgical-site infections (SSI) within 30 days of surgery. The procedural risk scores produced by the SVM classifiers were validated on data from 2010 in univariate and multivariate analyses. The procedural risk scores produced by the SVM classifiers achieved moderate-to-high levels of discrimination in univariate analyses (area under receiver operating characteristic curve: 0.871 for mortality, 0.789 for morbidity, 0.791 for SSI, 0.845 for Clavien 4 complications). Addition of these scores also substantially improved multivariate models comprising patient factors and previously proposed correlates of procedural risk (net reclassification improvement and integrated discrimination improvement: 0.54 and 0.001 for mortality, 0.46 and 0.011 for morbidity, 0.68 and 0.022 for SSI, 0.44 and 0.001 for Clavien 4 complications; P < .05 for all comparisons). Similar improvements were noted in discrimination and calibration for other statistical measures, and in subcohorts comprising patients with general or vascular surgery. Machine learning provides clinically useful estimates of surgical risk for individual procedures. This information can be measured in an entirely data-driven manner and substantially improves multifactorial models to predict postoperative complications. Copyright © 2014 Elsevier Inc. All rights reserved.
Watson, Robert A
2014-08-01
To test the hypothesis that machine learning algorithms increase the predictive power to classify surgical expertise using surgeons' hand motion patterns. In 2012 at the University of North Carolina at Chapel Hill, 14 surgical attendings and 10 first- and second-year surgical residents each performed two bench model venous anastomoses. During the simulated tasks, the participants wore an inertial measurement unit on the dorsum of their dominant (right) hand to capture their hand motion patterns. The pattern from each bench model task performed was preprocessed into a symbolic time series and labeled as expert (attending) or novice (resident). The labeled hand motion patterns were processed and used to train a Support Vector Machine (SVM) classification algorithm. The trained algorithm was then tested for discriminative/predictive power against unlabeled (blinded) hand motion patterns from tasks not used in the training. The Lempel-Ziv (LZ) complexity metric was also measured from each hand motion pattern, with an optimal threshold calculated to separately classify the patterns. The LZ metric classified unlabeled (blinded) hand motion patterns into expert and novice groups with an accuracy of 70% (sensitivity 64%, specificity 80%). The SVM algorithm had an accuracy of 83% (sensitivity 86%, specificity 80%). The results confirmed the hypothesis. The SVM algorithm increased the predictive power to classify blinded surgical hand motion patterns into expert versus novice groups. With further development, the system used in this study could become a viable tool for low-cost, objective assessment of procedural proficiency in a competency-based curriculum.
Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification
Huang, Lingkang; Zhang, Hao Helen; Zeng, Zhao-Bang; Bushel, Pierre R.
2013-01-01
Background Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability: The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html. PMID:23966761
Mapping Winter Wheat with Multi-Temporal SAR and Optical Images in an Urban Agricultural Region
Zhou, Tao; Pan, Jianjun; Zhang, Peiyu; Wei, Shanbao; Han, Tao
2017-01-01
Winter wheat is the second largest food crop in China. It is important to obtain reliable winter wheat acreage to guarantee the food security for the most populous country in the world. This paper focuses on assessing the feasibility of in-season winter wheat mapping and investigating potential classification improvement by using SAR (Synthetic Aperture Radar) images, optical images, and the integration of both types of data in urban agricultural regions with complex planting structures in Southern China. Both SAR (Sentinel-1A) and optical (Landsat-8) data were acquired, and classification using different combinations of Sentinel-1A-derived information and optical images was performed using a support vector machine (SVM) and a random forest (RF) method. The interference coherence and texture images were obtained and used to assess the effect of adding them to the backscatter intensity images on the classification accuracy. The results showed that the use of four Sentinel-1A images acquired before the jointing period of winter wheat can provide satisfactory winter wheat classification accuracy, with an F1 measure of 87.89%. The combination of SAR and optical images for winter wheat mapping achieved the best F1 measure–up to 98.06%. The SVM was superior to RF in terms of the overall accuracy and the kappa coefficient, and was faster than RF, while the RF classifier was slightly better than SVM in terms of the F1 measure. In addition, the classification accuracy can be effectively improved by adding the texture and coherence images to the backscatter intensity data. PMID:28587066
Ecker, Christine; Marquand, Andre; Mourão-Miranda, Janaina; Johnston, Patrick; Daly, Eileen M; Brammer, Michael J; Maltezos, Stefanos; Murphy, Clodagh M; Robertson, Dene; Williams, Steven C; Murphy, Declan G M
2010-08-11
Autism spectrum disorder (ASD) is a neurodevelopmental condition with multiple causes, comorbid conditions, and a wide range in the type and severity of symptoms expressed by different individuals. This makes the neuroanatomy of autism inherently difficult to describe. Here, we demonstrate how a multiparameter classification approach can be used to characterize the complex and subtle structural pattern of gray matter anatomy implicated in adults with ASD, and to reveal spatially distributed patterns of discriminating regions for a variety of parameters describing brain anatomy. A set of five morphological parameters including volumetric and geometric features at each spatial location on the cortical surface was used to discriminate between people with ASD and controls using a support vector machine (SVM) analytic approach, and to find a spatially distributed pattern of regions with maximal classification weights. On the basis of these patterns, SVM was able to identify individuals with ASD at a sensitivity and specificity of up to 90% and 80%, respectively. However, the ability of individual cortical features to discriminate between groups was highly variable, and the discriminating patterns of regions varied across parameters. The classification was specific to ASD rather than neurodevelopmental conditions in general (e.g., attention deficit hyperactivity disorder). Our results confirm the hypothesis that the neuroanatomy of autism is truly multidimensional, and affects multiple and most likely independent cortical features. The spatial patterns detected using SVM may help further exploration of the specific genetic and neuropathological underpinnings of ASD, and provide new insights into the most likely multifactorial etiology of the condition.
A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.
Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip
2014-11-01
This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method.
CNN-SVM for Microvascular Morphological Type Recognition with Data Augmentation.
Xue, Di-Xiu; Zhang, Rong; Feng, Hui; Wang, Ya-Lei
2016-01-01
This paper focuses on the problem of feature extraction and the classification of microvascular morphological types to aid esophageal cancer detection. We present a patch-based system with a hybrid SVM model with data augmentation for intraepithelial papillary capillary loop recognition. A greedy patch-generating algorithm and a specialized CNN named NBI-Net are designed to extract hierarchical features from patches. We investigate a series of data augmentation techniques to progressively improve the prediction invariance of image scaling and rotation. For classifier boosting, SVM is used as an alternative to softmax to enhance generalization ability. The effectiveness of CNN feature representation ability is discussed for a set of widely used CNN models, including AlexNet, VGG-16, and GoogLeNet. Experiments are conducted on the NBI-ME dataset. The recognition rate is up to 92.74% on the patch level with data augmentation and classifier boosting. The results show that the combined CNN-SVM model beats models of traditional features with SVM as well as the original CNN with softmax. The synthesis results indicate that our system is able to assist clinical diagnosis to a certain extent.
NASA Astrophysics Data System (ADS)
Hu, Yan-Yan; Li, Dong-Sheng
2016-01-01
The hyperspectral images(HSI) consist of many closely spaced bands carrying the most object information. While due to its high dimensionality and high volume nature, it is hard to get satisfactory classification performance. In order to reduce HSI data dimensionality preparation for high classification accuracy, it is proposed to combine a band selection method of artificial immune systems (AIS) with a hybrid kernels support vector machine (SVM-HK) algorithm. In fact, after comparing different kernels for hyperspectral analysis, the approach mixed radial basis function kernel (RBF-K) with sigmoid kernel (Sig-K) and applied the optimized hybrid kernels in SVM classifiers. Then the SVM-HK algorithm used to induce the bands selection of an improved version of AIS. The AIS was composed of clonal selection and elite antibody mutation, including evaluation process with optional index factor (OIF). Experimental classification performance was on a San Diego Naval Base acquired by AVIRIS, the HRS dataset shows that the method is able to efficiently achieve bands redundancy removal while outperforming the traditional SVM classifier.
NASA Astrophysics Data System (ADS)
Li, Shaoxin; Zhang, Yanjiao; Xu, Junfa; Li, Linfang; Zeng, Qiuyao; Lin, Lin; Guo, Zhouyi; Liu, Zhiming; Xiong, Honglian; Liu, Songhao
2014-09-01
This study aims to present a noninvasive prostate cancer screening methods using serum surface-enhanced Raman scattering (SERS) and support vector machine (SVM) techniques through peripheral blood sample. SERS measurements are performed using serum samples from 93 prostate cancer patients and 68 healthy volunteers by silver nanoparticles. Three types of kernel functions including linear, polynomial, and Gaussian radial basis function (RBF) are employed to build SVM diagnostic models for classifying measured SERS spectra. For comparably evaluating the performance of SVM classification models, the standard multivariate statistic analysis method of principal component analysis (PCA) is also applied to classify the same datasets. The study results show that for the RBF kernel SVM diagnostic model, the diagnostic accuracy of 98.1% is acquired, which is superior to the results of 91.3% obtained from PCA methods. The receiver operating characteristic curve of diagnostic models further confirm above research results. This study demonstrates that label-free serum SERS analysis technique combined with SVM diagnostic algorithm has great potential for noninvasive prostate cancer screening.
Dong, Jian-Jun; Li, Qing-Liang; Yin, Hua; Zhong, Cheng; Hao, Jun-Guang; Yang, Pan-Fei; Tian, Yu-Hong; Jia, Shi-Ru
2014-10-15
Sensory evaluation is regarded as a necessary procedure to ensure a reproducible quality of beer. Meanwhile, high-throughput analytical methods provide a powerful tool to analyse various flavour compounds, such as higher alcohol and ester. In this study, the relationship between flavour compounds and sensory evaluation was established by non-linear models such as partial least squares (PLS), genetic algorithm back-propagation neural network (GA-BP), support vector machine (SVM). It was shown that SVM with a Radial Basis Function (RBF) had a better performance of prediction accuracy for both calibration set (94.3%) and validation set (96.2%) than other models. Relatively lower prediction abilities were observed for GA-BP (52.1%) and PLS (31.7%). In addition, the kernel function of SVM played an essential role of model training when the prediction accuracy of SVM with polynomial kernel function was 32.9%. As a powerful multivariate statistics method, SVM holds great potential to assess beer quality. Copyright © 2014 Elsevier Ltd. All rights reserved.
A Collaborative Framework for Distributed Privacy-Preserving Support Vector Machine Learning
Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates “privacy-insensitive” intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner. PMID:23304414
A Power Transformers Fault Diagnosis Model Based on Three DGA Ratios and PSO Optimization SVM
NASA Astrophysics Data System (ADS)
Ma, Hongzhe; Zhang, Wei; Wu, Rongrong; Yang, Chunyan
2018-03-01
In order to make up for the shortcomings of existing transformer fault diagnosis methods in dissolved gas-in-oil analysis (DGA) feature selection and parameter optimization, a transformer fault diagnosis model based on the three DGA ratios and particle swarm optimization (PSO) optimize support vector machine (SVM) is proposed. Using transforming support vector machine to the nonlinear and multi-classification SVM, establishing the particle swarm optimization to optimize the SVM multi classification model, and conducting transformer fault diagnosis combined with the cross validation principle. The fault diagnosis results show that the average accuracy of test method is better than the standard support vector machine and genetic algorithm support vector machine, and the proposed method can effectively improve the accuracy of transformer fault diagnosis is proved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Song, Shuaiwen; Fu, Haohuan
2014-08-16
Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. To address the challenges above, we designed and implemented MICSVM, a highly efficient parallel SVM for x86 based multi-core and many core architectures,more » such as the Intel Ivy Bridge CPUs and Intel Xeon Phi coprocessor (MIC).« less
A Comparison of Artificial Intelligence Methods on Determining Coronary Artery Disease
NASA Astrophysics Data System (ADS)
Babaoğlu, Ismail; Baykan, Ömer Kaan; Aygül, Nazif; Özdemir, Kurtuluş; Bayrak, Mehmet
The aim of this study is to show a comparison of multi-layered perceptron neural network (MLPNN) and support vector machine (SVM) on determination of coronary artery disease existence upon exercise stress testing (EST) data. EST and coronary angiography were performed on 480 patients with acquiring 23 verifying features from each. The robustness of the proposed methods is examined using classification accuracy, k-fold cross-validation method and Cohen's kappa coefficient. The obtained classification accuracies are approximately 78% and 79% for MLPNN and SVM respectively. Both MLPNN and SVM methods are rather satisfactory than human-based method looking to Cohen's kappa coefficients. Besides, SVM is slightly better than MLPNN when looking to the diagnostic accuracy, average of sensitivity and specificity, and also Cohen's kappa coefficient.
Xu, L; Cai, C B; Cui, H F; Ye, Z H; Yu, X P
2012-12-01
Rapid discrimination of pork in Halal and non-Halal Chinese ham sausages was developed by Fourier transform infrared (FTIR) spectrometry combined with chemometrics. Transmittance spectra ranging from 400 to 4000 cm⁻¹ of 73 Halal and 78 non-Halal Chinese ham sausages were measured. Sample preparation involved finely grinding of samples and formation of KBr disks (under 10 MPa for 5 min). The influence of data preprocessing methods including smoothing, taking derivatives and standard normal variate (SNV) on partial least squares discriminant analysis (PLSDA) and least squares support vector machine (LS-SVM) was investigated. The results indicate removal of spectral background and baseline plays an important role in discrimination. Taking derivatives, SNV can improve classification accuracy and reduce the complexity of PLSDA. Possibly due to the loss of detailed high-frequency spectral information, smoothing degrades the model performance. For the best models, the sensitivity and specificity was 0.913 and 0.929 for PLSDA with SNV spectra, 0.957 and 0.929 for LS-SVM with second derivative spectra, respectively. Copyright © 2012 Elsevier Ltd. All rights reserved.
Sikirzhytski, Vitali; Sikirzhytskaya, Aliaksandra; Lednev, Igor K
2012-10-10
Conventional confirmatory biochemical tests used in the forensic analysis of body fluid traces found at a crime scene are destructive and not universal. Recently, we reported on the application of near-infrared (NIR) Raman microspectroscopy for non-destructive confirmatory identification of pure blood, saliva, semen, vaginal fluid and sweat. Here we expand the method to include dry mixtures of semen and blood. A classification algorithm was developed for differentiating pure body fluids and their mixtures. The classification methodology is based on an effective combination of Support Vector Machine (SVM) regression (data selection) and SVM Discriminant Analysis of preprocessed experimental Raman spectra collected using an automatic mapping of the sample. This extensive cross-validation of the obtained results demonstrated that the detection limit of the minor contributor is as low as a few percent. The developed methodology can be further expanded to any binary mixture of complex solutions, including but not limited to mixtures of other body fluids. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Memory access in shared virtual memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berrendorf, R.
1992-01-01
Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berrendorf, R.
1992-09-01
Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Thomas, Minta; De Brabanter, Kris; De Moor, Bart
2014-05-10
DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies. We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity.
Classification of Regional Ionospheric Disturbances Based on Support Vector Machines
NASA Astrophysics Data System (ADS)
Begüm Terzi, Merve; Arikan, Feza; Arikan, Orhan; Karatay, Secil
2016-07-01
Ionosphere is an anisotropic, inhomogeneous, time varying and spatio-temporally dispersive medium whose parameters can be estimated almost always by using indirect measurements. Geomagnetic, gravitational, solar or seismic activities cause variations of ionosphere at various spatial and temporal scales. This complex spatio-temporal variability is challenging to be identified due to extensive scales in period, duration, amplitude and frequency of disturbances. Since geomagnetic and solar indices such as Disturbance storm time (Dst), F10.7 solar flux, Sun Spot Number (SSN), Auroral Electrojet (AE), Kp and W-index provide information about variability on a global scale, identification and classification of regional disturbances poses a challenge. The main aim of this study is to classify the regional effects of global geomagnetic storms and classify them according to their risk levels. For this purpose, Total Electron Content (TEC) estimated from GPS receivers, which is one of the major parameters of ionosphere, will be used to model the regional and local variability that differs from global activity along with solar and geomagnetic indices. In this work, for the automated classification of the regional disturbances, a classification technique based on a robust machine learning technique that have found wide spread use, Support Vector Machine (SVM) is proposed. SVM is a supervised learning model used for classification with associated learning algorithm that analyze the data and recognize patterns. In addition to performing linear classification, SVM can efficiently perform nonlinear classification by embedding data into higher dimensional feature spaces. Performance of the developed classification technique is demonstrated for midlatitude ionosphere over Anatolia using TEC estimates generated from the GPS data provided by Turkish National Permanent GPS Network (TNPGN-Active) for solar maximum year of 2011. As a result of implementing the developed classification technique to the Global Ionospheric Map (GIM) TEC data which is provided by the NASA Jet Propulsion Laboratory (JPL), it will be shown that SVM can be a suitable learning method to detect the anomalies in Total Electron Content (TEC) variations. This study is supported by TUBITAK 114E541 project as a part of the Scientific and Technological Research Projects Funding Program (1001).
Al-Qazzaz, Noor Kamal; Ali, Sawal Hamid Bin Mohd; Ahmad, Siti Anom; Islam, Mohd Shabiul; Escudero, Javier
2018-01-01
Stroke survivors are more prone to developing cognitive impairment and dementia. Dementia detection is a challenge for supporting personalized healthcare. This study analyzes the electroencephalogram (EEG) background activity of 5 vascular dementia (VaD) patients, 15 stroke-related patients with mild cognitive impairment (MCI), and 15 control healthy subjects during a working memory (WM) task. The objective of this study is twofold. First, it aims to enhance the discrimination of VaD, stroke-related MCI patients, and control subjects using fuzzy neighborhood preserving analysis with QR-decomposition (FNPAQR); second, it aims to extract and investigate the spectral features that characterize the post-stroke dementia patients compared to the control subjects. Nineteen channels were recorded and analyzed using the independent component analysis and wavelet analysis (ICA-WT) denoising technique. Using ANOVA, linear spectral power including relative powers (RP) and power ratio were calculated to test whether the EEG dominant frequencies were slowed down in VaD and stroke-related MCI patients. Non-linear features including permutation entropy (PerEn) and fractal dimension (FD) were used to test the degree of irregularity and complexity, which was significantly lower in patients with VaD and stroke-related MCI than that in control subjects (ANOVA; p ˂ 0.05). This study is the first to use fuzzy neighborhood preserving analysis with QR-decomposition (FNPAQR) dimensionality reduction technique with EEG background activity of dementia patients. The impairment of post-stroke patients was detected using support vector machine (SVM) and k-nearest neighbors (kNN) classifiers. A comparative study has been performed to check the effectiveness of using FNPAQR dimensionality reduction technique with the SVM and kNN classifiers. FNPAQR with SVM and kNN obtained 91.48 and 89.63% accuracy, respectively, whereas without using the FNPAQR exhibited 70 and 67.78% accuracy for SVM and kNN, respectively, in classifying VaD, stroke-related MCI, and control patients, respectively. Therefore, EEG could be a reliable index for inspecting concise markers that are sensitive to VaD and stroke-related MCI patients compared to control healthy subjects.
Bowd, Christopher; Medeiros, Felipe A.; Zhang, Zuohua; Zangwill, Linda M.; Hao, Jiucang; Lee, Te-Won; Sejnowski, Terrence J.; Weinreb, Robert N.; Goldbaum, Michael H.
2010-01-01
Purpose To classify healthy and glaucomatous eyes using relevance vector machine (RVM) and support vector machine (SVM) learning classifiers trained on retinal nerve fiber layer (RNFL) thickness measurements obtained by scanning laser polarimetry (SLP). Methods Seventy-two eyes of 72 healthy control subjects (average age = 64.3 ± 8.8 years, visual field mean deviation =−0.71 ± 1.2 dB) and 92 eyes of 92 patients with glaucoma (average age = 66.9 ± 8.9 years, visual field mean deviation =−5.32 ± 4.0 dB) were imaged with SLP with variable corneal compensation (GDx VCC; Laser Diagnostic Technologies, San Diego, CA). RVM and SVM learning classifiers were trained and tested on SLP-determined RNFL thickness measurements from 14 standard parameters and 64 sectors (approximately 5.6° each) obtained in the circumpapillary area under the instrument-defined measurement ellipse (total 78 parameters). Tenfold cross-validation was used to train and test RVM and SVM classifiers on unique subsets of the full 164-eye data set and areas under the receiver operating characteristic (AUROC) curve for the classification of eyes in the test set were generated. AUROC curve results from RVM and SVM were compared to those for 14 SLP software-generated global and regional RNFL thickness parameters. Also reported was the AUROC curve for the GDx VCC software-generated nerve fiber indicator (NFI). Results The AUROC curves for RVM and SVM were 0.90 and 0.91, respectively, and increased to 0.93 and 0.94 when the training sets were optimized with sequential forward and backward selection (resulting in reduced dimensional data sets). AUROC curves for optimized RVM and SVM were significantly larger than those for all individual SLP parameters. The AUROC curve for the NFI was 0.87. Conclusions Results from RVM and SVM trained on SLP RNFL thickness measurements are similar and provide accurate classification of glaucomatous and healthy eyes. RVM may be preferable to SVM, because it provides a Bayesian-derived probability of glaucoma as an output. These results suggest that these machine learning classifiers show good potential for glaucoma diagnosis. PMID:15790898
Prediction of toxic metals concentration using artificial intelligence techniques
NASA Astrophysics Data System (ADS)
Gholami, R.; Kamkar-Rouhani, A.; Doulati Ardejani, F.; Maleki, Sh.
2011-12-01
Groundwater and soil pollution are noted to be the worst environmental problem related to the mining industry because of the pyrite oxidation, and hence acid mine drainage generation, release and transport of the toxic metals. The aim of this paper is to predict the concentration of Ni and Fe using a robust algorithm named support vector machine (SVM). Comparison of the obtained results of SVM with those of the back-propagation neural network (BPNN) indicates that the SVM can be regarded as a proper algorithm for the prediction of toxic metals concentration due to its relative high correlation coefficient and the associated running time. As a matter of fact, the SVM method has provided a better prediction of the toxic metals Fe and Ni and resulted the running time faster compared with that of the BPNN.
Extraction of prostatic lumina and automated recognition for prostatic calculus image using PCA-SVM.
Wang, Zhuocai; Xu, Xiangmin; Ding, Xiaojun; Xiao, Hui; Huang, Yusheng; Liu, Jian; Xing, Xiaofen; Wang, Hua; Liao, D Joshua
2011-01-01
Identification of prostatic calculi is an important basis for determining the tissue origin. Computation-assistant diagnosis of prostatic calculi may have promising potential but is currently still less studied. We studied the extraction of prostatic lumina and automated recognition for calculus images. Extraction of lumina from prostate histology images was based on local entropy and Otsu threshold recognition using PCA-SVM and based on the texture features of prostatic calculus. The SVM classifier showed an average time 0.1432 second, an average training accuracy of 100%, an average test accuracy of 93.12%, a sensitivity of 87.74%, and a specificity of 94.82%. We concluded that the algorithm, based on texture features and PCA-SVM, can recognize the concentric structure and visualized features easily. Therefore, this method is effective for the automated recognition of prostatic calculi.
A RLS-SVM Aided Fusion Methodology for INS during GPS Outages
Yao, Yiqing; Xu, Xiaosu
2017-01-01
In order to maintain a relatively high accuracy of navigation performance during global positioning system (GPS) outages, a novel robust least squares support vector machine (LS-SVM)-aided fusion methodology is explored to provide the pseudo-GPS position information for the inertial navigation system (INS). The relationship between the yaw, specific force, velocity, and the position increment is modeled. Rather than share the same weight in the traditional LS-SVM, the proposed algorithm allocates various weights for different data, which makes the system immune to the outliers. Field test data was collected to evaluate the proposed algorithm. The comparison results indicate that the proposed algorithm can effectively provide position corrections for standalone INS during the 300 s GPS outage, which outperforms the traditional LS-SVM method. Historical information is also involved to better represent the vehicle dynamics. PMID:28245549
A RLS-SVM Aided Fusion Methodology for INS during GPS Outages.
Yao, Yiqing; Xu, Xiaosu
2017-02-24
In order to maintain a relatively high accuracy of navigation performance during global positioning system (GPS) outages, a novel robust least squares support vector machine (LS-SVM)-aided fusion methodology is explored to provide the pseudo-GPS position information for the inertial navigation system (INS). The relationship between the yaw, specific force, velocity, and the position increment is modeled. Rather than share the same weight in the traditional LS-SVM, the proposed algorithm allocates various weights for different data, which makes the system immune to the outliers. Field test data was collected to evaluate the proposed algorithm. The comparison results indicate that the proposed algorithm can effectively provide position corrections for standalone INS during the 300 s GPS outage, which outperforms the traditional LS-SVM method. Historical information is also involved to better represent the vehicle dynamics.
A Support Vector Machine-Based Gender Identification Using Speech Signal
NASA Astrophysics Data System (ADS)
Lee, Kye-Hwan; Kang, Sang-Ick; Kim, Deok-Hwan; Chang, Joon-Hyuk
We propose an effective voice-based gender identification method using a support vector machine (SVM). The SVM is a binary classification algorithm that classifies two groups by finding the voluntary nonlinear boundary in a feature space and is known to yield high classification performance. In the present work, we compare the identification performance of the SVM with that of a Gaussian mixture model (GMM)-based method using the mel frequency cepstral coefficients (MFCC). A novel approach of incorporating a features fusion scheme based on a combination of the MFCC and the fundamental frequency is proposed with the aim of improving the performance of gender identification. Experimental results demonstrate that the gender identification performance using the SVM is significantly better than that of the GMM-based scheme. Moreover, the performance is substantially improved when the proposed features fusion technique is applied.
Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil
2014-09-07
Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Kumar, Deepak; Thakur, Manoj; Dubey, Chandra S.; Shukla, Dericks P.
2017-10-01
In recent years, various machine learning techniques have been applied for landslide susceptibility mapping. In this study, three different variants of support vector machine viz., SVM, Proximal Support Vector Machine (PSVM) and L2-Support Vector Machine - Modified Finite Newton (L2-SVM-MFN) have been applied on the Mandakini River Basin in Uttarakhand, India to carry out the landslide susceptibility mapping. Eight thematic layers such as elevation, slope, aspect, drainages, geology/lithology, buffer of thrusts/faults, buffer of streams and soil along with the past landslide data were mapped in GIS environment and used for landslide susceptibility mapping in MATLAB. The study area covering 1625 km2 has merely 0.11% of area under landslides. There are 2009 pixels for past landslides out of which 50% (1000) landslides were considered as training set while remaining 50% as testing set. The performance of these techniques has been evaluated and the computational results show that L2-SVM-MFN obtains higher prediction values (0.829) of receiver operating characteristic curve (AUC-area under the curve) as compared to 0.807 for PSVM model and 0.79 for SVM. The results obtained from L2-SVM-MFN model are found to be superior than other SVM prediction models and suggest the usefulness of this technique to problem of landslide susceptibility mapping where training data is very less. However, these techniques can be used for satisfactory determination of susceptible zones with these inputs.
NASA Technical Reports Server (NTRS)
Forman, Barton A.; Reichle, Rolf Helmut
2014-01-01
A support vector machine (SVM), a machine learning technique developed from statistical learning theory, is employed for the purpose of estimating passive microwave (PMW) brightness temperatures over snow-covered land in North America as observed by the Advanced Microwave Scanning Radiometer (AMSR-E) satellite sensor. The capability of the trained SVM is compared relative to the artificial neural network (ANN) estimates originally presented in [14]. The results suggest the SVM outperforms the ANN at 10.65 GHz, 18.7 GHz, and 36.5 GHz for both vertically and horizontally-polarized PMW radiation. When compared against daily AMSR-E measurements not used during the training procedure and subsequently averaged across the North American domain over the 9-year study period, the root mean squared error in the SVM output is 8 K or less while the anomaly correlation coefficient is 0.7 or greater. When compared relative to the results from the ANN at any of the six frequency and polarization combinations tested, the root mean squared error was reduced by more than 18 percent while the anomaly correlation coefficient was increased by more than 52 percent. Further, the temporal and spatial variability in the modeled brightness temperatures via the SVM more closely agrees with that found in the original AMSR-E measurements. These findings suggest the SVM is a superior alternative to the ANN for eventual use as a measurement operator within a data assimilation framework.
Liu, Kang; Chi, Shuyan; Liu, Hongyu; Dong, Xiaohui; Yang, Qihui; Zhang, Shuang; Tan, Beiping
2015-08-01
In the present study, juvenile cobia, Rachycentron canadum L. were fed diets contaminated by two different sources of cadmium: squid viscera meal (SVM-Cd, organic form) and cadmium chloride (CdCl2-Cd, inorganic form). The Cd concentrations in fish diet were approximate 3.0, 5.0 and 10.0mg Cd kg(-1) for both inorganic and organic forms. In the control diet (0.312mg Cd kg(-1) diet, Cd mainly come from fish meal), no cadmium was added. The experiment lasted for 16 weeks and a statistically significant inverse relationship was observed between specific growth rate (SGR) and the concentration of dietary Cd. The SGR of cobia fed a diet with SVM-Cd increased at the lowest doses and decreased with the increasing level of dietary SVM. Fish fed diet contaminated SVM-Cd had significantly higher SGR than those fed diets contaminated CdCl2-Cd among the high Cd level diets treatments. The dietary Cd levels also significantly affected the survival rate of the fish. Among the hematological characteristics and plasma constituents, glutamic-pyruvic transaminase activities and alkaline phosphatase activities in serum and liver increased and hepatic superoxide dismutase activity decreased with the increasing dietary Cd levels. The cobia fed diet contaminated by high level of CdCl2-Cd had significantly higher ALP activity than cobia fed diet contaminated by high level of SVM-Cd. The results from these studies indicate no differences in toxicity response to dietborne SVM-Cd and CdCl2-Cd at a low level of Cd. However, at a higher level, cobia was more sensitive to dietborne CdCl2-Cd than SVM-Cd. Based on quadratic regression of SGR, The Cd concentrations was 3.617mg kg(-1) in the optimal diet, Cd source was SVM (126mg Cd kg(-1) in SVM) which stimulate the growth of cobia and the added level was determined to be 26.7g kg(-1) diet in the present study. Cd accumulations in the kidney of cobia fed both types of Cd were higher than other tissues, and the order of Cd accumulation in tissues were kidney>liver>intestine>gill>muscle. Iron accumulation in liver and kidney and calcium accumulation in vertebra and scale were also significantly affected by dietary Cd levels. Copyright © 2015 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matsui, T; Ohki, M; Nakamura, T
Purpose: Sjoegren's syndrome (SS) is an autoimmune disease invading mainly salivary and lacrimal glands. Ultrasonography is used for an initial and non-invasive examination of this disease. However, the ultrasonography diagnosis tends to lack in objectivity and depends on the operator's skills. The purpose of this study is to propose a computer-aided diagnosis (CAD) system for SS based on a dual-tree complex wavelet transform (DT-CWT) and machine learning. Methods: The subjects of this study were 174 patients suspected of having SS at Nagasaki University Hospital and examined with ultrasonography of the parotid glands. Out of these patients, 77 patients were diagnosedmore » with SS by sialography. A region of interest (ROI) of 128 × 128 pixels was set within the parotid gland that was indicated by a dental radiologist. The DT-CWT was applied to the images in the ROI and every image was decomposed into 72 sub-images of the real and imaginary components in six different resolution levels and six orientations. The statistical features of the sub-image were calculated and used as data input for the support vector machine (SVM) classifier for the detection of SS. A ten-fold cross-validation was employed to verify the Resultof SVM. The accuracy of diagnosis was compared by a CAD system with a human observer performance. Results: The sensitivity, specificity, and accuracy in the detection of SS were 95%, 86%, and 91% through our CAD system respectively, while those by a human observer were 84%, 81%, and 83% respectively. Conclusion: The proposed computer-aided diagnosis system for Sjoegren's syndrome in ultrasonography based on dual-tree complex wavelet transform had a better performance than a human observer.« less
Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang
2018-06-14
Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.
Hyperspectral recognition of processing tomato early blight based on GA and SVM
NASA Astrophysics Data System (ADS)
Yin, Xiaojun; Zhao, SiFeng
2013-03-01
Processing tomato early blight seriously affect the yield and quality of its.Determine the leaves spectrum of different disease severity level of processing tomato early blight.We take the sensitive bands of processing tomato early blight as support vector machine input vector.Through the genetic algorithm(GA) to optimize the parameters of SVM, We could recognize different disease severity level of processing tomato early blight.The result show:the sensitive bands of different disease severity levels of processing tomato early blight is 628-643nm and 689-692nm.The sensitive bands are as the GA and SVM input vector.We get the best penalty parameters is 0.129 and kernel function parameters is 3.479.We make classification training and testing by polynomial nuclear,radial basis function nuclear,Sigmoid nuclear.The best classification model is the radial basis function nuclear of SVM. Training accuracy is 84.615%,Testing accuracy is 80.681%.It is combined GA and SVM to achieve multi-classification of processing tomato early blight.It is provided the technical support of prediction processing tomato early blight occurrence, development and diffusion rule in large areas.
An intelligent framework for medical image retrieval using MDCT and multi SVM.
Balan, J A Alex Rajju; Rajan, S Edward
2014-01-01
Volumes of medical images are rapidly generated in medical field and to manage them effectively has become a great challenge. This paper studies the development of innovative medical image retrieval based on texture features and accuracy. The objective of the paper is to analyze the image retrieval based on diagnosis of healthcare management systems. This paper traces the development of innovative medical image retrieval to estimate both the image texture features and accuracy. The texture features of medical images are extracted using MDCT and multi SVM. Both the theoretical approach and the simulation results revealed interesting observations and they were corroborated using MDCT coefficients and SVM methodology. All attempts to extract the data about the image in response to the query has been computed successfully and perfect image retrieval performance has been obtained. Experimental results on a database of 100 trademark medical images show that an integrated texture feature representation results in 98% of the images being retrieved using MDCT and multi SVM. Thus we have studied a multiclassification technique based on SVM which is prior suitable for medical images. The results show the retrieval accuracy of 98%, 99% for different sets of medical images with respect to the class of image.
Interpreting support vector machine models for multivariate group wise analysis in neuroimaging
Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos
2015-01-01
Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913
LMD Based Features for the Automatic Seizure Detection of EEG Signals Using SVM.
Zhang, Tao; Chen, Wanzhong
2017-08-01
Achieving the goal of detecting seizure activity automatically using electroencephalogram (EEG) signals is of great importance and significance for the treatment of epileptic seizures. To realize this aim, a newly-developed time-frequency analytical algorithm, namely local mean decomposition (LMD), is employed in the presented study. LMD is able to decompose an arbitrary signal into a series of product functions (PFs). Primarily, the raw EEG signal is decomposed into several PFs, and then the temporal statistical and non-linear features of the first five PFs are calculated. The features of each PF are fed into five classifiers, including back propagation neural network (BPNN), K-nearest neighbor (KNN), linear discriminant analysis (LDA), un-optimized support vector machine (SVM) and SVM optimized by genetic algorithm (GA-SVM), for five classification cases, respectively. Confluent features of all PFs and raw EEG are further passed into the high-performance GA-SVM for the same classification tasks. Experimental results on the international public Bonn epilepsy EEG dataset show that the average classification accuracy of the presented approach are equal to or higher than 98.10% in all the five cases, and this indicates the effectiveness of the proposed approach for automated seizure detection.
Using evolutionary computation to optimize an SVM used in detecting buried objects in FLIR imagery
NASA Astrophysics Data System (ADS)
Paino, Alex; Popescu, Mihail; Keller, James M.; Stone, Kevin
2013-06-01
In this paper we describe an approach for optimizing the parameters of a Support Vector Machine (SVM) as part of an algorithm used to detect buried objects in forward looking infrared (FLIR) imagery captured by a camera installed on a moving vehicle. The overall algorithm consists of a spot-finding procedure (to look for potential targets) followed by the extraction of several features from the neighborhood of each spot. The features include local binary pattern (LBP) and histogram of oriented gradients (HOG) as these are good at detecting texture classes. Finally, we project and sum each hit into UTM space along with its confidence value (obtained from the SVM), producing a confidence map for ROC analysis. In this work, we use an Evolutionary Computation Algorithm (ECA) to optimize various parameters involved in the system, such as the combination of features used, parameters on the Canny edge detector, the SVM kernel, and various HOG and LBP parameters. To validate our approach, we compare results obtained from an SVM using parameters obtained through our ECA technique with those previously selected by hand through several iterations of "guess and check".
2018-01-01
Early detection of power transformer fault is important because it can reduce the maintenance cost of the transformer and it can ensure continuous electricity supply in power systems. Dissolved Gas Analysis (DGA) technique is commonly used to identify oil-filled power transformer fault type but utilisation of artificial intelligence method with optimisation methods has shown convincing results. In this work, a hybrid support vector machine (SVM) with modified evolutionary particle swarm optimisation (EPSO) algorithm was proposed to determine the transformer fault type. The superiority of the modified PSO technique with SVM was evaluated by comparing the results with the actual fault diagnosis, unoptimised SVM and previous reported works. Data reduction was also applied using stepwise regression prior to the training process of SVM to reduce the training time. It was found that the proposed hybrid SVM-Modified EPSO (MEPSO)-Time Varying Acceleration Coefficient (TVAC) technique results in the highest correct identification percentage of faults in a power transformer compared to other PSO algorithms. Thus, the proposed technique can be one of the potential solutions to identify the transformer fault type based on DGA data on site. PMID:29370230
Illias, Hazlee Azil; Zhao Liang, Wee
2018-01-01
Early detection of power transformer fault is important because it can reduce the maintenance cost of the transformer and it can ensure continuous electricity supply in power systems. Dissolved Gas Analysis (DGA) technique is commonly used to identify oil-filled power transformer fault type but utilisation of artificial intelligence method with optimisation methods has shown convincing results. In this work, a hybrid support vector machine (SVM) with modified evolutionary particle swarm optimisation (EPSO) algorithm was proposed to determine the transformer fault type. The superiority of the modified PSO technique with SVM was evaluated by comparing the results with the actual fault diagnosis, unoptimised SVM and previous reported works. Data reduction was also applied using stepwise regression prior to the training process of SVM to reduce the training time. It was found that the proposed hybrid SVM-Modified EPSO (MEPSO)-Time Varying Acceleration Coefficient (TVAC) technique results in the highest correct identification percentage of faults in a power transformer compared to other PSO algorithms. Thus, the proposed technique can be one of the potential solutions to identify the transformer fault type based on DGA data on site.
Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram
2015-08-01
In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.
Liu, X H; Song, H Y; Zhang, J X; Han, B C; Wei, X N; Ma, X H; Cui, W K; Chen, Y Z
2010-05-17
Histone deacetylase inhibitors (HDACi) have been successfully used for the treatment of cancers and other diseases. Search for novel type ZBGs and development of non-hydroxamate HDACi has become a focus in current research. To complement this, it is desirable to explore a virtual screening (VS) tool capable of identifying different types of potential inhibitors from large compound libraries with high yields and low false-hit rates similar to HTS. This work explored the use of support vector machines (SVM) combined with our newly developed putative non-inhibitor generation method as such a tool. SVM trained by 702 pre-2008 hydroxamate HDACi and 64334 putative non-HDACi showed good yields and low false-hit rates in cross-validation test and independent test using 220 diverse types of HDACi reported since 2008. The SVM hit rates in scanning 13.56 M PubChem and 168K MDDR compounds are comparable to HTS rates. Further structural analysis of SVM virtual hits suggests its potential for identification of non-hydroxamate HDACi. From this analysis, a series of novel ZBG and cap groups were proposed for HDACi design. Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bhattacharyya, Pranab Jyoti; Agrawal, Shweta; Barkataky, Jogesh Chandra; Bhattacharyya, Anjan Kumar
2015-01-01
Insulation break in a permanent pacemaker lead is a rare long-term complication. We describe an elderly male with a VVIR pacemaker, who presented with an episode of presyncope more than 3 years after the initial implantation procedure, attributed to insulation break possibly caused by lead entrapment in components of the medial subclavicular musculotendinous complex (MSMC) and repeated compressive damage over time during ipsilateral arm movement requiring lead replacement. The differential diagnosis of a clinical presentation when pacing stimuli are present with failure to capture and the role of the MSMC in causing lead damage late after implantation are discussed. PMID:26995445
Fluoroquinolone-Gyrase-DNA Complexes
Mustaev, Arkady; Malik, Muhammad; Zhao, Xilin; Kurepina, Natalia; Luan, Gan; Oppegard, Lisa M.; Hiasa, Hiroshi; Marks, Kevin R.; Kerns, Robert J.; Berger, James M.; Drlica, Karl
2014-01-01
DNA gyrase and topoisomerase IV control bacterial DNA topology by breaking DNA, passing duplex DNA through the break, and then resealing the break. This process is subject to reversible corruption by fluoroquinolones, antibacterials that form drug-enzyme-DNA complexes in which the DNA is broken. The complexes, called cleaved complexes because of the presence of DNA breaks, have been crystallized and found to have the fluoroquinolone C-7 ring system facing the GyrB/ParE subunits. As expected from x-ray crystallography, a thiol-reactive, C-7-modified chloroacetyl derivative of ciprofloxacin (Cip-AcCl) formed cross-linked cleaved complexes with mutant GyrB-Cys466 gyrase as evidenced by resistance to reversal by both EDTA and thermal treatments. Surprisingly, cross-linking was also readily seen with complexes formed by mutant GyrA-G81C gyrase, thereby revealing a novel drug-gyrase interaction not observed in crystal structures. The cross-link between fluoroquinolone and GyrA-G81C gyrase correlated with exceptional bacteriostatic activity for Cip-AcCl with a quinolone-resistant GyrA-G81C variant of Escherichia coli and its Mycobacterium smegmatis equivalent (GyrA-G89C). Cip-AcCl-mediated, irreversible inhibition of DNA replication provided further evidence for a GyrA-drug cross-link. Collectively these data establish the existence of interactions between the fluoroquinolone C-7 ring and both GyrA and GyrB. Because the GyrA-Gly81 and GyrB-Glu466 residues are far apart (17 Å) in the crystal structure of cleaved complexes, two modes of quinolone binding must exist. The presence of two binding modes raises the possibility that multiple quinolone-enzyme-DNA complexes can form, a discovery that opens new avenues for exploring and exploiting relationships between drug structure and activity with type II DNA topoisomerases. PMID:24497635
Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.
2013-01-01
Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933
A Wavelet Support Vector Machine Combination Model for Singapore Tourist Arrival to Malaysia
NASA Astrophysics Data System (ADS)
Rafidah, A.; Shabri, Ani; Nurulhuda, A.; Suhaila, Y.
2017-08-01
In this study, wavelet support vector machine model (WSVM) is proposed and applied for monthly data Singapore tourist time series prediction. The WSVM model is combination between wavelet analysis and support vector machine (SVM). In this study, we have two parts, first part we compare between the kernel function and second part we compare between the developed models with single model, SVM. The result showed that kernel function linear better than RBF while WSVM outperform with single model SVM to forecast monthly Singapore tourist arrival to Malaysia.
Ahmadi, Hamed; Rodehutscord, Markus
2017-01-01
In the nutrition literature, there are several reports on the use of artificial neural network (ANN) and multiple linear regression (MLR) approaches for predicting feed composition and nutritive value, while the use of support vector machines (SVM) method as a new alternative approach to MLR and ANN models is still not fully investigated. The MLR, ANN, and SVM models were developed to predict metabolizable energy (ME) content of compound feeds for pigs based on the German energy evaluation system from analyzed contents of crude protein (CP), ether extract (EE), crude fiber (CF), and starch. A total of 290 datasets from standardized digestibility studies with compound feeds was provided from several institutions and published papers, and ME was calculated thereon. Accuracy and precision of developed models were evaluated, given their produced prediction values. The results revealed that the developed ANN [ R 2 = 0.95; root mean square error (RMSE) = 0.19 MJ/kg of dry matter] and SVM ( R 2 = 0.95; RMSE = 0.21 MJ/kg of dry matter) models produced better prediction values in estimating ME in compound feed than those produced by conventional MLR ( R 2 = 0.89; RMSE = 0.27 MJ/kg of dry matter). The developed ANN and SVM models produced better prediction values in estimating ME in compound feed than those produced by conventional MLR; however, there were not obvious differences between performance of ANN and SVM models. Thus, SVM model may also be considered as a promising tool for modeling the relationship between chemical composition and ME of compound feeds for pigs. To provide the readers and nutritionist with the easy and rapid tool, an Excel ® calculator, namely, SVM_ME_pig, was created to predict the metabolizable energy values in compound feeds for pigs using developed support vector machine model.
NASA Astrophysics Data System (ADS)
He, Zhibin; Wen, Xiaohu; Liu, Hu; Du, Jun
2014-02-01
Data driven models are very useful for river flow forecasting when the underlying physical relationships are not fully understand, but it is not clear whether these data driven models still have a good performance in the small river basin of semiarid mountain regions where have complicated topography. In this study, the potential of three different data driven methods, artificial neural network (ANN), adaptive neuro fuzzy inference system (ANFIS) and support vector machine (SVM) were used for forecasting river flow in the semiarid mountain region, northwestern China. The models analyzed different combinations of antecedent river flow values and the appropriate input vector has been selected based on the analysis of residuals. The performance of the ANN, ANFIS and SVM models in training and validation sets are compared with the observed data. The model which consists of three antecedent values of flow has been selected as the best fit model for river flow forecasting. To get more accurate evaluation of the results of ANN, ANFIS and SVM models, the four quantitative standard statistical performance evaluation measures, the coefficient of correlation (R), root mean squared error (RMSE), Nash-Sutcliffe efficiency coefficient (NS) and mean absolute relative error (MARE), were employed to evaluate the performances of various models developed. The results indicate that the performance obtained by ANN, ANFIS and SVM in terms of different evaluation criteria during the training and validation period does not vary substantially; the performance of the ANN, ANFIS and SVM models in river flow forecasting was satisfactory. A detailed comparison of the overall performance indicated that the SVM model performed better than ANN and ANFIS in river flow forecasting for the validation data sets. The results also suggest that ANN, ANFIS and SVM method can be successfully applied to establish river flow with complicated topography forecasting models in the semiarid mountain regions.
Novel Hybrid of LS-SVM and Kalman Filter for GPS/INS Integration
NASA Astrophysics Data System (ADS)
Xu, Zhenkai; Li, Yong; Rizos, Chris; Xu, Xiaosu
Integration of Global Positioning System (GPS) and Inertial Navigation System (INS) technologies can overcome the drawbacks of the individual systems. One of the advantages is that the integrated solution can provide continuous navigation capability even during GPS outages. However, bridging the GPS outages is still a challenge when Micro-Electro-Mechanical System (MEMS) inertial sensors are used. Methods being currently explored by the research community include applying vehicle motion constraints, optimal smoother, and artificial intelligence (AI) techniques. In the research area of AI, the neural network (NN) approach has been extensively utilised up to the present. In an NN-based integrated system, a Kalman filter (KF) estimates position, velocity and attitude errors, as well as the inertial sensor errors, to output navigation solutions while GPS signals are available. At the same time, an NN is trained to map the vehicle dynamics with corresponding KF states, and to correct INS measurements when GPS measurements are unavailable. To achieve good performance it is critical to select suitable quality and an optimal number of samples for the NN. This is sometimes too rigorous a requirement which limits real world application of NN-based methods.The support vector machine (SVM) approach is based on the structural risk minimisation principle, instead of the minimised empirical error principle that is commonly implemented in an NN. The SVM can avoid local minimisation and over-fitting problems in an NN, and therefore potentially can achieve a higher level of global performance. This paper focuses on the least squares support vector machine (LS-SVM), which can solve highly nonlinear and noisy black-box modelling problems. This paper explores the application of the LS-SVM to aid the GPS/INS integrated system, especially during GPS outages. The paper describes the principles of the LS-SVM and of the KF hybrid method, and introduces the LS-SVM regression algorithm. Field test data is processed to evaluate the performance of the proposed approach.
Ma, X H; Wang, R; Tan, C Y; Jiang, Y Y; Lu, T; Rao, H B; Li, X Y; Go, M L; Low, B C; Chen, Y Z
2010-10-04
Multitarget agents have been increasingly explored for enhancing efficacy and reducing countertarget activities and toxicities. Efficient virtual screening (VS) tools for searching selective multitarget agents are desired. Combinatorial support vector machines (C-SVM) were tested as VS tools for searching dual-inhibitors of 11 combinations of 9 anticancer kinase targets (EGFR, VEGFR, PDGFR, Src, FGFR, Lck, CDK1, CDK2, GSK3). C-SVM trained on 233-1,316 non-dual-inhibitors correctly identified 26.8%-57.3% (majority >36%) of the 56-230 intra-kinase-group dual-inhibitors (equivalent to the 50-70% yields of two independent individual target VS tools), and 12.2% of the 41 inter-kinase-group dual-inhibitors. C-SVM were fairly selective in misidentifying as dual-inhibitors 3.7%-48.1% (majority <20%) of the 233-1,316 non-dual-inhibitors of the same kinase pairs and 0.98%-4.77% of the 3,971-5,180 inhibitors of other kinases. C-SVM produced low false-hit rates in misidentifying as dual-inhibitors 1,746-4,817 (0.013%-0.036%) of the 13.56 M PubChem compounds, 12-175 (0.007%-0.104%) of the 168 K MDDR compounds, and 0-84 (0.0%-2.9%) of the 19,495-38,483 MDDR compounds similar to the known dual-inhibitors. C-SVM was compared to other VS methods Surflex-Dock, DOCK Blaster, kNN and PNN against the same sets of kinase inhibitors and the full set or subset of the 1.02 M Zinc clean-leads data set. C-SVM produced comparable dual-inhibitor yields, slightly better false-hit rates for kinase inhibitors, and significantly lower false-hit rates for the Zinc clean-leads data set. Combinatorial SVM showed promising potential for searching selective multitarget agents against intra-kinase-group kinases without explicit knowledge of multitarget agents.
Kumar, Pankaj; Ma, Xiaohua; Liu, Xianghui; Jia, Jia; Bucong, Han; Xue, Ying; Li, Ze Rong; Yang, Sheng Yong; Wei, Yu Quan; Chen, Yu Zong
2011-05-01
Various in vitro and in-silico methods have been used for drug genotoxicity tests, which show limited genotoxicity (GT+) and non-genotoxicity (GT-) identification rates. New methods and combinatorial approaches have been explored for enhanced collective identification capability. The rates of in-silco methods may be further improved by significantly diversified training data enriched by the large number of recently reported GT+ and GT- compounds, but a major concern is the increased noise levels arising from high false-positive rates of in vitro data. In this work, we evaluated the effect of training data size and noise level on the performance of support vector machines (SVM) method known to tolerate high noise levels in training data. Two SVMs of different diversity/noise levels were developed and tested. H-SVM trained by higher diversity higher noise data (GT+ in any in vivo or in vitro test) outperforms L-SVM trained by lower noise lower diversity data (GT+ in in vivo or Ames test only). H-SVM trained by 4,763 GT+ compounds reported before 2008 and 8,232 GT- compounds excluding clinical trial drugs correctly identified 81.6% of the 38 GT+ compounds reported since 2008, predicted 83.1% of the 2,008 clinical trial drugs as GT-, and 23.96% of 168 K MDDR and 27.23% of 17.86M PubChem compounds as GT+. These are comparable to the 43.1-51.9% GT+ and 75-93% GT- rates of existing in-silico methods, 58.8% GT+ and 79% GT- rates of Ames method, and the estimated percentages of 23% in vivo and 31-33% in vitro GT+ compounds in the "universe of chemicals". There is a substantial level of agreement between H-SVM and L-SVM predicted GT+ and GT- MDDR compounds and the prediction from TOPKAT. SVM showed good potential in identifying GT+ compounds from large compound libraries based on higher diversity and higher noise training data.
NASA Astrophysics Data System (ADS)
Kalantar, B.; Mansor, S.; Khuzaimah, Z.; Sameen, M. Ibrahim; Pradhan, B.
2017-09-01
Knowledge of surface albedo at individual roof scale is important for mitigating urban heat islands and understanding urban climate change. This study presents a method for quantifying surface albedo of individual roofs in a complex urban area using the integration of Landsat 8 and airborne LiDAR data. First, individual roofs were extracted from airborne LiDAR data and orthophotos using optimized segmentation and supervised object based image analysis (OBIA). Support vector machine (SVM) was used as a classifier in OBIA process for extracting individual roofs. The user-defined parameters required in SVM classifier were selected using v-fold cross validation method. After that, surface albedo was calculated for each individual roof from Landsat images. Finally, thematic maps of mean surface albedo of individual roofs were generated in GIS and the results were discussed. Results showed that the study area is covered by 35% of buildings varying in roofing material types and conditions. The calculated surface albedo of buildings ranged from 0.16 to 0.65 in the study area. More importantly, the results indicated that the types and conditions of roofing materials significantly effect on the mean value of surface albedo. Mean albedo of new concrete, old concrete, new steel, and old steel were found to be equal to 0.38, 0.26, 0.51, and 0.44 respectively. Replacing old roofing materials with new ones should highly prioritized.
Machine learning modelling for predicting soil liquefaction susceptibility
NASA Astrophysics Data System (ADS)
Samui, P.; Sitharam, T. G.
2011-01-01
This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N1)60] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [(N1)60 and peck ground acceleration (amax/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.
Extraction of Prostatic Lumina and Automated Recognition for Prostatic Calculus Image Using PCA-SVM
Wang, Zhuocai; Xu, Xiangmin; Ding, Xiaojun; Xiao, Hui; Huang, Yusheng; Liu, Jian; Xing, Xiaofen; Wang, Hua; Liao, D. Joshua
2011-01-01
Identification of prostatic calculi is an important basis for determining the tissue origin. Computation-assistant diagnosis of prostatic calculi may have promising potential but is currently still less studied. We studied the extraction of prostatic lumina and automated recognition for calculus images. Extraction of lumina from prostate histology images was based on local entropy and Otsu threshold recognition using PCA-SVM and based on the texture features of prostatic calculus. The SVM classifier showed an average time 0.1432 second, an average training accuracy of 100%, an average test accuracy of 93.12%, a sensitivity of 87.74%, and a specificity of 94.82%. We concluded that the algorithm, based on texture features and PCA-SVM, can recognize the concentric structure and visualized features easily. Therefore, this method is effective for the automated recognition of prostatic calculi. PMID:21461364
NASA Astrophysics Data System (ADS)
Xu, Lili; Luo, Shuqian
2010-11-01
Microaneurysms (MAs) are the first manifestations of the diabetic retinopathy (DR) as well as an indicator for its progression. Their automatic detection plays a key role for both mass screening and monitoring and is therefore in the core of any system for computer-assisted diagnosis of DR. The algorithm basically comprises the following stages: candidate detection aiming at extracting the patterns possibly corresponding to MAs based on mathematical morphological black top hat, feature extraction to characterize these candidates, and classification based on support vector machine (SVM), to validate MAs. Feature vector and kernel function of SVM selection is very important to the algorithm. We use the receiver operating characteristic (ROC) curve to evaluate the distinguishing performance of different feature vectors and different kernel functions of SVM. The ROC analysis indicates the quadratic polynomial SVM with a combination of features as the input shows the best discriminating performance.
Xu, Lili; Luo, Shuqian
2010-01-01
Microaneurysms (MAs) are the first manifestations of the diabetic retinopathy (DR) as well as an indicator for its progression. Their automatic detection plays a key role for both mass screening and monitoring and is therefore in the core of any system for computer-assisted diagnosis of DR. The algorithm basically comprises the following stages: candidate detection aiming at extracting the patterns possibly corresponding to MAs based on mathematical morphological black top hat, feature extraction to characterize these candidates, and classification based on support vector machine (SVM), to validate MAs. Feature vector and kernel function of SVM selection is very important to the algorithm. We use the receiver operating characteristic (ROC) curve to evaluate the distinguishing performance of different feature vectors and different kernel functions of SVM. The ROC analysis indicates the quadratic polynomial SVM with a combination of features as the input shows the best discriminating performance.
Damage level prediction of non-reshaped berm breakwater using ANN, SVM and ANFIS models
NASA Astrophysics Data System (ADS)
Mandal, Sukomal; Rao, Subba; N., Harish; Lokesha
2012-06-01
The damage analysis of coastal structure is very important as it involves many design parameters to be considered for the better and safe design of structure. In the present study experimental data for non-reshaped berm breakwater are collected from Marine Structures Laboratory, Department of Applied Mechanics and Hydraulics, NITK, Surathkal, India. Soft computing techniques like Artificial Neural Network (ANN), Support Vector Machine (SVM) and Adaptive Neuro Fuzzy Inference system (ANFIS) models are constructed using experimental data sets to predict the damage level of non-reshaped berm breakwater. The experimental data are used to train ANN, SVM and ANFIS models and results are determined in terms of statistical measures like mean square error, root mean square error, correla-tion coefficient and scatter index. The result shows that soft computing techniques i.e., ANN, SVM and ANFIS can be efficient tools in predicting damage levels of non reshaped berm breakwater.
Shahlaei, Mohsen; Sabet, Razieh; Ziari, Maryam Bahman; Moeinifard, Behzad; Fassihi, Afshin; Karbakhsh, Reza
2010-10-01
Quantitative relationships between molecular structure and methionine aminopeptidase-2 inhibitory activity of a series of cytotoxic anthranilic acid sulfonamide derivatives were discovered. We have demonstrated the detailed application of two efficient nonlinear methods for evaluation of quantitative structure-activity relationships of the studied compounds. Components produced by principal component analysis as input of developed nonlinear models were used. The performance of the developed models namely PC-GRNN and PC-LS-SVM were tested by several validation methods. The resulted PC-LS-SVM model had a high statistical quality (R(2)=0.91 and R(CV)(2)=0.81) for predicting the cytotoxic activity of the compounds. Comparison between predictability of PC-GRNN and PC-LS-SVM indicates that later method has higher ability to predict the activity of the studied molecules. Copyright (c) 2010 Elsevier Masson SAS. All rights reserved.
A support vector machine based control application to the experimental three-tank system.
Iplikci, Serdar
2010-07-01
This paper presents a support vector machine (SVM) approach to generalized predictive control (GPC) of multiple-input multiple-output (MIMO) nonlinear systems. The possession of higher generalization potential and at the same time avoidance of getting stuck into the local minima have motivated us to employ SVM algorithms for modeling MIMO systems. Based on the SVM model, detailed and compact formulations for calculating predictions and gradient information, which are used in the computation of the optimal control action, are given in the paper. The proposed MIMO SVM-based GPC method has been verified on an experimental three-tank liquid level control system. Experimental results have shown that the proposed method can handle the control task successfully for different reference trajectories. Moreover, a detailed discussion on data gathering, model selection and effects of the control parameters have been given in this paper. 2010 ISA. Published by Elsevier Ltd. All rights reserved.
Support vector machines-based fault diagnosis for turbo-pump rotor
NASA Astrophysics Data System (ADS)
Yuan, Sheng-Fa; Chu, Fu-Lei
2006-05-01
Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.
Reid, Dylan A; Conlin, Michael P; Yin, Yandong; Chang, Howard H; Watanabe, Go; Lieber, Michael R; Ramsden, Dale A; Rothenberg, Eli
2017-02-28
The nonhomologous end-joining (NHEJ) pathway is the primary repair pathway for DNA double strand breaks (DSBs) in humans. Repair is mediated by a core complex of NHEJ factors that includes a ligase (DNA Ligase IV; L4) that relies on juxtaposition of 3΄ hydroxyl and 5΄ phosphate termini of the strand breaks for catalysis. However, chromosome breaks arising from biological sources often have different end chemistries, and how these different end chemistries impact the way in which the core complex directs the necessary transitions from end pairing to ligation is not known. Here, using single-molecule FRET (smFRET), we show that prior to ligation, differences in end chemistry strongly modulate the bridging of broken ends by the NHEJ core complex. In particular, the 5΄ phosphate group is a recognition element for L4 and is critical for the ability of NHEJ factors to promote stable pairing of ends. Moreover, other chemical incompatibilities, including products of aborted ligation, are sufficient to disrupt end pairing. Based on these observations, we propose a mechanism for iterative repair of DSBs by NHEJ. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
An Improved Memetic Algorithm for Break Scheduling
NASA Astrophysics Data System (ADS)
Widl, Magdalena; Musliu, Nysret
In this paper we consider solving a complex real life break scheduling problem. This problem of high practical relevance arises in many working areas, e.g. in air traffic control and other fields where supervision personnel is working. The objective is to assign breaks to employees such that various constraints reflecting legal demands or ergonomic criteria are satisfied and staffing requirement violations are minimised.
ERIC Educational Resources Information Center
Sweeney, Catherine; O'Sullivan, Eleanor; McCarthy, Marian
2015-01-01
Palliative care is a complex area of healthcare best delivered by an interdisciplinary team approach. Breaking bad news is an inherent part of caring for people with life-limiting conditions. This study aims to explore an interdisciplinary breaking bad news role-play in a palliative care module. Participants were undergraduate medical and nursing…
Classification of EEG Signals Based on Pattern Recognition Approach.
Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed
2017-01-01
Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a "pattern recognition" approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90-7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11-89.63% and 91.60-81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy.
Classification of EEG Signals Based on Pattern Recognition Approach
Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed
2017-01-01
Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a “pattern recognition” approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90–7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11–89.63% and 91.60–81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy. PMID:29209190
Improving precision of glomerular filtration rate estimating model by ensemble learning.
Liu, Xun; Li, Ningshan; Lv, Linsheng; Fu, Yongmei; Cheng, Cailian; Wang, Caixia; Ye, Yuqiu; Li, Shaomin; Lou, Tanqi
2017-11-09
Accurate assessment of kidney function is clinically important, but estimates of glomerular filtration rate (GFR) by regression are imprecise. We hypothesized that ensemble learning could improve precision. A total of 1419 participants were enrolled, with 1002 in the development dataset and 417 in the external validation dataset. GFR was independently estimated from age, sex and serum creatinine using an artificial neural network (ANN), support vector machine (SVM), regression, and ensemble learning. GFR was measured by 99mTc-DTPA renal dynamic imaging calibrated with dual plasma sample 99mTc-DTPA GFR. Mean measured GFRs were 70.0 ml/min/1.73 m 2 in the developmental and 53.4 ml/min/1.73 m 2 in the external validation cohorts. In the external validation cohort, precision was better in the ensemble model of the ANN, SVM and regression equation (IQR = 13.5 ml/min/1.73 m 2 ) than in the new regression model (IQR = 14.0 ml/min/1.73 m 2 , P < 0.001). The precision of ensemble learning was the best of the three models, but the models had similar bias and accuracy. The median difference ranged from 2.3 to 3.7 ml/min/1.73 m 2 , 30% accuracy ranged from 73.1 to 76.0%, and P was > 0.05 for all comparisons of the new regression equation and the other new models. An ensemble learning model including three variables, the average ANN, SVM, and regression equation values, was more precise than the new regression model. A more complex ensemble learning strategy may further improve GFR estimates.
Fiot, Jean-Baptiste; Cohen, Laurent D; Raniga, Parnesh; Fripp, Jurgen
2013-09-01
Support vector machines (SVM) are machine learning techniques that have been used for segmentation and classification of medical images, including segmentation of white matter hyper-intensities (WMH). Current approaches using SVM for WMH segmentation extract features from the brain and classify these followed by complex post-processing steps to remove false positives. The method presented in this paper combines advanced pre-processing, tissue-based feature selection and SVM classification to obtain efficient and accurate WMH segmentation. Features from 125 patients, generated from up to four MR modalities [T1-w, T2-w, proton-density and fluid attenuated inversion recovery(FLAIR)], differing neighbourhood sizes and the use of multi-scale features were compared. We found that although using all four modalities gave the best overall classification (average Dice scores of 0.54 ± 0.12, 0.72 ± 0.06 and 0.82 ± 0.06 respectively for small, moderate and severe lesion loads); this was not significantly different (p = 0.50) from using just T1-w and FLAIR sequences (Dice scores of 0.52 ± 0.13, 0.71 ± 0.08 and 0.81 ± 0.07). Furthermore, there was a negligible difference between using 5 × 5 × 5 and 3 × 3 × 3 features (p = 0.93). Finally, we show that careful consideration of features and pre-processing techniques not only saves storage space and computation time but also leads to more efficient classification, which outperforms the one based on all features with post-processing. Copyright © 2013 John Wiley & Sons, Ltd.
Zhang, Yu-xin; Cheng, Zhi-feng; Xu, Zheng-ping; Bai, Jing
2015-01-01
In order to solve the problems such as complex operation, consumption for the carrier gas and long test period in traditional power transformer fault diagnosis approach based on dissolved gas analysis (DGA), this paper proposes a new method which is detecting 5 types of characteristic gas content in transformer oil such as CH4, C2H2, C2H4, C2H6 and H2 based on photoacoustic Spectroscopy and C2H2/C2H4, CH4/H2, C2H4/C2H6 three-ratios data are calculated. The support vector machine model was constructed using cross validation method under five support vector machine functions and four kernel functions, heuristic algorithms were used in parameter optimization for penalty factor c and g, which to establish the best SVM model for the highest fault diagnosis accuracy and the fast computing speed. Particles swarm optimization and genetic algorithm two types of heuristic algorithms were comparative studied in this paper for accuracy and speed in optimization. The simulation result shows that SVM model composed of C-SVC, RBF kernel functions and genetic algorithm obtain 97. 5% accuracy in test sample set and 98. 333 3% accuracy in train sample set, and genetic algorithm was about two times faster than particles swarm optimization in computing speed. The methods described in this paper has many advantages such as simple operation, non-contact measurement, no consumption for the carrier gas, long test period, high stability and sensitivity, the result shows that the methods described in this paper can instead of the traditional transformer fault diagnosis by gas chromatography and meets the actual project needs in transformer fault diagnosis.
Spectral Reconstruction Based on Svm for Cross Calibration
NASA Astrophysics Data System (ADS)
Gao, H.; Ma, Y.; Liu, W.; He, H.
2017-05-01
Chinese HY-1C/1D satellites will use a 5nm/10nm-resolutional visible-near infrared(VNIR) hyperspectral sensor with the solar calibrator to cross-calibrate with other sensors. The hyperspectral radiance data are composed of average radiance in the sensor's passbands and bear a spectral smoothing effect, a transform from the hyperspectral radiance data to the 1-nm-resolution apparent spectral radiance by spectral reconstruction need to be implemented. In order to solve the problem of noise cumulation and deterioration after several times of iteration by the iterative algorithm, a novel regression method based on SVM is proposed, which can approach arbitrary complex non-linear relationship closely and provide with better generalization capability by learning. In the opinion of system, the relationship between the apparent radiance and equivalent radiance is nonlinear mapping introduced by spectral response function(SRF), SVM transform the low-dimensional non-linear question into high-dimensional linear question though kernel function, obtaining global optimal solution by virtue of quadratic form. The experiment is performed using 6S-simulated spectrums considering the SRF and SNR of the hyperspectral sensor, measured reflectance spectrums of water body and different atmosphere conditions. The contrastive result shows: firstly, the proposed method is with more reconstructed accuracy especially to the high-frequency signal; secondly, while the spectral resolution of the hyperspectral sensor reduces, the proposed method performs better than the iterative method; finally, the root mean square relative error(RMSRE) which is used to evaluate the difference of the reconstructed spectrum and the real spectrum over the whole spectral range is calculated, it decreses by one time at least by proposed method.
In-Vivo Imaging of Cell Migration Using Contrast Enhanced MRI and SVM Based Post-Processing.
Weis, Christian; Hess, Andreas; Budinsky, Lubos; Fabry, Ben
2015-01-01
The migration of cells within a living organism can be observed with magnetic resonance imaging (MRI) in combination with iron oxide nanoparticles as an intracellular contrast agent. This method, however, suffers from low sensitivity and specificty. Here, we developed a quantitative non-invasive in-vivo cell localization method using contrast enhanced multiparametric MRI and support vector machines (SVM) based post-processing. Imaging phantoms consisting of agarose with compartments containing different concentrations of cancer cells labeled with iron oxide nanoparticles were used to train and evaluate the SVM for cell localization. From the magnitude and phase data acquired with a series of T2*-weighted gradient-echo scans at different echo-times, we extracted features that are characteristic for the presence of superparamagnetic nanoparticles, in particular hyper- and hypointensities, relaxation rates, short-range phase perturbations, and perturbation dynamics. High detection quality was achieved by SVM analysis of the multiparametric feature-space. The in-vivo applicability was validated in animal studies. The SVM detected the presence of iron oxide nanoparticles in the imaging phantoms with high specificity and sensitivity with a detection limit of 30 labeled cells per mm3, corresponding to 19 μM of iron oxide. As proof-of-concept, we applied the method to follow the migration of labeled cancer cells injected in rats. The combination of iron oxide labeled cells, multiparametric MRI and a SVM based post processing provides high spatial resolution, specificity, and sensitivity, and is therefore suitable for non-invasive in-vivo cell detection and cell migration studies over prolonged time periods.
Mapping membrane activity in undiscovered peptide sequence space using machine learning
Fulan, Benjamin M.; Wong, Gerard C. L.
2016-01-01
There are some ∼1,100 known antimicrobial peptides (AMPs), which permeabilize microbial membranes but have diverse sequences. Here, we develop a support vector machine (SVM)-based classifier to investigate ⍺-helical AMPs and the interrelated nature of their functional commonality and sequence homology. SVM is used to search the undiscovered peptide sequence space and identify Pareto-optimal candidates that simultaneously maximize the distance σ from the SVM hyperplane (thus maximize its “antimicrobialness”) and its ⍺-helicity, but minimize mutational distance to known AMPs. By calibrating SVM machine learning results with killing assays and small-angle X-ray scattering (SAXS), we find that the SVM metric σ correlates not with a peptide’s minimum inhibitory concentration (MIC), but rather its ability to generate negative Gaussian membrane curvature. This surprising result provides a topological basis for membrane activity common to AMPs. Moreover, we highlight an important distinction between the maximal recognizability of a sequence to a trained AMP classifier (its ability to generate membrane curvature) and its maximal antimicrobial efficacy. As mutational distances are increased from known AMPs, we find AMP-like sequences that are increasingly difficult for nature to discover via simple mutation. Using the sequence map as a discovery tool, we find a unexpectedly diverse taxonomy of sequences that are just as membrane-active as known AMPs, but with a broad range of primary functions distinct from AMP functions, including endogenous neuropeptides, viral fusion proteins, topogenic peptides, and amyloids. The SVM classifier is useful as a general detector of membrane activity in peptide sequences. PMID:27849600
Liu, Xue-Mei; Zhang, Hai-Liang
2014-10-01
Ultraviolet/visible (UV/Vis) spectroscopy was studied for the rapid determination of chemical oxygen demand (COD), which was an indicator to measure the concentration of organic matter in aquaculture water. In order to reduce the influence of the absolute noises of the spectra, the extracted 135 absorbance spectra were preprocessed by Savitzky-Golay smoothing (SG), EMD, and wavelet transform (WT) methods. The preprocessed spectra were then used to select latent variables (LVs) by partial least squares (PLS) methods. Partial least squares (PLS) was used to build models with the full spectra, and back- propagation neural network (BPNN) and least square support vector machine (LS-SVM) were applied to build models with the selected LVs. The overall results showed that BPNN and LS-SVM models performed better than PLS models, and the LS-SVM models with LVs based on WT preprocessed spectra obtained the best results with the determination coefficient (r2) and RMSE being 0. 83 and 14. 78 mg · L(-1) for calibration set, and 0.82 and 14.82 mg · L(-1) for the prediction set respectively. The method showed the best performance in LS-SVM model. The results indicated that it was feasible to use UV/Vis with LVs which were obtained by PLS method, combined with LS-SVM calibration could be applied to the rapid and accurate determination of COD in aquaculture water. Moreover, this study laid the foundation for further implementation of online analysis of aquaculture water and rapid determination of other water quality parameters.
NASA Astrophysics Data System (ADS)
Guo, Yiqing; Jia, Xiuping; Paull, David
2018-06-01
The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.
Using oceanic-atmospheric oscillations for long lead time streamflow forecasting
NASA Astrophysics Data System (ADS)
Kalra, Ajay; Ahmad, Sajjad
2009-03-01
We present a data-driven model, Support Vector Machine (SVM), for long lead time streamflow forecasting using oceanic-atmospheric oscillations. The SVM is based on statistical learning theory that uses a hypothesis space of linear functions based on Kernel approach and has been used to predict a quantity forward in time on the basis of training from past data. The strength of SVM lies in minimizing the empirical classification error and maximizing the geometric margin by solving inverse problem. The SVM model is applied to three gages, i.e., Cisco, Green River, and Lees Ferry in the Upper Colorado River Basin in the western United States. Annual oceanic-atmospheric indices, comprising Pacific Decadal Oscillation (PDO), North Atlantic Oscillation (NAO), Atlantic Multidecadal Oscillation (AMO), and El Nino-Southern Oscillations (ENSO) for a period of 1906-2001 are used to generate annual streamflow volumes with 3 years lead time. The SVM model is trained with 86 years of data (1906-1991) and tested with 10 years of data (1992-2001). On the basis of correlation coefficient, root means square error, and Nash Sutcliffe Efficiency Coefficient the model shows satisfactory results, and the predictions are in good agreement with measured streamflow volumes. Sensitivity analysis, performed to evaluate the effect of individual and coupled oscillations, reveals a strong signal for ENSO and NAO indices as compared to PDO and AMO indices for the long lead time streamflow forecast. Streamflow predictions from the SVM model are found to be better when compared with the predictions obtained from feedforward back propagation artificial neural network model and linear regression.
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition
Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina
2007-01-01
Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145
Find a Physician from the Society for Vascular Medicine
... by SVM_tweets About SVM Event Calendar Practice Tools Case Study Education Journal Scientific Sessions Website FAQ Copyright © ... Choosing Wisely DVT Toolkit A-Fib Decision Making Tool Job Bank Case Study Current Case Case Archive Submission Guidelines Education ...
A Features Selection for Crops Classification
NASA Astrophysics Data System (ADS)
Liu, Yifan; Shao, Luyi; Yin, Qiang; Hong, Wen
2016-08-01
The components of the polarimetric target decomposition reflect the differences of target since they linked with the scattering properties of the target and can be imported into SVM as the classification features. The result of decomposition usually concentrate on part of the components. Selecting a combination of components can reduce the features that importing into the SVM. The features reduction can lead to less calculation and targeted classification of one target when we classify a multi-class area. In this research, we import different combinations of features into the SVM and find a better combination for classification with a data of AGRISAR.
NASA Astrophysics Data System (ADS)
Adhi Pradana, Wisnu; Adiwijaya; Novia Wisesty, Untari
2018-03-01
Support Vector Machine or commonly called SVM is one method that can be used to process the classification of a data. SVM classifies data from 2 different classes with hyperplane. In this study, the system was built using SVM to develop Arabic Speech Recognition. In the development of the system, there are 2 kinds of speakers that have been tested that is dependent speakers and independent speakers. The results from this system is an accuracy of 85.32% for speaker dependent and 61.16% for independent speakers.
a Comparison of Empirical and Inteligent Methods for Dust Detection Using Modis Satellite Data
NASA Astrophysics Data System (ADS)
Shahrisvand, M.; Akhoondzadeh, M.
2013-09-01
Nowadays, dust storm in one of the most important natural hazards which is considered as a national concern in scientific communities. This paper considers the capabilities of some classical and intelligent methods for dust detection from satellite imagery around the Middle East region. In the study of dust detection, MODIS images have been a good candidate due to their suitable spectral and temporal resolution. In this study, physical-based and intelligent methods including decision tree, ANN (Artificial Neural Network) and SVM (Support Vector Machine) have been applied to detect dust storms. Among the mentioned approaches, in this paper, SVM method has been implemented for the first time in domain of dust detection studies. Finally, AOD (Aerosol Optical Depth) images, which are one the referenced standard products of OMI (Ozone Monitoring Instrument) sensor, have been used to assess the accuracy of all the implemented methods. Since the SVM method can distinguish dust storm over lands and oceans simultaneously, therefore the accuracy of SVM method is achieved better than the other applied approaches. As a conclusion, this paper shows that SVM can be a powerful tool for production of dust images with remarkable accuracy in comparison with AOT (Aerosol Optical Thickness) product of NASA.
Automatic system for radar echoes filtering based on textural features and artificial intelligence
NASA Astrophysics Data System (ADS)
Hedir, Mehdia; Haddad, Boualem
2017-10-01
Among the very popular Artificial Intelligence (AI) techniques, Artificial Neural Network (ANN) and Support Vector Machine (SVM) have been retained to process Ground Echoes (GE) on meteorological radar images taken from Setif (Algeria) and Bordeaux (France) with different climates and topologies. To achieve this task, AI techniques were associated with textural approaches. We used Gray Level Co-occurrence Matrix (GLCM) and Completed Local Binary Pattern (CLBP); both methods were largely used in image analysis. The obtained results show the efficiency of texture to preserve precipitations forecast on both sites with the accuracy of 98% on Bordeaux and 95% on Setif despite the AI technique used. 98% of GE are suppressed with SVM, this rate is outperforming ANN skills. CLBP approach associated to SVM eliminates 98% of GE and preserves precipitations forecast on Bordeaux site better than on Setif's, while it exhibits lower accuracy with ANN. SVM classifier is well adapted to the proposed application since the average filtering rate is 95-98% with texture and 92-93% with CLBP. These approaches allow removing Anomalous Propagations (APs) too with a better accuracy of 97.15% with texture and SVM. In fact, textural features associated to AI techniques are an efficient tool for incoherent radars to surpass spurious echoes.
Schnyer, David M; Clasen, Peter C; Gonzalez, Christopher; Beevers, Christopher G
2017-06-30
Using MRI to diagnose mental disorders has been a long-term goal. Despite this, the vast majority of prior neuroimaging work has been descriptive rather than predictive. The current study applies support vector machine (SVM) learning to MRI measures of brain white matter to classify adults with Major Depressive Disorder (MDD) and healthy controls. In a precisely matched group of individuals with MDD (n =25) and healthy controls (n =25), SVM learning accurately (74%) classified patients and controls across a brain map of white matter fractional anisotropy values (FA). The study revealed three main findings: 1) SVM applied to DTI derived FA maps can accurately classify MDD vs. healthy controls; 2) prediction is strongest when only right hemisphere white matter is examined; and 3) removing FA values from a region identified by univariate contrast as significantly different between MDD and healthy controls does not change the SVM accuracy. These results indicate that SVM learning applied to neuroimaging data can classify the presence versus absence of MDD and that predictive information is distributed across brain networks rather than being highly localized. Finally, MDD group differences revealed through typical univariate contrasts do not necessarily reveal patterns that provide accurate predictive information. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Chen, Hongming; Carlsson, Lars; Eriksson, Mats; Varkonyi, Peter; Norinder, Ulf; Nilsson, Ingemar
2013-06-24
A novel methodology was developed to build Free-Wilson like local QSAR models by combining R-group signatures and the SVM algorithm. Unlike Free-Wilson analysis this method is able to make predictions for compounds with R-groups not present in a training set. Eleven public data sets were chosen as test cases for comparing the performance of our new method with several other traditional modeling strategies, including Free-Wilson analysis. Our results show that the R-group signature SVM models achieve better prediction accuracy compared with Free-Wilson analysis in general. Moreover, the predictions of R-group signature models are also comparable to the models using ECFP6 fingerprints and signatures for the whole compound. Most importantly, R-group contributions to the SVM model can be obtained by calculating the gradient for R-group signatures. For most of the studied data sets, a significant correlation with that of a corresponding Free-Wilson analysis is shown. These results suggest that the R-group contribution can be used to interpret bioactivity data and highlight that the R-group signature based SVM modeling method is as interpretable as Free-Wilson analysis. Hence the signature SVM model can be a useful modeling tool for any drug discovery project.
Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong
2016-01-01
Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.
NASA Astrophysics Data System (ADS)
Ali, Salah M.; Hui, K. H.; Hee, L. M.; Salman Leong, M.; Al-Obaidi, M. A.; Ali, Y. H.; Abdelrhman, Ahmed M.
2018-03-01
Acoustic emission (AE) analysis has become a vital tool for initiating the maintenance tasks in many industries. However, the analysis process and interpretation has been found to be highly dependent on the experts. Therefore, an automated monitoring method would be required to reduce the cost and time consumed in the interpretation of AE signal. This paper investigates the application of two of the most common machine learning approaches namely artificial neural network (ANN) and support vector machine (SVM) to automate the diagnosis of valve faults in reciprocating compressor based on AE signal parameters. Since the accuracy is an essential factor in any automated diagnostic system, this paper also provides a comparative study based on predictive performance of ANN and SVM. AE parameters data was acquired from single stage reciprocating air compressor with different operational and valve conditions. ANN and SVM diagnosis models were subsequently devised by combining AE parameters of different conditions. Results demonstrate that ANN and SVM models have the same results in term of prediction accuracy. However, SVM model is recommended to automate diagnose the valve condition in due to the ability of handling a high number of input features with low sampling data sets.
Density-Dependent Quantized Least Squares Support Vector Machine for Large Data Sets.
Nan, Shengyu; Sun, Lei; Chen, Badong; Lin, Zhiping; Toh, Kar-Ann
2017-01-01
Based on the knowledge that input data distribution is important for learning, a data density-dependent quantization scheme (DQS) is proposed for sparse input data representation. The usefulness of the representation scheme is demonstrated by using it as a data preprocessing unit attached to the well-known least squares support vector machine (LS-SVM) for application on big data sets. Essentially, the proposed DQS adopts a single shrinkage threshold to obtain a simple quantization scheme, which adapts its outputs to input data density. With this quantization scheme, a large data set is quantized to a small subset where considerable sample size reduction is generally obtained. In particular, the sample size reduction can save significant computational cost when using the quantized subset for feature approximation via the Nyström method. Based on the quantized subset, the approximated features are incorporated into LS-SVM to develop a data density-dependent quantized LS-SVM (DQLS-SVM), where an analytic solution is obtained in the primal solution space. The developed DQLS-SVM is evaluated on synthetic and benchmark data with particular emphasis on large data sets. Extensive experimental results show that the learning machine incorporating DQS attains not only high computational efficiency but also good generalization performance.
Determination of the carmine content based on spectrum fluorescence spectral and PSO-SVM
NASA Astrophysics Data System (ADS)
Wang, Shu-tao; Peng, Tao; Cheng, Qi; Wang, Gui-chuan; Kong, De-ming; Wang, Yu-tian
2018-03-01
Carmine is a widely used food pigment in various food and beverage additives. Excessive consumption of synthetic pigment shall do harm to body seriously. The food is generally associated with a variety of colors. Under the simulation context of various food pigments' coexistence, we adopted the technology of fluorescence spectroscopy, together with the PSO-SVM algorithm, so that to establish a method for the determination of carmine content in mixed solution. After analyzing the prediction results of PSO-SVM, we collected a bunch of data: the carmine average recovery rate was 100.84%, the root mean square error of prediction (RMSEP) for 1.03e-04, 0.999 for the correlation coefficient between the model output and the real value of the forecast. Compared with the prediction results of reverse transmission, the correlation coefficient of PSO-SVM was 2.7% higher, the average recovery rate for 0.6%, and the root mean square error was nearly one order of magnitude lower. According to the analysis results, it can effectively avoid the interference caused by pigment with the combination of the fluorescence spectrum technique and PSO-SVM, accurately determining the content of carmine in mixed solution with an effect better than that of BP.
Assessing the effect of a fuel break network to reduce burnt area and wildfire risk transmission
Tiago M. Oliveira; Ana M. G. Barros; Alan A. Ager; Paulo M. Fernandes
2016-01-01
Wildfires pose complex challenges to policymakers and fire agencies. Fuel break networks and area-wide fuel treatments are risk-management options to reduce losses from large fires. Two fuel management scenarios covering 3% of the fire-prone Algarve region of Portugal and differing in the intensity of treatment in 120-m wide fuel breaks were examined and compared with...
To Break it Down or Not Break it Down: That is the Question!
ERIC Educational Resources Information Center
Coker, Cheryl A.
2006-01-01
Learning a new skill, even a seemingly simple one, can be an overwhelming task for a beginner. A question often faced by the practitioner as a result is whether or not to break the skill into parts for initial practice. Skill complexity and skill organization interact to provide direction as to whether whole or part practice should be employed in…
Multicategory Composite Least Squares Classifiers
Park, Seo Young; Liu, Yufeng; Liu, Dacheng; Scholl, Paul
2010-01-01
Classification is a very useful statistical tool for information extraction. In particular, multicategory classification is commonly seen in various applications. Although binary classification problems are heavily studied, extensions to the multicategory case are much less so. In view of the increased complexity and volume of modern statistical problems, it is desirable to have multicategory classifiers that are able to handle problems with high dimensions and with a large number of classes. Moreover, it is necessary to have sound theoretical properties for the multicategory classifiers. In the literature, there exist several different versions of simultaneous multicategory Support Vector Machines (SVMs). However, the computation of the SVM can be difficult for large scale problems, especially for problems with large number of classes. Furthermore, the SVM cannot produce class probability estimation directly. In this article, we propose a novel efficient multicategory composite least squares classifier (CLS classifier), which utilizes a new composite squared loss function. The proposed CLS classifier has several important merits: efficient computation for problems with large number of classes, asymptotic consistency, ability to handle high dimensional data, and simple conditional class probability estimation. Our simulated and real examples demonstrate competitive performance of the proposed approach. PMID:21218128
NASA Astrophysics Data System (ADS)
Samsudin, Sarah Hanim; Shafri, Helmi Z. M.; Hamedianfar, Alireza
2016-04-01
Status observations of roofing material degradation are constantly evolving due to urban feature heterogeneities. Although advanced classification techniques have been introduced to improve within-class impervious surface classifications, these techniques involve complex processing and high computation times. This study integrates field spectroscopy and satellite multispectral remote sensing data to generate degradation status maps of concrete and metal roofing materials. Field spectroscopy data were used as bases for selecting suitable bands for spectral index development because of the limited number of multispectral bands. Mapping methods for roof degradation status were established for metal and concrete roofing materials by developing the normalized difference concrete condition index (NDCCI) and the normalized difference metal condition index (NDMCI). Results indicate that the accuracies achieved using the spectral indices are higher than those obtained using supervised pixel-based classification. The NDCCI generated an accuracy of 84.44%, whereas the support vector machine (SVM) approach yielded an accuracy of 73.06%. The NDMCI obtained an accuracy of 94.17% compared with 62.5% for the SVM approach. These findings support the suitability of the developed spectral index methods for determining roof degradation statuses from satellite observations in heterogeneous urban environments.
Multitemporal spatial pattern analysis of Tulum's tropical coastal landscape
NASA Astrophysics Data System (ADS)
Ramírez-Forero, Sandra Carolina; López-Caloca, Alejandra; Silván-Cárdenas, José Luis
2011-11-01
The tropical coastal landscape of Tulum in Quintana Roo, Mexico has a high ecological, economical, social and cultural value, it provides environmental and tourism services at global, national, regional and local levels. The landscape of the area is heterogeneous and presents random fragmentation patterns. In recent years, tourist services of the region has been increased promoting an accelerate expansion of hotels, transportation and recreation infrastructure altering the complex landscape. It is important to understand the environmental dynamics through temporal changes on the spatial patterns and to propose a better management of this ecological area to the authorities. This paper addresses a multi-temporal analysis of land cover changes from 1993 to 2000 in Tulum using Thematic Mapper data acquired by Landsat-5. Two independent methodologies were applied for the analysis of changes in the landscape and for the definition of fragmentation patterns. First, an Iteratively Multivariate Alteration Detection (IR-MAD) algorithm was used to detect and localize land cover change/no-change areas. Second, the post-classification change detection evaluated using the Support Vector Machine (SVM) algorithm. Landscape metrics were calculated from the results of IR-MAD and SVM. The analysis of the metrics indicated, among other things, a higher fragmentation pattern along roadways.
Automatic epileptic seizure detection in EEGs using MF-DFA, SVM based on cloud computing.
Zhang, Zhongnan; Wen, Tingxi; Huang, Wei; Wang, Meihong; Li, Chunfeng
2017-01-01
Epilepsy is a chronic disease with transient brain dysfunction that results from the sudden abnormal discharge of neurons in the brain. Since electroencephalogram (EEG) is a harmless and noninvasive detection method, it plays an important role in the detection of neurological diseases. However, the process of analyzing EEG to detect neurological diseases is often difficult because the brain electrical signals are random, non-stationary and nonlinear. In order to overcome such difficulty, this study aims to develop a new computer-aided scheme for automatic epileptic seizure detection in EEGs based on multi-fractal detrended fluctuation analysis (MF-DFA) and support vector machine (SVM). New scheme first extracts features from EEG by MF-DFA during the first stage. Then, the scheme applies a genetic algorithm (GA) to calculate parameters used in SVM and classify the training data according to the selected features using SVM. Finally, the trained SVM classifier is exploited to detect neurological diseases. The algorithm utilizes MLlib from library of SPARK and runs on cloud platform. Applying to a public dataset for experiment, the study results show that the new feature extraction method and scheme can detect signals with less features and the accuracy of the classification reached up to 99%. MF-DFA is a promising approach to extract features for analyzing EEG, because of its simple algorithm procedure and less parameters. The features obtained by MF-DFA can represent samples as well as traditional wavelet transform and Lyapunov exponents. GA can always find useful parameters for SVM with enough execution time. The results illustrate that the classification model can achieve comparable accuracy, which means that it is effective in epileptic seizure detection.
Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization.
Nishio, Mizuho; Nishizawa, Mitsuo; Sugiyama, Osamu; Kojima, Ryosuke; Yakami, Masahiro; Kuroda, Tomohiro; Togashi, Kaori
2018-01-01
We aimed to evaluate a computer-aided diagnosis (CADx) system for lung nodule classification focussing on (i) usefulness of the conventional CADx system (hand-crafted imaging feature + machine learning algorithm), (ii) comparison between support vector machine (SVM) and gradient tree boosting (XGBoost) as machine learning algorithms, and (iii) effectiveness of parameter optimization using Bayesian optimization and random search. Data on 99 lung nodules (62 lung cancers and 37 benign lung nodules) were included from public databases of CT images. A variant of the local binary pattern was used for calculating a feature vector. SVM or XGBoost was trained using the feature vector and its corresponding label. Tree Parzen Estimator (TPE) was used as Bayesian optimization for parameters of SVM and XGBoost. Random search was done for comparison with TPE. Leave-one-out cross-validation was used for optimizing and evaluating the performance of our CADx system. Performance was evaluated using area under the curve (AUC) of receiver operating characteristic analysis. AUC was calculated 10 times, and its average was obtained. The best averaged AUC of SVM and XGBoost was 0.850 and 0.896, respectively; both were obtained using TPE. XGBoost was generally superior to SVM. Optimal parameters for achieving high AUC were obtained with fewer numbers of trials when using TPE, compared with random search. Bayesian optimization of SVM and XGBoost parameters was more efficient than random search. Based on observer study, AUC values of two board-certified radiologists were 0.898 and 0.822. The results show that diagnostic accuracy of our CADx system was comparable to that of radiologists with respect to classifying lung nodules.
Ji, Xiaoliang; Shang, Xu; Dahlgren, Randy A; Zhang, Minghua
2017-07-01
Accurate quantification of dissolved oxygen (DO) is critically important for managing water resources and controlling pollution. Artificial intelligence (AI) models have been successfully applied for modeling DO content in aquatic ecosystems with limited data. However, the efficacy of these AI models in predicting DO levels in the hypoxic river systems having multiple pollution sources and complicated pollutants behaviors is unclear. Given this dilemma, we developed a promising AI model, known as support vector machine (SVM), to predict the DO concentration in a hypoxic river in southeastern China. Four different calibration models, specifically, multiple linear regression, back propagation neural network, general regression neural network, and SVM, were established, and their prediction accuracy was systemically investigated and compared. A total of 11 hydro-chemical variables were used as model inputs. These variables were measured bimonthly at eight sampling sites along the rural-suburban-urban portion of Wen-Rui Tang River from 2004 to 2008. The performances of the established models were assessed through the mean square error (MSE), determination coefficient (R 2 ), and Nash-Sutcliffe (NS) model efficiency. The results indicated that the SVM model was superior to other models in predicting DO concentration in Wen-Rui Tang River. For SVM, the MSE, R 2 , and NS values for the testing subset were 0.9416 mg/L, 0.8646, and 0.8763, respectively. Sensitivity analysis showed that ammonium-nitrogen was the most significant input variable of the proposal SVM model. Overall, these results demonstrated that the proposed SVM model can efficiently predict water quality, especially for highly impaired and hypoxic river systems.
Cerebral 18F-FDG PET in macrophagic myofasciitis: An individual SVM-based approach.
Blanc-Durand, Paul; Van Der Gucht, Axel; Guedj, Eric; Abulizi, Mukedaisi; Aoun-Sebaiti, Mehdi; Lerman, Lionel; Verger, Antoine; Authier, François-Jérôme; Itti, Emmanuel
2017-01-01
Macrophagic myofasciitis (MMF) is an emerging condition with highly specific myopathological alterations. A peculiar spatial pattern of a cerebral glucose hypometabolism involving occipito-temporal cortex and cerebellum have been reported in patients with MMF; however, the full pattern is not systematically present in routine interpretation of scans, and with varying degrees of severity depending on the cognitive profile of patients. Aim was to generate and evaluate a support vector machine (SVM) procedure to classify patients between healthy or MMF 18F-FDG brain profiles. 18F-FDG PET brain images of 119 patients with MMF and 64 healthy subjects were retrospectively analyzed. The whole-population was divided into two groups; a training set (100 MMF, 44 healthy subjects) and a testing set (19 MMF, 20 healthy subjects). Dimensionality reduction was performed using a t-map from statistical parametric mapping (SPM) and a SVM with a linear kernel was trained on the training set. To evaluate the performance of the SVM classifier, values of sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV) and accuracy (Acc) were calculated. The SPM12 analysis on the training set exhibited the already reported hypometabolism pattern involving occipito-temporal and fronto-parietal cortices, limbic system and cerebellum. The SVM procedure, based on the t-test mask generated from the training set, correctly classified MMF patients of the testing set with following Se, Sp, PPV, NPV and Acc: 89%, 85%, 85%, 89%, and 87%. We developed an original and individual approach including a SVM to classify patients between healthy or MMF metabolic brain profiles using 18F-FDG-PET. Machine learning algorithms are promising for computer-aided diagnosis but will need further validation in prospective cohorts.
Liu, Mei; Lu, Jun
2014-09-01
Water quality forecasting in agricultural drainage river basins is difficult because of the complicated nonpoint source (NPS) pollution transport processes and river self-purification processes involved in highly nonlinear problems. Artificial neural network (ANN) and support vector model (SVM) were developed to predict total nitrogen (TN) and total phosphorus (TP) concentrations for any location of the river polluted by agricultural NPS pollution in eastern China. River flow, water temperature, flow travel time, rainfall, dissolved oxygen, and upstream TN or TP concentrations were selected as initial inputs of the two models. Monthly, bimonthly, and trimonthly datasets were selected to train the two models, respectively, and the same monthly dataset which had not been used for training was chosen to test the models in order to compare their generalization performance. Trial and error analysis and genetic algorisms (GA) were employed to optimize the parameters of ANN and SVM models, respectively. The results indicated that the proposed SVM models performed better generalization ability due to avoiding the occurrence of overtraining and optimizing fewer parameters based on structural risk minimization (SRM) principle. Furthermore, both TN and TP SVM models trained by trimonthly datasets achieved greater forecasting accuracy than corresponding ANN models. Thus, SVM models will be a powerful alternative method because it is an efficient and economic tool to accurately predict water quality with low risk. The sensitivity analyses of two models indicated that decreasing upstream input concentrations during the dry season and NPS emission along the reach during average or flood season should be an effective way to improve Changle River water quality. If the necessary water quality and hydrology data and even trimonthly data are available, the SVM methodology developed here can easily be applied to other NPS-polluted rivers.
Sun, Tong; Xu, Wen-Li; Hu, Tian; Liu, Mu-Hua
2013-12-01
The objective of the present research was to assess soluble solids content (SSC) of Nanfeng mandarin by visible/near infrared (Vis/NIR) spectroscopy combined with new variable selection method, simplify prediction model and improve the performance of prediction model for SSC of Nanfeng mandarin. A total of 300 Nanfeng mandarin samples were used, the numbers of Nanfeng mandarin samples in calibration, validation and prediction sets were 150, 75 and 75, respectively. Vis/NIR spectra of Nanfeng mandarin samples were acquired by a QualitySpec spectrometer in the wavelength range of 350-1000 nm. Uninformative variables elimination (UVE) was used to eliminate wavelength variables that had few information of SSC, then independent component analysis (ICA) was used to extract independent components (ICs) from spectra that eliminated uninformative wavelength variables. At last, least squares support vector machine (LS-SVM) was used to develop calibration models for SSC of Nanfeng mandarin using extracted ICs, and 75 prediction samples that had not been used for model development were used to evaluate the performance of SSC model of Nanfeng mandarin. The results indicate t hat Vis/NIR spectroscopy combinedwith UVE-ICA-LS-SVM is suitable for assessing SSC o f Nanfeng mandarin, and t he precision o f prediction ishigh. UVE--ICA is an effective method to eliminate uninformative wavelength variables, extract important spectral information, simplify prediction model and improve the performance of prediction model. The SSC model developed by UVE-ICA-LS-SVM is superior to that developed by PLS, PCA-LS-SVM or ICA-LS-SVM, and the coefficient of determination and root mean square error in calibration, validation and prediction sets were 0.978, 0.230%, 0.965, 0.301% and 0.967, 0.292%, respectively.
Anam, Khairul; Al-Jumaily, Adel
2017-01-01
The success of myoelectric pattern recognition (M-PR) mostly relies on the features extracted and classifier employed. This paper proposes and evaluates a fast classifier, extreme learning machine (ELM), to classify individual and combined finger movements on amputees and non-amputees. ELM is a single hidden layer feed-forward network (SLFN) that avoids iterative learning by determining input weights randomly and output weights analytically. Therefore, it can accelerate the training time of SLFNs. In addition to the classifier evaluation, this paper evaluates various feature combinations to improve the performance of M-PR and investigate some feature projections to improve the class separability of the features. Different from other studies on the implementation of ELM in the myoelectric controller, this paper presents a complete and thorough investigation of various types of ELMs including the node-based and kernel-based ELM. Furthermore, this paper provides comparisons of ELMs and other well-known classifiers such as linear discriminant analysis (LDA), k-nearest neighbour (kNN), support vector machine (SVM) and least-square SVM (LS-SVM). The experimental results show the most accurate ELM classifier is radial basis function ELM (RBF-ELM). The comparison of RBF-ELM and other well-known classifiers shows that RBF-ELM is as accurate as SVM and LS-SVM but faster than the SVM family; it is superior to LDA and kNN. The experimental results also indicate that the accuracy gap of the M-PR on the amputees and non-amputees is not too much with the accuracy of 98.55% on amputees and 99.5% on the non-amputees using six electromyography (EMG) channels. Copyright © 2016 Elsevier Ltd. All rights reserved.
Ansari, Mozafar; Othman, Faridah; Abunama, Taher; El-Shafie, Ahmed
2018-04-01
The function of a sewage treatment plant is to treat the sewage to acceptable standards before being discharged into the receiving waters. To design and operate such plants, it is necessary to measure and predict the influent flow rate. In this research, the influent flow rate of a sewage treatment plant (STP) was modelled and predicted by autoregressive integrated moving average (ARIMA), nonlinear autoregressive network (NAR) and support vector machine (SVM) regression time series algorithms. To evaluate the models' accuracy, the root mean square error (RMSE) and coefficient of determination (R 2 ) were calculated as initial assessment measures, while relative error (RE), peak flow criterion (PFC) and low flow criterion (LFC) were calculated as final evaluation measures to demonstrate the detailed accuracy of the selected models. An integrated model was developed based on the individual models' prediction ability for low, average and peak flow. An initial assessment of the results showed that the ARIMA model was the least accurate and the NAR model was the most accurate. The RE results also prove that the SVM model's frequency of errors above 10% or below - 10% was greater than the NAR model's. The influent was also forecasted up to 44 weeks ahead by both models. The graphical results indicate that the NAR model made better predictions than the SVM model. The final evaluation of NAR and SVM demonstrated that SVM made better predictions at peak flow and NAR fit well for low and average inflow ranges. The integrated model developed includes the NAR model for low and average influent and the SVM model for peak inflow.
2012-01-01
Background Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM). Result The amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity. Conclusion The results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences. PMID:23046503
Zhang, Ming-Huan; Ma, Jun-Shan; Shen, Ying; Chen, Ying
2016-09-01
This study aimed to investigate the optimal support vector machines (SVM)-based classifier of duchenne muscular dystrophy (DMD) magnetic resonance imaging (MRI) images. T1-weighted (T1W) and T2-weighted (T2W) images of the 15 boys with DMD and 15 normal controls were obtained. Textural features of the images were extracted and wavelet decomposed, and then, principal features were selected. Scale transform was then performed for MRI images. Afterward, SVM-based classifiers of MRI images were analyzed based on the radical basis function and decomposition levels. The cost (C) parameter and kernel parameter [Formula: see text] were used for classification. Then, the optimal SVM-based classifier, expressed as [Formula: see text]), was identified by performance evaluation (sensitivity, specificity and accuracy). Eight of 12 textural features were selected as principal features (eigenvalues [Formula: see text]). The 16 SVM-based classifiers were obtained using combination of (C, [Formula: see text]), and those with lower C and [Formula: see text] values showed higher performances, especially classifier of [Formula: see text]). The SVM-based classifiers of T1W images showed higher performance than T1W images at the same decomposition level. The T1W images in classifier of [Formula: see text]) at level 2 decomposition showed the highest performance of all, and its overall correct sensitivity, specificity, and accuracy reached 96.9, 97.3, and 97.1 %, respectively. The T1W images in SVM-based classifier [Formula: see text] at level 2 decomposition showed the highest performance of all, demonstrating that it was the optimal classification for the diagnosis of DMD.
NASA Astrophysics Data System (ADS)
Zhu, Guoping; Zhang, Haiting; Yang, Yang; Wang, Shaoqin; Wei, Lian; Yang, Qingyuan
2017-09-01
The Patagonian Shelf is a very productive region with different ecosystem structures. A long history of fishing in the Southwestern Atlantic Ocean combined with a complex hydrographic structure, with a permanent front over the shelf-break and different coastal frontal regions, and a wide non-frontal area in between have made the food web in this area more complex and have resulted in changes to the spatial-temporal scale. Stable isotopes of carbon and nitrogen were used to determine the trophic structure of the Patagonian shelf break which was previously poorly understood. The results indicated that the average δ15N value of pelagic guild (Illex argentinus) was remarkable lower than those of the other guilds. The δ13C values of almost all species ranged from -17‰ to -18‰, but Stromateus brasiliensis had a significant lower δ13C value. Compared with the southern Patagonian shelf, short food chain length also occurred. The impact of complex oceanographic structures has resulted in food web structure change to the temporal-spatial scale on the Patagonian shelf. The Patagonian shelf break can be considered as a separated ecosystem structure with lower δ15N values.
NASA Astrophysics Data System (ADS)
Mahvash Mohammadi, Neda; Hezarkhani, Ardeshir
2018-07-01
Classification of mineralised zones is an important factor for the analysis of economic deposits. In this paper, the support vector machine (SVM), a supervised learning algorithm, based on subsurface data is proposed for classification of mineralised zones in the Takht-e-Gonbad porphyry Cu-deposit (SE Iran). The effects of the input features are evaluated via calculating the accuracy rates on the SVM performance. Ultimately, the SVM model, is developed based on input features namely lithology, alteration, mineralisation, the level and, radial basis function (RBF) as a kernel function. Moreover, the optimal amount of parameters λ and C, using n-fold cross-validation method, are calculated at level 0.001 and 0.01 respectively. The accuracy of this model is 0.931 for classification of mineralised zones in the Takht-e-Gonbad porphyry deposit. The results of the study confirm the efficiency of SVM method for classification the mineralised zones.
Alejo, Luz; Atkinson, John; Guzmán-Fierro, Víctor; Roeckel, Marlene
2018-05-16
Computational self-adapting methods (Support Vector Machines, SVM) are compared with an analytical method in effluent composition prediction of a two-stage anaerobic digestion (AD) process. Experimental data for the AD of poultry manure were used. The analytical method considers the protein as the only source of ammonia production in AD after degradation. Total ammonia nitrogen (TAN), total solids (TS), chemical oxygen demand (COD), and total volatile solids (TVS) were measured in the influent and effluent of the process. The TAN concentration in the effluent was predicted, this being the most inhibiting and polluting compound in AD. Despite the limited data available, the SVM-based model outperformed the analytical method for the TAN prediction, achieving a relative average error of 15.2% against 43% for the analytical method. Moreover, SVM showed higher prediction accuracy in comparison with Artificial Neural Networks. This result reveals the future promise of SVM for prediction in non-linear and dynamic AD processes. Graphical abstract ᅟ.
Distributed support vector machine in master-slave mode.
Chen, Qingguo; Cao, Feilong
2018-05-01
It is well known that the support vector machine (SVM) is an effective learning algorithm. The alternating direction method of multipliers (ADMM) algorithm has emerged as a powerful technique for solving distributed optimisation models. This paper proposes a distributed SVM algorithm in a master-slave mode (MS-DSVM), which integrates a distributed SVM and ADMM acting in a master-slave configuration where the master node and slave nodes are connected, meaning the results can be broadcasted. The distributed SVM is regarded as a regularised optimisation problem and modelled as a series of convex optimisation sub-problems that are solved by ADMM. Additionally, the over-relaxation technique is utilised to accelerate the convergence rate of the proposed MS-DSVM. Our theoretical analysis demonstrates that the proposed MS-DSVM has linear convergence, meaning it possesses the fastest convergence rate among existing standard distributed ADMM algorithms. Numerical examples demonstrate that the convergence and accuracy of the proposed MS-DSVM are superior to those of existing methods under the ADMM framework. Copyright © 2018 Elsevier Ltd. All rights reserved.
Prediction of Backbreak in Open-Pit Blasting Operations Using the Machine Learning Method
NASA Astrophysics Data System (ADS)
Khandelwal, Manoj; Monjezi, M.
2013-03-01
Backbreak is an undesirable phenomenon in blasting operations. It can cause instability of mine walls, falling down of machinery, improper fragmentation, reduced efficiency of drilling, etc. The existence of various effective parameters and their unknown relationships are the main reasons for inaccuracy of the empirical models. Presently, the application of new approaches such as artificial intelligence is highly recommended. In this paper, an attempt has been made to predict backbreak in blasting operations of Soungun iron mine, Iran, incorporating rock properties and blast design parameters using the support vector machine (SVM) method. To investigate the suitability of this approach, the predictions by SVM have been compared with multivariate regression analysis (MVRA). The coefficient of determination (CoD) and the mean absolute error (MAE) were taken as performance measures. It was found that the CoD between measured and predicted backbreak was 0.987 and 0.89 by SVM and MVRA, respectively, whereas the MAE was 0.29 and 1.07 by SVM and MVRA, respectively.
NASA Astrophysics Data System (ADS)
Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos
2016-08-01
This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.
Zhan, Xiaobin; Jiang, Shulan; Yang, Yili; Liang, Jian; Shi, Tielin; Li, Xiwen
2015-09-18
This paper proposes an ultrasonic measurement system based on least squares support vector machines (LS-SVM) for inline measurement of particle concentrations in multicomponent suspensions. Firstly, the ultrasonic signals are analyzed and processed, and the optimal feature subset that contributes to the best model performance is selected based on the importance of features. Secondly, the LS-SVM model is tuned, trained and tested with different feature subsets to obtain the optimal model. In addition, a comparison is made between the partial least square (PLS) model and the LS-SVM model. Finally, the optimal LS-SVM model with the optimal feature subset is applied to inline measurement of particle concentrations in the mixing process. The results show that the proposed method is reliable and accurate for inline measuring the particle concentrations in multicomponent suspensions and the measurement accuracy is sufficiently high for industrial application. Furthermore, the proposed method is applicable to the modeling of the nonlinear system dynamically and provides a feasible way to monitor industrial processes.
VLSI Design of SVM-Based Seizure Detection System With On-Chip Learning Capability.
Feng, Lichen; Li, Zunchao; Wang, Yuanfa
2018-02-01
Portable automatic seizure detection system is very convenient for epilepsy patients to carry. In order to make the system on-chip trainable with high efficiency and attain high detection accuracy, this paper presents a very large scale integration (VLSI) design based on the nonlinear support vector machine (SVM). The proposed design mainly consists of a feature extraction (FE) module and an SVM module. The FE module performs the three-level Daubechies discrete wavelet transform to fit the physiological bands of the electroencephalogram (EEG) signal and extracts the time-frequency domain features reflecting the nonstationary signal properties. The SVM module integrates the modified sequential minimal optimization algorithm with the table-driven-based Gaussian kernel to enable efficient on-chip learning. The presented design is verified on an Altera Cyclone II field-programmable gate array and tested using the two publicly available EEG datasets. Experiment results show that the designed VLSI system improves the detection accuracy and training efficiency.
Multiclass Reduced-Set Support Vector Machines
NASA Technical Reports Server (NTRS)
Tang, Benyang; Mazzoni, Dominic
2006-01-01
There are well-established methods for reducing the number of support vectors in a trained binary support vector machine, often with minimal impact on accuracy. We show how reduced-set methods can be applied to multiclass SVMs made up of several binary SVMs, with significantly better results than reducing each binary SVM independently. Our approach is based on Burges' approach that constructs each reduced-set vector as the pre-image of a vector in kernel space, but we extend this by recomputing the SVM weights and bias optimally using the original SVM objective function. This leads to greater accuracy for a binary reduced-set SVM, and also allows vectors to be 'shared' between multiple binary SVMs for greater multiclass accuracy with fewer reduced-set vectors. We also propose computing pre-images using differential evolution, which we have found to be more robust than gradient descent alone. We show experimental results on a variety of problems and find that this new approach is consistently better than previous multiclass reduced-set methods, sometimes with a dramatic difference.
SVM-based tree-type neural networks as a critic in adaptive critic designs for control.
Deb, Alok Kanti; Jayadeva; Gopal, Madan; Chandra, Suresh
2007-07-01
In this paper, we use the approach of adaptive critic design (ACD) for control, specifically, the action-dependent heuristic dynamic programming (ADHDP) method. A least squares support vector machine (SVM) regressor has been used for generating the control actions, while an SVM-based tree-type neural network (NN) is used as the critic. After a failure occurs, the critic and action are retrained in tandem using the failure data. Failure data is binary classification data, where the number of failure states are very few as compared to the number of no-failure states. The difficulty of conventional multilayer feedforward NNs in learning this type of classification data has been overcome by using the SVM-based tree-type NN, which due to its feature to add neurons to learn misclassified data, has the capability to learn any binary classification data without a priori choice of the number of neurons or the structure of the network. The capability of the trained controller to handle unforeseen situations is demonstrated.
NASA Astrophysics Data System (ADS)
Rokni Deilmai, B.; Ahmad, B. Bin; Zabihi, H.
2014-06-01
Mapping is essential for the analysis of the land use and land cover, which influence many environmental processes and properties. For the purpose of the creation of land cover maps, it is important to minimize error. These errors will propagate into later analyses based on these land cover maps. The reliability of land cover maps derived from remotely sensed data depends on an accurate classification. In this study, we have analyzed multispectral data using two different classifiers including Maximum Likelihood Classifier (MLC) and Support Vector Machine (SVM). To pursue this aim, Landsat Thematic Mapper data and identical field-based training sample datasets in Johor Malaysia used for each classification method, which results indicate in five land cover classes forest, oil palm, urban area, water, rubber. Classification results indicate that SVM was more accurate than MLC. With demonstrated capability to produce reliable cover results, the SVM methods should be especially useful for land cover classification.
Huang, Tao; Li, Xiao-yu; Xu, Meng-ling; Jin, Rui; Ku, Jing; Xu, Sen-miao; Wu, Zhen-zhong
2015-01-01
The quality of potato is directly related to their edible value and industrial value. Hollow heart of potato, as a physiological disease occurred inside the tuber, is difficult to be detected. This paper put forward a non-destructive detection method by using semi-transmission hyperspectral imaging with support vector machine (SVM) to detect hollow heart of potato. Compared to reflection and transmission hyperspectral image, semi-transmission hyperspectral image can get clearer image which contains the internal quality information of agricultural products. In this study, 224 potato samples (149 normal samples and 75 hollow samples) were selected as the research object, and semi-transmission hyperspectral image acquisition system was constructed to acquire the hyperspectral images (390-1 040 nn) of the potato samples, and then the average spectrum of region of interest were extracted for spectral characteristics analysis. Normalize was used to preprocess the original spectrum, and prediction model were developed based on SVM using all wave bands, the accurate recognition rate of test set is only 87. 5%. In order to simplify the model competitive.adaptive reweighed sampling algorithm (CARS) and successive projection algorithm (SPA) were utilized to select important variables from the all 520 spectral variables and 8 variables were selected (454, 601, 639, 664, 748, 827, 874 and 936 nm). 94. 64% of the accurate recognition rate of test set was obtained by using the 8 variables to develop SVM model. Parameter optimization algorithms, including artificial fish swarm algorithm (AFSA), genetic algorithm (GA) and grid search algorithm, were used to optimize the SVM model parameters: penalty parameter c and kernel parameter g. After comparative analysis, AFSA, a new bionic optimization algorithm based on the foraging behavior of fish swarm, was proved to get the optimal model parameter (c=10. 659 1, g=0. 349 7), and the recognition accuracy of 10% were obtained for the AFSA-SVM model. The results indicate that combining the semi-transmission hyperspectral imaging technology with CARS-SPA and AFSA-SVM can accurately detect hollow heart of potato, and also provide technical support for rapid non-destructive detecting of hollow heart of potato.
Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading.
Sahran, Shahnorbanun; Albashish, Dheeb; Abdullah, Azizi; Shukor, Nordashima Abd; Hayati Md Pauzi, Suria
2018-04-18
Feature selection (FS) methods are widely used in grading and diagnosing prostate histopathological images. In this context, FS is based on the texture features obtained from the lumen, nuclei, cytoplasm and stroma, all of which are important tissue components. However, it is difficult to represent the high-dimensional textures of these tissue components. To solve this problem, we propose a new FS method that enables the selection of features with minimal redundancy in the tissue components. We categorise tissue images based on the texture of individual tissue components via the construction of a single classifier and also construct an ensemble learning model by merging the values obtained by each classifier. Another issue that arises is overfitting due to the high-dimensional texture of individual tissue components. We propose a new FS method, SVM-RFE(AC), that integrates a Support Vector Machine-Recursive Feature Elimination (SVM-RFE) embedded procedure with an absolute cosine (AC) filter method to prevent redundancy in the selected features of the SV-RFE and an unoptimised classifier in the AC. We conducted experiments on H&E histopathological prostate and colon cancer images with respect to three prostate classifications, namely benign vs. grade 3, benign vs. grade 4 and grade 3 vs. grade 4. The colon benchmark dataset requires a distinction between grades 1 and 2, which are the most difficult cases to distinguish in the colon domain. The results obtained by both the single and ensemble classification models (which uses the product rule as its merging method) confirm that the proposed SVM-RFE(AC) is superior to the other SVM and SVM-RFE-based methods. We developed an FS method based on SVM-RFE and AC and successfully showed that its use enabled the identification of the most crucial texture feature of each tissue component. Thus, it makes possible the distinction between multiple Gleason grades (e.g. grade 3 vs. grade 4) and its performance is far superior to other reported FS methods. Copyright © 2018 Elsevier B.V. All rights reserved.
Vidić, Igor; Egnell, Liv; Jerome, Neil P; Teruel, Jose R; Sjøbakk, Torill E; Østlie, Agnes; Fjøsne, Hans E; Bathen, Tone F; Goa, Pål Erik
2018-05-01
Diffusion-weighted MRI (DWI) is currently one of the fastest developing MRI-based techniques in oncology. Histogram properties from model fitting of DWI are useful features for differentiation of lesions, and classification can potentially be improved by machine learning. To evaluate classification of malignant and benign tumors and breast cancer subtypes using support vector machine (SVM). Prospective. Fifty-one patients with benign (n = 23) and malignant (n = 28) breast tumors (26 ER+, whereof six were HER2+). Patients were imaged with DW-MRI (3T) using twice refocused spin-echo echo-planar imaging with echo time / repetition time (TR/TE) = 9000/86 msec, 90 × 90 matrix size, 2 × 2 mm in-plane resolution, 2.5 mm slice thickness, and 13 b-values. Apparent diffusion coefficient (ADC), relative enhanced diffusivity (RED), and the intravoxel incoherent motion (IVIM) parameters diffusivity (D), pseudo-diffusivity (D*), and perfusion fraction (f) were calculated. The histogram properties (median, mean, standard deviation, skewness, kurtosis) were used as features in SVM (10-fold cross-validation) for differentiation of lesions and subtyping. Accuracies of the SVM classifications were calculated to find the combination of features with highest prediction accuracy. Mann-Whitney tests were performed for univariate comparisons. For benign versus malignant tumors, univariate analysis found 11 histogram properties to be significant differentiators. Using SVM, the highest accuracy (0.96) was achieved from a single feature (mean of RED), or from three feature combinations of IVIM or ADC. Combining features from all models gave perfect classification. No single feature predicted HER2 status of ER + tumors (univariate or SVM), although high accuracy (0.90) was achieved with SVM combining several features. Importantly, these features had to include higher-order statistics (kurtosis and skewness), indicating the importance to account for heterogeneity. Our findings suggest that SVM, using features from a combination of diffusion models, improves prediction accuracy for differentiation of benign versus malignant breast tumors, and may further assist in subtyping of breast cancer. 3 Technical Efficacy: Stage 3 J. Magn. Reson. Imaging 2018;47:1205-1216. © 2017 International Society for Magnetic Resonance in Medicine.
Identification of handwriting by using the genetic algorithm (GA) and support vector machine (SVM)
NASA Astrophysics Data System (ADS)
Zhang, Qigui; Deng, Kai
2016-12-01
As portable digital camera and a camera phone comes more and more popular, and equally pressing is meeting the requirements of people to shoot at any time, to identify and storage handwritten character. In this paper, genetic algorithm(GA) and support vector machine(SVM)are used for identification of handwriting. Compare with parameters-optimized method, this technique overcomes two defects: first, it's easy to trap in the local optimum; second, finding the best parameters in the larger range will affects the efficiency of classification and prediction. As the experimental results suggest, GA-SVM has a higher recognition rate.
Applications of Support Vector Machines In Chemo And Bioinformatics
NASA Astrophysics Data System (ADS)
Jayaraman, V. K.; Sundararajan, V.
2010-10-01
Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.
NASA Astrophysics Data System (ADS)
Cui, Ying; Dy, Jennifer G.; Alexander, Brian; Jiang, Steve B.
2008-08-01
Various problems with the current state-of-the-art techniques for gated radiotherapy have prevented this new treatment modality from being widely implemented in clinical routine. These problems are caused mainly by applying various external respiratory surrogates. There might be large uncertainties in deriving the tumor position from external respiratory surrogates. While tracking implanted fiducial markers has sufficient accuracy, this procedure may not be widely accepted due to the risk of pneumothorax. Previously, we have developed a technique to generate gating signals from fluoroscopic images without implanted fiducial markers using template matching methods (Berbeco et al 2005 Phys. Med. Biol. 50 4481-90, Cui et al 2007b Phys. Med. Biol. 52 741-55). In this note, our main contribution is to provide a totally different new view of the gating problem by recasting it as a classification problem. Then, we solve this classification problem by a well-studied powerful classification method called a support vector machine (SVM). Note that the goal of an automated gating tool is to decide when to turn the beam ON or OFF. We treat ON and OFF as the two classes in our classification problem. We create our labeled training data during the patient setup session by utilizing the reference gating signal, manually determined by a radiation oncologist. We then pre-process these labeled training images and build our SVM prediction model. During treatment delivery, fluoroscopic images are continuously acquired, pre-processed and sent as an input to the SVM. Finally, our SVM model will output the predicted labels as gating signals. We test the proposed technique on five sequences of fluoroscopic images from five lung cancer patients against the reference gating signal as ground truth. We compare the performance of the SVM to our previous template matching method (Cui et al 2007b Phys. Med. Biol. 52 741-55). We find that the SVM is slightly more accurate on average (1-3%) than the template matching method, when delivering the target dose. And the average duty cycle is 4-6% longer. Given the very limited patient dataset, we cannot conclude that the SVM is more accurate and efficient than the template matching method. However, our preliminary results show that the SVM is a potentially precise and efficient algorithm for generating gating signals for radiotherapy. This work demonstrates that the gating problem can be considered as a classification problem and solved accordingly.
Fluoroquinolone-gyrase-DNA complexes: two modes of drug binding.
Mustaev, Arkady; Malik, Muhammad; Zhao, Xilin; Kurepina, Natalia; Luan, Gan; Oppegard, Lisa M; Hiasa, Hiroshi; Marks, Kevin R; Kerns, Robert J; Berger, James M; Drlica, Karl
2014-05-02
DNA gyrase and topoisomerase IV control bacterial DNA topology by breaking DNA, passing duplex DNA through the break, and then resealing the break. This process is subject to reversible corruption by fluoroquinolones, antibacterials that form drug-enzyme-DNA complexes in which the DNA is broken. The complexes, called cleaved complexes because of the presence of DNA breaks, have been crystallized and found to have the fluoroquinolone C-7 ring system facing the GyrB/ParE subunits. As expected from x-ray crystallography, a thiol-reactive, C-7-modified chloroacetyl derivative of ciprofloxacin (Cip-AcCl) formed cross-linked cleaved complexes with mutant GyrB-Cys(466) gyrase as evidenced by resistance to reversal by both EDTA and thermal treatments. Surprisingly, cross-linking was also readily seen with complexes formed by mutant GyrA-G81C gyrase, thereby revealing a novel drug-gyrase interaction not observed in crystal structures. The cross-link between fluoroquinolone and GyrA-G81C gyrase correlated with exceptional bacteriostatic activity for Cip-AcCl with a quinolone-resistant GyrA-G81C variant of Escherichia coli and its Mycobacterium smegmatis equivalent (GyrA-G89C). Cip-AcCl-mediated, irreversible inhibition of DNA replication provided further evidence for a GyrA-drug cross-link. Collectively these data establish the existence of interactions between the fluoroquinolone C-7 ring and both GyrA and GyrB. Because the GyrA-Gly(81) and GyrB-Glu(466) residues are far apart (17 Å) in the crystal structure of cleaved complexes, two modes of quinolone binding must exist. The presence of two binding modes raises the possibility that multiple quinolone-enzyme-DNA complexes can form, a discovery that opens new avenues for exploring and exploiting relationships between drug structure and activity with type II DNA topoisomerases.
Estimation of hydraulic jump characteristics of channels with sudden diverging side walls via SVM.
Roushangar, Kiyoumars; Valizadeh, Reyhaneh; Ghasempour, Roghayeh
2017-10-01
Sudden diverging channels are one of the energy dissipaters which can dissipate most of the kinetic energy of the flow through a hydraulic jump. An accurate prediction of hydraulic jump characteristics is an important step in designing hydraulic structures. This paper focuses on the capability of the support vector machine (SVM) as a meta-model approach for predicting hydraulic jump characteristics in different sudden diverging stilling basins (i.e. basins with and without appurtenances). In this regard, different models were developed and tested using 1,018 experimental data. The obtained results proved the capability of the SVM technique in predicting hydraulic jump characteristics and it was found that the developed models for a channel with a central block performed more successfully than models for channels without appurtenances or with a negative step. The superior performance for the length of hydraulic jump was obtained for the model with parameters F 1 (Froude number) and (h 2- h 1 )/h 1 (h 1 and h 2 are sequent depth of upstream and downstream respectively). Concerning the relative energy dissipation and sequent depth ratio, the model with parameters F 1 and h 1 /B (B is expansion ratio) led to the best results. According to the outcome of sensitivity analysis, Froude number had the most significant effect on the modeling. Also comparison between SVM and empirical equations indicated the great performance of the SVM.
NASA Astrophysics Data System (ADS)
Manu, D. S.; Thalla, Arun Kumar
2017-11-01
The current work demonstrates the support vector machine (SVM) and adaptive neuro-fuzzy inference system (ANFIS) modeling to assess the removal efficiency of Kjeldahl Nitrogen of a full-scale aerobic biological wastewater treatment plant. The influent variables such as pH, chemical oxygen demand, total solids (TS), free ammonia, ammonia nitrogen and Kjeldahl Nitrogen are used as input variables during modeling. Model development focused on postulating an adaptive, functional, real-time and alternative approach for modeling the removal efficiency of Kjeldahl Nitrogen. The input variables used for modeling were daily time series data recorded at wastewater treatment plant (WWTP) located in Mangalore during the period June 2014-September 2014. The performance of ANFIS model developed using Gbell and trapezoidal membership functions (MFs) and SVM are assessed using different statistical indices like root mean square error, correlation coefficients (CC) and Nash Sutcliff error (NSE). The errors related to the prediction of effluent Kjeldahl Nitrogen concentration by the SVM modeling appeared to be reasonable when compared to that of ANFIS models with Gbell and trapezoidal MF. From the performance evaluation of the developed SVM model, it is observed that the approach is capable to define the inter-relationship between various wastewater quality variables and thus SVM can be potentially applied for evaluating the efficiency of aerobic biological processes in WWTP.
NASA Astrophysics Data System (ADS)
Gao, Xiangdong; Liu, Guiqian
2015-01-01
During deep penetration laser welding, there exist plume (weak plasma) and spatters, which are the results of weld material ejection due to strong laser heating. The characteristics of plume and spatters are related to welding stability and quality. Characteristics of metallic plume and spatters were investigated during high-power disk laser bead-on-plate welding of Type 304 austenitic stainless steel plates at a continuous wave laser power of 10 kW. An ultraviolet and visible sensitive high-speed camera was used to capture the metallic plume and spatter images. Plume area, laser beam path through the plume, swing angle, distance between laser beam focus and plume image centroid, abscissa of plume centroid and spatter numbers are defined as eigenvalues, and the weld bead width was used as a characteristic parameter that reflected welding stability. Welding status was distinguished by SVM (support vector machine) after data normalization and characteristic analysis. Also, PCA (principal components analysis) feature extraction was used to reduce the dimensions of feature space, and PSO (particle swarm optimization) was used to optimize the parameters of SVM. Finally a classification model based on SVM was established to estimate the weld bead width and welding stability. Experimental results show that the established algorithm based on SVM could effectively distinguish the variation of weld bead width, thus providing an experimental example of monitoring high-power disk laser welding quality.
The dynamic financial distress prediction method of EBW-VSTW-SVM
NASA Astrophysics Data System (ADS)
Sun, Jie; Li, Hui; Chang, Pei-Chann; He, Kai-Yu
2016-07-01
Financial distress prediction (FDP) takes important role in corporate financial risk management. Most of former researches in this field tried to construct effective static FDP (SFDP) models that are difficult to be embedded into enterprise information systems, because they are based on horizontal data-sets collected outside the modelling enterprise by defining the financial distress as the absolute conditions such as bankruptcy or insolvency. This paper attempts to propose an approach for dynamic evaluation and prediction of financial distress based on the entropy-based weighting (EBW), the support vector machine (SVM) and an enterprise's vertical sliding time window (VSTW). The dynamic FDP (DFDP) method is named EBW-VSTW-SVM, which keeps updating the FDP model dynamically with time goes on and only needs the historic financial data of the modelling enterprise itself and thus is easier to be embedded into enterprise information systems. The DFDP method of EBW-VSTW-SVM consists of four steps, namely evaluation of vertical relative financial distress (VRFD) based on EBW, construction of training data-set for DFDP modelling according to VSTW, training of DFDP model based on SVM and DFDP for the future time point. We carry out case studies for two listed pharmaceutical companies and experimental analysis for some other companies to simulate the sliding of enterprise vertical time window. The results indicated that the proposed approach was feasible and efficient to help managers improve corporate financial management.
NASA Astrophysics Data System (ADS)
Xiao, Jian; Luo, Xiaoping; Feng, Zhenfei; Zhang, Jinxin
2018-01-01
This work combines fuzzy logic and a support vector machine (SVM) with a principal component analysis (PCA) to create an artificial-intelligence system that identifies nanofluid gas-liquid two-phase flow states in a vertical mini-channel. Flow-pattern recognition requires finding the operational details of the process and doing computer simulations and image processing can be used to automate the description of flow patterns in nanofluid gas-liquid two-phase flow. This work uses fuzzy logic and a SVM with PCA to improve the accuracy with which the flow pattern of a nanofluid gas-liquid two-phase flow is identified. To acquire images of nanofluid gas-liquid two-phase flow patterns of flow boiling, a high-speed digital camera was used to record four different types of flow-pattern images, namely annular flow, bubbly flow, churn flow, and slug flow. The textural features extracted by processing the images of nanofluid gas-liquid two-phase flow patterns are used as inputs to various identification schemes such as fuzzy logic, SVM, and SVM with PCA to identify the type of flow pattern. The results indicate that the SVM with reduced characteristics of PCA provides the best identification accuracy and requires less calculation time than the other two schemes. The data reported herein should be very useful for the design and operation of industrial applications.
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets
Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.
2013-01-01
Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.
Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S; Beer, Michael A
2013-07-01
Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features
Mohammad-Noori, Morteza; Beer, Michael A.
2014-01-01
Abstract Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. PMID:25033408
Enhanced regulatory sequence prediction using gapped k-mer features.
Ghandi, Mahmoud; Lee, Dongwon; Mohammad-Noori, Morteza; Beer, Michael A
2014-07-01
Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.
Borthiry, Griselda R.; Antholine, William E.; Myers, Judith M.; Myers, Charles R.
2009-01-01
Chromium (Cr) is a cytotoxic metal that can be associated with a variety of types of DNA damage, including Cr-DNA adducts and strand breaks. Prior studies with purified human cytochrome b5 and NADPH :P450 reductase in reconstituted proteoliposomes (PLs) demonstrated rapid reduction of CrVI (hexavalent chromium, as CrO42− ), and the generation of CrV, superoxide (O2·−) , and hydroxyl radical (HO˙). Studies reported here examined the potential for the species produced by this system to interact with DNA. Strand breaks of purified plasmid DNA increased over time aerobically, but were not observed in the absence of O2. CrV is formed under both conditions, so the breaks are not mediated directly by CrV. The aerobic strand breaks were significantly prevented by catalase and EtOH, but not by the metal chelator diethylenetriaminepentaacetic acid (DTPA), suggesting that they are largely due to HO˙ from Cr-mediated redox cycling. EPR was used to assess the formation of Cr-DNA complexes. Following a 10-min incubation of PLs, CrO42− , and plasmid DNA, intense EPR signals at g = 5.7and g = 5.0 were observed. These signals are attributed to specific CrIII complexes with large zero field splitting (ZFS). Without DNA, the signals in the g = 5 region were weak. The large ZFS signals were not seen, when CrIIICl3 was incubated with DNA, suggesting that the CrIII–DNA interactions are different when generated by the PLs. After 24 h, a broad signal at g = 2 is attributed to CrIII complexes with a small ZFS. This g = 2 signal was observed without DNA, but it was different from that seen with plasmid. It is concluded that EPR can detect specific CrIII complexes that depend on the presence of plasmid DNA and the manner in which the CrIII is formed. PMID:18729091
USDA-ARS?s Scientific Manuscript database
Support Vector Machine (SVM) was used in the Genetic Algorithms (GA) process to select and classify a subset of hyperspectral image bands. The method was applied to fluorescence hyperspectral data for the detection of aflatoxin contamination in Aspergillus flavus infected single corn kernels. In the...
Evaluation of Data Processing Techniques for Unobtrusive Gait Authentication
2014-03-01
scatter plot depicting the performance of kNN , by TER, on all experimental mixtures...30 Table 9. Mean TER of SVM and kNN performance with different voting parameters...performance on XYZ-axis data. ...........................................................51 Table 19. kNN and SVM results in back pocket carrying
The Human L1 Element Causes DNA Double-Strand Breaks in Breast Cancer
2006-08-01
cancer is complex. However, defects in DNA repair genes in the double-strand break repair pathway are cancer predisposing. My lab has characterized...a new potentially important source of double-strand breaks (DSBs) in human cells and are interested in characterizing which DNA repair genes act on...this particular source of DNA damage. Selfish DNA accounts for 45% of the human genome. We have recently demonstrated that one particular selfish
Blaikley, Elizabeth J; Tinline-Purvis, Helen; Kasparek, Torben R; Marguerat, Samuel; Sarkar, Sovan; Hulme, Lydia; Hussey, Sharon; Wee, Boon-Yu; Deegan, Rachel S; Walker, Carol A; Pai, Chen-Chun; Bähler, Jürg; Nakagawa, Takuro; Humphrey, Timothy C
2014-05-01
DNA double-strand breaks (DSBs) can cause chromosomal rearrangements and extensive loss of heterozygosity (LOH), hallmarks of cancer cells. Yet, how such events are normally suppressed is unclear. Here we identify roles for the DNA damage checkpoint pathway in facilitating homologous recombination (HR) repair and suppressing extensive LOH and chromosomal rearrangements in response to a DSB. Accordingly, deletion of Rad3(ATR), Rad26ATRIP, Crb2(53BP1) or Cdc25 overexpression leads to reduced HR and increased break-induced chromosome loss and rearrangements. We find the DNA damage checkpoint pathway facilitates HR, in part, by promoting break-induced Cdt2-dependent nucleotide synthesis. We also identify additional roles for Rad17, the 9-1-1 complex and Chk1 activation in facilitating break-induced extensive resection and chromosome loss, thereby suppressing extensive LOH. Loss of Rad17 or the 9-1-1 complex results in a striking increase in break-induced isochromosome formation and very low levels of chromosome loss, suggesting the 9-1-1 complex acts as a nuclease processivity factor to facilitate extensive resection. Further, our data suggest redundant roles for Rad3ATR and Exo1 in facilitating extensive resection. We propose that the DNA damage checkpoint pathway coordinates resection and nucleotide synthesis, thereby promoting efficient HR repair and genome stability. © The Author(s) 2014. Published by Oxford University Press.
Verbs in the lexicon: Why is hitting easier than breaking?
McKoon, Gail; Love, Jessica
2011-11-01
Adult speakers use verbs in syntactically appropriate ways. For example, they know implicitly that the boy hit at the fence is acceptable but the boy broke at the fence is not. We suggest that this knowledge is lexically encoded in semantic decompositions. The decomposition for break verbs (e.g. crack, smash) is hypothesized to be more complex than that for hit verbs (e.g. kick, kiss). Specifically, the decomposition of a break verb denotes that "an entity changes state as the result of some external force" whereas the decomposition for a hit verb denotes only that "an entity potentially comes in contact with another entity." In this article, verbs of the two types were compared in a lexical decision experiment - Experiment 1 - and they were compared in sentence comprehension experiments with transitive sentences (e.g. the car hit the bicycle and the car broke the bicycle) - Experiments 2 and 3. In Experiment 1, processing times were shorter for the hit than the break verbs and in Experiments 2 and 3, processing times were shorter for the hit sentences than the break sentences, results that are in accord with the complexities of the postulated semantic decompositions.
Kloosterman, Wigard P; Tavakoli-Yaraki, Masoumeh; van Roosmalen, Markus J; van Binsbergen, Ellen; Renkens, Ivo; Duran, Karen; Ballarati, Lucia; Vergult, Sarah; Giardino, Daniela; Hansson, Kerstin; Ruivenkamp, Claudia A L; Jager, Myrthe; van Haeringen, Arie; Ippel, Elly F; Haaf, Thomas; Passarge, Eberhard; Hochstenbach, Ron; Menten, Björn; Larizza, Lidia; Guryev, Victor; Poot, Martin; Cuppen, Edwin
2012-06-28
Chromothripsis represents a novel phenomenon in the structural variation landscape of cancer genomes. Here, we analyze the genomes of ten patients with congenital disease who were preselected to carry complex chromosomal rearrangements with more than two breakpoints. The rearrangements displayed unanticipated complexity resembling chromothripsis. We find that eight of them contain hallmarks of multiple clustered double-stranded DNA breaks (DSBs) on one or more chromosomes. In addition, nucleotide resolution analysis of 98 breakpoint junctions indicates that break repair involves nonhomologous or microhomology-mediated end joining. We observed that these eight rearrangements are balanced or contain sporadic deletions ranging in size between a few hundred base pairs and several megabases. The two remaining complex rearrangements did not display signs of DSBs and contain duplications, indicative of rearrangement processes involving template switching. Our work provides detailed insight into the characteristics of chromothripsis and supports a role for clustered DSBs driving some constitutional chromothripsis rearrangements. Copyright © 2012 The Authors. Published by Elsevier Inc. All rights reserved.
A study of speech emotion recognition based on hybrid algorithm
NASA Astrophysics Data System (ADS)
Zhu, Ju-xia; Zhang, Chao; Lv, Zhao; Rao, Yao-quan; Wu, Xiao-pei
2011-10-01
To effectively improve the recognition accuracy of the speech emotion recognition system, a hybrid algorithm which combines Continuous Hidden Markov Model (CHMM), All-Class-in-One Neural Network (ACON) and Support Vector Machine (SVM) is proposed. In SVM and ACON methods, some global statistics are used as emotional features, while in CHMM method, instantaneous features are employed. The recognition rate by the proposed method is 92.25%, with the rejection rate to be 0.78%. Furthermore, it obtains the relative increasing of 8.53%, 4.69% and 0.78% compared with ACON, CHMM and SVM methods respectively. The experiment result confirms the efficiency of distinguishing anger, happiness, neutral and sadness emotional states.
Testing of the Support Vector Machine for Binary-Class Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew
2011-01-01
The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results
A Two-Layer Least Squares Support Vector Machine Approach to Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Liu, Jingli; Li, Jianping; Xu, Weixuan; Shi, Yong
Least squares support vector machine (LS-SVM) is a revised version of support vector machine (SVM) and has been proved to be a useful tool for pattern recognition. LS-SVM had excellent generalization performance and low computational cost. In this paper, we propose a new method called two-layer least squares support vector machine which combines kernel principle component analysis (KPCA) and linear programming form of least square support vector machine. With this method sparseness and robustness is obtained while solving large dimensional and large scale database. A U.S. commercial credit card database is used to test the efficiency of our method and the result proved to be a satisfactory one.
Liao, Quan; Yao, Jianhua; Yuan, Shengang
2007-05-01
The study of prediction of toxicity is very important and necessary because measurement of toxicity is typically time-consuming and expensive. In this paper, Recursive Partitioning (RP) method was used to select descriptors. RP and Support Vector Machines (SVM) were used to construct structure-toxicity relationship models, RP model and SVM model, respectively. The performances of the two models are different. The prediction accuracies of the RP model are 80.2% for mutagenic compounds in MDL's toxicity database, 83.4% for compounds in CMC and 84.9% for agrochemicals in in-house database respectively. Those of SVM model are 81.4%, 87.0% and 87.3% respectively.
Data mining for the analysis of hippocampal zones in Alzheimer's disease
NASA Astrophysics Data System (ADS)
Ovando Vázquez, Cesaré M.
2012-02-01
In this work, a methodology to classify people with Alzheimer's Disease (AD), Healthy Controls (HC) and people with Mild Cognitive Impairment (MCI) is presented. This methodology consists of an ensemble of Support Vector Machines (SVM) with the hippocampal boxes (HB) as input data, these hippocampal zones are taken from Magnetic Resonance (MRI) and Positron Emission Tomography (PET) images. Two ways of constructing this ensemble are presented, the first consists of linear SVM models and the second of non-linear SVM models. Results demonstrate that the linear models classify HBs more accurately than the non-linear models between HC and MCI and that there are no differences between HC and AD.
Applications of Support Vector Machine (SVM) Learning in Cancer Genomics
HUANG, SHUJUN; CAI, NIANGUANG; PACHECO, PEDRO PENZUTI; NARANDES, SHAVIRA; WANG, YANG; XU, WAYNE
2017-01-01
Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. PMID:29275361
Prediction on sunspot activity based on fuzzy information granulation and support vector machine
NASA Astrophysics Data System (ADS)
Peng, Lingling; Yan, Haisheng; Yang, Zhigang
2018-04-01
In order to analyze the range of sunspots, a combined prediction method of forecasting the fluctuation range of sunspots based on fuzzy information granulation (FIG) and support vector machine (SVM) was put forward. Firstly, employing the FIG to granulate sample data and extract va)alid information of each window, namely the minimum value, the general average value and the maximum value of each window. Secondly, forecasting model is built respectively with SVM and then cross method is used to optimize these parameters. Finally, the fluctuation range of sunspots is forecasted with the optimized SVM model. Case study demonstrates that the model have high accuracy and can effectively predict the fluctuation of sunspots.
Lahmiri, Salim; Gargour, Christian S; Gabrea, Marcel
2014-10-01
An automated diagnosis system that uses complex continuous wavelet transform (CWT) to process retina digital images and support vector machines (SVMs) for classification purposes is presented. In particular, each retina image is transformed into two one-dimensional signals by concatenating image rows and columns separately. The mathematical norm of phase angles found in each one-dimensional signal at each level of CWT decomposition are relied on to characterise the texture of normal images against abnormal images affected by exudates, drusen and microaneurysms. The leave-one-out cross-validation method was adopted to conduct experiments and the results from the SVM show that the proposed approach gives better results than those obtained by other methods based on the correct classification rate, sensitivity and specificity.
Visualization of complex DNA double-strand breaks in a tumor treated with carbon ion radiotherapy
Oike, Takahiro; Niimi, Atsuko; Okonogi, Noriyuki; Murata, Kazutoshi; Matsumura, Akihiko; Noda, Shin-Ei; Kobayashi, Daijiro; Iwanaga, Mototaro; Tsuchida, Keisuke; Kanai, Tatsuaki; Ohno, Tatsuya; Shibata, Atsushi; Nakano, Takashi
2016-01-01
Carbon ion radiotherapy shows great potential as a cure for X-ray-resistant tumors. Basic research suggests that the strong cell-killing effect induced by carbon ions is based on their ability to cause complex DNA double-strand breaks (DSBs). However, evidence supporting the formation of complex DSBs in actual patients is lacking. Here, we used advanced high-resolution microscopy with deconvolution to show that complex DSBs are formed in a human tumor clinically treated with carbon ion radiotherapy, but not in a tumor treated with X-ray radiotherapy. Furthermore, analysis using a physics model suggested that the complexity of radiotherapy-induced DSBs is related to linear energy transfer, which is much higher for carbon ion beams than for X-rays. Visualization of complex DSBs in clinical specimens will help us to understand the anti-tumor effects of carbon ion radiotherapy. PMID:26925533
NASA Astrophysics Data System (ADS)
Wong, Pak-kin; Vong, Chi-man; Wong, Hang-cheong; Li, Ke
2010-05-01
Modern automotive spark-ignition (SI) power performance usually refers to output power and torque, and they are significantly affected by the setup of control parameters in the engine management system (EMS). EMS calibration is done empirically through tests on the dynamometer (dyno) because no exact mathematical engine model is yet available. With an emerging nonlinear function estimation technique of Least squares support vector machines (LS-SVM), the approximate power performance model of a SI engine can be determined by training the sample data acquired from the dyno. A novel incremental algorithm based on typical LS-SVM is also proposed in this paper, so the power performance models built from the incremental LS-SVM can be updated whenever new training data arrives. With updating the models, the model accuracies can be continuously increased. The predicted results using the estimated models from the incremental LS-SVM are good agreement with the actual test results and with the almost same average accuracy of retraining the models from scratch, but the incremental algorithm can significantly shorten the model construction time when new training data arrives.
NASA Astrophysics Data System (ADS)
Xian, Guangming
2018-03-01
A method for predicting the optimal vibration field parameters by least square support vector machine (LS-SVM) is presented in this paper. One convenient and commonly used technique for characterizing the the vibration flow field of polymer melts films is small angle light scattering (SALS) in a visualized slit die of the electromagnetism dynamic extruder. The optimal value of vibration vibration frequency, vibration amplitude, and the maximum light intensity projection area can be obtained by using LS-SVM for prediction. For illustrating this method and show its validity, the flowing material is used with polypropylene (PP) and fifteen samples are tested at the rotation speed of screw at 36rpm. This paper first describes the apparatus of SALS to perform the experiments, then gives the theoretical basis of this new method, and detail the experimental results for parameter prediction of vibration flow field. It is demonstrated that it is possible to use the method of SALS and obtain detailed information on optimal parameter of vibration flow field of PP melts by LS-SVM.
Ghafouri, Hamidreza; Ranjbar, Mohsen; Sakhteman, Amirhossein
2017-08-01
A great challenge in medicinal chemistry is to develop different methods for structural design based on the pattern of the previously synthesized compounds. In this study two different QSAR methods were established and compared for a series of piperidine acetylcholinesterase inhibitors. In one novel approach, PC-LS-SVM and PLS-LS-SVM was used for modeling 3D interaction descriptors, and in the other method the same nonlinear techniques were used to build QSAR equations based on field descriptors. Different validation methods were used to evaluate the models and the results revealed the more applicability and predictive ability of the model generated by field descriptors (Q 2 LOO-CV =1, R 2 ext =0.97). External validation criteria revealed that both methods can be used in generating reasonable QSAR models. It was concluded that due to ability of interaction descriptors in prediction of binding mode, using this approach can be implemented in future 3D-QSAR softwares. Copyright © 2017 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.
Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptidemore » identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample« less
Identification and classification of similar looking food grains
NASA Astrophysics Data System (ADS)
Anami, B. S.; Biradar, Sunanda D.; Savakar, D. G.; Kulkarni, P. V.
2013-01-01
This paper describes the comparative study of Artificial Neural Network (ANN) and Support Vector Machine (SVM) classifiers by taking a case study of identification and classification of four pairs of similar looking food grains namely, Finger Millet, Mustard, Soyabean, Pigeon Pea, Aniseed, Cumin-seeds, Split Greengram and Split Blackgram. Algorithms are developed to acquire and process color images of these grains samples. The developed algorithms are used to extract 18 colors-Hue Saturation Value (HSV), and 42 wavelet based texture features. Back Propagation Neural Network (BPNN)-based classifier is designed using three feature sets namely color - HSV, wavelet-texture and their combined model. SVM model for color- HSV model is designed for the same set of samples. The classification accuracies ranging from 93% to 96% for color-HSV, ranging from 78% to 94% for wavelet texture model and from 92% to 97% for combined model are obtained for ANN based models. The classification accuracy ranging from 80% to 90% is obtained for color-HSV based SVM model. Training time required for the SVM based model is substantially lesser than ANN for the same set of images.
Zhang, Li; Zhou, WeiDa
2013-12-01
This paper deals with fast methods for training a 1-norm support vector machine (SVM). First, we define a specific class of linear programming with many sparse constraints, i.e., row-column sparse constraint linear programming (RCSC-LP). In nature, the 1-norm SVM is a sort of RCSC-LP. In order to construct subproblems for RCSC-LP and solve them, a family of row-column generation (RCG) methods is introduced. RCG methods belong to a category of decomposition techniques, and perform row and column generations in a parallel fashion. Specially, for the 1-norm SVM, the maximum size of subproblems of RCG is identical with the number of Support Vectors (SVs). We also introduce a semi-deleting rule for RCG methods and prove the convergence of RCG methods when using the semi-deleting rule. Experimental results on toy data and real-world datasets illustrate that it is efficient to use RCG to train the 1-norm SVM, especially in the case of small SVs. Copyright © 2013 Elsevier Ltd. All rights reserved.
[New method of mixed gas infrared spectrum analysis based on SVM].
Bai, Peng; Xie, Wen-Jun; Liu, Jun-Hua
2007-07-01
A new method of infrared spectrum analysis based on support vector machine (SVM) for mixture gas was proposed. The kernel function in SVM was used to map the seriously overlapping absorption spectrum into high-dimensional space, and after transformation, the high-dimensional data could be processed in the original space, so the regression calibration model was established, then the regression calibration model with was applied to analyze the concentration of component gas. Meanwhile it was proved that the regression calibration model with SVM also could be used for component recognition of mixture gas. The method was applied to the analysis of different data samples. Some factors such as scan interval, range of the wavelength, kernel function and penalty coefficient C that affect the model were discussed. Experimental results show that the component concentration maximal Mean AE is 0.132%, and the component recognition accuracy is higher than 94%. The problems of overlapping absorption spectrum, using the same method for qualitative and quantitative analysis, and limit number of training sample, were solved. The method could be used in other mixture gas infrared spectrum analyses, promising theoretic and application values.
Pedestrian detection in crowded scenes with the histogram of gradients principle
NASA Astrophysics Data System (ADS)
Sidla, O.; Rosner, M.; Lypetskyy, Y.
2006-10-01
This paper describes a close to real-time scale invariant implementation of a pedestrian detector system which is based on the Histogram of Oriented Gradients (HOG) principle. Salient HOG features are first selected from a manually created very large database of samples with an evolutionary optimization procedure that directly trains a polynomial Support Vector Machine (SVM). Real-time operation is achieved by a cascaded 2-step classifier which uses first a very fast linear SVM (with the same features as the polynomial SVM) to reject most of the irrelevant detections and then computes the decision function with a polynomial SVM on the remaining set of candidate detections. Scale invariance is achieved by running the detector of constant size on scaled versions of the original input images and by clustering the results over all resolutions. The pedestrian detection system has been implemented in two versions: i) fully body detection, and ii) upper body only detection. The latter is especially suited for very busy and crowded scenarios. On a state-of-the-art PC it is able to run at a frequency of 8 - 20 frames/sec.
NASA Astrophysics Data System (ADS)
Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher
2012-10-01
Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.
Hadamard Kernel SVM with applications for breast cancer outcome predictions.
Jiang, Hao; Ching, Wai-Ki; Cheung, Wai-Shun; Hou, Wenpin; Yin, Hong
2017-12-21
Breast cancer is one of the leading causes of deaths for women. It is of great necessity to develop effective methods for breast cancer detection and diagnosis. Recent studies have focused on gene-based signatures for outcome predictions. Kernel SVM for its discriminative power in dealing with small sample pattern recognition problems has attracted a lot attention. But how to select or construct an appropriate kernel for a specified problem still needs further investigation. Here we propose a novel kernel (Hadamard Kernel) in conjunction with Support Vector Machines (SVMs) to address the problem of breast cancer outcome prediction using gene expression data. Hadamard Kernel outperform the classical kernels and correlation kernel in terms of Area under the ROC Curve (AUC) values where a number of real-world data sets are adopted to test the performance of different methods. Hadamard Kernel SVM is effective for breast cancer predictions, either in terms of prognosis or diagnosis. It may benefit patients by guiding therapeutic options. Apart from that, it would be a valuable addition to the current SVM kernel families. We hope it will contribute to the wider biology and related communities.
Online image classification under monotonic decision boundary constraint
NASA Astrophysics Data System (ADS)
Lu, Cheng; Allebach, Jan; Wagner, Jerry; Pitta, Brandi; Larson, David; Guo, Yandong
2015-01-01
Image classification is a prerequisite for copy quality enhancement in all-in-one (AIO) device that comprises a printer and scanner, and which can be used to scan, copy and print. Different processing pipelines are provided in an AIO printer. Each of the processing pipelines is designed specifically for one type of input image to achieve the optimal output image quality. A typical approach to this problem is to apply Support Vector Machine to classify the input image and feed it to its corresponding processing pipeline. The online training SVM can help users to improve the performance of classification as input images accumulate. At the same time, we want to make quick decision on the input image to speed up the classification which means sometimes the AIO device does not need to scan the entire image to make a final decision. These two constraints, online SVM and quick decision, raise questions regarding: 1) what features are suitable for classification; 2) how we should control the decision boundary in online SVM training. This paper will discuss the compatibility of online SVM and quick decision capability.
Fault diagnosis method based on FFT-RPCA-SVM for Cascaded-Multilevel Inverter.
Wang, Tianzhen; Qi, Jie; Xu, Hao; Wang, Yide; Liu, Lei; Gao, Diju
2016-01-01
Thanks to reduced switch stress, high quality of load wave, easy packaging and good extensibility, the cascaded H-bridge multilevel inverter is widely used in wind power system. To guarantee stable operation of system, a new fault diagnosis method, based on Fast Fourier Transform (FFT), Relative Principle Component Analysis (RPCA) and Support Vector Machine (SVM), is proposed for H-bridge multilevel inverter. To avoid the influence of load variation on fault diagnosis, the output voltages of the inverter is chosen as the fault characteristic signals. To shorten the time of diagnosis and improve the diagnostic accuracy, the main features of the fault characteristic signals are extracted by FFT. To further reduce the training time of SVM, the feature vector is reduced based on RPCA that can get a lower dimensional feature space. The fault classifier is constructed via SVM. An experimental prototype of the inverter is built to test the proposed method. Compared to other fault diagnosis methods, the experimental results demonstrate the high accuracy and efficiency of the proposed method. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Lu, Wei-Zhen; Wang, Wen-Jian
2005-04-01
Monitoring and forecasting of air quality parameters are popular and important topics of atmospheric and environmental research today due to the health impact caused by exposing to air pollutants existing in urban air. The accurate models for air pollutant prediction are needed because such models would allow forecasting and diagnosing potential compliance or non-compliance in both short- and long-term aspects. Artificial neural networks (ANN) are regarded as reliable and cost-effective method to achieve such tasks and have produced some promising results to date. Although ANN has addressed more attentions to environmental researchers, its inherent drawbacks, e.g., local minima, over-fitting training, poor generalization performance, determination of the appropriate network architecture, etc., impede the practical application of ANN. Support vector machine (SVM), a novel type of learning machine based on statistical learning theory, can be used for regression and time series prediction and have been reported to perform well by some promising results. The work presented in this paper aims to examine the feasibility of applying SVM to predict air pollutant levels in advancing time series based on the monitored air pollutant database in Hong Kong downtown area. At the same time, the functional characteristics of SVM are investigated in the study. The experimental comparisons between the SVM model and the classical radial basis function (RBF) network demonstrate that the SVM is superior to the conventional RBF network in predicting air quality parameters with different time series and of better generalization performance than the RBF model.
Nonlinear Classification of AVO Attributes Using SVM
NASA Astrophysics Data System (ADS)
Zhao, B.; Zhou, H.
2005-05-01
A key research topic in reservoir characterization is the detection of the presence of fluids using seismic and well-log data. In particular, partial gas discrimination is very challenging because low and high gas saturation can result in similar anomalies in terms of Amplitude Variation with Offset (AVO), bright spot, and velocity sag. Hence, a successful fluid detection will require a good understanding of the seismic signatures of the fluids, high-quality data, and good detection methodology. Traditional attempts of partial gas discrimination employ the Neural Network algorithm. A new approach is to use the Support Vector Machine (SVM) (Vapnik, 1995; Liu and Sacchi, 2003). While the potential of the SVM has not been fully explored for reservoir fluid detection, the current nonlinear methods classify seismic attributes without the use of rock physics constraints. The objective of this study is to improve the capability of distinguishing a fizz-water reservoir from a commercial gas reservoir by developing a new detection method using AVO attributes and rock physics constraints. This study will first test the SVM classification with synthetic data, and then apply the algorithm to field data from the King-Kong and Lisa-Anne fields in Gulf of Mexico. While both field areas have high amplitude seismic anomalies, King-Kong field produces commercial gas but Lisa-Anne field does not. We expect that the new SVM-based nonlinear classification of AVO attributes may be able to separate commercial gas from fizz-water in these two fields.
NASA Astrophysics Data System (ADS)
Taha, Zahari; Muazu Musa, Rabiu; Majeed, A. P. P. Abdul; Razali Abdullah, Mohamad; Aizzat Zakaria, Muhammad; Muaz Alim, Muhammad; Arif Mat Jizat, Jessnor; Fauzi Ibrahim, Mohamad
2018-03-01
Support Vector Machine (SVM) has been revealed to be a powerful learning algorithm for classification and prediction. However, the use of SVM for prediction and classification in sport is at its inception. The present study classified and predicted high and low potential archers from a collection of psychological coping skills variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. Psychological coping skills inventory which evaluates the archers level of related coping skills were filled out by the archers prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models, i.e. linear and fine radial basis function (RBF) kernel functions, were trained on the psychological variables. The k-means clustered the archers into high psychologically prepared archers (HPPA) and low psychologically prepared archers (LPPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy and precision throughout the exercise with an accuracy of 92% and considerably fewer error rate for the prediction of the HPPA and the LPPA as compared to the fine RBF SVM. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected psychological coping skills variables examined which would consequently save time and energy during talent identification and development programme.
Improved Extreme Learning Machine based on the Sensitivity Analysis
NASA Astrophysics Data System (ADS)
Cui, Licheng; Zhai, Huawei; Wang, Benchao; Qu, Zengtang
2018-03-01
Extreme learning machine and its improved ones is weak in some points, such as computing complex, learning error and so on. After deeply analyzing, referencing the importance of hidden nodes in SVM, an novel analyzing method of the sensitivity is proposed which meets people’s cognitive habits. Based on these, an improved ELM is proposed, it could remove hidden nodes before meeting the learning error, and it can efficiently manage the number of hidden nodes, so as to improve the its performance. After comparing tests, it is better in learning time, accuracy and so on.
Exploiting three kinds of interface propensities to identify protein binding sites.
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2009-08-01
Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. In this study, we present a building block of proteins called order profiles to use the evolutionary information of the protein sequence frequency profiles and apply this building block to produce a class of propensities called order profile interface propensities. For comparisons, we revisit the usage of residue interface propensities and binary profile interface propensities for protein binding site prediction. Each kind of propensities combined with sequence profiles and accessible surface areas are inputted into SVM. When tested on four types of complexes (hetero-permanent complexes, hetero-transient complexes, homo-permanent complexes and homo-transient complexes), experimental results show that the order profile interface propensities are better than residue interface propensities and binary profile interface propensities. Therefore, order profile is a suitable profile-level building block of the protein sequences and can be widely used in many tasks of computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the protein remote homology detection.
Artan, Yusuf; Haider, Masoom A; Langer, Deanna L; van der Kwast, Theodorus H; Evans, Andrew J; Yang, Yongyi; Wernick, Miles N; Trachtenberg, John; Yetik, Imam Samil
2010-09-01
Prostate cancer is a leading cause of cancer death for men in the United States. Fortunately, the survival rate for early diagnosed patients is relatively high. Therefore, in vivo imaging plays an important role for the detection and treatment of the disease. Accurate prostate cancer localization with noninvasive imaging can be used to guide biopsy, radiotherapy, and surgery as well as to monitor disease progression. Magnetic resonance imaging (MRI) performed with an endorectal coil provides higher prostate cancer localization accuracy, when compared to transrectal ultrasound (TRUS). However, in general, a single type of MRI is not sufficient for reliable tumor localization. As an alternative, multispectral MRI, i.e., the use of multiple MRI-derived datasets, has emerged as a promising noninvasive imaging technique for the localization of prostate cancer; however almost all studies are with human readers. There is a significant inter and intraobserver variability for human readers, and it is substantially difficult for humans to analyze the large dataset of multispectral MRI. To solve these problems, this study presents an automated localization method using cost-sensitive support vector machines (SVMs) and shows that this method results in improved localization accuracy than classical SVM. Additionally, we develop a new segmentation method by combining conditional random fields (CRF) with a cost-sensitive framework and show that our method further improves cost-sensitive SVM results by incorporating spatial information. We test SVM, cost-sensitive SVM, and the proposed cost-sensitive CRF on multispectral MRI datasets acquired from 21 biopsy-confirmed cancer patients. Our results show that multispectral MRI helps to increase the accuracy of prostate cancer localization when compared to single MR images; and that using advanced methods such as cost-sensitive SVM as well as the proposed cost-sensitive CRF can boost the performance significantly when compared to SVM.
Automatic classification of seismic events within a regional seismograph network
NASA Astrophysics Data System (ADS)
Tiira, Timo; Kortström, Jari; Uski, Marja
2015-04-01
A fully automatic method for seismic event classification within a sparse regional seismograph network is presented. The tool is based on a supervised pattern recognition technique, Support Vector Machine (SVM), trained here to distinguish weak local earthquakes from a bulk of human-made or spurious seismic events. The classification rules rely on differences in signal energy distribution between natural and artificial seismic sources. Seismic records are divided into four windows, P, P coda, S, and S coda. For each signal window STA is computed in 20 narrow frequency bands between 1 and 41 Hz. The 80 discrimination parameters are used as a training data for the SVM. The SVM models are calculated for 19 on-line seismic stations in Finland. The event data are compiled mainly from fully automatic event solutions that are manually classified after automatic location process. The station-specific SVM training events include 11-302 positive (earthquake) and 227-1048 negative (non-earthquake) examples. The best voting rules for combining results from different stations are determined during an independent testing period. Finally, the network processing rules are applied to an independent evaluation period comprising 4681 fully automatic event determinations, of which 98 % have been manually identified as explosions or noise and 2 % as earthquakes. The SVM method correctly identifies 94 % of the non-earthquakes and all the earthquakes. The results imply that the SVM tool can identify and filter out blasts and spurious events from fully automatic event solutions with a high level of confidence. The tool helps to reduce work-load in manual seismic analysis by leaving only ~5 % of the automatic event determinations, i.e. the probable earthquakes for more detailed seismological analysis. The approach presented is easy to adjust to requirements of a denser or wider high-frequency network, once enough training examples for building a station-specific data set are available.
Deep neural mapping support vector machines.
Li, Yujian; Zhang, Ting
2017-09-01
The choice of kernel has an important effect on the performance of a support vector machine (SVM). The effect could be reduced by NEUROSVM, an architecture using multilayer perceptron for feature extraction and SVM for classification. In binary classification, a general linear kernel NEUROSVM can be theoretically simplified as an input layer, many hidden layers, and an SVM output layer. As a feature extractor, the sub-network composed of the input and hidden layers is first trained together with a virtual ordinary output layer by backpropagation, then with the output of its last hidden layer taken as input of the SVM classifier for further training separately. By taking the sub-network as a kernel mapping from the original input space into a feature space, we present a novel model, called deep neural mapping support vector machine (DNMSVM), from the viewpoint of deep learning. This model is also a new and general kernel learning method, where the kernel mapping is indeed an explicit function expressed as a sub-network, different from an implicit function induced by a kernel function traditionally. Moreover, we exploit a two-stage procedure of contrastive divergence learning and gradient descent for DNMSVM to jointly training an adaptive kernel mapping instead of a kernel function, without requirement of kernel tricks. As a whole of the sub-network and the SVM classifier, the joint training of DNMSVM is done by using gradient descent to optimize the objective function with the sub-network layer-wise pre-trained via contrastive divergence learning of restricted Boltzmann machines. Compared to the separate training of NEUROSVM, the joint training is a new algorithm for DNMSVM to have advantages over NEUROSVM. Experimental results show that DNMSVM can outperform NEUROSVM and RBFSVM (i.e., SVM with the kernel of radial basis function), demonstrating its effectiveness. Copyright © 2017 Elsevier Ltd. All rights reserved.
Predicting metabolic syndrome using decision tree and support vector machine methods.
Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh
2016-05-01
Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in predicting metabolic syndrome. According to this study, in cases where only the final result of the decision is regarded significant, SVM method can be used with acceptable accuracy in decision making medical issues. This method has not been implemented in the previous research.
Li, Yuanpeng; Li, Fucui; Yang, Xinhao; Guo, Liu; Huang, Furong; Chen, Zhenqiang; Chen, Xingdan; Zheng, Shifu
2018-08-05
A rapid quantitative analysis model for determining the glycated albumin (GA) content based on Attenuated total reflectance (ATR)-Fourier transform infrared spectroscopy (FTIR) combining with linear SiPLS and nonlinear SVM has been developed. Firstly, the real GA content in human serum was determined by GA enzymatic method, meanwhile, the ATR-FTIR spectra of serum samples from the population of health examination were obtained. The spectral data of the whole spectra mid-infrared region (4000-600 cm -1 ) and GA's characteristic region (1800-800 cm -1 ) were used as the research object of quantitative analysis. Secondly, several preprocessing steps including first derivative, second derivative, variable standardization and spectral normalization, were performed. Lastly, quantitative analysis regression models were established by using SiPLS and SVM respectively. The SiPLS modeling results are as follows: root mean square error of cross validation (RMSECV T ) = 0.523 g/L, calibration coefficient (R C ) = 0.937, Root Mean Square Error of Prediction (RMSEP T ) = 0.787 g/L, and prediction coefficient (R P ) = 0.938. The SVM modeling results are as follows: RMSECV T = 0.0048 g/L, R C = 0.998, RMSEP T = 0.442 g/L, and R p = 0.916. The results indicated that the model performance was improved significantly after preprocessing and optimization of characteristic regions. While modeling performance of nonlinear SVM was considerably better than that of linear SiPLS. Hence, the quantitative analysis model for GA in human serum based on ATR-FTIR combined with SiPLS and SVM is effective. And it does not need sample preprocessing while being characterized by simple operations and high time efficiency, providing a rapid and accurate method for GA content determination. Copyright © 2018 Elsevier B.V. All rights reserved.
Jiang, Rou; You, Rui; Pei, Xiao-Qing; Zou, Xiong; Zhang, Meng-Xia; Wang, Tong-Min; Sun, Rui; Luo, Dong-Hua; Huang, Pei-Yu; Chen, Qiu-Yan; Hua, Yi-Jun; Tang, Lin-Quan; Guo, Ling; Mo, Hao-Yuan; Qian, Chao-Nan; Mai, Hai-Qiang; Hong, Ming-Huang; Cai, Hong-Min; Chen, Ming-Yuan
2016-01-19
The aim of this study was to develop a prognostic classifier and subdivided the M1 stage for nasopharyngeal carcinoma patients with synchronous metastases (mNPC). A retrospective cohort of 347 mNPC patients was recruited between January 2000 and December 2010. Thirty hematological markers and 11 clinical characteristics were collected, and the association of these factors with overall survival (OS) was evaluated. Advanced machine learning schemes of a support vector machine (SVM) were used to select a subset of highly informative factors and to construct a prognostic model (mNPC-SVM). The mNPC-SVM classifier identified ten informative variables, including three clinical indexes and seven hematological markers. The median survival time for low-risk patients (M1a) as identified by the mNPC-SVM classifier was 38.0 months, and survival time was dramatically reduced to 13.8 months for high-risk patients (M1b) (P < 0.001). Multivariate adjustment using prognostic factors revealed that the mNPC-SVM classifier remained a powerful predictor of OS (M1a vs. M1b, hazard ratio, 3.45; 95% CI, 2.59 to 4.60, P < 0.001). Moreover, combination treatment of systemic chemotherapy and loco-regional radiotherapy was associated with significantly better survival outcomes than chemotherapy alone (the 5-year OS, 47.0% vs. 10.0%, P < 0.001) in the M1a subgroup but not in the M1b subgroup (12.0% vs. 3.0%, P = 0.101). These findings were validated by a separate cohort. In conclusion, the newly developed mNPC-SVM classifier led to more precise risk definitions that offer a promising subdivision of the M1 stage and individualized selection for future therapeutic regimens in mNPC patients.
Kianmehr, Keivan; Alhajj, Reda
2008-09-01
In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.
Breaking the Silence (Teaching and Learning about Cultural Diversity).
ERIC Educational Resources Information Center
Miller, Howard M., Ed.
1997-01-01
Discusses the policy of silence (and its complex reasons) that often rules when it comes to teaching and learning about race, religion, ethnicity, and sexuality. Discusses briefly five books and articles that deal with breaking this silence, and offers observations about effective multiculturalism in the classroom. (SR)
NASA Astrophysics Data System (ADS)
Yadav, Basant; Ch, Sudheer; Mathur, Shashi; Adamowski, Jan
2016-12-01
In-situ bioremediation is the most common groundwater remediation procedure used for treating organically contaminated sites. A simulation-optimization approach, which incorporates a simulation model for groundwaterflow and transport processes within an optimization program, could help engineers in designing a remediation system that best satisfies management objectives as well as regulatory constraints. In-situ bioremediation is a highly complex, non-linear process and the modelling of such a complex system requires significant computational exertion. Soft computing techniques have a flexible mathematical structure which can generalize complex nonlinear processes. In in-situ bioremediation management, a physically-based model is used for the simulation and the simulated data is utilized by the optimization model to optimize the remediation cost. The recalling of simulator to satisfy the constraints is an extremely tedious and time consuming process and thus there is need for a simulator which can reduce the computational burden. This study presents a simulation-optimization approach to achieve an accurate and cost effective in-situ bioremediation system design for groundwater contaminated with BTEX (Benzene, Toluene, Ethylbenzene, and Xylenes) compounds. In this study, the Extreme Learning Machine (ELM) is used as a proxy simulator to replace BIOPLUME III for the simulation. The selection of ELM is done by a comparative analysis with Artificial Neural Network (ANN) and Support Vector Machine (SVM) as they were successfully used in previous studies of in-situ bioremediation system design. Further, a single-objective optimization problem is solved by a coupled Extreme Learning Machine (ELM)-Particle Swarm Optimization (PSO) technique to achieve the minimum cost for the in-situ bioremediation system design. The results indicate that ELM is a faster and more accurate proxy simulator than ANN and SVM. The total cost obtained by the ELM-PSO approach is held to a minimum while successfully satisfying all the regulatory constraints of the contaminated site.
Classification of cardiovascular tissues using LBP based descriptors and a cascade SVM.
Mazo, Claudia; Alegre, Enrique; Trujillo, Maria
2017-08-01
Histological images have characteristics, such as texture, shape, colour and spatial structure, that permit the differentiation of each fundamental tissue and organ. Texture is one of the most discriminative features. The automatic classification of tissues and organs based on histology images is an open problem, due to the lack of automatic solutions when treating tissues without pathologies. In this paper, we demonstrate that it is possible to automatically classify cardiovascular tissues using texture information and Support Vector Machines (SVM). Additionally, we realised that it is feasible to recognise several cardiovascular organs following the same process. The texture of histological images was described using Local Binary Patterns (LBP), LBP Rotation Invariant (LBPri), Haralick features and different concatenations between them, representing in this way its content. Using a SVM with linear kernel, we selected the more appropriate descriptor that, for this problem, was a concatenation of LBP and LBPri. Due to the small number of the images available, we could not follow an approach based on deep learning, but we selected the classifier who yielded the higher performance by comparing SVM with Random Forest and Linear Discriminant Analysis. Once SVM was selected as the classifier with a higher area under the curve that represents both higher recall and precision, we tuned it evaluating different kernels, finding that a linear SVM allowed us to accurately separate four classes of tissues: (i) cardiac muscle of the heart, (ii) smooth muscle of the muscular artery, (iii) loose connective tissue, and (iv) smooth muscle of the large vein and the elastic artery. The experimental validation was conducted using 3000 blocks of 100 × 100 sized pixels, with 600 blocks per class and the classification was assessed using a 10-fold cross-validation. using LBP as the descriptor, concatenated with LBPri and a SVM with linear kernel, the main four classes of tissues were recognised with an AUC higher than 0.98. A polynomial kernel was then used to separate the elastic artery and vein, yielding an AUC in both cases superior to 0.98. Following the proposed approach, it is possible to separate with very high precision (AUC greater than 0.98) the fundamental tissues of the cardiovascular system along with some organs, such as the heart, arteries and veins. Copyright © 2017 Elsevier B.V. All rights reserved.
Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.
Tuo, Youlin; An, Ning; Zhang, Ming
2018-03-01
The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.
NASA Astrophysics Data System (ADS)
Shao, Yongni; Xie, Chuanqi; Jiang, Linjun; Shi, Jiahui; Zhu, Jiajin; He, Yong
2015-04-01
Visible/near infrared spectroscopy (Vis/NIR) based on sensitive wavelengths (SWs) and chemometrics was proposed to discriminate different tomatoes bred by spaceflight mutagenesis from their leafs or fruits (green or mature). The tomato breeds were mutant M1, M2 and their parent. Partial least squares (PLS) analysis and least squares-support vector machine (LS-SVM) were implemented for calibration models. PLS analysis was implemented for calibration models with different wavebands including the visible region (400-700 nm) and the near infrared region (700-1000 nm). The best PLS models were achieved in the visible region for the leaf and green fruit samples and in the near infrared region for the mature fruit samples. Furthermore, different latent variables (4-8 LVs for leafs, 5-9 LVs for green fruits, and 4-9 LVs for mature fruits) were used as inputs of LS-SVM to develop the LV-LS-SVM models with the grid search technique and radial basis function (RBF) kernel. The optimal LV-LS-SVM models were achieved with six LVs for the leaf samples, seven LVs for green fruits, and six LVs for mature fruits, respectively, and they outperformed the PLS models. Moreover, independent component analysis (ICA) was executed to select several SWs based on loading weights. The optimal LS-SVM model was achieved with SWs of 550-560 nm, 562-574 nm, 670-680 nm and 705-715 nm for the leaf samples; 548-556 nm, 559-564 nm, 678-685 nm and 962-974 nm for the green fruit samples; and 712-718 nm, 720-729 nm, 968-978 nm and 820-830 nm for the mature fruit samples. All of them had better performance than PLS and LV-LS-SVM, with the parameters of correlation coefficient (rp), root mean square error of prediction (RMSEP) and bias of 0.9792, 0.2632 and 0.0901 based on leaf discrimination, 0.9837, 0.2783 and 0.1758 based on green fruit discrimination, 0.9804, 0.2215 and -0.0035 based on mature fruit discrimination, respectively. The overall results indicated that ICA was an effective way for the selection of SWs, and the Vis/NIR combined with LS-SVM models had the capability to predict the different breeds (mutant M1, mutant M2 and their parent) of tomatoes from leafs and fruits.
Spiking Neurons for Analysis of Patterns
NASA Technical Reports Server (NTRS)
Huntsberger, Terrance
2008-01-01
Artificial neural networks comprising spiking neurons of a novel type have been conceived as improved pattern-analysis and pattern-recognition computational systems. These neurons are represented by a mathematical model denoted the state-variable model (SVM), which among other things, exploits a computational parallelism inherent in spiking-neuron geometry. Networks of SVM neurons offer advantages of speed and computational efficiency, relative to traditional artificial neural networks. The SVM also overcomes some of the limitations of prior spiking-neuron models. There are numerous potential pattern-recognition, tracking, and data-reduction (data preprocessing) applications for these SVM neural networks on Earth and in exploration of remote planets. Spiking neurons imitate biological neurons more closely than do the neurons of traditional artificial neural networks. A spiking neuron includes a central cell body (soma) surrounded by a tree-like interconnection network (dendrites). Spiking neurons are so named because they generate trains of output pulses (spikes) in response to inputs received from sensors or from other neurons. They gain their speed advantage over traditional neural networks by using the timing of individual spikes for computation, whereas traditional artificial neurons use averages of activity levels over time. Moreover, spiking neurons use the delays inherent in dendritic processing in order to efficiently encode the information content of incoming signals. Because traditional artificial neurons fail to capture this encoding, they have less processing capability, and so it is necessary to use more gates when implementing traditional artificial neurons in electronic circuitry. Such higher-order functions as dynamic tasking are effected by use of pools (collections) of spiking neurons interconnected by spike-transmitting fibers. The SVM includes adaptive thresholds and submodels of transport of ions (in imitation of such transport in biological neurons). These features enable the neurons to adapt their responses to high-rate inputs from sensors, and to adapt their firing thresholds to mitigate noise or effects of potential sensor failure. The mathematical derivation of the SVM starts from a prior model, known in the art as the point soma model, which captures all of the salient properties of neuronal response while keeping the computational cost low. The point-soma latency time is modified to be an exponentially decaying function of the strength of the applied potential. Choosing computational efficiency over biological fidelity, the dendrites surrounding a neuron are represented by simplified compartmental submodels and there are no dendritic spines. Updates to the dendritic potential, calcium-ion concentrations and conductances, and potassium-ion conductances are done by use of equations similar to those of the point soma. Diffusion processes in dendrites are modeled by averaging among nearest-neighbor compartments. Inputs to each of the dendritic compartments come from sensors. Alternatively or in addition, when an affected neuron is part of a pool, inputs can come from other spiking neurons. At present, SVM neural networks are implemented by computational simulation, using algorithms that encode the SVM and its submodels. However, it should be possible to implement these neural networks in hardware: The differential equations for the dendritic and cellular processes in the SVM model of spiking neurons map to equivalent circuits that can be implemented directly in analog very-large-scale integrated (VLSI) circuits.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuck-Muller, C.M.; Li, Shibo; Chen, H.
Intrachromosomal rearrangements usually result from three or fewer breaks. We report a complex intrachromosomal rearrangement resulting from five breaks in one chromosome 10 of a phenotypically normal father of two developmentally delayed children. GTG-banding analysis of the father`s rearranged chromosome 10 suggested an initial pericentric inversion followed by an insertion from the short arm into the terminal band of the long arm. To our knowledge, this rearrangement is the most complex ever reported in a single chromosome. Both children inherited a recombinant chromosome 10 with loss of the insertion and the segment distal to it. Mechanisms for both rearrangements aremore » proposed. 7 refs., 2 figs.« less
USDA-ARS?s Scientific Manuscript database
It is important to find an appropriate pattern-recognition method for in-field plant identification based on spectral measurement in order to classify the crop and weeds accurately. In this study, the method of Support Vector Machine (SVM) was evaluated and compared with two other methods, Decision ...
NASA Astrophysics Data System (ADS)
Pohling, Christoph; Bocklitz, Thomas; Duarte, Alex S.; Emmanuello, Cinzia; Ishikawa, Mariana S.; Dietzeck, Benjamin; Buckup, Tiago; Uckermann, Ortrud; Schackert, Gabriele; Kirsch, Matthias; Schmitt, Michael; Popp, Jürgen; Motzkus, Marcus
2017-06-01
Multiplex coherent anti-Stokes Raman scattering (MCARS) microscopy was carried out to map a solid tumor in mouse brain tissue. The border between normal and tumor tissue was visualized using support vector machines (SVM) as a higher ranking type of data classification. Training data were collected separately in both tissue types, and the image contrast is based on class affiliation of the single spectra. Color coding in the image generated by SVM is then related to pathological information instead of single spectral intensities or spectral differences within the data set. The results show good agreement with the H&E stained reference and spontaneous Raman microscopy, proving the validity of the MCARS approach in combination with SVM.
Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.
Huang, Shujun; Cai, Nianguang; Pacheco, Pedro Penzuti; Narrandes, Shavira; Wang, Yang; Xu, Wayne
2018-01-01
Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. Copyright© 2018, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
LBP and SIFT based facial expression recognition
NASA Astrophysics Data System (ADS)
Sumer, Omer; Gunes, Ece O.
2015-02-01
This study compares the performance of local binary patterns (LBP) and scale invariant feature transform (SIFT) with support vector machines (SVM) in automatic classification of discrete facial expressions. Facial expression recognition is a multiclass classification problem and seven classes; happiness, anger, sadness, disgust, surprise, fear and comtempt are classified. Using SIFT feature vectors and linear SVM, 93.1% mean accuracy is acquired on CK+ database. On the other hand, the performance of LBP-based classifier with linear SVM is reported on SFEW using strictly person independent (SPI) protocol. Seven-class mean accuracy on SFEW is 59.76%. Experiments on both databases showed that LBP features can be used in a fairly descriptive way if a good localization of facial points and partitioning strategy are followed.
Power line identification of millimeter wave radar based on PCA-GS-SVM
NASA Astrophysics Data System (ADS)
Fang, Fang; Zhang, Guifeng; Cheng, Yansheng
2017-12-01
Aiming at the problem that the existing detection method can not effectively solve the security of UAV's ultra low altitude flight caused by power line, a power line recognition method based on grid search (GS) and the principal component analysis and support vector machine (PCA-SVM) is proposed. Firstly, the candidate line of Hough transform is reduced by PCA, and the main feature of candidate line is extracted. Then, upport vector machine (SVM is) optimized by grid search method (GS). Finally, using support vector machine classifier optimized parameters to classify the candidate line. MATLAB simulation results show that this method can effectively identify the power line and noise, and has high recognition accuracy and algorithm efficiency.
NASA Astrophysics Data System (ADS)
Zhang, Yanjiao; Lai, Xiaoping; Zeng, Qiuyao; Li, Linfang; Lin, Lin; Li, Shaoxin; Liu, Zhiming; Su, Chengkang; Qi, Minni; Guo, Zhouyi
2018-03-01
This study aims to classify low-grade and high-grade bladder cancer (BC) patients using serum surface-enhanced Raman scattering (SERS) spectra and support vector machine (SVM) algorithms. Serum SERS spectra are acquired from 88 serum samples with silver nanoparticles as the SERS-active substrate. Diagnostic accuracies of 96.4% and 95.4% are obtained when differentiating the serum SERS spectra of all BC patients versus normal subjects and low-grade versus high-grade BC patients, respectively, with optimal SVM classifier models. This study demonstrates that the serum SERS technique combined with SVM has great potential to noninvasively detect and classify high-grade and low-grade BC patients.
Training the max-margin sequence model with the relaxed slack variables.
Niu, Lingfeng; Wu, Jianmin; Shi, Yong
2012-09-01
Sequence models are widely used in many applications such as natural language processing, information extraction and optical character recognition, etc. We propose a new approach to train the max-margin based sequence model by relaxing the slack variables in this paper. With the canonical feature mapping definition, the relaxed problem is solved by training a multiclass Support Vector Machine (SVM). Compared with the state-of-the-art solutions for the sequence learning, the new method has the following advantages: firstly, the sequence training problem is transformed into a multiclassification problem, which is more widely studied and already has quite a few off-the-shelf training packages; secondly, this new approach reduces the complexity of training significantly and achieves comparable prediction performance compared with the existing sequence models; thirdly, when the size of training data is limited, by assigning different slack variables to different microlabel pairs, the new method can use the discriminative information more frugally and produces more reliable model; last but not least, by employing kernels in the intermediate multiclass SVM, nonlinear feature space can be easily explored. Experimental results on the task of named entity recognition, information extraction and handwritten letter recognition with the public datasets illustrate the efficiency and effectiveness of our method. Copyright © 2012 Elsevier Ltd. All rights reserved.
Ren, Ji-Xia; Li, Lin-Li; Zheng, Ren-Lin; Xie, Huan-Zhang; Cao, Zhi-Xing; Feng, Shan; Pan, You-Li; Chen, Xin; Wei, Yu-Quan; Yang, Sheng-Yong
2011-06-27
In this investigation, we describe the discovery of novel potent Pim-1 inhibitors by employing a proposed hierarchical multistage virtual screening (VS) approach, which is based on support vector machine-based (SVM-based VS or SB-VS), pharmacophore-based VS (PB-VS), and docking-based VS (DB-VS) methods. In this approach, the three VS methods are applied in an increasing order of complexity so that the first filter (SB-VS) is fast and simple, while successive ones (PB-VS and DB-VS) are more time-consuming but are applied only to a small subset of the entire database. Evaluation of this approach indicates that it can be used to screen a large chemical library rapidly with a high hit rate and a high enrichment factor. This approach was then applied to screen several large chemical libraries, including PubChem, Specs, and Enamine as well as an in-house database. From the final hits, 47 compounds were selected for further in vitro Pim-1 inhibitory assay, and 15 compounds show nanomolar level or low micromolar inhibition potency against Pim-1. In particular, four of them were found to have new scaffolds which have potential for the chemical development of Pim-1 inhibitors.
Aksu, Yaman; Miller, David J; Kesidis, George; Yang, Qing X
2010-05-01
Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature "markers." For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both improved margin and improved generalization, compared with RFE. Moreover, for the case of a nonlinear kernel, we show that RFE assumes that the squared weight vector 2-norm is strictly decreasing as features are eliminated. We demonstrate this is not true for the Gaussian kernel and, consequently, RFE may give poor results in this case. MFE for nonlinear kernels gives better margin and generalization. We also present an extension which achieves further margin gains, by optimizing only two degrees of freedom--the hyperplane's intercept and its squared 2-norm--with the weight vector orientation fixed. We finally introduce an extension that allows margin slackness. We compare against several alternatives, including RFE and a linear programming method that embeds feature selection within the classifier design. On high-dimensional gene microarray data sets, University of California at Irvine (UCI) repository data sets, and Alzheimer's disease brain image data, MFE methods give promising results.
NASA Astrophysics Data System (ADS)
Paul, Subir; Nagesh Kumar, D.
2018-04-01
Hyperspectral (HS) data comprises of continuous spectral responses of hundreds of narrow spectral bands with very fine spectral resolution or bandwidth, which offer feature identification and classification with high accuracy. In the present study, Mutual Information (MI) based Segmented Stacked Autoencoder (S-SAE) approach for spectral-spatial classification of the HS data is proposed to reduce the complexity and computational time compared to Stacked Autoencoder (SAE) based feature extraction. A non-parametric dependency measure (MI) based spectral segmentation is proposed instead of linear and parametric dependency measure to take care of both linear and nonlinear inter-band dependency for spectral segmentation of the HS bands. Then morphological profiles are created corresponding to segmented spectral features to assimilate the spatial information in the spectral-spatial classification approach. Two non-parametric classifiers, Support Vector Machine (SVM) with Gaussian kernel and Random Forest (RF) are used for classification of the three most popularly used HS datasets. Results of the numerical experiments carried out in this study have shown that SVM with a Gaussian kernel is providing better results for the Pavia University and Botswana datasets whereas RF is performing better for Indian Pines dataset. The experiments performed with the proposed methodology provide encouraging results compared to numerous existing approaches.
Zeng, Qinghui; Liu, Yi; Zhao, Hongtao; Sun, Mingdong; Li, Xuyong
2017-04-01
Inter-basin water transfer projects might cause complex hydro-chemical and biological variation in the receiving aquatic ecosystems. Whether machine learning models can be used to predict changes in phytoplankton community composition caused by water transfer projects have rarely been studied. In the present study, we used machine learning models to predict the total algal cell densities and changes in phytoplankton community composition in Miyun reservoir caused by the middle route of the South-to-North Water Transfer Project (SNWTP). The model performances of four machine learning models, including regression trees (RT), random forest (RF), support vector machine (SVM), and artificial neural network (ANN) were evaluated and the best model was selected for further prediction. The results showed that the predictive accuracies (Pearson's correlation coefficient) of the models were RF (0.974), ANN (0.951), SVM (0.860), and RT (0.817) in the training step and RF (0.806), ANN (0.734), SVM (0.730), and RT (0.692) in the testing step. Therefore, the RF model was the best method for estimating total algal cell densities. Furthermore, the predicted accuracies of the RF model for dominant phytoplankton phyla (Cyanophyta, Chlorophyta, and Bacillariophyta) in Miyun reservoir ranged from 0.824 to 0.869 in the testing step. The predicted proportions with water transfer of the different phytoplankton phyla ranged from -8.88% to 9.93%, and the predicted dominant phyla with water transfer in each season remained unchanged compared to the phytoplankton succession without water transfer. The results of the present study provide a useful tool for predicting the changes in phytoplankton community caused by water transfer. The method is transferrable to other locations via establishment of models with relevant data to a particular area. Our findings help better understanding the possible changes in aquatic ecosystems influenced by inter-basin water transfer. Copyright © 2017 Elsevier Ltd. All rights reserved.
Frenzel, Jochen; Gessner, Christian; Sandvoss, Torsten; Hammerschmidt, Stefan; Schellenberger, Wolfgang; Sack, Ulrich; Eschrich, Klaus; Wirtz, Hubert
2011-01-01
Background Peptide patterns of bronchoalveolar lavage fluid (BALF) were assumed to reflect the complex pathology of acute lung injury (ALI)/acute respiratory distress syndrome (ARDS) better than clinical and inflammatory parameters and may be superior for outcome prediction. Methodology/Principal Findings A training group of patients suffering from ALI/ARDS was compiled from equal numbers of survivors and nonsurvivors. Clinical history, ventilation parameters, Murray's lung injury severity score (Murray's LISS) and interleukins in BALF were gathered. In addition, samples of bronchoalveolar lavage fluid were analyzed by means of hydrophobic chromatography and MALDI-ToF mass spectrometry (MALDI-ToF MS). Receiver operating characteristic (ROC) analysis for each clinical and cytokine parameter revealed interleukin-6>interleukin-8>diabetes mellitus>Murray's LISS as the best outcome predictors. Outcome predicted on the basis of BALF levels of interleukin-6 resulted in 79.4% accuracy, 82.7% sensitivity and 76.1% specificity (area under the ROC curve, AUC, 0.853). Both clinical parameters and cytokines as well as peptide patterns determined by MALDI-ToF MS were analyzed by classification and regression tree (CART) analysis and support vector machine (SVM) algorithms. CART analysis including Murray's LISS, interleukin-6 and interleukin-8 in combination was correct in 78.0%. MALDI-ToF MS of BALF peptides did not reveal a single identifiable biomarker for ARDS. However, classification of patients was successfully achieved based on the entire peptide pattern analyzed using SVM. This method resulted in 90% accuracy, 93.3% sensitivity and 86.7% specificity following a 10-fold cross validation (AUC = 0.953). Subsequent validation of the optimized SVM algorithm with a test group of patients with unknown prognosis yielded 87.5% accuracy, 83.3% sensitivity and 90.0% specificity. Conclusions/Significance MALDI-ToF MS peptide patterns of BALF, evaluated by appropriate mathematical methods can be of value in predicting outcome in pneumonia induced ALI/ARDS. PMID:21991318
Frenzel, Jochen; Gessner, Christian; Sandvoss, Torsten; Hammerschmidt, Stefan; Schellenberger, Wolfgang; Sack, Ulrich; Eschrich, Klaus; Wirtz, Hubert
2011-01-01
Peptide patterns of bronchoalveolar lavage fluid (BALF) were assumed to reflect the complex pathology of acute lung injury (ALI)/acute respiratory distress syndrome (ARDS) better than clinical and inflammatory parameters and may be superior for outcome prediction. A training group of patients suffering from ALI/ARDS was compiled from equal numbers of survivors and nonsurvivors. Clinical history, ventilation parameters, Murray's lung injury severity score (Murray's LISS) and interleukins in BALF were gathered. In addition, samples of bronchoalveolar lavage fluid were analyzed by means of hydrophobic chromatography and MALDI-ToF mass spectrometry (MALDI-ToF MS). Receiver operating characteristic (ROC) analysis for each clinical and cytokine parameter revealed interleukin-6>interleukin-8>diabetes mellitus>Murray's LISS as the best outcome predictors. Outcome predicted on the basis of BALF levels of interleukin-6 resulted in 79.4% accuracy, 82.7% sensitivity and 76.1% specificity (area under the ROC curve, AUC, 0.853). Both clinical parameters and cytokines as well as peptide patterns determined by MALDI-ToF MS were analyzed by classification and regression tree (CART) analysis and support vector machine (SVM) algorithms. CART analysis including Murray's LISS, interleukin-6 and interleukin-8 in combination was correct in 78.0%. MALDI-ToF MS of BALF peptides did not reveal a single identifiable biomarker for ARDS. However, classification of patients was successfully achieved based on the entire peptide pattern analyzed using SVM. This method resulted in 90% accuracy, 93.3% sensitivity and 86.7% specificity following a 10-fold cross validation (AUC = 0.953). Subsequent validation of the optimized SVM algorithm with a test group of patients with unknown prognosis yielded 87.5% accuracy, 83.3% sensitivity and 90.0% specificity. MALDI-ToF MS peptide patterns of BALF, evaluated by appropriate mathematical methods can be of value in predicting outcome in pneumonia induced ALI/ARDS.
Karthick, P A; Ghosh, Diptasree Maitra; Ramakrishnan, S
2018-02-01
Surface electromyography (sEMG) based muscle fatigue research is widely preferred in sports science and occupational/rehabilitation studies due to its noninvasiveness. However, these signals are complex, multicomponent and highly nonstationary with large inter-subject variations, particularly during dynamic contractions. Hence, time-frequency based machine learning methodologies can improve the design of automated system for these signals. In this work, the analysis based on high-resolution time-frequency methods, namely, Stockwell transform (S-transform), B-distribution (BD) and extended modified B-distribution (EMBD) are proposed to differentiate the dynamic muscle nonfatigue and fatigue conditions. The nonfatigue and fatigue segments of sEMG signals recorded from the biceps brachii of 52 healthy volunteers are preprocessed and subjected to S-transform, BD and EMBD. Twelve features are extracted from each method and prominent features are selected using genetic algorithm (GA) and binary particle swarm optimization (BPSO). Five machine learning algorithms, namely, naïve Bayes, support vector machine (SVM) of polynomial and radial basis kernel, random forest and rotation forests are used for the classification. The results show that all the proposed time-frequency distributions (TFDs) are able to show the nonstationary variations of sEMG signals. Most of the features exhibit statistically significant difference in the muscle fatigue and nonfatigue conditions. The maximum number of features (66%) is reduced by GA and BPSO for EMBD and BD-TFD respectively. The combination of EMBD- polynomial kernel based SVM is found to be most accurate (91% accuracy) in classifying the conditions with the features selected using GA. The proposed methods are found to be capable of handling the nonstationary and multicomponent variations of sEMG signals recorded in dynamic fatiguing contractions. Particularly, the combination of EMBD- polynomial kernel based SVM could be used to detect the dynamic muscle fatigue conditions. Copyright © 2017 Elsevier B.V. All rights reserved.
Wong, Emily S. W.; Hardy, Margaret C.; Wood, David; Bailey, Timothy; King, Glenn F.
2013-01-01
Spider neurotoxins are commonly used as pharmacological tools and are a popular source of novel compounds with therapeutic and agrochemical potential. Since venom peptides are inherently toxic, the host spider must employ strategies to avoid adverse effects prior to venom use. It is partly for this reason that most spider toxins encode a protective proregion that upon enzymatic cleavage is excised from the mature peptide. In order to identify the mature toxin sequence directly from toxin transcripts, without resorting to protein sequencing, the propeptide cleavage site in the toxin precursor must be predicted bioinformatically. We evaluated different machine learning strategies (support vector machines, hidden Markov model and decision tree) and developed an algorithm (SpiderP) for prediction of propeptide cleavage sites in spider toxins. Our strategy uses a support vector machine (SVM) framework that combines both local and global sequence information. Our method is superior or comparable to current tools for prediction of propeptide sequences in spider toxins. Evaluation of the SVM method on an independent test set of known toxin sequences yielded 96% sensitivity and 100% specificity. Furthermore, we sequenced five novel peptides (not used to train the final predictor) from the venom of the Australian tarantula Selenotypus plumipes to test the accuracy of the predictor and found 80% sensitivity and 99.6% 8-mer specificity. Finally, we used the predictor together with homology information to predict and characterize seven groups of novel toxins from the deeply sequenced venom gland transcriptome of S. plumipes, which revealed structural complexity and innovations in the evolution of the toxins. The precursor prediction tool (SpiderP) is freely available on ArachnoServer (http://www.arachnoserver.org/spiderP.html), a web portal to a comprehensive relational database of spider toxins. All training data, test data, and scripts used are available from the SpiderP website. PMID:23894279
Xu, Jun-Feng; Kang, Qian; Ma, Xing-Yong; Pan, Yuan-Ming; Yang, Lang; Jin, Peng; Wang, Xin; Li, Chen-Guang; Chen, Xiao-Chen; Wu, Chao; Jiao, Shao-Zhuo; Sheng, Jian-Qiu
2018-01-01
Colonoscopy screening has been accepted broadly to evaluate the risk and incidence of colorectal cancer (CRC) during health examination in outpatients. However, the intrusiveness, complexity and discomfort of colonoscopy may limit its application and the compliance of patients. Thus, more reliable and convenient diagnostic methods are necessary for CRC screening. Genome instability, especially copy-number variation (CNV), is a hallmark of cancer and has been proved to have potential in clinical application. We determined the diagnostic potential of chromosomal CNV at the arm level by whole-genome sequencing of CRC plasma samples (n = 32) and healthy controls (n = 38). Arm level CNV was determined and the consistence of arm-level CNV between plasma and tissue was further analyzed. Two methods including regular z score and trained Support Vector Machine (SVM) classifier were applied for detection of colorectal cancer. In plasma samples of CRC patients, the most frequent deletions were detected on chromosomes 6, 8p, 14q and 1p, and the most frequent amplifications occurred on chromosome 19, 5, 2, 9p and 20p. These arm-level alterations detected in plasma were also observed in tumor tissues. We showed that the specificity of regular z score analysis for the detection of colorectal cancer was 86.8% (33/38), whereas its sensitivity was only 56.3% (18/32). Applying a trained SVM classifier (n = 40 in trained group) as the standard to detect colorectal cancer relevance ratio in the test samples (n = 30), a sensitivity of 91.7% (11/12) and a specificity 88.9% (16/18) were finally reached. Furthermore, all five early CRC patients in stages I and II were successfully detected. Trained SVM classifier based on arm-level CNVs can be used as a promising method to screen early-stage CRC. © 2018 The Author(s). Published by S. Karger AG, Basel.
NASA Astrophysics Data System (ADS)
Akhoondzadeh, M.
2013-04-01
In this paper, a number of classical and intelligent methods, including interquartile, autoregressive integrated moving average (ARIMA), artificial neural network (ANN) and support vector machine (SVM), have been proposed to quantify potential thermal anomalies around the time of the 11 August 2012 Varzeghan, Iran, earthquake (Mw = 6.4). The duration of the data set, which is comprised of Aqua-MODIS land surface temperature (LST) night-time snapshot images, is 62 days. In order to quantify variations of LST data obtained from satellite images, the air temperature (AT) data derived from the meteorological station close to the earthquake epicenter has been taken into account. For the models examined here, results indicate the following: (i) ARIMA models, which are the most widely used in the time series community for short-term forecasting, are quickly and easily implemented, and can efficiently act through linear solutions. (ii) A multilayer perceptron (MLP) feed-forward neural network can be a suitable non-parametric method to detect the anomalous changes of a non-linear time series such as variations of LST. (iii) Since SVMs are often used due to their many advantages for classification and regression tasks, it can be shown that, if the difference between the predicted value using the SVM method and the observed value exceeds the pre-defined threshold value, then the observed value could be regarded as an anomaly. (iv) ANN and SVM methods could be powerful tools in modeling complex phenomena such as earthquake precursor time series where we may not know what the underlying data generating process is. There is good agreement in the results obtained from the different methods for quantifying potential anomalies in a given LST time series. This paper indicates that the detection of the potential thermal anomalies derive credibility from the overall efficiencies and potentialities of the four integrated methods.
TIRR regulates 53BP1 by masking its histone methyl-lysine binding function.
Drané, Pascal; Brault, Marie-Eve; Cui, Gaofeng; Meghani, Khyati; Chaubey, Shweta; Detappe, Alexandre; Parnandi, Nishita; He, Yizhou; Zheng, Xiao-Feng; Botuyan, Maria Victoria; Kalousi, Alkmini; Yewdell, William T; Münch, Christian; Harper, J Wade; Chaudhuri, Jayanta; Soutoglou, Evi; Mer, Georges; Chowdhury, Dipanjan
2017-03-09
P53-binding protein 1 (53BP1) is a multi-functional double-strand break repair protein that is essential for class switch recombination in B lymphocytes and for sensitizing BRCA1-deficient tumours to poly-ADP-ribose polymerase-1 (PARP) inhibitors. Central to all 53BP1 activities is its recruitment to double-strand breaks via the interaction of the tandem Tudor domain with dimethylated lysine 20 of histone H4 (H4K20me2). Here we identify an uncharacterized protein, Tudor interacting repair regulator (TIRR), that directly binds the tandem Tudor domain and masks its H4K20me2 binding motif. Upon DNA damage, the protein kinase ataxia-telangiectasia mutated (ATM) phosphorylates 53BP1 and recruits RAP1-interacting factor 1 (RIF1) to dissociate the 53BP1-TIRR complex. However, overexpression of TIRR impedes 53BP1 function by blocking its localization to double-strand breaks. Depletion of TIRR destabilizes 53BP1 in the nuclear-soluble fraction and alters the double-strand break-induced protein complex centring 53BP1. These findings identify TIRR as a new factor that influences double-strand break repair using a unique mechanism of masking the histone methyl-lysine binding function of 53BP1.
[Should antiseptics be used for chronic wounds?].
Barrois, B
2001-02-01
Using antiseptics is common, it is adequate on safe skin, but no scientific study allow their use on break skin. Then, cicatrisation is a complex process with a physiological bacteriocycle. Usual antiseptics are responsible of fibroblasts destruction and only a short effect on bacteries. So, it is logical not to use antiseptics on break skin.
Seeber, Andrew; Hegnauer, Anna Maria; Hustedt, Nicole; Deshpande, Ishan; Poli, Jérôme; Eglinger, Jan; Pasero, Philippe; Gut, Heinz; Shinohara, Miki; Hopfner, Karl-Peter; Shimada, Kenji; Gasser, Susan M
2016-12-01
The Mre11-Rad50-Xrs2 (MRX) complex is related to SMC complexes that form rings capable of holding two distinct DNA strands together. MRX functions at stalled replication forks and double-strand breaks (DSBs). A mutation in the N-terminal OB fold of the 70 kDa subunit of yeast replication protein A, rfa1-t11, abrogates MRX recruitment to both types of DNA damage. The rfa1 mutation is functionally epistatic with loss of any of the MRX subunits for survival of replication fork stress or DSB recovery, although it does not compromise end-resection. High-resolution imaging shows that either the rfa1-t11 or the rad50Δ mutation lets stalled replication forks collapse and allows the separation not only of opposing ends but of sister chromatids at breaks. Given that cohesin loss does not provoke visible sister separation as long as the RPA-MRX contacts are intact, we conclude that MRX also serves as a structural linchpin holding sister chromatids together at breaks. Copyright © 2016 Elsevier Inc. All rights reserved.
Wang, Yong-Cui; Wang, Yong; Yang, Zhi-Xia; Deng, Nai-Yang
2011-06-20
Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism. In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew's correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships. Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.
Identification of type 2 diabetes-associated combination of SNPs using support vector machine.
Ban, Hyo-Jeong; Heo, Jee Yeon; Oh, Kyung-Soo; Park, Keun-Joon
2010-04-23
Type 2 diabetes mellitus (T2D), a metabolic disorder characterized by insulin resistance and relative insulin deficiency, is a complex disease of major public health importance. Its incidence is rapidly increasing in the developed countries. Complex diseases are caused by interactions between multiple genes and environmental factors. Most association studies aim to identify individual susceptibility single markers using a simple disease model. Recent studies are trying to estimate the effects of multiple genes and multi-locus in genome-wide association. However, estimating the effects of association is very difficult. We aim to assess the rules for classifying diseased and normal subjects by evaluating potential gene-gene interactions in the same or distinct biological pathways. We analyzed the importance of gene-gene interactions in T2D susceptibility by investigating 408 single nucleotide polymorphisms (SNPs) in 87 genes involved in major T2D-related pathways in 462 T2D patients and 456 healthy controls from the Korean cohort studies. We evaluated the support vector machine (SVM) method to differentiate between cases and controls using SNP information in a 10-fold cross-validation test. We achieved a 65.3% prediction rate with a combination of 14 SNPs in 12 genes by using the radial basis function (RBF)-kernel SVM. Similarly, we investigated subpopulation data sets of men and women and identified different SNP combinations with the prediction rates of 70.9% and 70.6%, respectively. As the high-throughput technology for genome-wide SNPs improves, it is likely that a much higher prediction rate with biologically more interesting combination of SNPs can be acquired by using this method. Support Vector Machine based feature selection method in this research found novel association between combinations of SNPs and T2D in a Korean population.
Computer-aided diagnosis of breast microcalcifications based on dual-tree complex wavelet transform.
Jian, Wushuai; Sun, Xueyan; Luo, Shuqian
2012-12-19
Digital mammography is the most reliable imaging modality for breast carcinoma diagnosis and breast micro-calcifications is regarded as one of the most important signs on imaging diagnosis. In this paper, a computer-aided diagnosis (CAD) system is presented for breast micro-calcifications based on dual-tree complex wavelet transform (DT-CWT) to facilitate radiologists like double reading. Firstly, 25 abnormal ROIs were extracted according to the center and diameter of the lesions manually and 25 normal ROIs were selected randomly. Then micro-calcifications were segmented by combining space and frequency domain techniques. We extracted three texture features based on wavelet (Haar, DB4, DT-CWT) transform. Totally 14 descriptors were introduced to define the characteristics of the suspicious micro-calcifications. Principal Component Analysis (PCA) was used to transform these descriptors to a compact and efficient vector expression. Support Vector Machine (SVM) classifier was used to classify potential micro-calcifications. Finally, we used the receiver operating characteristic (ROC) curve and free-response operating characteristic (FROC) curve to evaluate the performance of the CAD system. The results of SVM classifications based on different wavelets shows DT-CWT has a better performance. Compared with other results, DT-CWT method achieved an accuracy of 96% and 100% for the classification of normal and abnormal ROIs, and the classification of benign and malignant micro-calcifications respectively. In FROC analysis, our CAD system for clinical dataset detection achieved a sensitivity of 83.5% at a false positive per image of 1.85. Compared with general wavelets, DT-CWT could describe the features more effectively, and our CAD system had a competitive performance.
Computer-aided diagnosis of breast microcalcifications based on dual-tree complex wavelet transform
2012-01-01
Background Digital mammography is the most reliable imaging modality for breast carcinoma diagnosis and breast micro-calcifications is regarded as one of the most important signs on imaging diagnosis. In this paper, a computer-aided diagnosis (CAD) system is presented for breast micro-calcifications based on dual-tree complex wavelet transform (DT-CWT) to facilitate radiologists like double reading. Methods Firstly, 25 abnormal ROIs were extracted according to the center and diameter of the lesions manually and 25 normal ROIs were selected randomly. Then micro-calcifications were segmented by combining space and frequency domain techniques. We extracted three texture features based on wavelet (Haar, DB4, DT-CWT) transform. Totally 14 descriptors were introduced to define the characteristics of the suspicious micro-calcifications. Principal Component Analysis (PCA) was used to transform these descriptors to a compact and efficient vector expression. Support Vector Machine (SVM) classifier was used to classify potential micro-calcifications. Finally, we used the receiver operating characteristic (ROC) curve and free-response operating characteristic (FROC) curve to evaluate the performance of the CAD system. Results The results of SVM classifications based on different wavelets shows DT-CWT has a better performance. Compared with other results, DT-CWT method achieved an accuracy of 96% and 100% for the classification of normal and abnormal ROIs, and the classification of benign and malignant micro-calcifications respectively. In FROC analysis, our CAD system for clinical dataset detection achieved a sensitivity of 83.5% at a false positive per image of 1.85. Conclusions Compared with general wavelets, DT-CWT could describe the features more effectively, and our CAD system had a competitive performance. PMID:23253202
Spectral Estimation Model Construction of Heavy Metals in Mining Reclamation Areas
Dong, Jihong; Dai, Wenting; Xu, Jiren; Li, Songnian
2016-01-01
The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R2 of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R2 between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R2 value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R2 and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible. PMID:27367708
Spectral Estimation Model Construction of Heavy Metals in Mining Reclamation Areas.
Dong, Jihong; Dai, Wenting; Xu, Jiren; Li, Songnian
2016-06-28
The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R² of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R² between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R² value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R² and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible.
Sørensen, Lauge; Nielsen, Mads
2018-05-15
The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.
Kasiri, Keyvan; Kazemi, Kamran; Dehghani, Mohammad Javad; Helfroush, Mohammad Sadegh
2013-01-01
In this paper, we present a new semi-automatic brain tissue segmentation method based on a hybrid hierarchical approach that combines a brain atlas as a priori information and a least-square support vector machine (LS-SVM). The method consists of three steps. In the first two steps, the skull is removed and the cerebrospinal fluid (CSF) is extracted. These two steps are performed using the toolbox FMRIB's automated segmentation tool integrated in the FSL software (FSL-FAST) developed in Oxford Centre for functional MRI of the brain (FMRIB). Then, in the third step, the LS-SVM is used to segment grey matter (GM) and white matter (WM). The training samples for LS-SVM are selected from the registered brain atlas. The voxel intensities and spatial positions are selected as the two feature groups for training and test. SVM as a powerful discriminator is able to handle nonlinear classification problems; however, it cannot provide posterior probability. Thus, we use a sigmoid function to map the SVM output into probabilities. The proposed method is used to segment CSF, GM and WM from the simulated magnetic resonance imaging (MRI) using Brainweb MRI simulator and real data provided by Internet Brain Segmentation Repository. The semi-automatically segmented brain tissues were evaluated by comparing to the corresponding ground truth. The Dice and Jaccard similarity coefficients, sensitivity and specificity were calculated for the quantitative validation of the results. The quantitative results show that the proposed method segments brain tissues accurately with respect to corresponding ground truth. PMID:24696800
Arbitrary norm support vector machines.
Huang, Kaizhu; Zheng, Danian; King, Irwin; Lyu, Michael R
2009-02-01
Support vector machines (SVM) are state-of-the-art classifiers. Typically L2-norm or L1-norm is adopted as a regularization term in SVMs, while other norm-based SVMs, for example, the L0-norm SVM or even the L(infinity)-norm SVM, are rarely seen in the literature. The major reason is that L0-norm describes a discontinuous and nonconvex term, leading to a combinatorially NP-hard optimization problem. In this letter, motivated by Bayesian learning, we propose a novel framework that can implement arbitrary norm-based SVMs in polynomial time. One significant feature of this framework is that only a sequence of sequential minimal optimization problems needs to be solved, thus making it practical in many real applications. The proposed framework is important in the sense that Bayesian priors can be efficiently plugged into most learning methods without knowing the explicit form. Hence, this builds a connection between Bayesian learning and the kernel machines. We derive the theoretical framework, demonstrate how our approach works on the L0-norm SVM as a typical example, and perform a series of experiments to validate its advantages. Experimental results on nine benchmark data sets are very encouraging. The implemented L0-norm is competitive with or even better than the standard L2-norm SVM in terms of accuracy but with a reduced number of support vectors, -9.46% of the number on average. When compared with another sparse model, the relevance vector machine, our proposed algorithm also demonstrates better sparse properties with a training speed over seven times faster.
Malegori, Cristina; Nascimento Marques, Emanuel José; de Freitas, Sergio Tonetto; Pimentel, Maria Fernanda; Pasquini, Celio; Casiraghi, Ernestina
2017-04-01
The main goal of this study was to investigate the analytical performances of a state-of-the-art device, one of the smallest dispersion NIR spectrometers on the market (MicroNIR 1700), making a critical comparison with a benchtop FT-NIR spectrometer in the evaluation of the prediction accuracy. In particular, the aim of this study was to estimate in a non-destructive manner, titratable acidity and ascorbic acid content in acerola fruit during ripening, in a view of direct applicability in field of this new miniaturised handheld device. Acerola (Malpighia emarginata DC.) is a super-fruit characterised by a considerable amount of ascorbic acid, ranging from 1.0% to 4.5%. However, during ripening, acerola colour changes and the fruit may lose as much as half of its ascorbic acid content. Because the variability of chemical parameters followed a non-strictly linear profile, two different regression algorithms were compared: PLS and SVM. Regression models obtained with Micro-NIR spectra give better results using SVM algorithm, for both ascorbic acid and titratable acidity estimation. FT-NIR data give comparable results using both SVM and PLS algorithms, with lower errors for SVM regression. The prediction ability of the two instruments was statistically compared using the Passing-Bablok regression algorithm; the outcomes are critically discussed together with the regression models, showing the suitability of the portable Micro-NIR for in field monitoring of chemical parameters of interest in acerola fruits. Copyright © 2016 Elsevier B.V. All rights reserved.
Mocellin, Simone; Ambrosi, Alessandro; Montesco, Maria Cristina; Foletto, Mirto; Zavagno, Giorgio; Nitti, Donato; Lise, Mario; Rossi, Carlo Riccardo
2006-08-01
Currently, approximately 80% of melanoma patients undergoing sentinel node biopsy (SNB) have negative sentinel lymph nodes (SLNs), and no prediction system is reliable enough to be implemented in the clinical setting to reduce the number of SNB procedures. In this study, the predictive power of support vector machine (SVM)-based statistical analysis was tested. The clinical records of 246 patients who underwent SNB at our institution were used for this analysis. The following clinicopathologic variables were considered: the patient's age and sex and the tumor's histological subtype, Breslow thickness, Clark level, ulceration, mitotic index, lymphocyte infiltration, regression, angiolymphatic invasion, microsatellitosis, and growth phase. The results of SVM-based prediction of SLN status were compared with those achieved with logistic regression. The SLN positivity rate was 22% (52 of 234). When the accuracy was > or = 80%, the negative predictive value, positive predictive value, specificity, and sensitivity were 98%, 54%, 94%, and 77% and 82%, 41%, 69%, and 93% by using SVM and logistic regression, respectively. Moreover, SVM and logistic regression were associated with a diagnostic error and an SNB percentage reduction of (1) 1% and 60% and (2) 15% and 73%, respectively. The results from this pilot study suggest that SVM-based prediction of SLN status might be evaluated as a prognostic method to avoid the SNB procedure in 60% of patients currently eligible, with a very low error rate. If validated in larger series, this strategy would lead to obvious advantages in terms of both patient quality of life and costs for the health care system.
NASA Astrophysics Data System (ADS)
Cubillas, J. E.; Japitana, M.
2016-06-01
This study demonstrates the application of CIELAB, Color intensity, and One Dimensional Scalar Constancy as features for image recognition and classifying benthic habitats in an image with the coastal areas of Hinatuan, Surigao Del Sur, Philippines as the study area. The study area is composed of four datasets, namely: (a) Blk66L005, (b) Blk66L021, (c) Blk66L024, and (d) Blk66L0114. SVM optimization was performed in Matlab® software with the help of Parallel Computing Toolbox to hasten the SVM computing speed. The image used for collecting samples for SVM procedure was Blk66L0114 in which a total of 134,516 sample objects of mangrove, possible coral existence with rocks, sand, sea, fish pens and sea grasses were collected and processed. The collected samples were then used as training sets for the supervised learning algorithm and for the creation of class definitions. The learned hyper-planes separating one class from another in the multi-dimensional feature space can be thought of as a super feature which will then be used in developing the C (classifier) rule set in eCognition® software. The classification results of the sampling site yielded an accuracy of 98.85% which confirms the reliability of remote sensing techniques and analysis employed to orthophotos like the CIELAB, Color Intensity and One dimensional scalar constancy and the use of SVM classification algorithm in classifying benthic habitats.
Durán, Jorge; Delgado-Baquerizo, Manuel; Dougill, Andrew J; Guuroh, Reginald T; Linstädter, Anja; Thomas, Andrew D; Maestre, Fernando T
2018-05-01
The relationship between the spatial variability of soil multifunctionality (i.e., the capacity of soils to conduct multiple functions; SVM) and major climatic drivers, such as temperature and aridity, has never been assessed globally in terrestrial ecosystems. We surveyed 236 dryland ecosystems from six continents to evaluate the relative importance of aridity and mean annual temperature, and of other abiotic (e.g., texture) and biotic (e.g., plant cover) variables as drivers of SVM, calculated as the averaged coefficient of variation for multiple soil variables linked to nutrient stocks and cycling. We found that increases in temperature and aridity were globally correlated to increases in SVM. Some of these climatic effects on SVM were direct, but others were indirectly driven through reductions in the number of vegetation patches and increases in soil sand content. The predictive capacity of our structural equation modelling was clearly higher for the spatial variability of N- than for C- and P-related soil variables. In the case of N cycling, the effects of temperature and aridity were both direct and indirect via changes in soil properties. For C and P, the effect of climate was mainly indirect via changes in plant attributes. These results suggest that future changes in climate may decouple the spatial availability of these elements for plants and microbes in dryland soils. Our findings significantly advance our understanding of the patterns and mechanisms driving SVM in drylands across the globe, which is critical for predicting changes in ecosystem functioning in response to climate change. © 2018 by the Ecological Society of America.
NASA Astrophysics Data System (ADS)
Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah
2014-05-01
Flood is one of the most devastating natural disasters that occur frequently in Terengganu, Malaysia. Recently, ensemble based techniques are getting extremely popular in flood modeling. In this paper, weights-of-evidence (WoE) model was utilized first, to assess the impact of classes of each conditioning factor on flooding through bivariate statistical analysis (BSA). Then, these factors were reclassified using the acquired weights and entered into the support vector machine (SVM) model to evaluate the correlation between flood occurrence and each conditioning factor. Through this integration, the weak point of WoE can be solved and the performance of the SVM will be enhanced. The spatial database included flood inventory, slope, stream power index (SPI), topographic wetness index (TWI), altitude, curvature, distance from the river, geology, rainfall, land use/cover (LULC), and soil type. Four kernel types of SVM (linear kernel (LN), polynomial kernel (PL), radial basis function kernel (RBF), and sigmoid kernel (SIG)) were used to investigate the performance of each kernel type. The efficiency of the new ensemble WoE and SVM method was tested using area under curve (AUC) which measured the prediction and success rates. The validation results proved the strength and efficiency of the ensemble method over the individual methods. The best results were obtained from RBF kernel when compared with the other kernel types. Success rate and prediction rate for ensemble WoE and RBF-SVM method were 96.48% and 95.67% respectively. The proposed ensemble flood susceptibility mapping method could assist researchers and local governments in flood mitigation strategies.
AMINI, Payam; AHMADINIA, Hasan; POOROLAJAL, Jalal; MOQADDASI AMIRI, Mohammad
2016-01-01
Background: We aimed to assess the high-risk group for suicide using different classification methods includinglogistic regression (LR), decision tree (DT), artificial neural network (ANN), and support vector machine (SVM). Methods: We used the dataset of a study conducted to predict risk factors of completed suicide in Hamadan Province, the west of Iran, in 2010. To evaluate the high-risk groups for suicide, LR, SVM, DT and ANN were performed. The applied methods were compared using sensitivity, specificity, positive predicted value, negative predicted value, accuracy and the area under curve. Cochran-Q test was implied to check differences in proportion among methods. To assess the association between the observed and predicted values, Ø coefficient, contingency coefficient, and Kendall tau-b were calculated. Results: Gender, age, and job were the most important risk factors for fatal suicide attempts in common for four methods. SVM method showed the highest accuracy 0.68 and 0.67 for training and testing sample, respectively. However, this method resulted in the highest specificity (0.67 for training and 0.68 for testing sample) and the highest sensitivity for training sample (0.85), but the lowest sensitivity for the testing sample (0.53). Cochran-Q test resulted in differences between proportions in different methods (P<0.001). The association of SVM predictions and observed values, Ø coefficient, contingency coefficient, and Kendall tau-b were 0.239, 0.232 and 0.239, respectively. Conclusion: SVM had the best performance to classify fatal suicide attempts comparing to DT, LR and ANN. PMID:27957463
Kavitha, Muthu Subash; Asano, Akira; Taguchi, Akira; Heo, Min-Suk
2013-09-01
To prevent low bone mineral density (BMD), that is, osteoporosis, in postmenopausal women, it is essential to diagnose osteoporosis more precisely. This study presented an automatic approach utilizing a histogram-based automatic clustering (HAC) algorithm with a support vector machine (SVM) to analyse dental panoramic radiographs (DPRs) and thus improve diagnostic accuracy by identifying postmenopausal women with low BMD or osteoporosis. We integrated our newly-proposed histogram-based automatic clustering (HAC) algorithm with our previously-designed computer-aided diagnosis system. The extracted moment-based features (mean, variance, skewness, and kurtosis) of the mandibular cortical width for the radial basis function (RBF) SVM classifier were employed. We also compared the diagnostic efficacy of the SVM model with the back propagation (BP) neural network model. In this study, DPRs and BMD measurements of 100 postmenopausal women patients (aged >50 years), with no previous record of osteoporosis, were randomly selected for inclusion. The accuracy, sensitivity, and specificity of the BMD measurements using our HAC-SVM model to identify women with low BMD were 93.0% (88.0%-98.0%), 95.8% (91.9%-99.7%) and 86.6% (79.9%-93.3%), respectively, at the lumbar spine; and 89.0% (82.9%-95.1%), 96.0% (92.2%-99.8%) and 84.0% (76.8%-91.2%), respectively, at the femoral neck. Our experimental results predict that the proposed HAC-SVM model combination applied on DPRs could be useful to assist dentists in early diagnosis and help to reduce the morbidity and mortality associated with low BMD and osteoporosis.
Automated classification of neurological disorders of gait using spatio-temporal gait parameters.
Pradhan, Cauchy; Wuehr, Max; Akrami, Farhoud; Neuhaeusser, Maximilian; Huth, Sabrina; Brandt, Thomas; Jahn, Klaus; Schniepp, Roman
2015-04-01
Automated pattern recognition systems have been used for accurate identification of neurological conditions as well as the evaluation of the treatment outcomes. This study aims to determine the accuracy of diagnoses of (oto-)neurological gait disorders using different types of automated pattern recognition techniques. Clinically confirmed cases of phobic postural vertigo (N = 30), cerebellar ataxia (N = 30), progressive supranuclear palsy (N = 30), bilateral vestibulopathy (N = 30), as well as healthy subjects (N = 30) were recruited for the study. 8 measurements with 136 variables using a GAITRite(®) sensor carpet were obtained from each subject. Subjects were randomly divided into two groups (training cases and validation cases). Sensitivity and specificity of k-nearest neighbor (KNN), naive-bayes classifier (NB), artificial neural network (ANN), and support vector machine (SVM) in classifying the validation cases were calculated. ANN and SVM had the highest overall sensitivity with 90.6% and 92.0% respectively, followed by NB (76.0%) and KNN (73.3%). SVM and ANN showed high false negative rates for bilateral vestibulopathy cases (20.0% and 26.0%); while KNN and NB had high false negative rates for progressive supranuclear palsy cases (76.7% and 40.0%). Automated pattern recognition systems are able to identify pathological gait patterns and establish clinical diagnosis with good accuracy. SVM and ANN in particular differentiate gait patterns of several distinct oto-neurological disorders of gait with high sensitivity and specificity compared to KNN and NB. Both SVM and ANN appear to be a reliable diagnostic and management tool for disorders of gait. Copyright © 2015 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Harder, Ben
2011-01-01
The Student Volunteer Movement (SVM) for Foreign Missions was founded in 1886 at a Conference in the Mt. Hermon University, an organization designed to recruit college and university students in the United States and later of course through the Western world, for missionary service abroad. The primary leader of the SVM was A. T. Pierson, a major…
NASA Astrophysics Data System (ADS)
Luna, Aderval S.; da Silva, Arnaldo P.; Ferré, Joan; Boqué, Ricard
This research work describes two studies for the classification and characterization of edible oils and its quality parameters through Fourier transform mid infrared spectroscopy (FT-mid-IR) together with chemometric methods. The discrimination of canola, sunflower, corn and soybean oils was investigated using SVM-DA, SIMCA and PLS-DA. Using FT-mid-IR, DPLS was able to classify 100% of the samples from the validation set, but SIMCA and SVM-DA were not. The quality parameters: refraction index and relative density of edible oils were obtained from reference methods. Prediction models for FT-mid-IR spectra were calculated for these quality parameters using partial least squares (PLS) and support vector machines (SVM). Several preprocessing alternatives (first derivative, multiplicative scatter correction, mean centering, and standard normal variate) were investigated. The best result for the refraction index was achieved with SVM as well as for the relative density except when the preprocessing combination of mean centering and first derivative was used. For both of quality parameters, the best results obtained for the figures of merit expressed by the root mean square error of cross validation (RMSECV) and prediction (RMSEP) were equal to 0.0001.
SVM Pixel Classification on Colour Image Segmentation
NASA Astrophysics Data System (ADS)
Barui, Subhrajit; Latha, S.; Samiappan, Dhanalakshmi; Muthu, P.
2018-04-01
The aim of image segmentation is to simplify the representation of an image with the help of cluster pixels into something meaningful to analyze. Segmentation is typically used to locate boundaries and curves in an image, precisely to label every pixel in an image to give each pixel an independent identity. SVM pixel classification on colour image segmentation is the topic highlighted in this paper. It holds useful application in the field of concept based image retrieval, machine vision, medical imaging and object detection. The process is accomplished step by step. At first we need to recognize the type of colour and the texture used as an input to the SVM classifier. These inputs are extracted via local spatial similarity measure model and Steerable filter also known as Gabon Filter. It is then trained by using FCM (Fuzzy C-Means). Both the pixel level information of the image and the ability of the SVM Classifier undergoes some sophisticated algorithm to form the final image. The method has a well developed segmented image and efficiency with respect to increased quality and faster processing of the segmented image compared with the other segmentation methods proposed earlier. One of the latest application result is the Light L16 camera.
A SVM framework for fault detection of the braking system in a high speed train
NASA Astrophysics Data System (ADS)
Liu, Jie; Li, Yan-Fu; Zio, Enrico
2017-03-01
In April 2015, the number of operating High Speed Trains (HSTs) in the world has reached 3603. An efficient, effective and very reliable braking system is evidently very critical for trains running at a speed around 300 km/h. Failure of a highly reliable braking system is a rare event and, consequently, informative recorded data on fault conditions are scarce. This renders the fault detection problem a classification problem with highly unbalanced data. In this paper, a Support Vector Machine (SVM) framework, including feature selection, feature vector selection, model construction and decision boundary optimization, is proposed for tackling this problem. Feature vector selection can largely reduce the data size and, thus, the computational burden. The constructed model is a modified version of the least square SVM, in which a higher cost is assigned to the error of classification of faulty conditions than the error of classification of normal conditions. The proposed framework is successfully validated on a number of public unbalanced datasets. Then, it is applied for the fault detection of braking systems in HST: in comparison with several SVM approaches for unbalanced datasets, the proposed framework gives better results.
Age and gender estimation using Region-SIFT and multi-layered SVM
NASA Astrophysics Data System (ADS)
Kim, Hyunduk; Lee, Sang-Heon; Sohn, Myoung-Kyu; Hwang, Byunghun
2018-04-01
In this paper, we propose an age and gender estimation framework using the region-SIFT feature and multi-layered SVM classifier. The suggested framework entails three processes. The first step is landmark based face alignment. The second step is the feature extraction step. In this step, we introduce the region-SIFT feature extraction method based on facial landmarks. First, we define sub-regions of the face. We then extract SIFT features from each sub-region. In order to reduce the dimensions of features we employ a Principal Component Analysis (PCA) and a Linear Discriminant Analysis (LDA). Finally, we classify age and gender using a multi-layered Support Vector Machines (SVM) for efficient classification. Rather than performing gender estimation and age estimation independently, the use of the multi-layered SVM can improve the classification rate by constructing a classifier that estimate the age according to gender. Moreover, we collect a dataset of face images, called by DGIST_C, from the internet. A performance evaluation of proposed method was performed with the FERET database, CACD database, and DGIST_C database. The experimental results demonstrate that the proposed approach classifies age and performs gender estimation very efficiently and accurately.
A Semisupervised Support Vector Machines Algorithm for BCI Systems
Qin, Jianzhao; Li, Yuanqing; Sun, Wei
2007-01-01
As an emerging technology, brain-computer interfaces (BCIs) bring us new communication interfaces which translate brain activities into control signals for devices like computers, robots, and so forth. In this study, we propose a semisupervised support vector machine (SVM) algorithm for brain-computer interface (BCI) systems, aiming at reducing the time-consuming training process. In this algorithm, we apply a semisupervised SVM for translating the features extracted from the electrical recordings of brain into control signals. This SVM classifier is built from a small labeled data set and a large unlabeled data set. Meanwhile, to reduce the time for training semisupervised SVM, we propose a batch-mode incremental learning method, which can also be easily applied to the online BCI systems. Additionally, it is suggested in many studies that common spatial pattern (CSP) is very effective in discriminating two different brain states. However, CSP needs a sufficient labeled data set. In order to overcome the drawback of CSP, we suggest a two-stage feature extraction method for the semisupervised learning algorithm. We apply our algorithm to two BCI experimental data sets. The offline data analysis results demonstrate the effectiveness of our algorithm. PMID:18368141
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
Nonlinear Demodulation and Channel Coding in EBPSK Scheme
Chen, Xianqing; Wu, Lenan
2012-01-01
The extended binary phase shift keying (EBPSK) is an efficient modulation technique, and a special impacting filter (SIF) is used in its demodulator to improve the bit error rate (BER) performance. However, the conventional threshold decision cannot achieve the optimum performance, and the SIF brings more difficulty in obtaining the posterior probability for LDPC decoding. In this paper, we concentrate not only on reducing the BER of demodulation, but also on providing accurate posterior probability estimates (PPEs). A new approach for the nonlinear demodulation based on the support vector machine (SVM) classifier is introduced. The SVM method which selects only a few sampling points from the filter output was used for getting PPEs. The simulation results show that the accurate posterior probability can be obtained with this method and the BER performance can be improved significantly by applying LDPC codes. Moreover, we analyzed the effect of getting the posterior probability with different methods and different sampling rates. We show that there are more advantages of the SVM method under bad condition and it is less sensitive to the sampling rate than other methods. Thus, SVM is an effective method for EBPSK demodulation and getting posterior probability for LDPC decoding. PMID:23213281
An SVM-based solution for fault detection in wind turbines.
Santos, Pedro; Villa, Luisa F; Reñones, Aníbal; Bustillo, Andres; Maudes, Jesús
2015-03-09
Research into fault diagnosis in machines with a wide range of variable loads and speeds, such as wind turbines, is of great industrial interest. Analysis of the power signals emitted by wind turbines for the diagnosis of mechanical faults in their mechanical transmission chain is insufficient. A successful diagnosis requires the inclusion of accelerometers to evaluate vibrations. This work presents a multi-sensory system for fault diagnosis in wind turbines, combined with a data-mining solution for the classification of the operational state of the turbine. The selected sensors are accelerometers, in which vibration signals are processed using angular resampling techniques and electrical, torque and speed measurements. Support vector machines (SVMs) are selected for the classification task, including two traditional and two promising new kernels. This multi-sensory system has been validated on a test-bed that simulates the real conditions of wind turbines with two fault typologies: misalignment and imbalance. Comparison of SVM performance with the results of artificial neural networks (ANNs) shows that linear kernel SVM outperforms other kernels and ANNs in terms of accuracy, training and tuning times. The suitability and superior performance of linear SVM is also experimentally analyzed, to conclude that this data acquisition technique generates linearly separable datasets.