Machine learning modelling for predicting soil liquefaction susceptibility
NASA Astrophysics Data System (ADS)
Samui, P.; Sitharam, T. G.
2011-01-01
This study describes two machine learning techniques applied to predict liquefaction susceptibility of soil based on the standard penetration test (SPT) data from the 1999 Chi-Chi, Taiwan earthquake. The first machine learning technique which uses Artificial Neural Network (ANN) based on multi-layer perceptions (MLP) that are trained with Levenberg-Marquardt backpropagation algorithm. The second machine learning technique uses the Support Vector machine (SVM) that is firmly based on the theory of statistical learning theory, uses classification technique. ANN and SVM have been developed to predict liquefaction susceptibility using corrected SPT [(N1)60] and cyclic stress ratio (CSR). Further, an attempt has been made to simplify the models, requiring only the two parameters [(N1)60 and peck ground acceleration (amax/g)], for the prediction of liquefaction susceptibility. The developed ANN and SVM models have also been applied to different case histories available globally. The paper also highlights the capability of the SVM over the ANN models.
NASA Astrophysics Data System (ADS)
Kumar, Deepak; Thakur, Manoj; Dubey, Chandra S.; Shukla, Dericks P.
2017-10-01
In recent years, various machine learning techniques have been applied for landslide susceptibility mapping. In this study, three different variants of support vector machine viz., SVM, Proximal Support Vector Machine (PSVM) and L2-Support Vector Machine - Modified Finite Newton (L2-SVM-MFN) have been applied on the Mandakini River Basin in Uttarakhand, India to carry out the landslide susceptibility mapping. Eight thematic layers such as elevation, slope, aspect, drainages, geology/lithology, buffer of thrusts/faults, buffer of streams and soil along with the past landslide data were mapped in GIS environment and used for landslide susceptibility mapping in MATLAB. The study area covering 1625 km2 has merely 0.11% of area under landslides. There are 2009 pixels for past landslides out of which 50% (1000) landslides were considered as training set while remaining 50% as testing set. The performance of these techniques has been evaluated and the computational results show that L2-SVM-MFN obtains higher prediction values (0.829) of receiver operating characteristic curve (AUC-area under the curve) as compared to 0.807 for PSVM model and 0.79 for SVM. The results obtained from L2-SVM-MFN model are found to be superior than other SVM prediction models and suggest the usefulness of this technique to problem of landslide susceptibility mapping where training data is very less. However, these techniques can be used for satisfactory determination of susceptible zones with these inputs.
Weighted K-means support vector machine for cancer prediction.
Kim, SungHwan
2016-01-01
To date, the support vector machine (SVM) has been widely applied to diverse bio-medical fields to address disease subtype identification and pathogenicity of genetic variants. In this paper, I propose the weighted K-means support vector machine (wKM-SVM) and weighted support vector machine (wSVM), for which I allow the SVM to impose weights to the loss term. Besides, I demonstrate the numerical relations between the objective function of the SVM and weights. Motivated by general ensemble techniques, which are known to improve accuracy, I directly adopt the boosting algorithm to the newly proposed weighted KM-SVM (and wSVM). For predictive performance, a range of simulation studies demonstrate that the weighted KM-SVM (and wSVM) with boosting outperforms the standard KM-SVM (and SVM) including but not limited to many popular classification rules. I applied the proposed methods to simulated data and two large-scale real applications in the TCGA pan-cancer methylation data of breast and kidney cancer. In conclusion, the weighted KM-SVM (and wSVM) increases accuracy of the classification model, and will facilitate disease diagnosis and clinical treatment decisions to benefit patients. A software package (wSVM) is publicly available at the R-project webpage (https://www.r-project.org).
Applications of Support Vector Machines In Chemo And Bioinformatics
NASA Astrophysics Data System (ADS)
Jayaraman, V. K.; Sundararajan, V.
2010-10-01
Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.
Bisgin, Halil; Bera, Tanmay; Ding, Hongjian; Semey, Howard G; Wu, Leihong; Liu, Zhichao; Barnes, Amy E; Langley, Darryl A; Pava-Ripoll, Monica; Vyas, Himansu J; Tong, Weida; Xu, Joshua
2018-04-25
Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.
NASA Astrophysics Data System (ADS)
Liu, Di; Mishra, Ashok K.; Yu, Zhongbo
2016-07-01
This paper examines the combination of support vector machines (SVM) and the dual ensemble Kalman filter (EnKF) technique to estimate root zone soil moisture at different soil layers up to 100 cm depth. Multiple experiments are conducted in a data rich environment to construct and validate the SVM model and to explore the effectiveness and robustness of the EnKF technique. It was observed that the performance of SVM relies more on the initial length of training set than other factors (e.g., cost function, regularization parameter, and kernel parameters). The dual EnKF technique proved to be efficient to improve SVM with observed data either at each time step or at a flexible time steps. The EnKF technique can reach its maximum efficiency when the updating ensemble size approaches a certain threshold. It was observed that the SVM model performance for the multi-layer soil moisture estimation can be influenced by the rainfall magnitude (e.g., dry and wet spells).
NASA Astrophysics Data System (ADS)
Li, Shao-Xin; Zeng, Qiu-Yao; Li, Lin-Fang; Zhang, Yan-Jiao; Wan, Ming-Ming; Liu, Zhi-Ming; Xiong, Hong-Lian; Guo, Zhou-Yi; Liu, Song-Hao
2013-02-01
The ability of combining serum surface-enhanced Raman spectroscopy (SERS) with support vector machine (SVM) for improving classification esophageal cancer patients from normal volunteers is investigated. Two groups of serum SERS spectra based on silver nanoparticles (AgNPs) are obtained: one group from patients with pathologically confirmed esophageal cancer (n=30) and the other group from healthy volunteers (n=31). Principal components analysis (PCA), conventional SVM (C-SVM) and conventional SVM combination with PCA (PCA-SVM) methods are implemented to classify the same spectral dataset. Results show that a diagnostic accuracy of 77.0% is acquired for PCA technique, while diagnostic accuracies of 83.6% and 85.2% are obtained for C-SVM and PCA-SVM methods based on radial basis functions (RBF) models. The results prove that RBF SVM models are superior to PCA algorithm in classification serum SERS spectra. The study demonstrates that serum SERS in combination with SVM technique has great potential to provide an effective and accurate diagnostic schema for noninvasive detection of esophageal cancer.
NASA Astrophysics Data System (ADS)
Li, S. X.; Zhang, Y. J.; Zeng, Q. Y.; Li, L. F.; Guo, Z. Y.; Liu, Z. M.; Xiong, H. L.; Liu, S. H.
2014-06-01
Cancer is the most common disease to threaten human health. The ability to screen individuals with malignant tumours with only a blood sample would be greatly advantageous to early diagnosis and intervention. This study explores the possibility of discriminating between cancer patients and normal subjects with serum surface-enhanced Raman spectroscopy (SERS) and a support vector machine (SVM) through a peripheral blood sample. A total of 130 blood samples were obtained from patients with liver cancer, colonic cancer, esophageal cancer, nasopharyngeal cancer, gastric cancer, as well as 113 blood samples from normal volunteers. Several diagnostic models were built with the serum SERS spectra using SVM and principal component analysis (PCA) techniques. The results show that a diagnostic accuracy of 85.5% is acquired with a PCA algorithm, while a diagnostic accuracy of 95.8% is obtained using radial basis function (RBF), PCA-SVM methods. The results prove that a RBF kernel PCA-SVM technique is superior to PCA and conventional SVM (C-SVM) algorithms in classification serum SERS spectra. The study demonstrates that serum SERS, in combination with SVM techniques, has great potential for screening cancerous patients with any solid malignant tumour through a peripheral blood sample.
SVM and SVM Ensembles in Breast Cancer Prediction.
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
SVM and SVM Ensembles in Breast Cancer Prediction
Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong
2017-01-01
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807
Deeb, Omar; Shaik, Basheerulla; Agrawal, Vijay K
2014-10-01
Quantitative Structure-Activity Relationship (QSAR) models for binding affinity constants (log Ki) of 78 flavonoid ligands towards the benzodiazepine site of GABA (A) receptor complex were calculated using the machine learning methods: artificial neural network (ANN) and support vector machine (SVM) techniques. The models obtained were compared with those obtained using multiple linear regression (MLR) analysis. The descriptor selection and model building were performed with 10-fold cross-validation using the training data set. The SVM and MLR coefficient of determination values are 0.944 and 0.879, respectively, for the training set and are higher than those of ANN models. Though the SVM model shows improvement of training set fitting, the ANN model was superior to SVM and MLR in predicting the test set. Randomization test is employed to check the suitability of the models.
NASA Astrophysics Data System (ADS)
Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos
2016-08-01
This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.
2018-01-01
Early detection of power transformer fault is important because it can reduce the maintenance cost of the transformer and it can ensure continuous electricity supply in power systems. Dissolved Gas Analysis (DGA) technique is commonly used to identify oil-filled power transformer fault type but utilisation of artificial intelligence method with optimisation methods has shown convincing results. In this work, a hybrid support vector machine (SVM) with modified evolutionary particle swarm optimisation (EPSO) algorithm was proposed to determine the transformer fault type. The superiority of the modified PSO technique with SVM was evaluated by comparing the results with the actual fault diagnosis, unoptimised SVM and previous reported works. Data reduction was also applied using stepwise regression prior to the training process of SVM to reduce the training time. It was found that the proposed hybrid SVM-Modified EPSO (MEPSO)-Time Varying Acceleration Coefficient (TVAC) technique results in the highest correct identification percentage of faults in a power transformer compared to other PSO algorithms. Thus, the proposed technique can be one of the potential solutions to identify the transformer fault type based on DGA data on site. PMID:29370230
Illias, Hazlee Azil; Zhao Liang, Wee
2018-01-01
Early detection of power transformer fault is important because it can reduce the maintenance cost of the transformer and it can ensure continuous electricity supply in power systems. Dissolved Gas Analysis (DGA) technique is commonly used to identify oil-filled power transformer fault type but utilisation of artificial intelligence method with optimisation methods has shown convincing results. In this work, a hybrid support vector machine (SVM) with modified evolutionary particle swarm optimisation (EPSO) algorithm was proposed to determine the transformer fault type. The superiority of the modified PSO technique with SVM was evaluated by comparing the results with the actual fault diagnosis, unoptimised SVM and previous reported works. Data reduction was also applied using stepwise regression prior to the training process of SVM to reduce the training time. It was found that the proposed hybrid SVM-Modified EPSO (MEPSO)-Time Varying Acceleration Coefficient (TVAC) technique results in the highest correct identification percentage of faults in a power transformer compared to other PSO algorithms. Thus, the proposed technique can be one of the potential solutions to identify the transformer fault type based on DGA data on site.
Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong
2015-09-01
Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.
Scaling Support Vector Machines On Modern HPC Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Fu, Haohuan; Song, Shuaiwen
2015-02-01
We designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multicore and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools.
Quantum optimization for training support vector machines.
Anguita, Davide; Ridella, Sandro; Rivieccio, Fabio; Zunino, Rodolfo
2003-01-01
Refined concepts, such as Rademacher estimates of model complexity and nonlinear criteria for weighting empirical classification errors, represent recent and promising approaches to characterize the generalization ability of Support Vector Machines (SVMs). The advantages of those techniques lie in both improving the SVM representation ability and yielding tighter generalization bounds. On the other hand, they often make Quadratic-Programming algorithms no longer applicable, and SVM training cannot benefit from efficient, specialized optimization techniques. The paper considers the application of Quantum Computing to solve the problem of effective SVM training, especially in the case of digital implementations. The presented research compares the behavioral aspects of conventional and enhanced SVMs; experiments in both a synthetic and real-world problems support the theoretical analysis. At the same time, the related differences between Quadratic-Programming and Quantum-based optimization techniques are considered.
Classification of the Regional Ionospheric Disturbance Based on Machine Learning Techniques
NASA Astrophysics Data System (ADS)
Terzi, Merve Begum; Arikan, Orhan; Karatay, Secil; Arikan, Feza; Gulyaeva, Tamara
2016-08-01
In this study, Total Electron Content (TEC) estimated from GPS receivers is used to model the regional and local variability that differs from global activity along with solar and geomagnetic indices. For the automated classification of regional disturbances, a classification technique based on a robust machine learning technique that have found wide spread use, Support Vector Machine (SVM) is proposed. Performance of developed classification technique is demonstrated for midlatitude ionosphere over Anatolia using TEC estimates generated from GPS data provided by Turkish National Permanent GPS Network (TNPGN-Active) for solar maximum year of 2011. As a result of implementing developed classification technique to Global Ionospheric Map (GIM) TEC data, which is provided by the NASA Jet Propulsion Laboratory (JPL), it is shown that SVM can be a suitable learning method to detect anomalies in TEC variations.
Seminal quality prediction using data mining methods.
Sahoo, Anoop J; Kumar, Yugal
2014-01-01
Now-a-days, some new classes of diseases have come into existences which are known as lifestyle diseases. The main reasons behind these diseases are changes in the lifestyle of people such as alcohol drinking, smoking, food habits etc. After going through the various lifestyle diseases, it has been found that the fertility rates (sperm quantity) in men has considerably been decreasing in last two decades. Lifestyle factors as well as environmental factors are mainly responsible for the change in the semen quality. The objective of this paper is to identify the lifestyle and environmental features that affects the seminal quality and also fertility rate in man using data mining methods. The five artificial intelligence techniques such as Multilayer perceptron (MLP), Decision Tree (DT), Navie Bayes (Kernel), Support vector machine+Particle swarm optimization (SVM+PSO) and Support vector machine (SVM) have been applied on fertility dataset to evaluate the seminal quality and also to predict the person is either normal or having altered fertility rate. While the eight feature selection techniques such as support vector machine (SVM), neural network (NN), evolutionary logistic regression (LR), support vector machine plus particle swarm optimization (SVM+PSO), principle component analysis (PCA), chi-square test, correlation and T-test methods have been used to identify more relevant features which affect the seminal quality. These techniques are applied on fertility dataset which contains 100 instances with nine attribute with two classes. The experimental result shows that SVM+PSO provides higher accuracy and area under curve (AUC) rate (94% & 0.932) among multi-layer perceptron (MLP) (92% & 0.728), Support Vector Machines (91% & 0.758), Navie Bayes (Kernel) (89% & 0.850) and Decision Tree (89% & 0.735) for some of the seminal parameters. This paper also focuses on the feature selection process i.e. how to select the features which are more important for prediction of fertility rate. In this paper, eight feature selection methods are applied on fertility dataset to find out a set of good features. The investigational results shows that childish diseases (0.079) and high fever features (0.057) has less impact on fertility rate while age (0.8685), season (0.843), surgical intervention (0.7683), alcohol consumption (0.5992), smoking habit (0.575), number of hours spent on setting (0.4366) and accident (0.5973) features have more impact. It is also observed that feature selection methods increase the accuracy of above mentioned techniques (multilayer perceptron 92%, support vector machine 91%, SVM+PSO 94%, Navie Bayes (Kernel) 89% and decision tree 89%) as compared to without feature selection methods (multilayer perceptron 86%, support vector machine 86%, SVM+PSO 85%, Navie Bayes (Kernel) 83% and decision tree 84%) which shows the applicability of feature selection methods in prediction. This paper lightens the application of artificial techniques in medical domain. From this paper, it can be concluded that data mining methods can be used to predict a person with or without disease based on environmental and lifestyle parameters/features rather than undergoing various medical test. In this paper, five data mining techniques are used to predict the fertility rate and among which SVM+PSO provide more accurate results than support vector machine and decision tree.
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine
Manavalan, Balachandran; Shin, Tae H.; Lee, Gwang
2018-01-01
Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html. PMID:29616000
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.
Manavalan, Balachandran; Shin, Tae H; Lee, Gwang
2018-01-01
Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.
Tuning support vector machines for minimax and Neyman-Pearson classification.
Davenport, Mark A; Baraniuk, Richard G; Scott, Clayton D
2010-10-01
This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and Neyman-Pearson criteria. In principle, these criteria can be optimized in a straightforward way using a cost-sensitive SVM. In practice, however, because these criteria require especially accurate error estimation, standard techniques for tuning SVM parameters, such as cross-validation, can lead to poor classifier performance. To address this issue, we first prove that the usual cost-sensitive SVM, here called the 2C-SVM, is equivalent to another formulation called the 2nu-SVM. We then exploit a characterization of the 2nu-SVM parameter space to develop a simple yet powerful approach to error estimation based on smoothing. In an extensive experimental study, we demonstrate that smoothing significantly improves the accuracy of cross-validation error estimates, leading to dramatic performance gains. Furthermore, we propose coordinate descent strategies that offer significant gains in computational efficiency, with little to no loss in performance.
Electrocardiographic signals and swarm-based support vector machine for hypoglycemia detection.
Nuryani, Nuryani; Ling, Steve S H; Nguyen, H T
2012-04-01
Cardiac arrhythmia relating to hypoglycemia is suggested as a cause of death in diabetic patients. This article introduces electrocardiographic (ECG) parameters for artificially induced hypoglycemia detection. In addition, a hybrid technique of swarm-based support vector machine (SVM) is introduced for hypoglycemia detection using the ECG parameters as inputs. In this technique, a particle swarm optimization (PSO) is proposed to optimize the SVM to detect hypoglycemia. In an experiment using medical data of patients with Type 1 diabetes, the introduced ECG parameters show significant contributions to the performance of the hypoglycemia detection and the proposed detection technique performs well in terms of sensitivity and specificity.
NASA Astrophysics Data System (ADS)
Li, Shaoxin; Zhang, Yanjiao; Xu, Junfa; Li, Linfang; Zeng, Qiuyao; Lin, Lin; Guo, Zhouyi; Liu, Zhiming; Xiong, Honglian; Liu, Songhao
2014-09-01
This study aims to present a noninvasive prostate cancer screening methods using serum surface-enhanced Raman scattering (SERS) and support vector machine (SVM) techniques through peripheral blood sample. SERS measurements are performed using serum samples from 93 prostate cancer patients and 68 healthy volunteers by silver nanoparticles. Three types of kernel functions including linear, polynomial, and Gaussian radial basis function (RBF) are employed to build SVM diagnostic models for classifying measured SERS spectra. For comparably evaluating the performance of SVM classification models, the standard multivariate statistic analysis method of principal component analysis (PCA) is also applied to classify the same datasets. The study results show that for the RBF kernel SVM diagnostic model, the diagnostic accuracy of 98.1% is acquired, which is superior to the results of 91.3% obtained from PCA methods. The receiver operating characteristic curve of diagnostic models further confirm above research results. This study demonstrates that label-free serum SERS analysis technique combined with SVM diagnostic algorithm has great potential for noninvasive prostate cancer screening.
Daily River Flow Forecasting with Hybrid Support Vector Machine – Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Zaini, N.; Malek, M. A.; Yusoff, M.; Mardi, N. H.; Norhisham, S.
2018-04-01
The application of artificial intelligence techniques for river flow forecasting can further improve the management of water resources and flood prevention. This study concerns the development of support vector machine (SVM) based model and its hybridization with particle swarm optimization (PSO) to forecast short term daily river flow at Upper Bertam Catchment located in Cameron Highland, Malaysia. Ten years duration of historical rainfall, antecedent river flow data and various meteorology parameters data from 2003 to 2012 are used in this study. Four SVM based models are proposed which are SVM1, SVM2, SVM-PSO1 and SVM-PSO2 to forecast 1 to 7 day ahead of river flow. SVM1 and SVM-PSO1 are the models with historical rainfall and antecedent river flow as its input, while SVM2 and SVM-PSO2 are the models with historical rainfall, antecedent river flow data and additional meteorological parameters as input. The performances of the proposed model are measured in term of RMSE and R2 . It is found that, SVM2 outperformed SVM1 and SVM-PSO2 outperformed SVM-PSO1 which meant the additional meteorology parameters used as input to the proposed models significantly affect the model performances. Hybrid models SVM-PSO1 and SVM-PSO2 yield higher performances as compared to SVM1 and SVM2. It is found that hybrid models are more effective in forecasting river flow at 1 to 7 day ahead at the study area.
Distributed support vector machine in master-slave mode.
Chen, Qingguo; Cao, Feilong
2018-05-01
It is well known that the support vector machine (SVM) is an effective learning algorithm. The alternating direction method of multipliers (ADMM) algorithm has emerged as a powerful technique for solving distributed optimisation models. This paper proposes a distributed SVM algorithm in a master-slave mode (MS-DSVM), which integrates a distributed SVM and ADMM acting in a master-slave configuration where the master node and slave nodes are connected, meaning the results can be broadcasted. The distributed SVM is regarded as a regularised optimisation problem and modelled as a series of convex optimisation sub-problems that are solved by ADMM. Additionally, the over-relaxation technique is utilised to accelerate the convergence rate of the proposed MS-DSVM. Our theoretical analysis demonstrates that the proposed MS-DSVM has linear convergence, meaning it possesses the fastest convergence rate among existing standard distributed ADMM algorithms. Numerical examples demonstrate that the convergence and accuracy of the proposed MS-DSVM are superior to those of existing methods under the ADMM framework. Copyright © 2018 Elsevier Ltd. All rights reserved.
Identification of handwriting by using the genetic algorithm (GA) and support vector machine (SVM)
NASA Astrophysics Data System (ADS)
Zhang, Qigui; Deng, Kai
2016-12-01
As portable digital camera and a camera phone comes more and more popular, and equally pressing is meeting the requirements of people to shoot at any time, to identify and storage handwritten character. In this paper, genetic algorithm(GA) and support vector machine(SVM)are used for identification of handwriting. Compare with parameters-optimized method, this technique overcomes two defects: first, it's easy to trap in the local optimum; second, finding the best parameters in the larger range will affects the efficiency of classification and prediction. As the experimental results suggest, GA-SVM has a higher recognition rate.
Support Vector Machine Based on Adaptive Acceleration Particle Swarm Optimization
Abdulameer, Mohammed Hasan; Othman, Zulaiha Ali
2014-01-01
Existing face recognition methods utilize particle swarm optimizer (PSO) and opposition based particle swarm optimizer (OPSO) to optimize the parameters of SVM. However, the utilization of random values in the velocity calculation decreases the performance of these techniques; that is, during the velocity computation, we normally use random values for the acceleration coefficients and this creates randomness in the solution. To address this problem, an adaptive acceleration particle swarm optimization (AAPSO) technique is proposed. To evaluate our proposed method, we employ both face and iris recognition based on AAPSO with SVM (AAPSO-SVM). In the face and iris recognition systems, performance is evaluated using two human face databases, YALE and CASIA, and the UBiris dataset. In this method, we initially perform feature extraction and then recognition on the extracted features. In the recognition process, the extracted features are used for SVM training and testing. During the training and testing, the SVM parameters are optimized with the AAPSO technique, and in AAPSO, the acceleration coefficients are computed using the particle fitness values. The parameters in SVM, which are optimized by AAPSO, perform efficiently for both face and iris recognition. A comparative analysis between our proposed AAPSO-SVM and the PSO-SVM technique is presented. PMID:24790584
Guo, Doudou; Juan, Jiaxiang; Chang, Liying; Zhang, Jingjin; Huang, Danfeng
2017-08-15
Plant-based sensing on water stress can provide sensitive and direct reference for precision irrigation system in greenhouse. However, plant information acquisition, interpretation, and systematical application remain insufficient. This study developed a discrimination method for plant root zone water status in greenhouse by integrating phenotyping and machine learning techniques. Pakchoi plants were used and treated by three root zone moisture levels, 40%, 60%, and 80% relative water content. Three classification models, Random Forest (RF), Neural Network (NN), and Support Vector Machine (SVM) were developed and validated in different scenarios with overall accuracy over 90% for all. SVM model had the highest value, but it required the longest training time. All models had accuracy over 85% in all scenarios, and more stable performance was observed in RF model. Simplified SVM model developed by the top five most contributing traits had the largest accuracy reduction as 29.5%, while simplified RF and NN model still maintained approximately 80%. For real case application, factors such as operation cost, precision requirement, and system reaction time should be synthetically considered in model selection. Our work shows it is promising to discriminate plant root zone water status by implementing phenotyping and machine learning techniques for precision irrigation management.
Epileptic seizure detection in EEG signal using machine learning techniques.
Jaiswal, Abeg Kumar; Banka, Haider
2018-03-01
Epilepsy is a well-known nervous system disorder characterized by seizures. Electroencephalograms (EEGs), which capture brain neural activity, can detect epilepsy. Traditional methods for analyzing an EEG signal for epileptic seizure detection are time-consuming. Recently, several automated seizure detection frameworks using machine learning technique have been proposed to replace these traditional methods. The two basic steps involved in machine learning are feature extraction and classification. Feature extraction reduces the input pattern space by keeping informative features and the classifier assigns the appropriate class label. In this paper, we propose two effective approaches involving subpattern based PCA (SpPCA) and cross-subpattern correlation-based PCA (SubXPCA) with Support Vector Machine (SVM) for automated seizure detection in EEG signals. Feature extraction was performed using SpPCA and SubXPCA. Both techniques explore the subpattern correlation of EEG signals, which helps in decision-making process. SVM is used for classification of seizure and non-seizure EEG signals. The SVM was trained with radial basis kernel. All the experiments have been carried out on the benchmark epilepsy EEG dataset. The entire dataset consists of 500 EEG signals recorded under different scenarios. Seven different experimental cases for classification have been conducted. The classification accuracy was evaluated using tenfold cross validation. The classification results of the proposed approaches have been compared with the results of some of existing techniques proposed in the literature to establish the claim.
Construction accident narrative classification: An evaluation of text mining techniques.
Goh, Yang Miang; Ubeynarayana, C U
2017-11-01
Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Support Vector Machine-Based Gender Identification Using Speech Signal
NASA Astrophysics Data System (ADS)
Lee, Kye-Hwan; Kang, Sang-Ick; Kim, Deok-Hwan; Chang, Joon-Hyuk
We propose an effective voice-based gender identification method using a support vector machine (SVM). The SVM is a binary classification algorithm that classifies two groups by finding the voluntary nonlinear boundary in a feature space and is known to yield high classification performance. In the present work, we compare the identification performance of the SVM with that of a Gaussian mixture model (GMM)-based method using the mel frequency cepstral coefficients (MFCC). A novel approach of incorporating a features fusion scheme based on a combination of the MFCC and the fundamental frequency is proposed with the aim of improving the performance of gender identification. Experimental results demonstrate that the gender identification performance using the SVM is significantly better than that of the GMM-based scheme. Moreover, the performance is substantially improved when the proposed features fusion technique is applied.
Machine Learning Toolkit for Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-03-31
Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are consideredmore » in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less
NASA Astrophysics Data System (ADS)
Wang, Danshi; Zhang, Min; Cai, Zhongle; Cui, Yue; Li, Ze; Han, Huanhuan; Fu, Meixia; Luo, Bin
2016-06-01
An effective machine learning algorithm, the support vector machine (SVM), is presented in the context of a coherent optical transmission system. As a classifier, the SVM can create nonlinear decision boundaries to mitigate the distortions caused by nonlinear phase noise (NLPN). Without any prior information or heuristic assumptions, the SVM can learn and capture the link properties from only a few training data. Compared with the maximum likelihood estimation (MLE) algorithm, a lower bit-error rate (BER) is achieved by the SVM for a given launch power; moreover, the launch power dynamic range (LPDR) is increased by 3.3 dBm for 8 phase-shift keying (8 PSK), 1.2 dBm for QPSK, and 0.3 dBm for BPSK. The maximum transmission distance corresponding to a BER of 1 ×10-3 is increased by 480 km for the case of 8 PSK. The larger launch power range and longer transmission distance improve the tolerance to amplitude and phase noise, which demonstrates the feasibility of the SVM in digital signal processing for M-PSK formats. Meanwhile, in order to apply the SVM method to 16 quadratic amplitude modulation (16 QAM) detection, we propose a parameter optimization scheme. By utilizing a cross-validation and grid-search techniques, the optimal parameters of SVM can be selected, thus leading to the LPDR improvement by 2.8 dBm. Additionally, we demonstrate that the SVM is also effective in combating the laser phase noise combined with the inphase and quadrature (I/Q) modulator imperfections, but the improvement is insignificant for the linear noise and separate I/Q imbalance. The computational complexity of SVM is also discussed. The relatively low complexity makes it possible for SVM to implement the real-time processing.
Chen, Zhenyu; Li, Jianping; Wei, Liwei
2007-10-01
Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.
NASA Astrophysics Data System (ADS)
Lee, Donghoon; Kim, Ye-seul; Choi, Sunghoon; Lee, Haenghwa; Jo, Byungdu; Choi, Seungyeon; Shin, Jungwook; Kim, Hee-Joung
2017-03-01
The chest digital tomosynthesis(CDT) is recently developed medical device that has several advantage for diagnosing lung disease. For example, CDT provides depth information with relatively low radiation dose compared to computed tomography (CT). However, a major problem with CDT is the image artifacts associated with data incompleteness resulting from limited angle data acquisition in CDT geometry. For this reason, the sensitivity of lung disease was not clear compared to CT. In this study, to improve sensitivity of lung disease detection in CDT, we developed computer aided diagnosis (CAD) systems based on machine learning. For design CAD systems, we used 100 cases of lung nodules cropped images and 100 cases of normal lesion cropped images acquired by lung man phantoms and proto type CDT. We used machine learning techniques based on support vector machine and Gabor filter. The Gabor filter was used for extracting characteristics of lung nodules and we compared performance of feature extraction of Gabor filter with various scale and orientation parameters. We used 3, 4, 5 scales and 4, 6, 8 orientations. After extracting features, support vector machine (SVM) was used for classifying feature of lesions. The linear, polynomial and Gaussian kernels of SVM were compared to decide the best SVM conditions for CDT reconstruction images. The results of CAD system with machine learning showed the capability of automatically lung lesion detection. Furthermore detection performance was the best when Gabor filter with 5 scale and 8 orientation and SVM with Gaussian kernel were used. In conclusion, our suggested CAD system showed improving sensitivity of lung lesion detection in CDT and decide Gabor filter and SVM conditions to achieve higher detection performance of our developed CAD system for CDT.
Markerless gating for lung cancer radiotherapy based on machine learning techniques
NASA Astrophysics Data System (ADS)
Lin, Tong; Li, Ruijiang; Tang, Xiaoli; Dy, Jennifer G.; Jiang, Steve B.
2009-03-01
In lung cancer radiotherapy, radiation to a mobile target can be delivered by respiratory gating, for which we need to know whether the target is inside or outside a predefined gating window at any time point during the treatment. This can be achieved by tracking one or more fiducial markers implanted inside or near the target, either fluoroscopically or electromagnetically. However, the clinical implementation of marker tracking is limited for lung cancer radiotherapy mainly due to the risk of pneumothorax. Therefore, gating without implanted fiducial markers is a promising clinical direction. We have developed several template-matching methods for fluoroscopic marker-less gating. Recently, we have modeled the gating problem as a binary pattern classification problem, in which principal component analysis (PCA) and support vector machine (SVM) are combined to perform the classification task. Following the same framework, we investigated different combinations of dimensionality reduction techniques (PCA and four nonlinear manifold learning methods) and two machine learning classification methods (artificial neural networks—ANN and SVM). Performance was evaluated on ten fluoroscopic image sequences of nine lung cancer patients. We found that among all combinations of dimensionality reduction techniques and classification methods, PCA combined with either ANN or SVM achieved a better performance than the other nonlinear manifold learning methods. ANN when combined with PCA achieves a better performance than SVM in terms of classification accuracy and recall rate, although the target coverage is similar for the two classification methods. Furthermore, the running time for both ANN and SVM with PCA is within tolerance for real-time applications. Overall, ANN combined with PCA is a better candidate than other combinations we investigated in this work for real-time gated radiotherapy.
NASA Astrophysics Data System (ADS)
Quitadamo, L. R.; Cavrini, F.; Sbernini, L.; Riillo, F.; Bianchi, L.; Seri, S.; Saggio, G.
2017-02-01
Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported.
Hussain, Lal; Ahmed, Adeel; Saeed, Sharjil; Rathore, Saima; Awan, Imtiaz Ahmed; Shah, Saeed Arif; Majid, Abdul; Idris, Adnan; Awan, Anees Ahmed
2018-02-06
Prostate is a second leading causes of cancer deaths among men. Early detection of cancer can effectively reduce the rate of mortality caused by Prostate cancer. Due to high and multiresolution of MRIs from prostate cancer require a proper diagnostic systems and tools. In the past researchers developed Computer aided diagnosis (CAD) systems that help the radiologist to detect the abnormalities. In this research paper, we have employed novel Machine learning techniques such as Bayesian approach, Support vector machine (SVM) kernels: polynomial, radial base function (RBF) and Gaussian and Decision Tree for detecting prostate cancer. Moreover, different features extracting strategies are proposed to improve the detection performance. The features extracting strategies are based on texture, morphological, scale invariant feature transform (SIFT), and elliptic Fourier descriptors (EFDs) features. The performance was evaluated based on single as well as combination of features using Machine Learning Classification techniques. The Cross validation (Jack-knife k-fold) was performed and performance was evaluated in term of receiver operating curve (ROC) and specificity, sensitivity, Positive predictive value (PPV), negative predictive value (NPV), false positive rate (FPR). Based on single features extracting strategies, SVM Gaussian Kernel gives the highest accuracy of 98.34% with AUC of 0.999. While, using combination of features extracting strategies, SVM Gaussian kernel with texture + morphological, and EFDs + morphological features give the highest accuracy of 99.71% and AUC of 1.00.
NASA Astrophysics Data System (ADS)
Leena, N.; Saju, K. K.
2018-04-01
Nutritional deficiencies in plants are a major concern for farmers as it affects productivity and thus profit. The work aims to classify nutritional deficiencies in maize plant in a non-destructive mannerusing image processing and machine learning techniques. The colored images of the leaves are analyzed and classified with multi-class support vector machine (SVM) method. Several images of maize leaves with known deficiencies like nitrogen, phosphorous and potassium (NPK) are used to train the SVM classifier prior to the classification of test images. The results show that the method was able to classify and identify nutritional deficiencies.
Support vector machine in machine condition monitoring and fault diagnosis
NASA Astrophysics Data System (ADS)
Widodo, Achmad; Yang, Bo-Suk
2007-08-01
Recently, the issue of machine condition monitoring and fault diagnosis as a part of maintenance system became global due to the potential advantages to be gained from reduced maintenance costs, improved productivity and increased machine availability. This paper presents a survey of machine condition monitoring and fault diagnosis using support vector machine (SVM). It attempts to summarize and review the recent research and developments of SVM in machine condition monitoring and diagnosis. Numerous methods have been developed based on intelligent systems such as artificial neural network, fuzzy expert system, condition-based reasoning, random forest, etc. However, the use of SVM for machine condition monitoring and fault diagnosis is still rare. SVM has excellent performance in generalization so it can produce high accuracy in classification for machine condition monitoring and diagnosis. Until 2006, the use of SVM in machine condition monitoring and fault diagnosis is tending to develop towards expertise orientation and problem-oriented domain. Finally, the ability to continually change and obtain a novel idea for machine condition monitoring and fault diagnosis using SVM will be future works.
[Identification of varieties of cashmere by Vis/NIR spectroscopy technology based on PCA-SVM].
Wu, Gui-Fang; He, Yong
2009-06-01
One mixed algorithm was presented to discriminate cashmere varieties with principal component analysis (PCA) and support vector machine (SVM). Cashmere fiber has such characteristics as threadlike, softness, glossiness and high tensile strength. The quality characters and economic value of each breed of cashmere are very different. In order to safeguard the consumer's rights and guarantee the quality of cashmere product, quickly, efficiently and correctly identifying cashmere has significant meaning to the production and transaction of cashmere material. The present research adopts Vis/NIRS spectroscopy diffuse techniques to collect the spectral data of cashmere. The near infrared fingerprint of cashmere was acquired by principal component analysis (PCA), and support vector machine (SVM) methods were used to further identify the cashmere material. The result of PCA indicated that the score map made by the scores of PC1, PC2 and PC3 was used, and 10 principal components (PCs) were selected as the input of support vector machine (SVM) based on the reliabilities of PCs of 99.99%. One hundred cashmere samples were used for calibration and the remaining 75 cashmere samples were used for validation. A one-against-all multi-class SVM model was built, the capabilities of SVM with different kernel function were comparatively analyzed, and the result showed that SVM possessing with the Gaussian kernel function has the best identification capabilities with the accuracy of 100%. This research indicated that the data mining method of PCA-SVM has a good identification effect, and can work as a new method for rapid identification of cashmere material varieties.
NASA Astrophysics Data System (ADS)
Zhang, Yanjiao; Lai, Xiaoping; Zeng, Qiuyao; Li, Linfang; Lin, Lin; Li, Shaoxin; Liu, Zhiming; Su, Chengkang; Qi, Minni; Guo, Zhouyi
2018-03-01
This study aims to classify low-grade and high-grade bladder cancer (BC) patients using serum surface-enhanced Raman scattering (SERS) spectra and support vector machine (SVM) algorithms. Serum SERS spectra are acquired from 88 serum samples with silver nanoparticles as the SERS-active substrate. Diagnostic accuracies of 96.4% and 95.4% are obtained when differentiating the serum SERS spectra of all BC patients versus normal subjects and low-grade versus high-grade BC patients, respectively, with optimal SVM classifier models. This study demonstrates that the serum SERS technique combined with SVM has great potential to noninvasively detect and classify high-grade and low-grade BC patients.
Karan, Shivesh Kishore; Samadder, Sukha Ranjan
2016-08-01
One objective of the present study was to evaluate the performance of support vector machine (SVM)-based image classification technique with the maximum likelihood classification (MLC) technique for a rapidly changing landscape of an open-cast mine. The other objective was to assess the change in land use pattern due to coal mining from 2006 to 2016. Assessing the change in land use pattern accurately is important for the development and monitoring of coalfields in conjunction with sustainable development. For the present study, Landsat 5 Thematic Mapper (TM) data of 2006 and Landsat 8 Operational Land Imager (OLI)/Thermal Infrared Sensor (TIRS) data of 2016 of a part of Jharia Coalfield, Dhanbad, India, were used. The SVM classification technique provided greater overall classification accuracy when compared to the MLC technique in classifying heterogeneous landscape with limited training dataset. SVM exceeded MLC in handling a difficult challenge of classifying features having near similar reflectance on the mean signature plot, an improvement of over 11 % was observed in classification of built-up area, and an improvement of 24 % was observed in classification of surface water using SVM; similarly, the SVM technique improved the overall land use classification accuracy by almost 6 and 3 % for Landsat 5 and Landsat 8 images, respectively. Results indicated that land degradation increased significantly from 2006 to 2016 in the study area. This study will help in quantifying the changes and can also serve as a basis for further decision support system studies aiding a variety of purposes such as planning and management of mines and environmental impact assessment.
Wahba, Maram A; Ashour, Amira S; Napoleon, Sameh A; Abd Elnaby, Mustafa M; Guo, Yanhui
2017-12-01
Basal cell carcinoma is one of the most common malignant skin lesions. Automated lesion identification and classification using image processing techniques is highly required to reduce the diagnosis errors. In this study, a novel technique is applied to classify skin lesion images into two classes, namely the malignant Basal cell carcinoma and the benign nevus. A hybrid combination of bi-dimensional empirical mode decomposition and gray-level difference method features is proposed after hair removal. The combined features are further classified using quadratic support vector machine (Q-SVM). The proposed system has achieved outstanding performance of 100% accuracy, sensitivity and specificity compared to other support vector machine procedures as well as with different extracted features. Basal Cell Carcinoma is effectively classified using Q-SVM with the proposed combined features.
Incremental Support Vector Machine Framework for Visual Sensor Networks
NASA Astrophysics Data System (ADS)
Awad, Mariette; Jiang, Xianhua; Motai, Yuichi
2006-12-01
Motivated by the emerging requirements of surveillance networks, we present in this paper an incremental multiclassification support vector machine (SVM) technique as a new framework for action classification based on real-time multivideo collected by homogeneous sites. The technique is based on an adaptation of least square SVM (LS-SVM) formulation but extends beyond the static image-based learning of current SVM methodologies. In applying the technique, an initial supervised offline learning phase is followed by a visual behavior data acquisition and an online learning phase during which the cluster head performs an ensemble of model aggregations based on the sensor nodes inputs. The cluster head then selectively switches on designated sensor nodes for future incremental learning. Combining sensor data offers an improvement over single camera sensing especially when the latter has an occluded view of the target object. The optimization involved alleviates the burdens of power consumption and communication bandwidth requirements. The resulting misclassification error rate, the iterative error reduction rate of the proposed incremental learning, and the decision fusion technique prove its validity when applied to visual sensor networks. Furthermore, the enabled online learning allows an adaptive domain knowledge insertion and offers the advantage of reducing both the model training time and the information storage requirements of the overall system which makes it even more attractive for distributed sensor networks communication.
Nonlinear detection for a high rate extended binary phase shift keying system.
Chen, Xian-Qing; Wu, Le-Nan
2013-03-28
The algorithm and the results of a nonlinear detector using a machine learning technique called support vector machine (SVM) on an efficient modulation system with high data rate and low energy consumption is presented in this paper. Simulation results showed that the performance achieved by the SVM detector is comparable to that of a conventional threshold decision (TD) detector. The two detectors detect the received signals together with the special impacting filter (SIF) that can improve the energy utilization efficiency. However, unlike the TD detector, the SVM detector concentrates not only on reducing the BER of the detector, but also on providing accurate posterior probability estimates (PPEs), which can be used as soft-inputs of the LDPC decoder. The complexity of this detector is considered in this paper by using four features and simplifying the decision function. In addition, a bandwidth efficient transmission is analyzed with both SVM and TD detector. The SVM detector is more robust to sampling rate than TD detector. We find that the SVM is suitable for extended binary phase shift keying (EBPSK) signal detection and can provide accurate posterior probability for LDPC decoding.
Nonlinear Detection for a High Rate Extended Binary Phase Shift Keying System
Chen, Xian-Qing; Wu, Le-Nan
2013-01-01
The algorithm and the results of a nonlinear detector using a machine learning technique called support vector machine (SVM) on an efficient modulation system with high data rate and low energy consumption is presented in this paper. Simulation results showed that the performance achieved by the SVM detector is comparable to that of a conventional threshold decision (TD) detector. The two detectors detect the received signals together with the special impacting filter (SIF) that can improve the energy utilization efficiency. However, unlike the TD detector, the SVM detector concentrates not only on reducing the BER of the detector, but also on providing accurate posterior probability estimates (PPEs), which can be used as soft-inputs of the LDPC decoder. The complexity of this detector is considered in this paper by using four features and simplifying the decision function. In addition, a bandwidth efficient transmission is analyzed with both SVM and TD detector. The SVM detector is more robust to sampling rate than TD detector. We find that the SVM is suitable for extended binary phase shift keying (EBPSK) signal detection and can provide accurate posterior probability for LDPC decoding. PMID:23539034
An SVM-based solution for fault detection in wind turbines.
Santos, Pedro; Villa, Luisa F; Reñones, Aníbal; Bustillo, Andres; Maudes, Jesús
2015-03-09
Research into fault diagnosis in machines with a wide range of variable loads and speeds, such as wind turbines, is of great industrial interest. Analysis of the power signals emitted by wind turbines for the diagnosis of mechanical faults in their mechanical transmission chain is insufficient. A successful diagnosis requires the inclusion of accelerometers to evaluate vibrations. This work presents a multi-sensory system for fault diagnosis in wind turbines, combined with a data-mining solution for the classification of the operational state of the turbine. The selected sensors are accelerometers, in which vibration signals are processed using angular resampling techniques and electrical, torque and speed measurements. Support vector machines (SVMs) are selected for the classification task, including two traditional and two promising new kernels. This multi-sensory system has been validated on a test-bed that simulates the real conditions of wind turbines with two fault typologies: misalignment and imbalance. Comparison of SVM performance with the results of artificial neural networks (ANNs) shows that linear kernel SVM outperforms other kernels and ANNs in terms of accuracy, training and tuning times. The suitability and superior performance of linear SVM is also experimentally analyzed, to conclude that this data acquisition technique generates linearly separable datasets.
NASA Technical Reports Server (NTRS)
Forman, Barton A.; Reichle, Rolf Helmut
2014-01-01
A support vector machine (SVM), a machine learning technique developed from statistical learning theory, is employed for the purpose of estimating passive microwave (PMW) brightness temperatures over snow-covered land in North America as observed by the Advanced Microwave Scanning Radiometer (AMSR-E) satellite sensor. The capability of the trained SVM is compared relative to the artificial neural network (ANN) estimates originally presented in [14]. The results suggest the SVM outperforms the ANN at 10.65 GHz, 18.7 GHz, and 36.5 GHz for both vertically and horizontally-polarized PMW radiation. When compared against daily AMSR-E measurements not used during the training procedure and subsequently averaged across the North American domain over the 9-year study period, the root mean squared error in the SVM output is 8 K or less while the anomaly correlation coefficient is 0.7 or greater. When compared relative to the results from the ANN at any of the six frequency and polarization combinations tested, the root mean squared error was reduced by more than 18 percent while the anomaly correlation coefficient was increased by more than 52 percent. Further, the temporal and spatial variability in the modeled brightness temperatures via the SVM more closely agrees with that found in the original AMSR-E measurements. These findings suggest the SVM is a superior alternative to the ANN for eventual use as a measurement operator within a data assimilation framework.
Support vector machine multiuser receiver for DS-CDMA signals in multipath channels.
Chen, S; Samingan, A K; Hanzo, L
2001-01-01
The problem of constructing an adaptive multiuser detector (MUD) is considered for direct sequence code division multiple access (DS-CDMA) signals transmitted through multipath channels. The emerging learning technique, called support vector machines (SVM), is proposed as a method of obtaining a nonlinear MUD from a relatively small training data block. Computer simulation is used to study this SVM MUD, and the results show that it can closely match the performance of the optimal Bayesian one-shot detector. Comparisons with an adaptive radial basis function (RBF) MUD trained by an unsupervised clustering algorithm are discussed.
Multiclass Classification of Cardiac Arrhythmia Using Improved Feature Selection and SVM Invariants.
Mustaqeem, Anam; Anwar, Syed Muhammad; Majid, Muahammad
2018-01-01
Arrhythmia is considered a life-threatening disease causing serious health issues in patients, when left untreated. An early diagnosis of arrhythmias would be helpful in saving lives. This study is conducted to classify patients into one of the sixteen subclasses, among which one class represents absence of disease and the other fifteen classes represent electrocardiogram records of various subtypes of arrhythmias. The research is carried out on the dataset taken from the University of California at Irvine Machine Learning Data Repository. The dataset contains a large volume of feature dimensions which are reduced using wrapper based feature selection technique. For multiclass classification, support vector machine (SVM) based approaches including one-against-one (OAO), one-against-all (OAA), and error-correction code (ECC) are employed to detect the presence and absence of arrhythmias. The SVM method results are compared with other standard machine learning classifiers using varying parameters and the performance of the classifiers is evaluated using accuracy, kappa statistics, and root mean square error. The results show that OAO method of SVM outperforms all other classifiers by achieving an accuracy rate of 81.11% when used with 80/20 data split and 92.07% using 90/10 data split option.
Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin
2017-01-01
Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization. PMID:28599282
Zhang, Xin; Yan, Lin-Feng; Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin
2017-07-18
Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization.
Alejo, Luz; Atkinson, John; Guzmán-Fierro, Víctor; Roeckel, Marlene
2018-05-16
Computational self-adapting methods (Support Vector Machines, SVM) are compared with an analytical method in effluent composition prediction of a two-stage anaerobic digestion (AD) process. Experimental data for the AD of poultry manure were used. The analytical method considers the protein as the only source of ammonia production in AD after degradation. Total ammonia nitrogen (TAN), total solids (TS), chemical oxygen demand (COD), and total volatile solids (TVS) were measured in the influent and effluent of the process. The TAN concentration in the effluent was predicted, this being the most inhibiting and polluting compound in AD. Despite the limited data available, the SVM-based model outperformed the analytical method for the TAN prediction, achieving a relative average error of 15.2% against 43% for the analytical method. Moreover, SVM showed higher prediction accuracy in comparison with Artificial Neural Networks. This result reveals the future promise of SVM for prediction in non-linear and dynamic AD processes. Graphical abstract ᅟ.
A linear-RBF multikernel SVM to classify big text corpora.
Romero, R; Iglesias, E L; Borrajo, L
2015-01-01
Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers.
Balabin, Roman M; Lomakina, Ekaterina I
2011-04-21
In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.
Yu, Wei; Clyne, Melinda; Dolan, Siobhan M; Yesupriya, Ajay; Wulf, Anja; Liu, Tiebin; Khoury, Muin J; Gwinn, Marta
2008-04-22
Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.
USDA-ARS?s Scientific Manuscript database
This study evaluated linear spectral unmixing (LSU), mixture tuned matched filtering (MTMF) and support vector machine (SVM) techniques for detecting and mapping giant reed (Arundo donax L.), an invasive weed that presents a severe threat to agroecosystems and riparian areas throughout the southern ...
Using support vector machines to identify literacy skills: Evidence from eye movements.
Lou, Ya; Liu, Yanping; Kaakinen, Johanna K; Li, Xingshan
2017-06-01
Is inferring readers' literacy skills possible by analyzing their eye movements during text reading? This study used Support Vector Machines (SVM) to analyze eye movement data from 61 undergraduate students who read a multiple-paragraph, multiple-topic expository text. Forward fixation time, first-pass rereading time, second-pass fixation time, and regression path reading time on different regions of the text were provided as features. The SVM classification algorithm assisted in distinguishing high-literacy-skilled readers from low-literacy-skilled readers with 80.3 % accuracy. Results demonstrate the effectiveness of combining eye tracking and machine learning techniques to detect readers with low literacy skills, and suggest that such approaches can be potentially used in predicting other cognitive abilities.
An implementation of support vector machine on sentiment classification of movie reviews
NASA Astrophysics Data System (ADS)
Yulietha, I. M.; Faraby, S. A.; Adiwijaya; Widyaningtyas, W. C.
2018-03-01
With technological advances, all information about movie is available on the internet. If the information is processed properly, it will get the quality of the information. This research proposes to the classify sentiments on movie review documents. This research uses Support Vector Machine (SVM) method because it can classify high dimensional data in accordance with the data used in this research in the form of text. Support Vector Machine is a popular machine learning technique for text classification because it can classify by learning from a collection of documents that have been classified previously and can provide good result. Based on number of datasets, the 90-10 composition has the best result that is 85.6%. Based on SVM kernel, kernel linear with constant 1 has the best result that is 84.9%
Generalized SMO algorithm for SVM-based multitask learning.
Cai, Feng; Cherkassky, Vladimir
2012-06-01
Exploiting additional information to improve traditional inductive learning is an active research area in machine learning. In many supervised-learning applications, training data can be naturally separated into several groups, and incorporating this group information into learning may improve generalization. Recently, Vapnik proposed a general approach to formalizing such problems, known as "learning with structured data" and its support vector machine (SVM) based optimization formulation called SVM+. Liang and Cherkassky showed the connection between SVM+ and multitask learning (MTL) approaches in machine learning, and proposed an SVM-based formulation for MTL called SVM+MTL for classification. Training the SVM+MTL classifier requires the solution of a large quadratic programming optimization problem which scales as O(n(3)) with sample size n. So there is a need to develop computationally efficient algorithms for implementing SVM+MTL. This brief generalizes Platt's sequential minimal optimization (SMO) algorithm to the SVM+MTL setting. Empirical results show that, for typical SVM+MTL problems, the proposed generalized SMO achieves over 100 times speed-up, in comparison with general-purpose optimization routines.
NASA Astrophysics Data System (ADS)
Calvin Frans Mariel, Wahyu; Mariyah, Siti; Pramana, Setia
2018-03-01
Deep learning is a new era of machine learning techniques that essentially imitate the structure and function of the human brain. It is a development of deeper Artificial Neural Network (ANN) that uses more than one hidden layer. Deep Learning Neural Network has a great ability on recognizing patterns from various data types such as picture, audio, text, and many more. In this paper, the authors tries to measure that algorithm’s ability by applying it into the text classification. The classification task herein is done by considering the content of sentiment in a text which is also called as sentiment analysis. By using several combinations of text preprocessing and feature extraction techniques, we aim to compare the precise modelling results of Deep Learning Neural Network with the other two commonly used algorithms, the Naϊve Bayes and Support Vector Machine (SVM). This algorithm comparison uses Indonesian text data with balanced and unbalanced sentiment composition. Based on the experimental simulation, Deep Learning Neural Network clearly outperforms the Naϊve Bayes and SVM and offers a better F-1 Score while for the best feature extraction technique which improves that modelling result is Bigram.
Extended robust support vector machine based on financial risk minimization.
Takeda, Akiko; Fujiwara, Shuhei; Kanamori, Takafumi
2014-11-01
Financial risk measures have been used recently in machine learning. For example, ν-support vector machine ν-SVM) minimizes the conditional value at risk (CVaR) of margin distribution. The measure is popular in finance because of the subadditivity property, but it is very sensitive to a few outliers in the tail of the distribution. We propose a new classification method, extended robust SVM (ER-SVM), which minimizes an intermediate risk measure between the CVaR and value at risk (VaR) by expecting that the resulting model becomes less sensitive than ν-SVM to outliers. We can regard ER-SVM as an extension of robust SVM, which uses a truncated hinge loss. Numerical experiments imply the ER-SVM's possibility of achieving a better prediction performance with proper parameter setting.
Damage level prediction of non-reshaped berm breakwater using ANN, SVM and ANFIS models
NASA Astrophysics Data System (ADS)
Mandal, Sukomal; Rao, Subba; N., Harish; Lokesha
2012-06-01
The damage analysis of coastal structure is very important as it involves many design parameters to be considered for the better and safe design of structure. In the present study experimental data for non-reshaped berm breakwater are collected from Marine Structures Laboratory, Department of Applied Mechanics and Hydraulics, NITK, Surathkal, India. Soft computing techniques like Artificial Neural Network (ANN), Support Vector Machine (SVM) and Adaptive Neuro Fuzzy Inference system (ANFIS) models are constructed using experimental data sets to predict the damage level of non-reshaped berm breakwater. The experimental data are used to train ANN, SVM and ANFIS models and results are determined in terms of statistical measures like mean square error, root mean square error, correla-tion coefficient and scatter index. The result shows that soft computing techniques i.e., ANN, SVM and ANFIS can be efficient tools in predicting damage levels of non reshaped berm breakwater.
Hsiung, Chang; Pederson, Christopher G.; Zou, Peng; Smith, Valton; von Gunten, Marc; O’Brien, Nada A.
2016-01-01
Near-infrared spectroscopy as a rapid and non-destructive analytical technique offers great advantages for pharmaceutical raw material identification (RMID) to fulfill the quality and safety requirements in pharmaceutical industry. In this study, we demonstrated the use of portable miniature near-infrared (MicroNIR) spectrometers for NIR-based pharmaceutical RMID and solved two challenges in this area, model transferability and large-scale classification, with the aid of support vector machine (SVM) modeling. We used a set of 19 pharmaceutical compounds including various active pharmaceutical ingredients (APIs) and excipients and six MicroNIR spectrometers to test model transferability. For the test of large-scale classification, we used another set of 253 pharmaceutical compounds comprised of both chemically and physically different APIs and excipients. We compared SVM with conventional chemometric modeling techniques, including soft independent modeling of class analogy, partial least squares discriminant analysis, linear discriminant analysis, and quadratic discriminant analysis. Support vector machine modeling using a linear kernel, especially when combined with a hierarchical scheme, exhibited excellent performance in both model transferability and large-scale classification. Hence, ultra-compact, portable and robust MicroNIR spectrometers coupled with SVM modeling can make on-site and in situ pharmaceutical RMID for large-volume applications highly achievable. PMID:27029624
An SVM-Based Solution for Fault Detection in Wind Turbines
Santos, Pedro; Villa, Luisa F.; Reñones, Aníbal; Bustillo, Andres; Maudes, Jesús
2015-01-01
Research into fault diagnosis in machines with a wide range of variable loads and speeds, such as wind turbines, is of great industrial interest. Analysis of the power signals emitted by wind turbines for the diagnosis of mechanical faults in their mechanical transmission chain is insufficient. A successful diagnosis requires the inclusion of accelerometers to evaluate vibrations. This work presents a multi-sensory system for fault diagnosis in wind turbines, combined with a data-mining solution for the classification of the operational state of the turbine. The selected sensors are accelerometers, in which vibration signals are processed using angular resampling techniques and electrical, torque and speed measurements. Support vector machines (SVMs) are selected for the classification task, including two traditional and two promising new kernels. This multi-sensory system has been validated on a test-bed that simulates the real conditions of wind turbines with two fault typologies: misalignment and imbalance. Comparison of SVM performance with the results of artificial neural networks (ANNs) shows that linear kernel SVM outperforms other kernels and ANNs in terms of accuracy, training and tuning times. The suitability and superior performance of linear SVM is also experimentally analyzed, to conclude that this data acquisition technique generates linearly separable datasets. PMID:25760051
NASA Astrophysics Data System (ADS)
Gavrishchaka, V. V.; Ganguli, S. B.
2001-12-01
Reliable forecasting of rare events in a complex dynamical system is a challenging problem that is important for many practical applications. Due to the nature of rare events, data set available for construction of the statistical and/or machine learning model is often very limited and incomplete. Therefore many widely used approaches including such robust algorithms as neural networks can easily become inadequate for rare events prediction. Moreover in many practical cases models with high-dimensional inputs are required. This limits applications of the existing rare event modeling techniques (e.g., extreme value theory) that focus on univariate cases. These approaches are not easily extended to multivariate cases. Support vector machine (SVM) is a machine learning system that can provide an optimal generalization using very limited and incomplete training data sets and can efficiently handle high-dimensional data. These features may allow to use SVM to model rare events in some applications. We have applied SVM-based system to the problem of large-amplitude substorm prediction and extreme event forecasting in stock and currency exchange markets. Encouraging preliminary results will be presented and other possible applications of the system will be discussed.
NASA Astrophysics Data System (ADS)
Sehad, Mounir; Lazri, Mourad; Ameur, Soltane
2017-03-01
In this work, a new rainfall estimation technique based on the high spatial and temporal resolution of the Spinning Enhanced Visible and Infra Red Imager (SEVIRI) aboard the Meteosat Second Generation (MSG) is presented. This work proposes efficient scheme rainfall estimation based on two multiclass support vector machine (SVM) algorithms: SVM_D for daytime and SVM_N for night time rainfall estimations. Both SVM models are trained using relevant rainfall parameters based on optical, microphysical and textural cloud proprieties. The cloud parameters are derived from the Spectral channels of the SEVIRI MSG radiometer. The 3-hourly and daily accumulated rainfall are derived from the 15 min-rainfall estimation given by the SVM classifiers for each MSG observation image pixel. The SVMs were trained with ground meteorological radar precipitation scenes recorded from November 2006 to March 2007 over the north of Algeria located in the Mediterranean region. Further, the SVM_D and SVM_N models were used to estimate 3-hourly and daily rainfall using data set gathered from November 2010 to March 2011 over north Algeria. The results were validated against collocated rainfall observed by rain gauge network. Indeed, the statistical scores given by correlation coefficient, bias, root mean square error and mean absolute error, showed good accuracy of rainfall estimates by the present technique. Moreover, rainfall estimates of our technique were compared with two high accuracy rainfall estimates methods based on MSG SEVIRI imagery namely: random forests (RF) based approach and an artificial neural network (ANN) based technique. The findings of the present technique indicate higher correlation coefficient (3-hourly: 0.78; daily: 0.94), and lower mean absolute error and root mean square error values. The results show that the new technique assign 3-hourly and daily rainfall with good and better accuracy than ANN technique and (RF) model.
NASA Astrophysics Data System (ADS)
Wu, Di; He, Yong
2007-11-01
The aim of this study is to investigate the potential of the visible and near infrared spectroscopy (Vis/NIRS) technique for non-destructive measurement of soluble solids contents (SSC) in grape juice beverage. 380 samples were studied in this paper. Smoothing way of Savitzky-Golay and standard normal variate were applied for the pre-processing of spectral data. Least-squares support vector machines (LS-SVM) with RBF kernel function was applied to developing the SSC prediction model based on the Vis/NIRS absorbance data. The determination coefficient for prediction (Rp2) of the results predicted by LS-SVM model was 0. 962 and root mean square error (RMSEP) was 0. 434137. It is concluded that Vis/NIRS technique can quantify the SSC of grape juice beverage fast and non-destructively.. At the same time, LS-SVM model was compared with PLS and back propagation neural network (BP-NN) methods. The results showed that LS-SVM was superior to the conventional linear and non-linear methods in predicting SSC of grape juice beverage. In this study, the generation ability of LS-SVM, PLS and BP-NN models were also investigated. It is concluded that LS-SVM regression method is a promising technique for chemometrics in quantitative prediction.
Feng, Zhichao; Rong, Pengfei; Cao, Peng; Zhou, Qingyu; Zhu, Wenwei; Yan, Zhimin; Liu, Qianyun; Wang, Wei
2018-04-01
To evaluate the diagnostic performance of machine-learning based quantitative texture analysis of CT images to differentiate small (≤ 4 cm) angiomyolipoma without visible fat (AMLwvf) from renal cell carcinoma (RCC). This single-institutional retrospective study included 58 patients with pathologically proven small renal mass (17 in AMLwvf and 41 in RCC groups). Texture features were extracted from the largest possible tumorous regions of interest (ROIs) by manual segmentation in preoperative three-phase CT images. Interobserver reliability and the Mann-Whitney U test were applied to select features preliminarily. Then support vector machine with recursive feature elimination (SVM-RFE) and synthetic minority oversampling technique (SMOTE) were adopted to establish discriminative classifiers, and the performance of classifiers was assessed. Of the 42 extracted features, 16 candidate features showed significant intergroup differences (P < 0.05) and had good interobserver agreement. An optimal feature subset including 11 features was further selected by the SVM-RFE method. The SVM-RFE+SMOTE classifier achieved the best performance in discriminating between small AMLwvf and RCC, with the highest accuracy, sensitivity, specificity and AUC of 93.9 %, 87.8 %, 100 % and 0.955, respectively. Machine learning analysis of CT texture features can facilitate the accurate differentiation of small AMLwvf from RCC. • Although conventional CT is useful for diagnosis of SRMs, it has limitations. • Machine-learning based CT texture analysis facilitate differentiation of small AMLwvf from RCC. • The highest accuracy of SVM-RFE+SMOTE classifier reached 93.9 %. • Texture analysis combined with machine-learning methods might spare unnecessary surgery for AMLwvf.
Relevance Vector Machine Learning for Neonate Pain Intensity Assessment Using Digital Imaging
Gholami, Behnood; Tannenbaum, Allen R.
2011-01-01
Pain assessment in patients who are unable to verbally communicate is a challenging problem. The fundamental limitations in pain assessment in neonates stem from subjective assessment criteria, rather than quantifiable and measurable data. This often results in poor quality and inconsistent treatment of patient pain management. Recent advancements in pattern recognition techniques using relevance vector machine (RVM) learning techniques can assist medical staff in assessing pain by constantly monitoring the patient and providing the clinician with quantifiable data for pain management. The RVM classification technique is a Bayesian extension of the support vector machine (SVM) algorithm, which achieves comparable performance to SVM while providing posterior probabilities for class memberships and a sparser model. If classes represent “pure” facial expressions (i.e., extreme expressions that an observer can identify with a high degree of confidence), then the posterior probability of the membership of some intermediate facial expression to a class can provide an estimate of the intensity of such an expression. In this paper, we use the RVM classification technique to distinguish pain from nonpain in neonates as well as assess their pain intensity levels. We also correlate our results with the pain intensity assessed by expert and nonexpert human examiners. PMID:20172803
The construction of support vector machine classifier using the firefly algorithm.
Chao, Chih-Feng; Horng, Ming-Huwi
2015-01-01
The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.
The Construction of Support Vector Machine Classifier Using the Firefly Algorithm
Chao, Chih-Feng; Horng, Ming-Huwi
2015-01-01
The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy. PMID:25802511
NASA Astrophysics Data System (ADS)
Hao, Xuejun; An, Xaioran; Wu, Bo; He, Shaoping
2018-02-01
In the gas pipeline system, safe operation of a gas regulator determines the stability of the fuel gas supply, and the medium-low pressure gas regulator of the safety precaution system is not perfect at the present stage in the Beijing Gas Group; therefore, safety precaution technique optimization has important social and economic significance. In this paper, according to the running status of the medium-low pressure gas regulator in the SCADA system, a new method for gas regulator safety precaution based on the support vector machine (SVM) is presented. This method takes the gas regulator outlet pressure data as input variables of the SVM model, the fault categories and degree as output variables, which will effectively enhance the precaution accuracy as well as save significant manpower and material resources.
Automatic system for radar echoes filtering based on textural features and artificial intelligence
NASA Astrophysics Data System (ADS)
Hedir, Mehdia; Haddad, Boualem
2017-10-01
Among the very popular Artificial Intelligence (AI) techniques, Artificial Neural Network (ANN) and Support Vector Machine (SVM) have been retained to process Ground Echoes (GE) on meteorological radar images taken from Setif (Algeria) and Bordeaux (France) with different climates and topologies. To achieve this task, AI techniques were associated with textural approaches. We used Gray Level Co-occurrence Matrix (GLCM) and Completed Local Binary Pattern (CLBP); both methods were largely used in image analysis. The obtained results show the efficiency of texture to preserve precipitations forecast on both sites with the accuracy of 98% on Bordeaux and 95% on Setif despite the AI technique used. 98% of GE are suppressed with SVM, this rate is outperforming ANN skills. CLBP approach associated to SVM eliminates 98% of GE and preserves precipitations forecast on Bordeaux site better than on Setif's, while it exhibits lower accuracy with ANN. SVM classifier is well adapted to the proposed application since the average filtering rate is 95-98% with texture and 92-93% with CLBP. These approaches allow removing Anomalous Propagations (APs) too with a better accuracy of 97.15% with texture and SVM. In fact, textural features associated to AI techniques are an efficient tool for incoherent radars to surpass spurious echoes.
NASA Astrophysics Data System (ADS)
Wong, Pak-kin; Vong, Chi-man; Wong, Hang-cheong; Li, Ke
2010-05-01
Modern automotive spark-ignition (SI) power performance usually refers to output power and torque, and they are significantly affected by the setup of control parameters in the engine management system (EMS). EMS calibration is done empirically through tests on the dynamometer (dyno) because no exact mathematical engine model is yet available. With an emerging nonlinear function estimation technique of Least squares support vector machines (LS-SVM), the approximate power performance model of a SI engine can be determined by training the sample data acquired from the dyno. A novel incremental algorithm based on typical LS-SVM is also proposed in this paper, so the power performance models built from the incremental LS-SVM can be updated whenever new training data arrives. With updating the models, the model accuracies can be continuously increased. The predicted results using the estimated models from the incremental LS-SVM are good agreement with the actual test results and with the almost same average accuracy of retraining the models from scratch, but the incremental algorithm can significantly shorten the model construction time when new training data arrives.
Zhang, Li; Zhou, WeiDa
2013-12-01
This paper deals with fast methods for training a 1-norm support vector machine (SVM). First, we define a specific class of linear programming with many sparse constraints, i.e., row-column sparse constraint linear programming (RCSC-LP). In nature, the 1-norm SVM is a sort of RCSC-LP. In order to construct subproblems for RCSC-LP and solve them, a family of row-column generation (RCG) methods is introduced. RCG methods belong to a category of decomposition techniques, and perform row and column generations in a parallel fashion. Specially, for the 1-norm SVM, the maximum size of subproblems of RCG is identical with the number of Support Vectors (SVs). We also introduce a semi-deleting rule for RCG methods and prove the convergence of RCG methods when using the semi-deleting rule. Experimental results on toy data and real-world datasets illustrate that it is efficient to use RCG to train the 1-norm SVM, especially in the case of small SVs. Copyright © 2013 Elsevier Ltd. All rights reserved.
Pirooznia, Mehdi; Deng, Youping
2006-12-12
Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.
Optimization of Support Vector Machine (SVM) for Object Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew; Dhingra, Neil; Lu, Thomas T.; Chao, Tien-Hsin
2012-01-01
The Support Vector Machine (SVM) is a powerful algorithm, useful in classifying data into species. The SVMs implemented in this research were used as classifiers for the final stage in a Multistage Automatic Target Recognition (ATR) system. A single kernel SVM known as SVMlight, and a modified version known as a SVM with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SVM as a method for classification. From trial to trial, SVM produces consistent results.
An assessment of support vector machines for land cover classification
Huang, C.; Davis, L.S.; Townshend, J.R.G.
2002-01-01
The support vector machine (SVM) is a group of theoretically superior machine learning algorithms. It was found competitive with the best available machine learning algorithms in classifying high-dimensional data sets. This paper gives an introduction to the theoretical development of the SVM and an experimental evaluation of its accuracy, stability and training speed in deriving land cover classifications from satellite images. The SVM was compared to three other popular classifiers, including the maximum likelihood classifier (MLC), neural network classifiers (NNC) and decision tree classifiers (DTC). The impacts of kernel configuration on the performance of the SVM and of the selection of training data and input variables on the four classifiers were also evaluated in this experiment.
An Auto-flag Method of Radio Visibility Data Based on Support Vector Machine
NASA Astrophysics Data System (ADS)
Dai, Hui-mei; Mei, Ying; Wang, Wei; Deng, Hui; Wang, Feng
2017-01-01
The Mingantu Ultrawide Spectral Radioheliograph (MUSER) has entered a test observation stage. After the construction of the data acquisition and storage system, it is urgent to automatically flag and eliminate the abnormal visibility data so as to improve the imaging quality. In this paper, according to the observational records, we create a credible visibility set, and further obtain the corresponding flag model of visibility data by using the support vector machine (SVM) technique. The results show that the SVM is a robust approach to flag the MUSER visibility data, and can attain an accuracy of about 86%. Meanwhile, this method will not be affected by solar activities, such as flare eruptions.
Classification of stellar spectra with SVM based on within-class scatter and between-class scatter
NASA Astrophysics Data System (ADS)
Liu, Zhong-bao; Zhou, Fang-xiao; Qin, Zhen-tao; Luo, Xue-gang; Zhang, Jing
2018-07-01
Support Vector Machine (SVM) is a popular data mining technique, and it has been widely applied in astronomical tasks, especially in stellar spectra classification. Since SVM doesn't take the data distribution into consideration, and therefore, its classification efficiencies can't be greatly improved. Meanwhile, SVM ignores the internal information of the training dataset, such as the within-class structure and between-class structure. In view of this, we propose a new classification algorithm-SVM based on Within-Class Scatter and Between-Class Scatter (WBS-SVM) in this paper. WBS-SVM tries to find an optimal hyperplane to separate two classes. The difference is that it incorporates minimum within-class scatter and maximum between-class scatter in Linear Discriminant Analysis (LDA) into SVM. These two scatters represent the distributions of the training dataset, and the optimization of WBS-SVM ensures the samples in the same class are as close as possible and the samples in different classes are as far as possible. Experiments on the K-, F-, G-type stellar spectra from Sloan Digital Sky Survey (SDSS), Data Release 8 show that our proposed WBS-SVM can greatly improve the classification accuracies.
Lynch, Chip M; Abdollahi, Behnaz; Fuqua, Joshua D; de Carlo, Alexandra R; Bartholomai, James A; Balgemann, Rayeanne N; van Berkel, Victor H; Frieboes, Hermann B
2017-12-01
Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods. Copyright © 2017 Elsevier B.V. All rights reserved.
Yu, Wei; Clyne, Melinda; Dolan, Siobhan M; Yesupriya, Ajay; Wulf, Anja; Liu, Tiebin; Khoury, Muin J; Gwinn, Marta
2008-01-01
Background Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. Results The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. Conclusion GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge. PMID:18430222
Component Pin Recognition Using Algorithms Based on Machine Learning
NASA Astrophysics Data System (ADS)
Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang
2018-04-01
The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.
You, Zhu-Hong; Lei, Ying-Ke; Zhu, Lin; Xia, Junfeng; Wang, Bing
2013-01-01
Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.
NASA Astrophysics Data System (ADS)
Brown, M. G. L.; He, T.; Liang, S.
2016-12-01
Satellite-derived estimates of incident photosynthetically active radiation (PAR) can be used to monitor global change, are required by most terrestrial ecosystem models, and can be used to estimate primary production according to the theory of light use efficiency. Compared with parametric approaches, non-parametric techniques that include an artificial neural network (ANN), support vector machine regression (SVM), an artificial bee colony (ABC), and a look-up table (LUT) do not require many ancillary data as inputs for the estimation of PAR from satellite data. In this study, a selection of machine learning methods to estimate PAR from MODIS top of atmosphere (TOA) radiances are compared to a LUT approach to determine which techniques might best handle the nonlinear relationship between TOA radiance and incident PAR. Evaluation of these methods (ANN, SVM, and LUT) is performed with ground measurements at seven SURFRAD sites. Due to the design of the ANN, it can handle the nonlinear relationship between TOA radiance and PAR better than linearly interpolating between the values in the LUT; however, training the ANN has to be carried out on an angular-bin basis, which results in a LUT of ANNs. The SVM model may be better for incorporating multiple viewing angles than the ANN; however, both techniques require a large amount of training data, which may introduce a regional bias based on where the most training and validation data are available. Based on the literature, the ABC is a promising alternative to an ANN, SVM regression and a LUT, but further development for this application is required before concrete conclusions can be drawn. For now, the LUT method outperforms the machine-learning techniques, but future work should be directed at developing and testing the ABC method. A simple, robust method to estimate direct and diffuse incident PAR, with minimal inputs and a priori knowledge, would be very useful for monitoring global change of primary production, particularly of pastures and rangeland, which have implications for livestock and food security. Future work will delve deeper into the utility of satellite-derived PAR estimation for monitoring primary production in pasture and rangelands.
Boxwala, Aziz A; Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila
2011-01-01
To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs.
Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila
2011-01-01
Objective To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. Methods From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. Results The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. Limitations The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. Conclusion The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs. PMID:21672912
Cho, Ming-Yuan; Hoang, Thi Thom
2017-01-01
Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.
Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat
2016-12-22
The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.
Amaral, Jorge L M; Lopes, Agnaldo J; Jansen, José M; Faria, Alvaro C D; Melo, Pedro L
2013-12-01
The purpose of this study was to develop an automatic classifier to increase the accuracy of the forced oscillation technique (FOT) for diagnosing early respiratory abnormalities in smoking patients. The data consisted of FOT parameters obtained from 56 volunteers, 28 healthy and 28 smokers with low tobacco consumption. Many supervised learning techniques were investigated, including logistic linear classifiers, k nearest neighbor (KNN), neural networks and support vector machines (SVM). To evaluate performance, the ROC curve of the most accurate parameter was established as baseline. To determine the best input features and classifier parameters, we used genetic algorithms and a 10-fold cross-validation using the average area under the ROC curve (AUC). In the first experiment, the original FOT parameters were used as input. We observed a significant improvement in accuracy (KNN=0.89 and SVM=0.87) compared with the baseline (0.77). The second experiment performed a feature selection on the original FOT parameters. This selection did not cause any significant improvement in accuracy, but it was useful in identifying more adequate FOT parameters. In the third experiment, we performed a feature selection on the cross products of the FOT parameters. This selection resulted in a further increase in AUC (KNN=SVM=0.91), which allows for high diagnostic accuracy. In conclusion, machine learning classifiers can help identify early smoking-induced respiratory alterations. The use of FOT cross products and the search for the best features and classifier parameters can markedly improve the performance of machine learning classifiers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Kawata, Yasuo; Arimura, Hidetaka; Ikushima, Koujirou; Jin, Ze; Morita, Kento; Tokunaga, Chiaki; Yabu-Uchi, Hidetake; Shioyama, Yoshiyuki; Sasaki, Tomonari; Honda, Hiroshi; Sasaki, Masayuki
2017-10-01
The aim of this study was to investigate the impact of pixel-based machine learning (ML) techniques, i.e., fuzzy-c-means clustering method (FCM), and the artificial neural network (ANN) and support vector machine (SVM), on an automated framework for delineation of gross tumor volume (GTV) regions of lung cancer for stereotactic body radiation therapy. The morphological and metabolic features for GTV regions, which were determined based on the knowledge of radiation oncologists, were fed on a pixel-by-pixel basis into the respective FCM, ANN, and SVM ML techniques. Then, the ML techniques were incorporated into the automated delineation framework of GTVs followed by an optimum contour selection (OCS) method, which we proposed in a previous study. The three-ML-based frameworks were evaluated for 16 lung cancer cases (six solid, four ground glass opacity (GGO), six part-solid GGO) with the datasets of planning computed tomography (CT) and 18 F-fluorodeoxyglucose (FDG) positron emission tomography (PET)/CT images using the three-dimensional Dice similarity coefficient (DSC). DSC denotes the degree of region similarity between the GTVs contoured by radiation oncologists and those estimated using the automated framework. The FCM-based framework achieved the highest DSCs of 0.79±0.06, whereas DSCs of the ANN-based and SVM-based frameworks were 0.76±0.14 and 0.73±0.14, respectively. The FCM-based framework provided the highest segmentation accuracy and precision without a learning process (lowest calculation cost). Therefore, the FCM-based framework can be useful for delineation of tumor regions in practical treatment planning. Copyright © 2017 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
On the classification techniques in data mining for microarray data classification
NASA Astrophysics Data System (ADS)
Aydadenta, Husna; Adiwijaya
2018-03-01
Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.
Sakr, Sherif; Elshawi, Radwa; Ahmed, Amjad M; Qureshi, Waqas T; Brawner, Clinton A; Keteyian, Steven J; Blaha, Michael J; Al-Mallah, Mouaz H
2017-12-19
Prior studies have demonstrated that cardiorespiratory fitness (CRF) is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of this study is to present an evaluation and comparison of how machine learning techniques can be applied on medical records of cardiorespiratory fitness and how the various techniques differ in terms of capabilities of predicting medical outcomes (e.g. mortality). We use data of 34,212 patients free of known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems Between 1991 and 2009 and had a complete 10-year follow-up. Seven machine learning classification techniques were evaluated: Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayesian Classifier (BC), Bayesian Network (BN), K-Nearest Neighbor (KNN) and Random Forest (RF). In order to handle the imbalanced dataset used, the Synthetic Minority Over-Sampling Technique (SMOTE) is used. Two set of experiments have been conducted with and without the SMOTE sampling technique. On average over different evaluation metrics, SVM Classifier has shown the lowest performance while other models like BN, BC and DT performed better. The RF classifier has shown the best performance (AUC = 0.97) among all models trained using the SMOTE sampling. The results show that various ML techniques can significantly vary in terms of its performance for the different evaluation metrics. It is also not necessarily that the more complex the ML model, the more prediction accuracy can be achieved. The prediction performance of all models trained with SMOTE is much better than the performance of models trained without SMOTE. The study shows the potential of machine learning methods for predicting all-cause mortality using cardiorespiratory fitness data.
NASA Astrophysics Data System (ADS)
Taha, Z.; Razman, M. A. M.; Adnan, F. A.; Ghani, A. S. Abdul; Majeed, A. P. P. Abdul; Musa, R. M.; Sallehudin, M. F.; Mukai, Y.
2018-03-01
Fish Hunger behaviour is one of the important element in determining the fish feeding routine, especially for farmed fishes. Inaccurate feeding routines (under-feeding or over-feeding) lead the fishes to die and thus, reduces the total production of fishes. The excessive food which is not eaten by fish will be dissolved in the water and thus, reduce the water quality (oxygen quantity in the water will be reduced). The reduction of oxygen (water quality) leads the fish to die and in some cases, may lead to fish diseases. This study correlates Barramundi fish-school behaviour with hunger condition through the hybrid data integration of image processing technique. The behaviour is clustered with respect to the position of the centre of gravity of the school of fish prior feeding, during feeding and after feeding. The clustered fish behaviour is then classified by means of a machine learning technique namely Support vector machine (SVM). It has been shown from the study that the Fine Gaussian variation of SVM is able to provide a reasonably accurate classification of fish feeding behaviour with a classification accuracy of 79.7%. The proposed integration technique may increase the usefulness of the captured data and thus better differentiates the various behaviour of farmed fishes.
Support vector machine firefly algorithm based optimization of lens system.
Shamshirband, Shahaboddin; Petković, Dalibor; Pavlović, Nenad T; Ch, Sudheer; Altameem, Torki A; Gani, Abdullah
2015-01-01
Lens system design is an important factor in image quality. The main aspect of the lens system design methodology is the optimization procedure. Since optimization is a complex, nonlinear task, soft computing optimization algorithms can be used. There are many tools that can be employed to measure optical performance, but the spot diagram is the most useful. The spot diagram gives an indication of the image of a point object. In this paper, the spot size radius is considered an optimization criterion. Intelligent soft computing scheme support vector machines (SVMs) coupled with the firefly algorithm (FFA) are implemented. The performance of the proposed estimators is confirmed with the simulation results. The result of the proposed SVM-FFA model has been compared with support vector regression (SVR), artificial neural networks, and generic programming methods. The results show that the SVM-FFA model performs more accurately than the other methodologies. Therefore, SVM-FFA can be used as an efficient soft computing technique in the optimization of lens system designs.
The formation method of the feature space for the identification of fatigued bills
NASA Astrophysics Data System (ADS)
Kang, Dongshik; Oshiro, Ayumu; Ozawa, Kenji; Mitsui, Ikugo
2014-10-01
Fatigued bills make a trouble such as the paper jam in a bill handling machine. In the discrimination of fatigued bills using an acoustic signal, the variation of an observed bill sound is considered to be one of causes in misclassification. Therefore a technique has demanded in order to make the classification of fatigued bills more efficient. In this paper, we proposed the algorithm that extracted feature quantity of bill sound from acoustic signal using the frequency difference, and carried out discrimination experiment of fatigued bill money by Support Vector Machine(SVM). The feature quantity of frequency difference can represent the frequency components of an acoustic signal is varied by the fatigued degree of bill money. The generalization performance of SVM does not depend on the size of dimensions of the feature space, even in a high dimensional feature space such as bill-acoustic signals. Furthermore, SVM can induce an optimal classifier which considers the combination of features by the virtue of polynomial kernel functions.
NASA Astrophysics Data System (ADS)
Wang, Chun-mei; Zhang, Chong-ming; Zou, Jun-zhong; Zhang, Jian
2012-02-01
The diagnosis of several neurological disorders is based on the detection of typical pathological patterns in electroencephalograms (EEGs). This is a time-consuming task requiring significant training and experience. A lot of effort has been devoted to developing automatic detection techniques which might help not only in accelerating this process but also in avoiding the disagreement among readers of the same record. In this work, Neyman-Pearson criteria and a support vector machine (SVM) are applied for detecting an epileptic EEG. Decision making is performed in two stages: feature extraction by computing the wavelet coefficients and the approximate entropy (ApEn) and detection by using Neyman-Pearson criteria and an SVM. Then the detection performance of the proposed method is evaluated. Simulation results demonstrate that the wavelet coefficients and the ApEn are features that represent the EEG signals well. By comparison with Neyman-Pearson criteria, an SVM applied on these features achieved higher detection accuracies.
Machine learning models in breast cancer survival prediction.
Montazeri, Mitra; Montazeri, Mohadeseh; Montazeri, Mahdieh; Beigzadeh, Amin
2016-01-01
Breast cancer is one of the most common cancers with a high mortality rate among women. With the early diagnosis of breast cancer survival will increase from 56% to more than 86%. Therefore, an accurate and reliable system is necessary for the early diagnosis of this cancer. The proposed model is the combination of rules and different machine learning techniques. Machine learning models can help physicians to reduce the number of false decisions. They try to exploit patterns and relationships among a large number of cases and predict the outcome of a disease using historical cases stored in datasets. The objective of this study is to propose a rule-based classification method with machine learning techniques for the prediction of different types of Breast cancer survival. We use a dataset with eight attributes that include the records of 900 patients in which 876 patients (97.3%) and 24 (2.7%) patients were females and males respectively. Naive Bayes (NB), Trees Random Forest (TRF), 1-Nearest Neighbor (1NN), AdaBoost (AD), Support Vector Machine (SVM), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of breast cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Out of 900 patients, 803 patients and 97 patients were alive and dead, respectively. In this study, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (NB, 1NN, AD, SVM and RBFN, MLP). The accuracy, sensitivity and the area under ROC curve of TRF are 96%, 96%, 93%, respectively. However, 1NN machine learning technique provided poor performance (accuracy 91%, sensitivity 91% and area under ROC curve 78%). This study demonstrates that Trees Random Forest model (TRF) which is a rule-based classification model was the best model with the highest level of accuracy. Therefore, this model is recommended as a useful tool for breast cancer survival prediction as well as medical decision making.
Financial Distress Prediction using Linear Discriminant Analysis and Support Vector Machine
NASA Astrophysics Data System (ADS)
Santoso, Noviyanti; Wibowo, Wahyu
2018-03-01
A financial difficulty is the early stages before the bankruptcy. Bankruptcies caused by the financial distress can be seen from the financial statements of the company. The ability to predict financial distress became an important research topic because it can provide early warning for the company. In addition, predicting financial distress is also beneficial for investors and creditors. This research will be made the prediction model of financial distress at industrial companies in Indonesia by comparing the performance of Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) combined with variable selection technique. The result of this research is prediction model based on hybrid Stepwise-SVM obtains better balance among fitting ability, generalization ability and model stability than the other models.
Data on Support Vector Machines (SVM) model to forecast photovoltaic power.
Malvoni, M; De Giorgi, M G; Congedo, P M
2016-12-01
The data concern the photovoltaic (PV) power, forecasted by a hybrid model that considers weather variations and applies a technique to reduce the input data size, as presented in the paper entitled "Photovoltaic forecast based on hybrid pca-lssvm using dimensionality reducted data" (M. Malvoni, M.G. De Giorgi, P.M. Congedo, 2015) [1]. The quadratic Renyi entropy criteria together with the principal component analysis (PCA) are applied to the Least Squares Support Vector Machines (LS-SVM) to predict the PV power in the day-ahead time frame. The data here shared represent the proposed approach results. Hourly PV power predictions for 1,3,6,12, 24 ahead hours and for different data reduction sizes are provided in Supplementary material.
Cheng, Feixiong; Shen, Jie; Yu, Yue; Li, Weihua; Liu, Guixia; Lee, Philip W; Tang, Yun
2011-03-01
There is an increasing need for the rapid safety assessment of chemicals by both industries and regulatory agencies throughout the world. In silico techniques are practical alternatives in the environmental hazard assessment. It is especially true to address the persistence, bioaccumulative and toxicity potentials of organic chemicals. Tetrahymena pyriformis toxicity is often used as a toxic endpoint. In this study, 1571 diverse unique chemicals were collected from the literature and composed of the largest diverse data set for T. pyriformis toxicity. Classification predictive models of T. pyriformis toxicity were developed by substructure pattern recognition and different machine learning methods, including support vector machine (SVM), C4.5 decision tree, k-nearest neighbors and random forest. The results of a 5-fold cross-validation showed that the SVM method performed better than other algorithms. The overall predictive accuracies of the SVM classification model with radial basis functions kernel was 92.2% for the 5-fold cross-validation and 92.6% for the external validation set, respectively. Furthermore, several representative substructure patterns for characterizing T. pyriformis toxicity were also identified via the information gain analysis methods. Copyright © 2010 Elsevier Ltd. All rights reserved.
Lin, Xiaohui; Li, Chao; Zhang, Yanhui; Su, Benzhe; Fan, Meng; Wei, Hai
2017-12-26
Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
A Power Transformers Fault Diagnosis Model Based on Three DGA Ratios and PSO Optimization SVM
NASA Astrophysics Data System (ADS)
Ma, Hongzhe; Zhang, Wei; Wu, Rongrong; Yang, Chunyan
2018-03-01
In order to make up for the shortcomings of existing transformer fault diagnosis methods in dissolved gas-in-oil analysis (DGA) feature selection and parameter optimization, a transformer fault diagnosis model based on the three DGA ratios and particle swarm optimization (PSO) optimize support vector machine (SVM) is proposed. Using transforming support vector machine to the nonlinear and multi-classification SVM, establishing the particle swarm optimization to optimize the SVM multi classification model, and conducting transformer fault diagnosis combined with the cross validation principle. The fault diagnosis results show that the average accuracy of test method is better than the standard support vector machine and genetic algorithm support vector machine, and the proposed method can effectively improve the accuracy of transformer fault diagnosis is proved.
Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders.
Subasi, Abdulhamit
2013-06-01
Support vector machine (SVM) is an extensively used machine learning method with many biomedical signal classification applications. In this study, a novel PSO-SVM model has been proposed that hybridized the particle swarm optimization (PSO) and SVM to improve the EMG signal classification accuracy. This optimization mechanism involves kernel parameter setting in the SVM training procedure, which significantly influences the classification accuracy. The experiments were conducted on the basis of EMG signal to classify into normal, neurogenic or myopathic. In the proposed method the EMG signals were decomposed into the frequency sub-bands using discrete wavelet transform (DWT) and a set of statistical features were extracted from these sub-bands to represent the distribution of wavelet coefficients. The obtained results obviously validate the superiority of the SVM method compared to conventional machine learning methods, and suggest that further significant enhancements in terms of classification accuracy can be achieved by the proposed PSO-SVM classification system. The PSO-SVM yielded an overall accuracy of 97.41% on 1200 EMG signals selected from 27 subject records against 96.75%, 95.17% and 94.08% for the SVM, the k-NN and the RBF classifiers, respectively. PSO-SVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of PSO-SVM for diagnosis of neuromuscular disorders. Copyright © 2013 Elsevier Ltd. All rights reserved.
A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.
Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner.
Machine learning-based methods for prediction of linear B-cell epitopes.
Wang, Hsin-Wei; Pai, Tun-Wen
2014-01-01
B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.
Classification of Regional Ionospheric Disturbances Based on Support Vector Machines
NASA Astrophysics Data System (ADS)
Begüm Terzi, Merve; Arikan, Feza; Arikan, Orhan; Karatay, Secil
2016-07-01
Ionosphere is an anisotropic, inhomogeneous, time varying and spatio-temporally dispersive medium whose parameters can be estimated almost always by using indirect measurements. Geomagnetic, gravitational, solar or seismic activities cause variations of ionosphere at various spatial and temporal scales. This complex spatio-temporal variability is challenging to be identified due to extensive scales in period, duration, amplitude and frequency of disturbances. Since geomagnetic and solar indices such as Disturbance storm time (Dst), F10.7 solar flux, Sun Spot Number (SSN), Auroral Electrojet (AE), Kp and W-index provide information about variability on a global scale, identification and classification of regional disturbances poses a challenge. The main aim of this study is to classify the regional effects of global geomagnetic storms and classify them according to their risk levels. For this purpose, Total Electron Content (TEC) estimated from GPS receivers, which is one of the major parameters of ionosphere, will be used to model the regional and local variability that differs from global activity along with solar and geomagnetic indices. In this work, for the automated classification of the regional disturbances, a classification technique based on a robust machine learning technique that have found wide spread use, Support Vector Machine (SVM) is proposed. SVM is a supervised learning model used for classification with associated learning algorithm that analyze the data and recognize patterns. In addition to performing linear classification, SVM can efficiently perform nonlinear classification by embedding data into higher dimensional feature spaces. Performance of the developed classification technique is demonstrated for midlatitude ionosphere over Anatolia using TEC estimates generated from the GPS data provided by Turkish National Permanent GPS Network (TNPGN-Active) for solar maximum year of 2011. As a result of implementing the developed classification technique to the Global Ionospheric Map (GIM) TEC data which is provided by the NASA Jet Propulsion Laboratory (JPL), it will be shown that SVM can be a suitable learning method to detect the anomalies in Total Electron Content (TEC) variations. This study is supported by TUBITAK 114E541 project as a part of the Scientific and Technological Research Projects Funding Program (1001).
Semisupervised learning using Bayesian interpretation: application to LS-SVM.
Adankon, Mathias M; Cheriet, Mohamed; Biem, Alain
2011-04-01
Bayesian reasoning provides an ideal basis for representing and manipulating uncertain knowledge, with the result that many interesting algorithms in machine learning are based on Bayesian inference. In this paper, we use the Bayesian approach with one and two levels of inference to model the semisupervised learning problem and give its application to the successful kernel classifier support vector machine (SVM) and its variant least-squares SVM (LS-SVM). Taking advantage of Bayesian interpretation of LS-SVM, we develop a semisupervised learning algorithm for Bayesian LS-SVM using our approach based on two levels of inference. Experimental results on both artificial and real pattern recognition problems show the utility of our method.
Prediction of toxic metals concentration using artificial intelligence techniques
NASA Astrophysics Data System (ADS)
Gholami, R.; Kamkar-Rouhani, A.; Doulati Ardejani, F.; Maleki, Sh.
2011-12-01
Groundwater and soil pollution are noted to be the worst environmental problem related to the mining industry because of the pyrite oxidation, and hence acid mine drainage generation, release and transport of the toxic metals. The aim of this paper is to predict the concentration of Ni and Fe using a robust algorithm named support vector machine (SVM). Comparison of the obtained results of SVM with those of the back-propagation neural network (BPNN) indicates that the SVM can be regarded as a proper algorithm for the prediction of toxic metals concentration due to its relative high correlation coefficient and the associated running time. As a matter of fact, the SVM method has provided a better prediction of the toxic metals Fe and Ni and resulted the running time faster compared with that of the BPNN.
A Two-Layer Least Squares Support Vector Machine Approach to Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Liu, Jingli; Li, Jianping; Xu, Weixuan; Shi, Yong
Least squares support vector machine (LS-SVM) is a revised version of support vector machine (SVM) and has been proved to be a useful tool for pattern recognition. LS-SVM had excellent generalization performance and low computational cost. In this paper, we propose a new method called two-layer least squares support vector machine which combines kernel principle component analysis (KPCA) and linear programming form of least square support vector machine. With this method sparseness and robustness is obtained while solving large dimensional and large scale database. A U.S. commercial credit card database is used to test the efficiency of our method and the result proved to be a satisfactory one.
Eddy current characterization of small cracks using least square support vector machine
NASA Astrophysics Data System (ADS)
Chelabi, M.; Hacib, T.; Le Bihan, Y.; Ikhlef, N.; Boughedda, H.; Mekideche, M. R.
2016-04-01
Eddy current (EC) sensors are used for non-destructive testing since they are able to probe conductive materials. Despite being a conventional technique for defect detection and localization, the main weakness of this technique is that defect characterization, of the exact determination of the shape and dimension, is still a question to be answered. In this work, we demonstrate the capability of small crack sizing using signals acquired from an EC sensor. We report our effort to develop a systematic approach to estimate the size of rectangular and thin defects (length and depth) in a conductive plate. The achieved approach by the novel combination of a finite element method (FEM) with a statistical learning method is called least square support vector machines (LS-SVM). First, we use the FEM to design the forward problem. Next, an algorithm is used to find an adaptive database. Finally, the LS-SVM is used to solve the inverse problems, creating polynomial functions able to approximate the correlation between the crack dimension and the signal picked up from the EC sensor. Several methods are used to find the parameters of the LS-SVM. In this study, the particle swarm optimization (PSO) and genetic algorithm (GA) are proposed for tuning the LS-SVM. The results of the design and the inversions were compared to both simulated and experimental data, with accuracy experimentally verified. These suggested results prove the applicability of the presented approach.
ChariDingari, Narahara; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P.; Kumar, G. Manoj
2012-01-01
Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real world applications, e.g. quality assurance and process monitoring. Specifically, variability in sample, system and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a non-linear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), due to its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data – highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples as well as in related areas of forensic and biological sample analysis. PMID:22292496
Dingari, Narahara Chari; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P; Kumar Gundawar, Manoj
2012-03-20
Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real-world applications, e.g., quality assurance and process monitoring. Specifically, variability in sample, system, and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a nonlinear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that the application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), because of its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data-highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples, as well as in related areas of forensic and biological sample analysis.
Gradient Evolution-based Support Vector Machine Algorithm for Classification
NASA Astrophysics Data System (ADS)
Zulvia, Ferani E.; Kuo, R. J.
2018-03-01
This paper proposes a classification algorithm based on a support vector machine (SVM) and gradient evolution (GE) algorithms. SVM algorithm has been widely used in classification. However, its result is significantly influenced by the parameters. Therefore, this paper aims to propose an improvement of SVM algorithm which can find the best SVMs’ parameters automatically. The proposed algorithm employs a GE algorithm to automatically determine the SVMs’ parameters. The GE algorithm takes a role as a global optimizer in finding the best parameter which will be used by SVM algorithm. The proposed GE-SVM algorithm is verified using some benchmark datasets and compared with other metaheuristic-based SVM algorithms. The experimental results show that the proposed GE-SVM algorithm obtains better results than other algorithms tested in this paper.
NASA Astrophysics Data System (ADS)
Zhan, Liwei; Li, Chengwei
2017-02-01
A hybrid PSO-SVM-based model is proposed to predict the friction coefficient between aircraft tire and coating. The presented hybrid model combines a support vector machine (SVM) with particle swarm optimization (PSO) technique. SVM has been adopted to solve regression problems successfully. Its regression accuracy is greatly related to optimizing parameters such as the regularization constant C , the parameter gamma γ corresponding to RBF kernel and the epsilon parameter \\varepsilon in the SVM training procedure. However, the friction coefficient which is predicted based on SVM has yet to be explored between aircraft tire and coating. The experiment reveals that drop height and tire rotational speed are the factors affecting friction coefficient. Bearing in mind, the friction coefficient can been predicted using the hybrid PSO-SVM-based model by the measured friction coefficient between aircraft tire and coating. To compare regression accuracy, a grid search (GS) method and a genetic algorithm (GA) are used to optimize the relevant parameters (C , γ and \\varepsilon ), respectively. The regression accuracy could be reflected by the coefficient of determination ({{R}2} ). The result shows that the hybrid PSO-RBF-SVM-based model has better accuracy compared with the GS-RBF-SVM- and GA-RBF-SVM-based models. The agreement of this model (PSO-RBF-SVM) with experiment data confirms its good performance.
Kianmehr, Keivan; Alhajj, Reda
2008-09-01
In this study, we aim at building a classification framework, namely the CARSVM model, which integrates association rule mining and support vector machine (SVM). The goal is to benefit from advantages of both, the discriminative knowledge represented by class association rules and the classification power of the SVM algorithm, to construct an efficient and accurate classifier model that improves the interpretability problem of SVM as a traditional machine learning technique and overcomes the efficiency issues of associative classification algorithms. In our proposed framework: instead of using the original training set, a set of rule-based feature vectors, which are generated based on the discriminative ability of class association rules over the training samples, are presented to the learning component of the SVM algorithm. We show that rule-based feature vectors present a high-qualified source of discrimination knowledge that can impact substantially the prediction power of SVM and associative classification techniques. They provide users with more conveniences in terms of understandability and interpretability as well. We have used four datasets from UCI ML repository to evaluate the performance of the developed system in comparison with five well-known existing classification methods. Because of the importance and popularity of gene expression analysis as real world application of the classification model, we present an extension of CARSVM combined with feature selection to be applied to gene expression data. Then, we describe how this combination will provide biologists with an efficient and understandable classifier model. The reported test results and their biological interpretation demonstrate the applicability, efficiency and effectiveness of the proposed model. From the results, it can be concluded that a considerable increase in classification accuracy can be obtained when the rule-based feature vectors are integrated in the learning process of the SVM algorithm. In the context of applicability, according to the results obtained from gene expression analysis, we can conclude that the CARSVM system can be utilized in a variety of real world applications with some adjustments.
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.
Zhang, Yi; Ren, Jinchang; Jiang, Jianmin
2015-01-01
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.
Analysis of miRNA expression profile based on SVM algorithm
NASA Astrophysics Data System (ADS)
Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian
2018-05-01
Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.
NASA Astrophysics Data System (ADS)
Xian, Guangming
2018-03-01
A method for predicting the optimal vibration field parameters by least square support vector machine (LS-SVM) is presented in this paper. One convenient and commonly used technique for characterizing the the vibration flow field of polymer melts films is small angle light scattering (SALS) in a visualized slit die of the electromagnetism dynamic extruder. The optimal value of vibration vibration frequency, vibration amplitude, and the maximum light intensity projection area can be obtained by using LS-SVM for prediction. For illustrating this method and show its validity, the flowing material is used with polypropylene (PP) and fifteen samples are tested at the rotation speed of screw at 36rpm. This paper first describes the apparatus of SALS to perform the experiments, then gives the theoretical basis of this new method, and detail the experimental results for parameter prediction of vibration flow field. It is demonstrated that it is possible to use the method of SALS and obtain detailed information on optimal parameter of vibration flow field of PP melts by LS-SVM.
Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.
2013-01-01
Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933
A Wavelet Support Vector Machine Combination Model for Singapore Tourist Arrival to Malaysia
NASA Astrophysics Data System (ADS)
Rafidah, A.; Shabri, Ani; Nurulhuda, A.; Suhaila, Y.
2017-08-01
In this study, wavelet support vector machine model (WSVM) is proposed and applied for monthly data Singapore tourist time series prediction. The WSVM model is combination between wavelet analysis and support vector machine (SVM). In this study, we have two parts, first part we compare between the kernel function and second part we compare between the developed models with single model, SVM. The result showed that kernel function linear better than RBF while WSVM outperform with single model SVM to forecast monthly Singapore tourist arrival to Malaysia.
Jongin Kim; Boreom Lee
2017-07-01
The classification of neuroimaging data for the diagnosis of Alzheimer's Disease (AD) is one of the main research goals of the neuroscience and clinical fields. In this study, we performed extreme learning machine (ELM) classifier to discriminate the AD, mild cognitive impairment (MCI) from normal control (NC). We compared the performance of ELM with that of a linear kernel support vector machine (SVM) for 718 structural MRI images from Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The data consisted of normal control, MCI converter (MCI-C), MCI non-converter (MCI-NC), and AD. We employed SVM-based recursive feature elimination (RFE-SVM) algorithm to find the optimal subset of features. In this study, we found that the RFE-SVM feature selection approach in combination with ELM shows the superior classification accuracy to that of linear kernel SVM for structural T1 MRI data.
A Collaborative Framework for Distributed Privacy-Preserving Support Vector Machine Learning
Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates “privacy-insensitive” intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner. PMID:23304414
Vidić, Igor; Egnell, Liv; Jerome, Neil P; Teruel, Jose R; Sjøbakk, Torill E; Østlie, Agnes; Fjøsne, Hans E; Bathen, Tone F; Goa, Pål Erik
2018-05-01
Diffusion-weighted MRI (DWI) is currently one of the fastest developing MRI-based techniques in oncology. Histogram properties from model fitting of DWI are useful features for differentiation of lesions, and classification can potentially be improved by machine learning. To evaluate classification of malignant and benign tumors and breast cancer subtypes using support vector machine (SVM). Prospective. Fifty-one patients with benign (n = 23) and malignant (n = 28) breast tumors (26 ER+, whereof six were HER2+). Patients were imaged with DW-MRI (3T) using twice refocused spin-echo echo-planar imaging with echo time / repetition time (TR/TE) = 9000/86 msec, 90 × 90 matrix size, 2 × 2 mm in-plane resolution, 2.5 mm slice thickness, and 13 b-values. Apparent diffusion coefficient (ADC), relative enhanced diffusivity (RED), and the intravoxel incoherent motion (IVIM) parameters diffusivity (D), pseudo-diffusivity (D*), and perfusion fraction (f) were calculated. The histogram properties (median, mean, standard deviation, skewness, kurtosis) were used as features in SVM (10-fold cross-validation) for differentiation of lesions and subtyping. Accuracies of the SVM classifications were calculated to find the combination of features with highest prediction accuracy. Mann-Whitney tests were performed for univariate comparisons. For benign versus malignant tumors, univariate analysis found 11 histogram properties to be significant differentiators. Using SVM, the highest accuracy (0.96) was achieved from a single feature (mean of RED), or from three feature combinations of IVIM or ADC. Combining features from all models gave perfect classification. No single feature predicted HER2 status of ER + tumors (univariate or SVM), although high accuracy (0.90) was achieved with SVM combining several features. Importantly, these features had to include higher-order statistics (kurtosis and skewness), indicating the importance to account for heterogeneity. Our findings suggest that SVM, using features from a combination of diffusion models, improves prediction accuracy for differentiation of benign versus malignant breast tumors, and may further assist in subtyping of breast cancer. 3 Technical Efficacy: Stage 3 J. Magn. Reson. Imaging 2018;47:1205-1216. © 2017 International Society for Magnetic Resonance in Medicine.
NASA Astrophysics Data System (ADS)
Cui, Ying; Dy, Jennifer G.; Alexander, Brian; Jiang, Steve B.
2008-08-01
Various problems with the current state-of-the-art techniques for gated radiotherapy have prevented this new treatment modality from being widely implemented in clinical routine. These problems are caused mainly by applying various external respiratory surrogates. There might be large uncertainties in deriving the tumor position from external respiratory surrogates. While tracking implanted fiducial markers has sufficient accuracy, this procedure may not be widely accepted due to the risk of pneumothorax. Previously, we have developed a technique to generate gating signals from fluoroscopic images without implanted fiducial markers using template matching methods (Berbeco et al 2005 Phys. Med. Biol. 50 4481-90, Cui et al 2007b Phys. Med. Biol. 52 741-55). In this note, our main contribution is to provide a totally different new view of the gating problem by recasting it as a classification problem. Then, we solve this classification problem by a well-studied powerful classification method called a support vector machine (SVM). Note that the goal of an automated gating tool is to decide when to turn the beam ON or OFF. We treat ON and OFF as the two classes in our classification problem. We create our labeled training data during the patient setup session by utilizing the reference gating signal, manually determined by a radiation oncologist. We then pre-process these labeled training images and build our SVM prediction model. During treatment delivery, fluoroscopic images are continuously acquired, pre-processed and sent as an input to the SVM. Finally, our SVM model will output the predicted labels as gating signals. We test the proposed technique on five sequences of fluoroscopic images from five lung cancer patients against the reference gating signal as ground truth. We compare the performance of the SVM to our previous template matching method (Cui et al 2007b Phys. Med. Biol. 52 741-55). We find that the SVM is slightly more accurate on average (1-3%) than the template matching method, when delivering the target dose. And the average duty cycle is 4-6% longer. Given the very limited patient dataset, we cannot conclude that the SVM is more accurate and efficient than the template matching method. However, our preliminary results show that the SVM is a potentially precise and efficient algorithm for generating gating signals for radiotherapy. This work demonstrates that the gating problem can be considered as a classification problem and solved accordingly.
Face recognition using total margin-based adaptive fuzzy support vector machines.
Liu, Yi-Hung; Chen, Yen-Ting
2007-01-01
This paper presents a new classifier called total margin-based adaptive fuzzy support vector machines (TAF-SVM) that deals with several problems that may occur in support vector machines (SVMs) when applied to the face recognition. The proposed TAF-SVM not only solves the overfitting problem resulted from the outlier with the approach of fuzzification of the penalty, but also corrects the skew of the optimal separating hyperplane due to the very imbalanced data sets by using different cost algorithm. In addition, by introducing the total margin algorithm to replace the conventional soft margin algorithm, a lower generalization error bound can be obtained. Those three functions are embodied into the traditional SVM so that the TAF-SVM is proposed and reformulated in both linear and nonlinear cases. By using two databases, the Chung Yuan Christian University (CYCU) multiview and the facial recognition technology (FERET) face databases, and using the kernel Fisher's discriminant analysis (KFDA) algorithm to extract discriminating face features, experimental results show that the proposed TAF-SVM is superior to SVM in terms of the face-recognition accuracy. The results also indicate that the proposed TAF-SVM can achieve smaller error variances than SVM over a number of tests such that better recognition stability can be obtained.
CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks.
Gillani, Zeeshan; Akash, Muhammad Sajid Hamid; Rahaman, M D Matiur; Chen, Ming
2014-11-30
Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size. We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network. For network with nodes (<200) and average (over all sizes of networks), SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/ .
Bascil, M Serdar; Tesneli, Ahmet Y; Temurtas, Feyzullah
2016-09-01
Brain computer interface (BCI) is a new communication way between man and machine. It identifies mental task patterns stored in electroencephalogram (EEG). So, it extracts brain electrical activities recorded by EEG and transforms them machine control commands. The main goal of BCI is to make available assistive environmental devices for paralyzed people such as computers and makes their life easier. This study deals with feature extraction and mental task pattern recognition on 2-D cursor control from EEG as offline analysis approach. The hemispherical power density changes are computed and compared on alpha-beta frequency bands with only mental imagination of cursor movements. First of all, power spectral density (PSD) features of EEG signals are extracted and high dimensional data reduced by principle component analysis (PCA) and independent component analysis (ICA) which are statistical algorithms. In the last stage, all features are classified with two types of support vector machine (SVM) which are linear and least squares (LS-SVM) and three different artificial neural network (ANN) structures which are learning vector quantization (LVQ), multilayer neural network (MLNN) and probabilistic neural network (PNN) and mental task patterns are successfully identified via k-fold cross validation technique.
Prediction and analysis of beta-turns in proteins by support vector machine.
Pham, Tho Hoan; Satou, Kenji; Ho, Tu Bao
2003-01-01
Tight turn has long been recognized as one of the three important features of proteins after the alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns. Analysis and prediction of beta-turns in particular and tight turns in general are very useful for the design of new molecules such as drugs, pesticides, and antigens. In this paper, we introduce a support vector machine (SVM) approach to prediction and analysis of beta-turns. We have investigated two aspects of applying SVM to the prediction and analysis of beta-turns. First, we developed a new SVM method, called BTSVM, which predicts beta-turns of a protein from its sequence. The prediction results on the dataset of 426 non-homologous protein chains by sevenfold cross-validation technique showed that our method is superior to the other previous methods. Second, we analyzed how amino acid positions support (or prevent) the formation of beta-turns based on the "multivariable" classification model of a linear SVM. This model is more general than the other ones of previous statistical methods. Our analysis results are more comprehensive and easier to use than previously published analysis results.
Fraley, Stephanie I.; Athamanolap, Pornpat; Masek, Billie J.; Hardick, Justin; Carroll, Karen C.; Hsieh, Yu-Hsiang; Rothman, Richard E.; Gaydos, Charlotte A.; Wang, Tza-Huei; Yang, Samuel
2016-01-01
High Resolution Melt (HRM) is a versatile and rapid post-PCR DNA analysis technique primarily used to differentiate sequence variants among only a few short amplicons. We recently developed a one-vs-one support vector machine algorithm (OVO SVM) that enables the use of HRM for identifying numerous short amplicon sequences automatically and reliably. Herein, we set out to maximize the discriminating power of HRM + SVM for a single genetic locus by testing longer amplicons harboring significantly more sequence information. Using universal primers that amplify the hypervariable bacterial 16 S rRNA gene as a model system, we found that long amplicons yield more complex HRM curve shapes. We developed a novel nested OVO SVM approach to take advantage of this feature and achieved 100% accuracy in the identification of 37 clinically relevant bacteria in Leave-One-Out-Cross-Validation. A subset of organisms were independently tested. Those from pure culture were identified with high accuracy, while those tested directly from clinical blood bottles displayed more technical variability and reduced accuracy. Our findings demonstrate that long sequences can be accurately and automatically profiled by HRM with a novel nested SVM approach and suggest that clinical sample testing is feasible with further optimization. PMID:26778280
Li, Wutao; Huang, Zhigang; Lang, Rongling; Qin, Honglei; Zhou, Kai; Cao, Yongbin
2016-03-04
Interferences can severely degrade the performance of Global Navigation Satellite System (GNSS) receivers. As the first step of GNSS any anti-interference measures, interference monitoring for GNSS is extremely essential and necessary. Since interference monitoring can be considered as a classification problem, a real-time interference monitoring technique based on Twin Support Vector Machine (TWSVM) is proposed in this paper. A TWSVM model is established, and TWSVM is solved by the Least Squares Twin Support Vector Machine (LSTWSVM) algorithm. The interference monitoring indicators are analyzed to extract features from the interfered GNSS signals. The experimental results show that the chosen observations can be used as the interference monitoring indicators. The interference monitoring performance of the proposed method is verified by using GPS L1 C/A code signal and being compared with that of standard SVM. The experimental results indicate that the TWSVM-based interference monitoring is much faster than the conventional SVM. Furthermore, the training time of TWSVM is on millisecond (ms) level and the monitoring time is on microsecond (μs) level, which make the proposed approach usable in practical interference monitoring applications.
A Real-Time Interference Monitoring Technique for GNSS Based on a Twin Support Vector Machine Method
Li, Wutao; Huang, Zhigang; Lang, Rongling; Qin, Honglei; Zhou, Kai; Cao, Yongbin
2016-01-01
Interferences can severely degrade the performance of Global Navigation Satellite System (GNSS) receivers. As the first step of GNSS any anti-interference measures, interference monitoring for GNSS is extremely essential and necessary. Since interference monitoring can be considered as a classification problem, a real-time interference monitoring technique based on Twin Support Vector Machine (TWSVM) is proposed in this paper. A TWSVM model is established, and TWSVM is solved by the Least Squares Twin Support Vector Machine (LSTWSVM) algorithm. The interference monitoring indicators are analyzed to extract features from the interfered GNSS signals. The experimental results show that the chosen observations can be used as the interference monitoring indicators. The interference monitoring performance of the proposed method is verified by using GPS L1 C/A code signal and being compared with that of standard SVM. The experimental results indicate that the TWSVM-based interference monitoring is much faster than the conventional SVM. Furthermore, the training time of TWSVM is on millisecond (ms) level and the monitoring time is on microsecond (μs) level, which make the proposed approach usable in practical interference monitoring applications. PMID:26959020
Fraccaro, Paolo; Nicolo, Massimo; Bonetto, Monica; Giacomini, Mauro; Weller, Peter; Traverso, Carlo Enrico; Prosperi, Mattia; OSullivan, Dympna
2015-01-27
To investigate machine learning methods, ranging from simpler interpretable techniques to complex (non-linear) "black-box" approaches, for automated diagnosis of Age-related Macular Degeneration (AMD). Data from healthy subjects and patients diagnosed with AMD or other retinal diseases were collected during routine visits via an Electronic Health Record (EHR) system. Patients' attributes included demographics and, for each eye, presence/absence of major AMD-related clinical signs (soft drusen, retinal pigment epitelium, defects/pigment mottling, depigmentation area, subretinal haemorrhage, subretinal fluid, macula thickness, macular scar, subretinal fibrosis). Interpretable techniques known as white box methods including logistic regression and decision trees as well as less interpreitable techniques known as black box methods, such as support vector machines (SVM), random forests and AdaBoost, were used to develop models (trained and validated on unseen data) to diagnose AMD. The gold standard was confirmed diagnosis of AMD by physicians. Sensitivity, specificity and area under the receiver operating characteristic (AUC) were used to assess performance. Study population included 487 patients (912 eyes). In terms of AUC, random forests, logistic regression and adaboost showed a mean performance of (0.92), followed by SVM and decision trees (0.90). All machine learning models identified soft drusen and age as the most discriminating variables in clinicians' decision pathways to diagnose AMD. Both black-box and white box methods performed well in identifying diagnoses of AMD and their decision pathways. Machine learning models developed through the proposed approach, relying on clinical signs identified by retinal specialists, could be embedded into EHR to provide physicians with real time (interpretable) support.
2013-01-01
Background Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. Results We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. Conclusions When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time. PMID:23815620
Testing of the Support Vector Machine for Binary-Class Classification
NASA Technical Reports Server (NTRS)
Scholten, Matthew
2011-01-01
The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results
Tripathy, Rajesh Kumar; Dandapat, Samarendra
2017-04-01
The complex wavelet sub-band bi-spectrum (CWSB) features are proposed for detection and classification of myocardial infarction (MI), heart muscle disease (HMD) and bundle branch block (BBB) from 12-lead ECG. The dual tree CW transform of 12-lead ECG produces CW coefficients at different sub-bands. The higher-order CW analysis is used for evaluation of CWSB. The mean of the absolute value of CWSB, and the number of negative phase angle and the number of positive phase angle features from the phase of CWSB of 12-lead ECG are evaluated. Extreme learning machine and support vector machine (SVM) classifiers are used to evaluate the performance of CWSB features. Experimental results show that the proposed CWSB features of 12-lead ECG and the SVM classifier are successful for classification of various heart pathologies. The individual accuracy values for MI, HMD and BBB classes are obtained as 98.37, 97.39 and 96.40%, respectively, using SVM classifier and radial basis function kernel function. A comparison has also been made with existing 12-lead ECG-based cardiac disease detection techniques.
EEG-based driver fatigue detection using hybrid deep generic model.
Phyo Phyo San; Sai Ho Ling; Rifai Chai; Tran, Yvonne; Craig, Ashley; Hung Nguyen
2016-08-01
Classification of electroencephalography (EEG)-based application is one of the important process for biomedical engineering. Driver fatigue is a major case of traffic accidents worldwide and considered as a significant problem in recent decades. In this paper, a hybrid deep generic model (DGM)-based support vector machine is proposed for accurate detection of driver fatigue. Traditionally, a probabilistic DGM with deep architecture is quite good at learning invariant features, but it is not always optimal for classification due to its trainable parameters are in the middle layer. Alternatively, Support Vector Machine (SVM) itself is unable to learn complicated invariance, but produces good decision surface when applied to well-behaved features. Consolidating unsupervised high-level feature extraction techniques, DGM and SVM classification makes the integrated framework stronger and enhance mutually in feature extraction and classification. The experimental results showed that the proposed DBN-based driver fatigue monitoring system achieves better testing accuracy of 73.29 % with 91.10 % sensitivity and 55.48 % specificity. In short, the proposed hybrid DGM-based SVM is an effective method for the detection of driver fatigue in EEG.
NASA Astrophysics Data System (ADS)
Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin
2010-12-01
We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.
Agricultural mapping using Support Vector Machine-Based Endmember Extraction (SVM-BEE)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archibald, Richard K; Filippi, Anthony M; Bhaduri, Budhendra L
Extracting endmembers from remotely sensed images of vegetated areas can present difficulties. In this research, we applied a recently developed endmember-extraction algorithm based on Support Vector Machines (SVMs) to the problem of semi-autonomous estimation of vegetation endmembers from a hyperspectral image. This algorithm, referred to as Support Vector Machine-Based Endmember Extraction (SVM-BEE), accurately and rapidly yields a computed representation of hyperspectral data that can accommodate multiple distributions. The number of distributions is identified without prior knowledge, based upon this representation. Prior work established that SVM-BEE is robustly noise-tolerant and can semi-automatically and effectively estimate endmembers; synthetic data and a geologicmore » scene were previously analyzed. Here we compared the efficacies of the SVM-BEE and N-FINDR algorithms in extracting endmembers from a predominantly agricultural scene. SVM-BEE was able to estimate vegetation and other endmembers for all classes in the image, which N-FINDR failed to do. Classifications based on SVM-BEE endmembers were markedly more accurate compared with those based on N-FINDR endmembers.« less
A Fast Reduced Kernel Extreme Learning Machine.
Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua
2016-04-01
In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. Copyright © 2015 Elsevier Ltd. All rights reserved.
Multi-Sectional Views Textural Based SVM for MS Lesion Segmentation in Multi-Channels MRIs
Abdullah, Bassem A; Younis, Akmal A; John, Nigel M
2012-01-01
In this paper, a new technique is proposed for automatic segmentation of multiple sclerosis (MS) lesions from brain magnetic resonance imaging (MRI) data. The technique uses a trained support vector machine (SVM) to discriminate between the blocks in regions of MS lesions and the blocks in non-MS lesion regions mainly based on the textural features with aid of the other features. The classification is done on each of the axial, sagittal and coronal sectional brain view independently and the resultant segmentations are aggregated to provide more accurate output segmentation. The main contribution of the proposed technique described in this paper is the use of textural features to detect MS lesions in a fully automated approach that does not rely on manually delineating the MS lesions. In addition, the technique introduces the concept of the multi-sectional view segmentation to produce verified segmentation. The proposed textural-based SVM technique was evaluated using three simulated datasets and more than fifty real MRI datasets. The results were compared with state of the art methods. The obtained results indicate that the proposed method would be viable for use in clinical practice for the detection of MS lesions in MRI. PMID:22741026
Spatially Regularized Machine Learning for Task and Resting-state fMRI
Song, Xiaomu; Panych, Lawrence P.; Chen, Nan-kuei
2015-01-01
Background Reliable mapping of brain function across sessions and/or subjects in task- and resting-state has been a critical challenge for quantitative fMRI studies although it has been intensively addressed in the past decades. New Method A spatially regularized support vector machine (SVM) technique was developed for the reliable brain mapping in task- and resting-state. Unlike most existing SVM-based brain mapping techniques, which implement supervised classifications of specific brain functional states or disorders, the proposed method performs a semi-supervised classification for the general brain function mapping where spatial correlation of fMRI is integrated into the SVM learning. The method can adapt to intra- and inter-subject variations induced by fMRI nonstationarity, and identify a true boundary between active and inactive voxels, or between functionally connected and unconnected voxels in a feature space. Results The method was evaluated using synthetic and experimental data at the individual and group level. Multiple features were evaluated in terms of their contributions to the spatially regularized SVM learning. Reliable mapping results in both task- and resting-state were obtained from individual subjects and at the group level. Comparison with Existing Methods A comparison study was performed with independent component analysis, general linear model, and correlation analysis methods. Experimental results indicate that the proposed method can provide a better or comparable mapping performance at the individual and group level. Conclusions The proposed method can provide accurate and reliable mapping of brain function in task- and resting-state, and is applicable to a variety of quantitative fMRI studies. PMID:26470627
NASA Astrophysics Data System (ADS)
Zhao, Shouwei; Zhang, Yong; Zhou, Bin; Ma, Dongxi
2014-09-01
Interaction is one of the key techniques of augmented reality (AR) maintenance guiding system. Because of the complexity of the maintenance guiding system's image background and the high dimensionality of gesture characteristics, the whole process of gesture recognition can be divided into three stages which are gesture segmentation, gesture characteristic feature modeling and trick recognition. In segmentation stage, for solving the misrecognition of skin-like region, a segmentation algorithm combing background mode and skin color to preclude some skin-like regions is adopted. In gesture characteristic feature modeling of image attributes stage, plenty of characteristic features are analyzed and acquired, such as structure characteristics, Hu invariant moments features and Fourier descriptor. In trick recognition stage, a classifier based on Support Vector Machine (SVM) is introduced into the augmented reality maintenance guiding process. SVM is a novel learning method based on statistical learning theory, processing academic foundation and excellent learning ability, having a lot of issues in machine learning area and special advantages in dealing with small samples, non-linear pattern recognition at high dimension. The gesture recognition of augmented reality maintenance guiding system is realized by SVM after the granulation of all the characteristic features. The experimental results of the simulation of number gesture recognition and its application in augmented reality maintenance guiding system show that the real-time performance and robustness of gesture recognition of AR maintenance guiding system can be greatly enhanced by improved SVM.
Park, Eunjeong; Chang, Hyuk-Jae; Nam, Hyo Suk
2017-04-18
The pronator drift test (PDT), a neurological examination, is widely used in clinics to measure motor weakness of stroke patients. The aim of this study was to develop a PDT tool with machine learning classifiers to detect stroke symptoms based on quantification of proximal arm weakness using inertial sensors and signal processing. We extracted features of drift and pronation from accelerometer signals of wearable devices on the inner wrists of 16 stroke patients and 10 healthy controls. Signal processing and feature selection approach were applied to discriminate PDT features used to classify stroke patients. A series of machine learning techniques, namely support vector machine (SVM), radial basis function network (RBFN), and random forest (RF), were implemented to discriminate stroke patients from controls with leave-one-out cross-validation. Signal processing by the PDT tool extracted a total of 12 PDT features from sensors. Feature selection abstracted the major attributes from the 12 PDT features to elucidate the dominant characteristics of proximal weakness of stroke patients using machine learning classification. Our proposed PDT classifiers had an area under the receiver operating characteristic curve (AUC) of .806 (SVM), .769 (RBFN), and .900 (RF) without feature selection, and feature selection improves the AUCs to .913 (SVM), .956 (RBFN), and .975 (RF), representing an average performance enhancement of 15.3%. Sensors and machine learning methods can reliably detect stroke signs and quantify proximal arm weakness. Our proposed solution will facilitate pervasive monitoring of stroke patients. ©Eunjeong Park, Hyuk-Jae Chang, Hyo Suk Nam. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 18.04.2017.
Ecological footprint model using the support vector machine technique.
Ma, Haibo; Chang, Wenjuan; Cui, Guangbai
2012-01-01
The per capita ecological footprint (EF) is one of the most widely recognized measures of environmental sustainability. It aims to quantify the Earth's biological resources required to support human activity. In this paper, we summarize relevant previous literature, and present five factors that influence per capita EF. These factors are: National gross domestic product (GDP), urbanization (independent of economic development), distribution of income (measured by the Gini coefficient), export dependence (measured by the percentage of exports to total GDP), and service intensity (measured by the percentage of service to total GDP). A new ecological footprint model based on a support vector machine (SVM), which is a machine-learning method based on the structural risk minimization principle from statistical learning theory was conducted to calculate the per capita EF of 24 nations using data from 123 nations. The calculation accuracy was measured by average absolute error and average relative error. They were 0.004883 and 0.351078% respectively. Our results demonstrate that the EF model based on SVM has good calculation performance.
Improving detection sensitivity for partial discharge monitoring of high voltage equipment
NASA Astrophysics Data System (ADS)
Hao, L.; Lewin, P. L.; Swingler, S. G.
2008-05-01
Partial discharge (PD) measurements are an important technique for assessing the health of power apparatus. Previous published research by the authors has shown that an electro-optic system can be used for PD measurement of oil-filled power transformers. A PD signal generated within an oil-filled power transformer may reach a winding and then travel along the winding to the bushing core bar. The bushing, acting like a capacitor, can transfer the high frequency components of the partial discharge signal to its earthed tap point. Therefore, an effective PD current measurement can be implemented at the bushing tap by using a radio frequency current transducer around the bushing-tap earth connection. In addition, the use of an optical transmission technique not only improves the electrical noise immunity and provides the possibility of remote measurement but also realizes electrical isolation and enhances safety for operators. However, the bushing core bar can act as an aerial and in addition noise induced by the electro-optic modulation system may influence overall measurement sensitivity. This paper reports on a machine learning technique, namely the use of a support vector machine (SVM), to improve the detection sensitivity of the system. Comparison between the signal extraction performances of a passive hardware filter and the SVM technique has been assessed. The results obtained from the laboratory-based experiment have been analysed and indicate that the SVM approach provides better performance than the passive hardware filter and it can reliably detect discharge signals with apparent charge greater than 30 pC.
Predicting complications of percutaneous coronary intervention using a novel support vector method.
Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan
2013-01-01
To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer-Lemeshow χ(2) value (seven cases) and the mean cross-entropy error (eight cases). The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains.
Application of machine learning on brain cancer multiclass classification
NASA Astrophysics Data System (ADS)
Panca, V.; Rustam, Z.
2017-07-01
Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.
Predicting complications of percutaneous coronary intervention using a novel support vector method
Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan
2013-01-01
Objective To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Materials and methods Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. Results The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer–Lemeshow χ2 value (seven cases) and the mean cross-entropy error (eight cases). Conclusions The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains. PMID:23599229
Support vector machine for day ahead electricity price forecasting
NASA Astrophysics Data System (ADS)
Razak, Intan Azmira binti Wan Abdul; Abidin, Izham bin Zainal; Siah, Yap Keem; Rahman, Titik Khawa binti Abdul; Lada, M. Y.; Ramani, Anis Niza binti; Nasir, M. N. M.; Ahmad, Arfah binti
2015-05-01
Electricity price forecasting has become an important part of power system operation and planning. In a pool- based electric energy market, producers submit selling bids consisting in energy blocks and their corresponding minimum selling prices to the market operator. Meanwhile, consumers submit buying bids consisting in energy blocks and their corresponding maximum buying prices to the market operator. Hence, both producers and consumers use day ahead price forecasts to derive their respective bidding strategies to the electricity market yet reduce the cost of electricity. However, forecasting electricity prices is a complex task because price series is a non-stationary and highly volatile series. Many factors cause for price spikes such as volatility in load and fuel price as well as power import to and export from outside the market through long term contract. This paper introduces an approach of machine learning algorithm for day ahead electricity price forecasting with Least Square Support Vector Machine (LS-SVM). Previous day data of Hourly Ontario Electricity Price (HOEP), generation's price and demand from Ontario power market are used as the inputs for training data. The simulation is held using LSSVMlab in Matlab with the training and testing data of 2004. SVM that widely used for classification and regression has great generalization ability with structured risk minimization principle rather than empirical risk minimization. Moreover, same parameter settings in trained SVM give same results that absolutely reduce simulation process compared to other techniques such as neural network and time series. The mean absolute percentage error (MAPE) for the proposed model shows that SVM performs well compared to neural network.
An evaluation of open set recognition for FLIR images
NASA Astrophysics Data System (ADS)
Scherreik, Matthew; Rigling, Brian
2015-05-01
Typical supervised classification algorithms label inputs according to what was learned in a training phase. Thus, test inputs that were not seen in training are always given incorrect labels. Open set recognition algorithms address this issue by accounting for inputs that are not present in training and providing the classifier with an option to reject" unknown samples. A number of such techniques have been developed in the literature, many of which are based on support vector machines (SVMs). One approach, the 1-vs-set machine, constructs a slab" in feature space using the SVM hyperplane. Inputs falling on one side of the slab or within the slab belong to a training class, while inputs falling on the far side of the slab are rejected. We note that rejection of unknown inputs can be achieved by thresholding class posterior probabilities. Another recently developed approach, the Probabilistic Open Set SVM (POS-SVM), empirically determines good probability thresholds. We apply the 1-vs-set machine, POS-SVM, and closed set SVMs to FLIR images taken from the Comanche SIG dataset. Vehicles in the dataset are divided into three general classes: wheeled, armored personnel carrier (APC), and tank. For each class, a coarse pose estimate (front, rear, left, right) is taken. In a closed set sense, we analyze these algorithms for prediction of vehicle class and pose. To test open set performance, one or more vehicle classes are held out from training. By considering closed and open set performance separately, we may closely analyze both inter-class discrimination and threshold effectiveness.
Majid, Abdul; Ali, Safdar; Iqbal, Mubashar; Kausar, Nabeela
2014-03-01
This study proposes a novel prediction approach for human breast and colon cancers using different feature spaces. The proposed scheme consists of two stages: the preprocessor and the predictor. In the preprocessor stage, the mega-trend diffusion (MTD) technique is employed to increase the samples of the minority class, thereby balancing the dataset. In the predictor stage, machine-learning approaches of K-nearest neighbor (KNN) and support vector machines (SVM) are used to develop hybrid MTD-SVM and MTD-KNN prediction models. MTD-SVM model has provided the best values of accuracy, G-mean and Matthew's correlation coefficient of 96.71%, 96.70% and 71.98% for cancer/non-cancer dataset, breast/non-breast cancer dataset and colon/non-colon cancer dataset, respectively. We found that hybrid MTD-SVM is the best with respect to prediction performance and computational cost. MTD-KNN model has achieved moderately better prediction as compared to hybrid MTD-NB (Naïve Bayes) but at the expense of higher computing cost. MTD-KNN model is faster than MTD-RF (random forest) but its prediction is not better than MTD-RF. To the best of our knowledge, the reported results are the best results, so far, for these datasets. The proposed scheme indicates that the developed models can be used as a tool for the prediction of cancer. This scheme may be useful for study of any sequential information such as protein sequence or any nucleic acid sequence. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Application of the support vector machine to predict subclinical mastitis in dairy cattle.
Mammadova, Nazira; Keskin, Ismail
2013-01-01
This study presented a potentially useful alternative approach to ascertain the presence of subclinical and clinical mastitis in dairy cows using support vector machine (SVM) techniques. The proposed method detected mastitis in a cross-sectional representative sample of Holstein dairy cattle milked using an automatic milking system. The study used such suspected indicators of mastitis as lactation rank, milk yield, electrical conductivity, average milking duration, and control season as input data. The output variable was somatic cell counts obtained from milk samples collected monthly throughout the 15 months of the control period. Cattle were judged to be healthy or infected based on those somatic cell counts. This study undertook a detailed scrutiny of the SVM methodology, constructing and examining a model which showed 89% sensitivity, 92% specificity, and 50% error in mastitis detection.
Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Manurung, Jonson; Mawengkang, Herman; Zamzami, Elviawaty
2017-12-01
Support vector machine (SVM) is a popular classification method known to have strong generalization capabilities. SVM can solve the problem of classification and linear regression or nonlinear kernel which can be a learning algorithm for the ability of classification and regression. However, SVM also has a weakness that is difficult to determine the optimal parameter value. SVM calculates the best linear separator on the input feature space according to the training data. To classify data which are non-linearly separable, SVM uses kernel tricks to transform the data into a linearly separable data on a higher dimension feature space. The kernel trick using various kinds of kernel functions, such as : linear kernel, polynomial, radial base function (RBF) and sigmoid. Each function has parameters which affect the accuracy of SVM classification. To solve the problem genetic algorithms are proposed to be applied as the optimal parameter value search algorithm thus increasing the best classification accuracy on SVM. Data taken from UCI repository of machine learning database: Australian Credit Approval. The results show that the combination of SVM and genetic algorithms is effective in improving classification accuracy. Genetic algorithms has been shown to be effective in systematically finding optimal kernel parameters for SVM, instead of randomly selected kernel parameters. The best accuracy for data has been upgraded from kernel Linear: 85.12%, polynomial: 81.76%, RBF: 77.22% Sigmoid: 78.70%. However, for bigger data sizes, this method is not practical because it takes a lot of time.
NASA Astrophysics Data System (ADS)
Wang, Qingjie; Xin, Jingmin; Wu, Jiayi; Zheng, Nanning
2017-03-01
Microaneurysms are the earliest clinic signs of diabetic retinopathy, and many algorithms were developed for the automatic classification of these specific pathology. However, the imbalanced class distribution of dataset usually causes the classification accuracy of true microaneurysms be low. Therefore, by combining the borderline synthetic minority over-sampling technique (BSMOTE) with the data cleaning techniques such as Tomek links and Wilson's edited nearest neighbor rule (ENN) to resample the imbalanced dataset, we propose two new support vector machine (SVM) classification algorithms for the microaneurysms. The proposed BSMOTE-Tomek and BSMOTE-ENN algorithms consist of: 1) the adaptive synthesis of the minority samples in the neighborhood of the borderline, and 2) the remove of redundant training samples for improving the efficiency of data utilization. Moreover, the modified SVM classifier with probabilistic outputs is used to divide the microaneurysm candidates into two groups: true microaneurysms and false microaneurysms. The experiments with a public microaneurysms database shows that the proposed algorithms have better classification performance including the receiver operating characteristic (ROC) curve and the free-response receiver operating characteristic (FROC) curve.
sw-SVM: sensor weighting support vector machines for EEG-based brain-computer interfaces.
Jrad, N; Congedo, M; Phlypo, R; Rousseau, S; Flamary, R; Yger, F; Rakotomamonjy, A
2011-10-01
In many machine learning applications, like brain-computer interfaces (BCI), high-dimensional sensor array data are available. Sensor measurements are often highly correlated and signal-to-noise ratio is not homogeneously spread across sensors. Thus, collected data are highly variable and discrimination tasks are challenging. In this work, we focus on sensor weighting as an efficient tool to improve the classification procedure. We present an approach integrating sensor weighting in the classification framework. Sensor weights are considered as hyper-parameters to be learned by a support vector machine (SVM). The resulting sensor weighting SVM (sw-SVM) is designed to satisfy a margin criterion, that is, the generalization error. Experimental studies on two data sets are presented, a P300 data set and an error-related potential (ErrP) data set. For the P300 data set (BCI competition III), for which a large number of trials is available, the sw-SVM proves to perform equivalently with respect to the ensemble SVM strategy that won the competition. For the ErrP data set, for which a small number of trials are available, the sw-SVM shows superior performances as compared to three state-of-the art approaches. Results suggest that the sw-SVM promises to be useful in event-related potentials classification, even with a small number of training trials.
Interpreting support vector machine models for multivariate group wise analysis in neuroimaging
Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos
2015-01-01
Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913
Cheng, Shu-Xi; Xie, Chuan-Qi; Wang, Qiao-Nan; He, Yong; Shao, Yong-Ni
2014-05-01
Identification of early blight on tomato leaves by using hyperspectral imaging technique based on different effective wavelengths selection methods (successive projections algorithm, SPA; x-loading weights, x-LW; gram-schmidt orthogonaliza-tion, GSO) was studied in the present paper. Hyperspectral images of seventy healthy and seventy infected tomato leaves were obtained by hyperspectral imaging system across the wavelength range of 380-1023 nm. Reflectance of all pixels in region of interest (ROI) was extracted by ENVI 4. 7 software. Least squares-support vector machine (LS-SVM) model was established based on the full spectral wavelengths. It obtained an excellent result with the highest identification accuracy (100%) in both calibration and prediction sets. Then, EW-LS-SVM and EW-LDA models were established based on the selected wavelengths suggested by SPA, x-LW and GSO, respectively. The results showed that all of the EW-LS-SVM and EW-LDA models performed well with the identification accuracy of 100% in EW-LS-SVM model and 100%, 100% and 97. 83% in EW-LDA model, respectively. Moreover, the number of input wavelengths of SPA-LS-SVM, x-LW-LS-SVM and GSO-LS-SVM models were four (492, 550, 633 and 680 nm), three (631, 719 and 747 nm) and two (533 and 657 nm), respectively. Fewer input variables were beneficial for the development of identification instrument. It demonstrated that it is feasible to identify early blight on tomato leaves by using hyperspectral imaging, and SPA, x-LW and GSO were effective wavelengths selection methods.
NASA Astrophysics Data System (ADS)
Yang, Dong; Lu, Anxiang; Ren, Dong; Wang, Jihua
2017-11-01
This study explored the feasibility of rapid detection of biogenic amines (BAs) in cooked beef during the storage process using hyperspectral imaging technique combined with sparse representation (SR) algorithm. The hyperspectral images of samples were collected in the two spectral ranges of 400-1000 nm and 1000-1800 nm, separately. The spectral data were reduced dimensionality by SR and principal component analysis (PCA) algorithms, and then integrated the least square support vector machine (LS-SVM) to build the SR-LS-SVM and PC-LS-SVM models for the prediction of BAs values in cooked beef. The results showed that the SR-LS-SVM model exhibited the best predictive ability with determination coefficients (RP2) of 0.943 and root mean square errors (RMSEP) of 1.206 in the range of 400-1000 nm of prediction set. The SR and PCA algorithms were further combined to establish the best SR-PC-LS-SVM model for BAs prediction, which had high RP2of 0.969 and low RMSEP of 1.039 in the region of 400-1000 nm. The visual map of the BAs was generated using the best SR-PC-LS-SVM model with imaging process algorithms, which could be used to observe the changes of BAs in cooked beef more intuitively. The study demonstrated that hyperspectral imaging technique combined with sparse representation were able to detect effectively the BAs values in cooked beef during storage and the built SR-PC-LS-SVM model had a potential for rapid and accurate determination of freshness indexes in other meat and meat products.
Ansari, Mozafar; Othman, Faridah; Abunama, Taher; El-Shafie, Ahmed
2018-04-01
The function of a sewage treatment plant is to treat the sewage to acceptable standards before being discharged into the receiving waters. To design and operate such plants, it is necessary to measure and predict the influent flow rate. In this research, the influent flow rate of a sewage treatment plant (STP) was modelled and predicted by autoregressive integrated moving average (ARIMA), nonlinear autoregressive network (NAR) and support vector machine (SVM) regression time series algorithms. To evaluate the models' accuracy, the root mean square error (RMSE) and coefficient of determination (R 2 ) were calculated as initial assessment measures, while relative error (RE), peak flow criterion (PFC) and low flow criterion (LFC) were calculated as final evaluation measures to demonstrate the detailed accuracy of the selected models. An integrated model was developed based on the individual models' prediction ability for low, average and peak flow. An initial assessment of the results showed that the ARIMA model was the least accurate and the NAR model was the most accurate. The RE results also prove that the SVM model's frequency of errors above 10% or below - 10% was greater than the NAR model's. The influent was also forecasted up to 44 weeks ahead by both models. The graphical results indicate that the NAR model made better predictions than the SVM model. The final evaluation of NAR and SVM demonstrated that SVM made better predictions at peak flow and NAR fit well for low and average inflow ranges. The integrated model developed includes the NAR model for low and average influent and the SVM model for peak inflow.
Ma, Yi; Zhang, Jie; Cui, Ting-wei
2006-12-01
Airborne hyperspectral identification of red tide organism dominant species can provide technique for distinguishing red tide and its toxin, and provide support for scaling the disaster. Based on support vector machine(SVM), the present paper provides an identification model of red tide dominant species. Utilizing this model, the authors accomplished three identification experiments with the hyperspectral data obtained on 16th July, and 19th and 25th August, 2001. It is shown from the identification results that the model has a high precision and is not restricted by high dimension of the hyperspectral data.
Optimal structural design of the midship of a VLCC based on the strategy integrating SVM and GA
NASA Astrophysics Data System (ADS)
Sun, Li; Wang, Deyu
2012-03-01
In this paper a hybrid process of modeling and optimization, which integrates a support vector machine (SVM) and genetic algorithm (GA), was introduced to reduce the high time cost in structural optimization of ships. SVM, which is rooted in statistical learning theory and an approximate implementation of the method of structural risk minimization, can provide a good generalization performance in metamodeling the input-output relationship of real problems and consequently cuts down on high time cost in the analysis of real problems, such as FEM analysis. The GA, as a powerful optimization technique, possesses remarkable advantages for the problems that can hardly be optimized with common gradient-based optimization methods, which makes it suitable for optimizing models built by SVM. Based on the SVM-GA strategy, optimization of structural scantlings in the midship of a very large crude carrier (VLCC) ship was carried out according to the direct strength assessment method in common structural rules (CSR), which eventually demonstrates the high efficiency of SVM-GA in optimizing the ship structural scantlings under heavy computational complexity. The time cost of this optimization with SVM-GA has been sharply reduced, many more loops have been processed within a small amount of time and the design has been improved remarkably.
Research on Classification of Chinese Text Data Based on SVM
NASA Astrophysics Data System (ADS)
Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao
2017-09-01
Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.
Yuan, Yaxia; Zheng, Fang; Zhan, Chang-Guo
2018-03-21
Blood-brain barrier (BBB) permeability of a compound determines whether the compound can effectively enter the brain. It is an essential property which must be accounted for in drug discovery with a target in the brain. Several computational methods have been used to predict the BBB permeability. In particular, support vector machine (SVM), which is a kernel-based machine learning method, has been used popularly in this field. For SVM training and prediction, the compounds are characterized by molecular descriptors. Some SVM models were based on the use of molecular property-based descriptors (including 1D, 2D, and 3D descriptors) or fragment-based descriptors (known as the fingerprints of a molecule). The selection of descriptors is critical for the performance of a SVM model. In this study, we aimed to develop a generally applicable new SVM model by combining all of the features of the molecular property-based descriptors and fingerprints to improve the accuracy for the BBB permeability prediction. The results indicate that our SVM model has improved accuracy compared to the currently available models of the BBB permeability prediction.
Applying machine-learning techniques to Twitter data for automatic hazard-event classification.
NASA Astrophysics Data System (ADS)
Filgueira, R.; Bee, E. J.; Diaz-Doce, D.; Poole, J., Sr.; Singh, A.
2017-12-01
The constant flow of information offered by tweets provides valuable information about all sorts of events at a high temporal and spatial resolution. Over the past year we have been analyzing in real-time geological hazards/phenomenon, such as earthquakes, volcanic eruptions, landslides, floods or the aurora, as part of the GeoSocial project, by geo-locating tweets filtered by keywords in a web-map. However, not all the filtered tweets are related with hazard/phenomenon events. This work explores two classification techniques for automatic hazard-event categorization based on tweets about the "Aurora". First, tweets were filtered using aurora-related keywords, removing stop words and selecting the ones written in English. For classifying the remaining between "aurora-event" or "no-aurora-event" categories, we compared two state-of-art techniques: Support Vector Machine (SVM) and Deep Convolutional Neural Networks (CNN) algorithms. Both approaches belong to the family of supervised learning algorithms, which make predictions based on labelled training dataset. Therefore, we created a training dataset by tagging 1200 tweets between both categories. The general form of SVM is used to separate two classes by a function (kernel). We compared the performance of four different kernels (Linear Regression, Logistic Regression, Multinomial Naïve Bayesian and Stochastic Gradient Descent) provided by Scikit-Learn library using our training dataset to build the SVM classifier. The results shown that the Logistic Regression (LR) gets the best accuracy (87%). So, we selected the SVM-LR classifier to categorise a large collection of tweets using the "dispel4py" framework.Later, we developed a CNN classifier, where the first layer embeds words into low-dimensional vectors. The next layer performs convolutions over the embedded word vectors. Results from the convolutional layer are max-pooled into a long feature vector, which is classified using a softmax layer. The CNN's accuracy is lower (83%) than the SVM-LR, since the algorithm needs a bigger training dataset to increase its accuracy. We used TensorFlow framework for applying CNN classifier to the same collection of tweets.In future we will modify both classifiers to work with other geo-hazards, use larger training datasets and apply them in real-time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klement, Rainer J., E-mail: rainer_klement@gmx.de; Department of Radiotherapy and Radiation Oncology, Leopoldina Hospital, Schweinfurt; Allgäuer, Michael
2014-03-01
Background: Several prognostic factors for local tumor control probability (TCP) after stereotactic body radiation therapy (SBRT) for early stage non-small cell lung cancer (NSCLC) have been described, but no attempts have been undertaken to explore whether a nonlinear combination of potential factors might synergistically improve the prediction of local control. Methods and Materials: We investigated a support vector machine (SVM) for predicting TCP in a cohort of 399 patients treated at 13 German and Austrian institutions. Among 7 potential input features for the SVM we selected those most important on the basis of forward feature selection, thereby evaluating classifier performancemore » by using 10-fold cross-validation and computing the area under the ROC curve (AUC). The final SVM classifier was built by repeating the feature selection 10 times with different splitting of the data for cross-validation and finally choosing only those features that were selected at least 5 out of 10 times. It was compared with a multivariate logistic model that was built by forward feature selection. Results: Local failure occurred in 12% of patients. Biologically effective dose (BED) at the isocenter (BED{sub ISO}) was the strongest predictor of TCP in the logistic model and also the most frequently selected input feature for the SVM. A bivariate logistic function of BED{sub ISO} and the pulmonary function indicator forced expiratory volume in 1 second (FEV1) yielded the best description of the data but resulted in a significantly smaller AUC than the final SVM classifier with the input features BED{sub ISO}, age, baseline Karnofsky index, and FEV1 (0.696 ± 0.040 vs 0.789 ± 0.001, P<.03). The final SVM resulted in sensitivity and specificity of 67.0% ± 0.5% and 78.7% ± 0.3%, respectively. Conclusions: These results confirm that machine learning techniques like SVMs can be successfully applied to predict treatment outcome after SBRT. Improvements over traditional TCP modeling are expected through a nonlinear combination of multiple features, eventually helping in the task of personalized treatment planning.« less
Klement, Rainer J; Allgäuer, Michael; Appold, Steffen; Dieckmann, Karin; Ernst, Iris; Ganswindt, Ute; Holy, Richard; Nestle, Ursula; Nevinny-Stickel, Meinhard; Semrau, Sabine; Sterzing, Florian; Wittig, Andrea; Andratschke, Nicolaus; Guckenberger, Matthias
2014-03-01
Several prognostic factors for local tumor control probability (TCP) after stereotactic body radiation therapy (SBRT) for early stage non-small cell lung cancer (NSCLC) have been described, but no attempts have been undertaken to explore whether a nonlinear combination of potential factors might synergistically improve the prediction of local control. We investigated a support vector machine (SVM) for predicting TCP in a cohort of 399 patients treated at 13 German and Austrian institutions. Among 7 potential input features for the SVM we selected those most important on the basis of forward feature selection, thereby evaluating classifier performance by using 10-fold cross-validation and computing the area under the ROC curve (AUC). The final SVM classifier was built by repeating the feature selection 10 times with different splitting of the data for cross-validation and finally choosing only those features that were selected at least 5 out of 10 times. It was compared with a multivariate logistic model that was built by forward feature selection. Local failure occurred in 12% of patients. Biologically effective dose (BED) at the isocenter (BED(ISO)) was the strongest predictor of TCP in the logistic model and also the most frequently selected input feature for the SVM. A bivariate logistic function of BED(ISO) and the pulmonary function indicator forced expiratory volume in 1 second (FEV1) yielded the best description of the data but resulted in a significantly smaller AUC than the final SVM classifier with the input features BED(ISO), age, baseline Karnofsky index, and FEV1 (0.696 ± 0.040 vs 0.789 ± 0.001, P<.03). The final SVM resulted in sensitivity and specificity of 67.0% ± 0.5% and 78.7% ± 0.3%, respectively. These results confirm that machine learning techniques like SVMs can be successfully applied to predict treatment outcome after SBRT. Improvements over traditional TCP modeling are expected through a nonlinear combination of multiple features, eventually helping in the task of personalized treatment planning. Copyright © 2014 Elsevier Inc. All rights reserved.
Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian
2016-01-01
In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%–19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides. PMID:27187430
Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian
2016-05-11
In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%-19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides.
NASA Astrophysics Data System (ADS)
Bai, Ting; Sun, Kaimin; Deng, Shiquan; Chen, Yan
2018-03-01
High resolution image change detection is one of the key technologies of remote sensing application, which is of great significance for resource survey, environmental monitoring, fine agriculture, military mapping and battlefield environment detection. In this paper, for high-resolution satellite imagery, Random Forest (RF), Support Vector Machine (SVM), Deep belief network (DBN), and Adaboost models were established to verify the possibility of different machine learning applications in change detection. In order to compare detection accuracy of four machine learning Method, we applied these four machine learning methods for two high-resolution images. The results shows that SVM has higher overall accuracy at small samples compared to RF, Adaboost, and DBN for binary and from-to change detection. With the increase in the number of samples, RF has higher overall accuracy compared to Adaboost, SVM and DBN.
Novel Hybrid of LS-SVM and Kalman Filter for GPS/INS Integration
NASA Astrophysics Data System (ADS)
Xu, Zhenkai; Li, Yong; Rizos, Chris; Xu, Xiaosu
Integration of Global Positioning System (GPS) and Inertial Navigation System (INS) technologies can overcome the drawbacks of the individual systems. One of the advantages is that the integrated solution can provide continuous navigation capability even during GPS outages. However, bridging the GPS outages is still a challenge when Micro-Electro-Mechanical System (MEMS) inertial sensors are used. Methods being currently explored by the research community include applying vehicle motion constraints, optimal smoother, and artificial intelligence (AI) techniques. In the research area of AI, the neural network (NN) approach has been extensively utilised up to the present. In an NN-based integrated system, a Kalman filter (KF) estimates position, velocity and attitude errors, as well as the inertial sensor errors, to output navigation solutions while GPS signals are available. At the same time, an NN is trained to map the vehicle dynamics with corresponding KF states, and to correct INS measurements when GPS measurements are unavailable. To achieve good performance it is critical to select suitable quality and an optimal number of samples for the NN. This is sometimes too rigorous a requirement which limits real world application of NN-based methods.The support vector machine (SVM) approach is based on the structural risk minimisation principle, instead of the minimised empirical error principle that is commonly implemented in an NN. The SVM can avoid local minimisation and over-fitting problems in an NN, and therefore potentially can achieve a higher level of global performance. This paper focuses on the least squares support vector machine (LS-SVM), which can solve highly nonlinear and noisy black-box modelling problems. This paper explores the application of the LS-SVM to aid the GPS/INS integrated system, especially during GPS outages. The paper describes the principles of the LS-SVM and of the KF hybrid method, and introduces the LS-SVM regression algorithm. Field test data is processed to evaluate the performance of the proposed approach.
NASA Astrophysics Data System (ADS)
Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying
2018-06-01
In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.
Support vector machine for automatic pain recognition
NASA Astrophysics Data System (ADS)
Monwar, Md Maruf; Rezaei, Siamak
2009-02-01
Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.
NASA Astrophysics Data System (ADS)
Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah
2014-05-01
Flood is one of the most devastating natural disasters that occur frequently in Terengganu, Malaysia. Recently, ensemble based techniques are getting extremely popular in flood modeling. In this paper, weights-of-evidence (WoE) model was utilized first, to assess the impact of classes of each conditioning factor on flooding through bivariate statistical analysis (BSA). Then, these factors were reclassified using the acquired weights and entered into the support vector machine (SVM) model to evaluate the correlation between flood occurrence and each conditioning factor. Through this integration, the weak point of WoE can be solved and the performance of the SVM will be enhanced. The spatial database included flood inventory, slope, stream power index (SPI), topographic wetness index (TWI), altitude, curvature, distance from the river, geology, rainfall, land use/cover (LULC), and soil type. Four kernel types of SVM (linear kernel (LN), polynomial kernel (PL), radial basis function kernel (RBF), and sigmoid kernel (SIG)) were used to investigate the performance of each kernel type. The efficiency of the new ensemble WoE and SVM method was tested using area under curve (AUC) which measured the prediction and success rates. The validation results proved the strength and efficiency of the ensemble method over the individual methods. The best results were obtained from RBF kernel when compared with the other kernel types. Success rate and prediction rate for ensemble WoE and RBF-SVM method were 96.48% and 95.67% respectively. The proposed ensemble flood susceptibility mapping method could assist researchers and local governments in flood mitigation strategies.
Support vector machines-based fault diagnosis for turbo-pump rotor
NASA Astrophysics Data System (ADS)
Yuan, Sheng-Fa; Chu, Fu-Lei
2006-05-01
Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.
A support vector machine approach for classification of welding defects from ultrasonic signals
NASA Astrophysics Data System (ADS)
Chen, Yuan; Ma, Hong-Wei; Zhang, Guang-Ming
2014-07-01
Defect classification is an important issue in ultrasonic non-destructive evaluation. A layered multi-class support vector machine (LMSVM) classification system, which combines multiple SVM classifiers through a layered architecture, is proposed in this paper. The proposed LMSVM classification system is applied to the classification of welding defects from ultrasonic test signals. The measured ultrasonic defect echo signals are first decomposed into wavelet coefficients by the wavelet packet transform. The energy of the wavelet coefficients at different frequency channels are used to construct the feature vectors. The bees algorithm (BA) is then used for feature selection and SVM parameter optimisation for the LMSVM classification system. The BA-based feature selection optimises the energy feature vectors. The optimised feature vectors are input to the LMSVM classification system for training and testing. Experimental results of classifying welding defects demonstrate that the proposed technique is highly robust, precise and reliable for ultrasonic defect classification.
Unresolved Galaxy Classifier for ESA/Gaia mission: Support Vector Machines approach
NASA Astrophysics Data System (ADS)
Bellas-Velidis, Ioannis; Kontizas, Mary; Dapergolas, Anastasios; Livanou, Evdokia; Kontizas, Evangelos; Karampelas, Antonios
A software package Unresolved Galaxy Classifier (UGC) is being developed for the ground-based pipeline of ESA's Gaia mission. It aims to provide an automated taxonomic classification and specific parameters estimation analyzing Gaia BP/RP instrument low-dispersion spectra of unresolved galaxies. The UGC algorithm is based on a supervised learning technique, the Support Vector Machines (SVM). The software is implemented in Java as two separate modules. An offline learning module provides functions for SVM-models training. Once trained, the set of models can be repeatedly applied to unknown galaxy spectra by the pipeline's application module. A library of galaxy models synthetic spectra, simulated for the BP/RP instrument, is used to train and test the modules. Science tests show a very good classification performance of UGC and relatively good regression performance, except for some of the parameters. Possible approaches to improve the performance are discussed.
Fiot, Jean-Baptiste; Cohen, Laurent D; Raniga, Parnesh; Fripp, Jurgen
2013-09-01
Support vector machines (SVM) are machine learning techniques that have been used for segmentation and classification of medical images, including segmentation of white matter hyper-intensities (WMH). Current approaches using SVM for WMH segmentation extract features from the brain and classify these followed by complex post-processing steps to remove false positives. The method presented in this paper combines advanced pre-processing, tissue-based feature selection and SVM classification to obtain efficient and accurate WMH segmentation. Features from 125 patients, generated from up to four MR modalities [T1-w, T2-w, proton-density and fluid attenuated inversion recovery(FLAIR)], differing neighbourhood sizes and the use of multi-scale features were compared. We found that although using all four modalities gave the best overall classification (average Dice scores of 0.54 ± 0.12, 0.72 ± 0.06 and 0.82 ± 0.06 respectively for small, moderate and severe lesion loads); this was not significantly different (p = 0.50) from using just T1-w and FLAIR sequences (Dice scores of 0.52 ± 0.13, 0.71 ± 0.08 and 0.81 ± 0.07). Furthermore, there was a negligible difference between using 5 × 5 × 5 and 3 × 3 × 3 features (p = 0.93). Finally, we show that careful consideration of features and pre-processing techniques not only saves storage space and computation time but also leads to more efficient classification, which outperforms the one based on all features with post-processing. Copyright © 2013 John Wiley & Sons, Ltd.
Cerasa, Antonio; Castiglioni, Isabella; Salvatore, Christian; Funaro, Angela; Martino, Iolanda; Alfano, Stefania; Donzuso, Giulia; Perrotta, Paolo; Gioia, Maria Cecilia; Gilardi, Maria Carla; Quattrone, Aldo
2015-01-01
Presently, there are no valid biomarkers to identify individuals with eating disorders (ED). The aim of this work was to assess the feasibility of a machine learning method for extracting reliable neuroimaging features allowing individual categorization of patients with ED. Support Vector Machine (SVM) technique, combined with a pattern recognition method, was employed utilizing structural magnetic resonance images. Seventeen females with ED (six with diagnosis of anorexia nervosa and 11 with bulimia nervosa) were compared against 17 body mass index-matched healthy controls (HC). Machine learning allowed individual diagnosis of ED versus HC with an Accuracy ≥ 0.80. Voxel-based pattern recognition analysis demonstrated that voxels influencing the classification Accuracy involved the occipital cortex, the posterior cerebellar lobule, precuneus, sensorimotor/premotor cortices, and the medial prefrontal cortex, all critical regions known to be strongly involved in the pathophysiological mechanisms of ED. Although these findings should be considered preliminary given the small size investigated, SVM analysis highlights the role of well-known brain regions as possible biomarkers to distinguish ED from HC at an individual level, thus encouraging the translational implementation of this new multivariate approach in the clinical practice. PMID:26648660
Applications of Support Vector Machine (SVM) Learning in Cancer Genomics
HUANG, SHUJUN; CAI, NIANGUANG; PACHECO, PEDRO PENZUTI; NARANDES, SHAVIRA; WANG, YANG; XU, WAYNE
2017-01-01
Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. PMID:29275361
NASA Astrophysics Data System (ADS)
Zhou, Xin; Jun, Sun; Zhang, Bing; Jun, Wu
2017-07-01
In order to improve the reliability of the spectrum feature extracted by wavelet transform, a method combining wavelet transform (WT) with bacterial colony chemotaxis algorithm and support vector machine (BCC-SVM) algorithm (WT-BCC-SVM) was proposed in this paper. Besides, we aimed to identify different kinds of pesticide residues on lettuce leaves in a novel and rapid non-destructive way by using fluorescence spectra technology. The fluorescence spectral data of 150 lettuce leaf samples of five different kinds of pesticide residues on the surface of lettuce were obtained using Cary Eclipse fluorescence spectrometer. Standard normalized variable detrending (SNV detrending), Savitzky-Golay coupled with Standard normalized variable detrending (SG-SNV detrending) were used to preprocess the raw spectra, respectively. Bacterial colony chemotaxis combined with support vector machine (BCC-SVM) and support vector machine (SVM) classification models were established based on full spectra (FS) and wavelet transform characteristics (WTC), respectively. Moreover, WTC were selected by WT. The results showed that the accuracy of training set, calibration set and the prediction set of the best optimal classification model (SG-SNV detrending-WT-BCC-SVM) were 100%, 98% and 93.33%, respectively. In addition, the results indicated that it was feasible to use WT-BCC-SVM to establish diagnostic model of different kinds of pesticide residues on lettuce leaves.
Dandapat, Samarendra
2017-01-01
The complex wavelet sub-band bi-spectrum (CWSB) features are proposed for detection and classification of myocardial infarction (MI), heart muscle disease (HMD) and bundle branch block (BBB) from 12-lead ECG. The dual tree CW transform of 12-lead ECG produces CW coefficients at different sub-bands. The higher-order CW analysis is used for evaluation of CWSB. The mean of the absolute value of CWSB, and the number of negative phase angle and the number of positive phase angle features from the phase of CWSB of 12-lead ECG are evaluated. Extreme learning machine and support vector machine (SVM) classifiers are used to evaluate the performance of CWSB features. Experimental results show that the proposed CWSB features of 12-lead ECG and the SVM classifier are successful for classification of various heart pathologies. The individual accuracy values for MI, HMD and BBB classes are obtained as 98.37, 97.39 and 96.40%, respectively, using SVM classifier and radial basis function kernel function. A comparison has also been made with existing 12-lead ECG-based cardiac disease detection techniques. PMID:28894589
Efficient and Privacy-Preserving Online Medical Prediagnosis Framework Using Nonlinear SVM.
Zhu, Hui; Liu, Xiaoxia; Lu, Rongxing; Li, Hui
2017-05-01
With the advances of machine learning algorithms and the pervasiveness of network terminals, the online medical prediagnosis system, which can provide the diagnosis of healthcare provider anywhere anytime, has attracted considerable interest recently. However, the flourish of online medical prediagnosis system still faces many challenges including information security and privacy preservation. In this paper, we propose an e fficient and privacy-preserving online medical prediagnosis framework, called eDiag, by using nonlinear kernel support vector machine (SVM). With eDiag, the sensitive personal health information can be processed without privacy disclosure during online prediagnosis service. Specifically, based on an improved expression for the nonlinear SVM, an efficient and privacy-preserving classification scheme is introduced with lightweight multiparty random masking and polynomial aggregation techniques. The encrypted user query is directly operated at the service provider without decryption, and the diagnosis result can only be decrypted by user. Through extensive analysis, we show that eDiag can ensure that users' health information and healthcare provider's prediction model are kept confidential, and has significantly less computation and communication overhead than existing schemes. In addition, performance evaluations via implementing eDiag on smartphone and computer demonstrate eDiag's effectiveness in term of real online environment.
Least-Squares Support Vector Machine Approach to Viral Replication Origin Prediction
Cruz-Cano, Raul; Chew, David S.H.; Kwok-Pui, Choi; Ming-Ying, Leung
2010-01-01
Replication of their DNA genomes is a central step in the reproduction of many viruses. Procedures to find replication origins, which are initiation sites of the DNA replication process, are therefore of great importance for controlling the growth and spread of such viruses. Existing computational methods for viral replication origin prediction have mostly been tested within the family of herpesviruses. This paper proposes a new approach by least-squares support vector machines (LS-SVMs) and tests its performance not only on the herpes family but also on a collection of caudoviruses coming from three viral families under the order of caudovirales. The LS-SVM approach provides sensitivities and positive predictive values superior or comparable to those given by the previous methods. When suitably combined with previous methods, the LS-SVM approach further improves the prediction accuracy for the herpesvirus replication origins. Furthermore, by recursive feature elimination, the LS-SVM has also helped find the most significant features of the data sets. The results suggest that the LS-SVMs will be a highly useful addition to the set of computational tools for viral replication origin prediction and illustrate the value of optimization-based computing techniques in biomedical applications. PMID:20729987
Least-Squares Support Vector Machine Approach to Viral Replication Origin Prediction.
Cruz-Cano, Raul; Chew, David S H; Kwok-Pui, Choi; Ming-Ying, Leung
2010-06-01
Replication of their DNA genomes is a central step in the reproduction of many viruses. Procedures to find replication origins, which are initiation sites of the DNA replication process, are therefore of great importance for controlling the growth and spread of such viruses. Existing computational methods for viral replication origin prediction have mostly been tested within the family of herpesviruses. This paper proposes a new approach by least-squares support vector machines (LS-SVMs) and tests its performance not only on the herpes family but also on a collection of caudoviruses coming from three viral families under the order of caudovirales. The LS-SVM approach provides sensitivities and positive predictive values superior or comparable to those given by the previous methods. When suitably combined with previous methods, the LS-SVM approach further improves the prediction accuracy for the herpesvirus replication origins. Furthermore, by recursive feature elimination, the LS-SVM has also helped find the most significant features of the data sets. The results suggest that the LS-SVMs will be a highly useful addition to the set of computational tools for viral replication origin prediction and illustrate the value of optimization-based computing techniques in biomedical applications.
Machine learning algorithms for mode-of-action classification in toxicity assessment.
Zhang, Yile; Wong, Yau Shu; Deng, Jian; Anton, Cristina; Gabos, Stephan; Zhang, Weiping; Huang, Dorothy Yu; Jin, Can
2016-01-01
Real Time Cell Analysis (RTCA) technology is used to monitor cellular changes continuously over the entire exposure period. Combining with different testing concentrations, the profiles have potential in probing the mode of action (MOA) of the testing substances. In this paper, we present machine learning approaches for MOA assessment. Computational tools based on artificial neural network (ANN) and support vector machine (SVM) are developed to analyze the time-concentration response curves (TCRCs) of human cell lines responding to tested chemicals. The techniques are capable of learning data from given TCRCs with known MOA information and then making MOA classification for the unknown toxicity. A novel data processing step based on wavelet transform is introduced to extract important features from the original TCRC data. From the dose response curves, time interval leading to higher classification success rate can be selected as input to enhance the performance of the machine learning algorithm. This is particularly helpful when handling cases with limited and imbalanced data. The validation of the proposed method is demonstrated by the supervised learning algorithm applied to the exposure data of HepG2 cell line to 63 chemicals with 11 concentrations in each test case. Classification success rate in the range of 85 to 95 % are obtained using SVM for MOA classification with two clusters to cases up to four clusters. Wavelet transform is capable of capturing important features of TCRCs for MOA classification. The proposed SVM scheme incorporated with wavelet transform has a great potential for large scale MOA classification and high-through output chemical screening.
Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng
2013-01-01
In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR.
An Automated and Intelligent Medical Decision Support System for Brain MRI Scans Classification.
Siddiqui, Muhammad Faisal; Reza, Ahmed Wasif; Kanesan, Jeevan
2015-01-01
A wide interest has been observed in the medical health care applications that interpret neuroimaging scans by machine learning systems. This research proposes an intelligent, automatic, accurate, and robust classification technique to classify the human brain magnetic resonance image (MRI) as normal or abnormal, to cater down the human error during identifying the diseases in brain MRIs. In this study, fast discrete wavelet transform (DWT), principal component analysis (PCA), and least squares support vector machine (LS-SVM) are used as basic components. Firstly, fast DWT is employed to extract the salient features of brain MRI, followed by PCA, which reduces the dimensions of the features. These reduced feature vectors also shrink the memory storage consumption by 99.5%. At last, an advanced classification technique based on LS-SVM is applied to brain MR image classification using reduced features. For improving the efficiency, LS-SVM is used with non-linear radial basis function (RBF) kernel. The proposed algorithm intelligently determines the optimized values of the hyper-parameters of the RBF kernel and also applied k-fold stratified cross validation to enhance the generalization of the system. The method was tested by 340 patients' benchmark datasets of T1-weighted and T2-weighted scans. From the analysis of experimental results and performance comparisons, it is observed that the proposed medical decision support system outperformed all other modern classifiers and achieves 100% accuracy rate (specificity/sensitivity 100%/100%). Furthermore, in terms of computation time, the proposed technique is significantly faster than the recent well-known methods, and it improves the efficiency by 71%, 3%, and 4% on feature extraction stage, feature reduction stage, and classification stage, respectively. These results indicate that the proposed well-trained machine learning system has the potential to make accurate predictions about brain abnormalities from the individual subjects, therefore, it can be used as a significant tool in clinical practice.
HYBRID NEURAL NETWORK AND SUPPORT VECTOR MACHINE METHOD FOR OPTIMIZATION
NASA Technical Reports Server (NTRS)
Rai, Man Mohan (Inventor)
2005-01-01
System and method for optimization of a design associated with a response function, using a hybrid neural net and support vector machine (NN/SVM) analysis to minimize or maximize an objective function, optionally subject to one or more constraints. As a first example, the NN/SVM analysis is applied iteratively to design of an aerodynamic component, such as an airfoil shape, where the objective function measures deviation from a target pressure distribution on the perimeter of the aerodynamic component. As a second example, the NN/SVM analysis is applied to data classification of a sequence of data points in a multidimensional space. The NN/SVM analysis is also applied to data regression.
Hybrid Neural Network and Support Vector Machine Method for Optimization
NASA Technical Reports Server (NTRS)
Rai, Man Mohan (Inventor)
2007-01-01
System and method for optimization of a design associated with a response function, using a hybrid neural net and support vector machine (NN/SVM) analysis to minimize or maximize an objective function, optionally subject to one or more constraints. As a first example, the NN/SVM analysis is applied iteratively to design of an aerodynamic component, such as an airfoil shape, where the objective function measures deviation from a target pressure distribution on the perimeter of the aerodynamic component. As a second example, the NN/SVM analysis is applied to data classification of a sequence of data points in a multidimensional space. The NN/SVM analysis is also applied to data regression.
Bowd, Christopher; Medeiros, Felipe A.; Zhang, Zuohua; Zangwill, Linda M.; Hao, Jiucang; Lee, Te-Won; Sejnowski, Terrence J.; Weinreb, Robert N.; Goldbaum, Michael H.
2010-01-01
Purpose To classify healthy and glaucomatous eyes using relevance vector machine (RVM) and support vector machine (SVM) learning classifiers trained on retinal nerve fiber layer (RNFL) thickness measurements obtained by scanning laser polarimetry (SLP). Methods Seventy-two eyes of 72 healthy control subjects (average age = 64.3 ± 8.8 years, visual field mean deviation =−0.71 ± 1.2 dB) and 92 eyes of 92 patients with glaucoma (average age = 66.9 ± 8.9 years, visual field mean deviation =−5.32 ± 4.0 dB) were imaged with SLP with variable corneal compensation (GDx VCC; Laser Diagnostic Technologies, San Diego, CA). RVM and SVM learning classifiers were trained and tested on SLP-determined RNFL thickness measurements from 14 standard parameters and 64 sectors (approximately 5.6° each) obtained in the circumpapillary area under the instrument-defined measurement ellipse (total 78 parameters). Tenfold cross-validation was used to train and test RVM and SVM classifiers on unique subsets of the full 164-eye data set and areas under the receiver operating characteristic (AUROC) curve for the classification of eyes in the test set were generated. AUROC curve results from RVM and SVM were compared to those for 14 SLP software-generated global and regional RNFL thickness parameters. Also reported was the AUROC curve for the GDx VCC software-generated nerve fiber indicator (NFI). Results The AUROC curves for RVM and SVM were 0.90 and 0.91, respectively, and increased to 0.93 and 0.94 when the training sets were optimized with sequential forward and backward selection (resulting in reduced dimensional data sets). AUROC curves for optimized RVM and SVM were significantly larger than those for all individual SLP parameters. The AUROC curve for the NFI was 0.87. Conclusions Results from RVM and SVM trained on SLP RNFL thickness measurements are similar and provide accurate classification of glaucomatous and healthy eyes. RVM may be preferable to SVM, because it provides a Bayesian-derived probability of glaucoma as an output. These results suggest that these machine learning classifiers show good potential for glaucoma diagnosis. PMID:15790898
NASA Astrophysics Data System (ADS)
Wei, ZHANG; Tongyu, WU; Bowen, ZHENG; Shiping, LI; Yipo, ZHANG; Zejie, YIN
2018-04-01
A new neutron-gamma discriminator based on the support vector machine (SVM) method is proposed to improve the performance of the time-of-flight neutron spectrometer. The neutron detector is an EJ-299-33 plastic scintillator with pulse-shape discrimination (PSD) property. The SVM algorithm is implemented in field programmable gate array (FPGA) to carry out the real-time sifting of neutrons in neutron-gamma mixed radiation fields. This study compares the ability of the pulse gradient analysis method and the SVM method. The results show that this SVM discriminator can provide a better discrimination accuracy of 99.1%. The accuracy and performance of the SVM discriminator based on FPGA have been evaluated in the experiments. It can get a figure of merit of 1.30.
Detection of Splice Sites Using Support Vector Machine
NASA Astrophysics Data System (ADS)
Varadwaj, Pritish; Purohit, Neetesh; Arora, Bhumika
Automatic identification and annotation of exon and intron region of gene, from DNA sequences has been an important research area in field of computational biology. Several approaches viz. Hidden Markov Model (HMM), Artificial Intelligence (AI) based machine learning and Digital Signal Processing (DSP) techniques have extensively and independently been used by various researchers to cater this challenging task. In this work, we propose a Support Vector Machine based kernel learning approach for detection of splice sites (the exon-intron boundary) in a gene. Electron-Ion Interaction Potential (EIIP) values of nucleotides have been used for mapping character sequences to corresponding numeric sequences. Radial Basis Function (RBF) SVM kernel is trained using EIIP numeric sequences. Furthermore this was tested on test gene dataset for detection of splice site by window (of 12 residues) shifting. Optimum values of window size, various important parameters of SVM kernel have been optimized for a better accuracy. Receiver Operating Characteristic (ROC) curves have been utilized for displaying the sensitivity rate of the classifier and results showed 94.82% accuracy for splice site detection on test dataset.
Probabilistic Open Set Recognition
NASA Astrophysics Data System (ADS)
Jain, Lalit Prithviraj
Real-world tasks in computer vision, pattern recognition and machine learning often touch upon the open set recognition problem: multi-class recognition with incomplete knowledge of the world and many unknown inputs. An obvious way to approach such problems is to develop a recognition system that thresholds probabilities to reject unknown classes. Traditional rejection techniques are not about the unknown; they are about the uncertain boundary and rejection around that boundary. Thus traditional techniques only represent the "known unknowns". However, a proper open set recognition algorithm is needed to reduce the risk from the "unknown unknowns". This dissertation examines this concept and finds existing probabilistic multi-class recognition approaches are ineffective for true open set recognition. We hypothesize the cause is due to weak adhoc assumptions combined with closed-world assumptions made by existing calibration techniques. Intuitively, if we could accurately model just the positive data for any known class without overfitting, we could reject the large set of unknown classes even under this assumption of incomplete class knowledge. For this, we formulate the problem as one of modeling positive training data by invoking statistical extreme value theory (EVT) near the decision boundary of positive data with respect to negative data. We provide a new algorithm called the PI-SVM for estimating the unnormalized posterior probability of class inclusion. This dissertation also introduces a new open set recognition model called Compact Abating Probability (CAP), where the probability of class membership decreases in value (abates) as points move from known data toward open space. We show that CAP models improve open set recognition for multiple algorithms. Leveraging the CAP formulation, we go on to describe the novel Weibull-calibrated SVM (W-SVM) algorithm, which combines the useful properties of statistical EVT for score calibration with one-class and binary support vector machines. Building from the success of statistical EVT based recognition methods such as PI-SVM and W-SVM on the open set problem, we present a new general supervised learning algorithm for multi-class classification and multi-class open set recognition called the Extreme Value Local Basis (EVLB). The design of this algorithm is motivated by the observation that extrema from known negative class distributions are the closest negative points to any positive sample during training, and thus should be used to define the parameters of a probabilistic decision model. In the EVLB, the kernel distribution for each positive training sample is estimated via an EVT distribution fit over the distances to the separating hyperplane between positive training sample and closest negative samples, with a subset of the overall positive training data retained to form a probabilistic decision boundary. Using this subset as a frame of reference, the probability of a sample at test time decreases as it moves away from the positive class. Possessing this property, the EVLB is well-suited to open set recognition problems where samples from unknown or novel classes are encountered at test. Our experimental evaluation shows that the EVLB provides a substantial improvement in scalability compared to standard radial basis function kernel machines, as well as P I-SVM and W-SVM, with improved accuracy in many cases. We evaluate our algorithm on open set variations of the standard visual learning benchmarks, as well as with an open subset of classes from Caltech 256 and ImageNet. Our experiments show that PI-SVM, WSVM and EVLB provide significant advances over the previous state-of-the-art solutions for the same tasks.
Lin, Yi; Cai, Fu-Ying; Zhang, Guang-Ya
2007-01-01
A quantitative structure-property relationship (QSPR) model in terms of amino acid composition and the activity of Bacillus thuringiensis insecticidal crystal proteins was established. Support vector machine (SVM) is a novel general machine-learning tool based on the structural risk minimization principle that exhibits good generalization when fault samples are few; it is especially suitable for classification, forecasting, and estimation in cases where small amounts of samples are involved such as fault diagnosis; however, some parameters of SVM are selected based on the experience of the operator, which has led to decreased efficiency of SVM in practical application. The uniform design (UD) method was applied to optimize the running parameters of SVM. It was found that the average accuracy rate approached 73% when the penalty factor was 0.01, the epsilon 0.2, the gamma 0.05, and the range 0.5. The results indicated that UD might be used an effective method to optimize the parameters of SVM and SVM and could be used as an alternative powerful modeling tool for QSPR studies of the activity of Bacillus thuringiensis (Bt) insecticidal crystal proteins. Therefore, a novel method for predicting the insecticidal activity of Bt insecticidal crystal proteins was proposed by the authors of this study.
Zhang, Daqing; Xiao, Jianfeng; Zhou, Nannan; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian
2015-01-01
Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. PMID:26504797
Computer-aided assessment of pulmonary disease in novel swine-origin H1N1 influenza on CT
NASA Astrophysics Data System (ADS)
Yao, Jianhua; Dwyer, Andrew J.; Summers, Ronald M.; Mollura, Daniel J.
2011-03-01
The 2009 pandemic is a global outbreak of novel H1N1 influenza. Radiologic images can be used to assess the presence and severity of pulmonary infection. We develop a computer-aided assessment system to analyze the CT images from Swine-Origin Influenza A virus (S-OIV) novel H1N1 cases. The technique is based on the analysis of lung texture patterns and classification using a support vector machine (SVM). Pixel-wise tissue classification is computed from the SVM value. The method was validated on four H1N1 cases and ten normal cases. We demonstrated that the technique can detect regions of pulmonary abnormality in novel H1N1 patients and differentiate these regions from visually normal lung (area under the ROC curve is 0.993). This technique can also be applied to differentiate regions infected by different pulmonary diseases.
lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.
Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia
2015-01-01
Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.
NASA Astrophysics Data System (ADS)
Su, Lihong
In remote sensing communities, support vector machine (SVM) learning has recently received increasing attention. SVM learning usually requires large memory and enormous amounts of computation time on large training sets. According to SVM algorithms, the SVM classification decision function is fully determined by support vectors, which compose a subset of the training sets. In this regard, a solution to optimize SVM learning is to efficiently reduce training sets. In this paper, a data reduction method based on agglomerative hierarchical clustering is proposed to obtain smaller training sets for SVM learning. Using a multiple angle remote sensing dataset of a semi-arid region, the effectiveness of the proposed method is evaluated by classification experiments with a series of reduced training sets. The experiments show that there is no loss of SVM accuracy when the original training set is reduced to 34% using the proposed approach. Maximum likelihood classification (MLC) also is applied on the reduced training sets. The results show that MLC can also maintain the classification accuracy. This implies that the most informative data instances can be retained by this approach.
Ranking Support Vector Machine with Kernel Approximation
Dou, Yong
2017-01-01
Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms. PMID:28293256
Ranking Support Vector Machine with Kernel Approximation.
Chen, Kai; Li, Rongchun; Dou, Yong; Liang, Zhengfa; Lv, Qi
2017-01-01
Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms.
Fuzzy support vector machine for microarray imbalanced data classification
NASA Astrophysics Data System (ADS)
Ladayya, Faroh; Purnami, Santi Wulan; Irhamah
2017-11-01
DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different contributions to the classifier. The minority classes have large fuzzy membership so FSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is a high dimensional data with a very large number of features, it is necessary to do feature selection first using Fast Correlation based Filter (FCBF). In this study will be analyzed by SVM, FSVM and both methods by applying FCBF and get the classification performance of them. Based on the overall results, FSVM on selected features has the best classification performance compared to SVM.
Noninvasive extraction of fetal electrocardiogram based on Support Vector Machine
NASA Astrophysics Data System (ADS)
Fu, Yumei; Xiang, Shihan; Chen, Tianyi; Zhou, Ping; Huang, Weiyan
2015-10-01
The fetal electrocardiogram (FECG) signal has important clinical value for diagnosing the fetal heart diseases and choosing suitable therapeutics schemes to doctors. So, the noninvasive extraction of FECG from electrocardiogram (ECG) signals becomes a hot research point. A new method, the Support Vector Machine (SVM) is utilized for the extraction of FECG with limited size of data. Firstly, the theory of the SVM and the principle of the extraction based on the SVM are studied. Secondly, the transformation of maternal electrocardiogram (MECG) component in abdominal composite signal is verified to be nonlinear and fitted with the SVM. Then, the SVM is trained, and the training results are compared with the real data to ensure the effect of the training. Meanwhile, the parameters of the SVM are optimized to achieve the best performance so that the learning machine can be utilized to fit the unknown samples. Finally, the FECG is extracted by removing the optimal estimation of MECG component from the abdominal composite signal. In order to evaluate the performance of FECG extraction based on the SVM, the Signal-to-Noise Ratio (SNR) and the visual test are used. The experimental results show that the FECG with good quality can be extracted, its SNR ratio is significantly increased as high as 9.2349 dB and the time cost is significantly decreased as short as 0.802 seconds. Compared with the traditional method, the noninvasive extraction method based on the SVM has a simple realization, the shorter treatment time and the better extraction quality under the same conditions.
Chen, Po-Hao; Zafar, Hanna; Galperin-Aizenberg, Maya; Cook, Tessa
2018-04-01
A significant volume of medical data remains unstructured. Natural language processing (NLP) and machine learning (ML) techniques have shown to successfully extract insights from radiology reports. However, the codependent effects of NLP and ML in this context have not been well-studied. Between April 1, 2015 and November 1, 2016, 9418 cross-sectional abdomen/pelvis CT and MR examinations containing our internal structured reporting element for cancer were separated into four categories: Progression, Stable Disease, Improvement, or No Cancer. We combined each of three NLP techniques with five ML algorithms to predict the assigned label using the unstructured report text and compared the performance of each combination. The three NLP algorithms included term frequency-inverse document frequency (TF-IDF), term frequency weighting (TF), and 16-bit feature hashing. The ML algorithms included logistic regression (LR), random decision forest (RDF), one-vs-all support vector machine (SVM), one-vs-all Bayes point machine (BPM), and fully connected neural network (NN). The best-performing NLP model consisted of tokenized unigrams and bigrams with TF-IDF. Increasing N-gram length yielded little to no added benefit for most ML algorithms. With all parameters optimized, SVM had the best performance on the test dataset, with 90.6 average accuracy and F score of 0.813. The interplay between ML and NLP algorithms and their effect on interpretation accuracy is complex. The best accuracy is achieved when both algorithms are optimized concurrently.
Machine learning search for variable stars
NASA Astrophysics Data System (ADS)
Pashchenko, Ilya N.; Sokolovsky, Kirill V.; Gavras, Panagiotis
2018-04-01
Photometric variability detection is often considered as a hypothesis testing problem: an object is variable if the null hypothesis that its brightness is constant can be ruled out given the measurements and their uncertainties. The practical applicability of this approach is limited by uncorrected systematic errors. We propose a new variability detection technique sensitive to a wide range of variability types while being robust to outliers and underestimated measurement uncertainties. We consider variability detection as a classification problem that can be approached with machine learning. Logistic Regression (LR), Support Vector Machines (SVM), k Nearest Neighbours (kNN), Neural Nets (NN), Random Forests (RF), and Stochastic Gradient Boosting classifier (SGB) are applied to 18 features (variability indices) quantifying scatter and/or correlation between points in a light curve. We use a subset of Optical Gravitational Lensing Experiment phase two (OGLE-II) Large Magellanic Cloud (LMC) photometry (30 265 light curves) that was searched for variability using traditional methods (168 known variable objects) as the training set and then apply the NN to a new test set of 31 798 OGLE-II LMC light curves. Among 205 candidates selected in the test set, 178 are real variables, while 13 low-amplitude variables are new discoveries. The machine learning classifiers considered are found to be more efficient (select more variables and fewer false candidates) compared to traditional techniques using individual variability indices or their linear combination. The NN, SGB, SVM, and RF show a higher efficiency compared to LR and kNN.
Design of Clinical Support Systems Using Integrated Genetic Algorithm and Support Vector Machine
NASA Astrophysics Data System (ADS)
Chen, Yung-Fu; Huang, Yung-Fa; Jiang, Xiaoyi; Hsu, Yuan-Nian; Lin, Hsuan-Hung
Clinical decision support system (CDSS) provides knowledge and specific information for clinicians to enhance diagnostic efficiency and improving healthcare quality. An appropriate CDSS can highly elevate patient safety, improve healthcare quality, and increase cost-effectiveness. Support vector machine (SVM) is believed to be superior to traditional statistical and neural network classifiers. However, it is critical to determine suitable combination of SVM parameters regarding classification performance. Genetic algorithm (GA) can find optimal solution within an acceptable time, and is faster than greedy algorithm with exhaustive searching strategy. By taking the advantage of GA in quickly selecting the salient features and adjusting SVM parameters, a method using integrated GA and SVM (IGS), which is different from the traditional method with GA used for feature selection and SVM for classification, was used to design CDSSs for prediction of successful ventilation weaning, diagnosis of patients with severe obstructive sleep apnea, and discrimination of different cell types form Pap smear. The results show that IGS is better than methods using SVM alone or linear discriminator.
Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.
Huang, Shujun; Cai, Nianguang; Pacheco, Pedro Penzuti; Narrandes, Shavira; Wang, Yang; Xu, Wayne
2018-01-01
Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. Copyright© 2018, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
Power line identification of millimeter wave radar based on PCA-GS-SVM
NASA Astrophysics Data System (ADS)
Fang, Fang; Zhang, Guifeng; Cheng, Yansheng
2017-12-01
Aiming at the problem that the existing detection method can not effectively solve the security of UAV's ultra low altitude flight caused by power line, a power line recognition method based on grid search (GS) and the principal component analysis and support vector machine (PCA-SVM) is proposed. Firstly, the candidate line of Hough transform is reduced by PCA, and the main feature of candidate line is extracted. Then, upport vector machine (SVM is) optimized by grid search method (GS). Finally, using support vector machine classifier optimized parameters to classify the candidate line. MATLAB simulation results show that this method can effectively identify the power line and noise, and has high recognition accuracy and algorithm efficiency.
NASA Astrophysics Data System (ADS)
Chen, Wei; Pourghasemi, Hamid Reza; Panahi, Mahdi; Kornejady, Aiding; Wang, Jiale; Xie, Xiaoshen; Cao, Shubo
2017-11-01
The spatial prediction of landslide susceptibility is an important prerequisite for the analysis of landslide hazards and risks in any area. This research uses three data mining techniques, such as an adaptive neuro-fuzzy inference system combined with frequency ratio (ANFIS-FR), a generalized additive model (GAM), and a support vector machine (SVM), for landslide susceptibility mapping in Hanyuan County, China. In the first step, in accordance with a review of the previous literature, twelve conditioning factors, including slope aspect, altitude, slope angle, topographic wetness index (TWI), plan curvature, profile curvature, distance to rivers, distance to faults, distance to roads, land use, normalized difference vegetation index (NDVI), and lithology, were selected. In the second step, a collinearity test and correlation analysis between the conditioning factors and landslides were applied. In the third step, we used three advanced methods, namely, ANFIS-FR, GAM, and SVM, for landslide susceptibility modeling. Subsequently, the results of their accuracy were validated using a receiver operating characteristic curve. The results showed that all three models have good prediction capabilities, while the SVM model has the highest prediction rate of 0.875, followed by the ANFIS-FR and GAM models with prediction rates of 0.851 and 0.846, respectively. Thus, the landslide susceptibility maps produced in the study area can be applied for management of hazards and risks in landslide-prone Hanyuan County.
Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng
2013-01-01
In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR. PMID:23536777
Time-Frequency Learning Machines for Nonstationarity Detection Using Surrogates
NASA Astrophysics Data System (ADS)
Borgnat, Pierre; Flandrin, Patrick; Richard, Cédric; Ferrari, André; Amoud, Hassan; Honeine, Paul
2012-03-01
Time-frequency representations provide a powerful tool for nonstationary signal analysis and classification, supporting a wide range of applications [12]. As opposed to conventional Fourier analysis, these techniques reveal the evolution in time of the spectral content of signals. In Ref. [7,38], time-frequency analysis is used to test stationarity of any signal. The proposed method consists of a comparison between global and local time-frequency features. The originality is to make use of a family of stationary surrogate signals for defining the null hypothesis of stationarity and, based upon this information, to derive statistical tests. An open question remains, however, about how to choose relevant time-frequency features. Over the last decade, a number of new pattern recognition methods based on reproducing kernels have been introduced. These learning machines have gained popularity due to their conceptual simplicity and their outstanding performance [30]. Initiated by Vapnik’s support vector machines (SVM) [35], they offer now a wide class of supervised and unsupervised learning algorithms. In Ref. [17-19], the authors have shown how the most effective and innovative learning machines can be tuned to operate in the time-frequency domain. This chapter follows this line of research by taking advantage of learning machines to test and quantify stationarity. Based on one-class SVM, our approach uses the entire time-frequency representation and does not require arbitrary feature extraction. Applied to a set of surrogates, it provides the domain boundary that includes most of these stationarized signals. This allows us to test the stationarity of the signal under investigation. This chapter is organized as follows. In Section 22.2, we introduce the surrogate data method to generate stationarized signals, namely, the null hypothesis of stationarity. The concept of time-frequency learning machines is presented in Section 22.3, and applied to one-class SVM in order to derive a stationarity test in Section 22.4. The relevance of the latter is illustrated by simulation results in Section 22.5.
Using evolutionary computation to optimize an SVM used in detecting buried objects in FLIR imagery
NASA Astrophysics Data System (ADS)
Paino, Alex; Popescu, Mihail; Keller, James M.; Stone, Kevin
2013-06-01
In this paper we describe an approach for optimizing the parameters of a Support Vector Machine (SVM) as part of an algorithm used to detect buried objects in forward looking infrared (FLIR) imagery captured by a camera installed on a moving vehicle. The overall algorithm consists of a spot-finding procedure (to look for potential targets) followed by the extraction of several features from the neighborhood of each spot. The features include local binary pattern (LBP) and histogram of oriented gradients (HOG) as these are good at detecting texture classes. Finally, we project and sum each hit into UTM space along with its confidence value (obtained from the SVM), producing a confidence map for ROC analysis. In this work, we use an Evolutionary Computation Algorithm (ECA) to optimize various parameters involved in the system, such as the combination of features used, parameters on the Canny edge detector, the SVM kernel, and various HOG and LBP parameters. To validate our approach, we compare results obtained from an SVM using parameters obtained through our ECA technique with those previously selected by hand through several iterations of "guess and check".
Support vector machine as a binary classifier for automated object detection in remotely sensed data
NASA Astrophysics Data System (ADS)
Wardaya, P. D.
2014-02-01
In the present paper, author proposes the application of Support Vector Machine (SVM) for the analysis of satellite imagery. One of the advantages of SVM is that, with limited training data, it may generate comparable or even better results than the other methods. The SVM algorithm is used for automated object detection and characterization. Specifically, the SVM is applied in its basic nature as a binary classifier where it classifies two classes namely, object and background. The algorithm aims at effectively detecting an object from its background with the minimum training data. The synthetic image containing noises is used for algorithm testing. Furthermore, it is implemented to perform remote sensing image analysis such as identification of Island vegetation, water body, and oil spill from the satellite imagery. It is indicated that SVM provides the fast and accurate analysis with the acceptable result.
a Gsa-Svm Hybrid System for Classification of Binary Problems
NASA Astrophysics Data System (ADS)
Sarafrazi, Soroor; Nezamabadi-pour, Hossein; Barahman, Mojgan
2011-06-01
This paperhybridizesgravitational search algorithm (GSA) with support vector machine (SVM) and made a novel GSA-SVM hybrid system to improve the classification accuracy in binary problems. GSA is an optimization heuristic toolused to optimize the value of SVM kernel parameter (in this paper, radial basis function (RBF) is chosen as the kernel function). The experimental results show that this newapproach can achieve high classification accuracy and is comparable to or better than the particle swarm optimization (PSO)-SVM and genetic algorithm (GA)-SVM, which are two hybrid systems for classification.
Takei, Takaaki; Ikeda, Mitsuru; Imai, Kuniharu; Yamauchi-Kawaura, Chiyo; Kato, Katsuhiko; Isoda, Haruo
2013-09-01
The automated contrast-detail (C-D) analysis methods developed so-far cannot be expected to work well on images processed with nonlinear methods, such as noise reduction methods. Therefore, we have devised a new automated C-D analysis method by applying support vector machine (SVM), and tested for its robustness to nonlinear image processing. We acquired the CDRAD (a commercially available C-D test object) images at a tube voltage of 120 kV and a milliampere-second product (mAs) of 0.5-5.0. A partial diffusion equation based technique was used as noise reduction method. Three radiologists and three university students participated in the observer performance study. The training data for our SVM method was the classification data scored by the one radiologist for the CDRAD images acquired at 1.6 and 3.2 mAs and their noise-reduced images. We also compared the performance of our SVM method with the CDRAD Analyser algorithm. The mean C-D diagrams (that is a plot of the mean of the smallest visible hole diameter vs. hole depth) obtained from our devised SVM method agreed well with the ones averaged across the six human observers for both original and noise-reduced CDRAD images, whereas the mean C-D diagrams from the CDRAD Analyser algorithm disagreed with the ones from the human observers for both original and noise-reduced CDRAD images. In conclusion, our proposed SVM method for C-D analysis will work well for the images processed with the non-linear noise reduction method as well as for the original radiographic images.
2014-01-01
Background Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. Results The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Conclusion Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database. PMID:24970564
Palaniappan, Rajkumar; Sundaraj, Kenneth; Sundaraj, Sebastian
2014-06-27
Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information for the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a differential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for patients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and K-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database. The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The pulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction pathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the pre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately into the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique. The statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are significantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19% and 98.26%, respectively. Although the data used to train and test the classifiers are limited, the classification accuracies found are satisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals from pathological and normal subjects obtained from the RALE database.
Hayat, Maqsood; Tahir, Muhammad
2015-08-01
Membrane protein is a central component of the cell that manages intra and extracellular processes. Membrane proteins execute a diversity of functions that are vital for the survival of organisms. The topology of transmembrane proteins describes the number of transmembrane (TM) helix segments and its orientation. However, owing to the lack of its recognized structures, the identification of TM helix and its topology through experimental methods is laborious with low throughput. In order to identify TM helix segments reliably, accurately, and effectively from topogenic sequences, we propose the PSOFuzzySVM-TMH model. In this model, evolutionary based information position specific scoring matrix and discrete based information 6-letter exchange group are used to formulate transmembrane protein sequences. The noisy and extraneous attributes are eradicated using an optimization selection technique, particle swarm optimization, from both feature spaces. Finally, the selected feature spaces are combined in order to form ensemble feature space. Fuzzy-support vector Machine is utilized as a classification algorithm. Two benchmark datasets, including low and high resolution datasets, are used. At various levels, the performance of the PSOFuzzySVM-TMH model is assessed through 10-fold cross validation test. The empirical results reveal that the proposed framework PSOFuzzySVM-TMH outperforms in terms of classification performance in the examined datasets. It is ascertained that the proposed model might be a useful and high throughput tool for academia and research community for further structure and functional studies on transmembrane proteins.
Dolz, Jose; Laprie, Anne; Ken, Soléakhéna; Leroy, Henri-Arthur; Reyns, Nicolas; Massoptier, Laurent; Vermandel, Maximilien
2016-01-01
To constrain the risk of severe toxicity in radiotherapy and radiosurgery, precise volume delineation of organs at risk is required. This task is still manually performed, which is time-consuming and prone to observer variability. To address these issues, and as alternative to atlas-based segmentation methods, machine learning techniques, such as support vector machines (SVM), have been recently presented to segment subcortical structures on magnetic resonance images (MRI). SVM is proposed to segment the brainstem on MRI in multicenter brain cancer context. A dataset composed by 14 adult brain MRI scans is used to evaluate its performance. In addition to spatial and probabilistic information, five different image intensity values (IIVs) configurations are evaluated as features to train the SVM classifier. Segmentation accuracy is evaluated by computing the Dice similarity coefficient (DSC), absolute volumes difference (AVD) and percentage volume difference between automatic and manual contours. Mean DSC for all proposed IIVs configurations ranged from 0.89 to 0.90. Mean AVD values were below 1.5 cm(3), where the value for best performing IIVs configuration was 0.85 cm(3), representing an absolute mean difference of 3.99% with respect to the manual segmented volumes. Results suggest consistent volume estimation and high spatial similarity with respect to expert delineations. The proposed approach outperformed presented methods to segment the brainstem, not only in volume similarity metrics, but also in segmentation time. Preliminary results showed that the approach might be promising for adoption in clinical use.
A SVM-based method for sentiment analysis in Persian language
NASA Astrophysics Data System (ADS)
Hajmohammadi, Mohammad Sadegh; Ibrahim, Roliana
2013-03-01
Persian language is the official language of Iran, Tajikistan and Afghanistan. Local online users often represent their opinions and experiences on the web with written Persian. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product. In this paper, standard machine learning techniques SVM and naive Bayes are incorporated into the domain of online Persian Movie reviews to automatically classify user reviews as positive or negative and performance of these two classifiers is compared with each other in this language. The effects of feature presentations on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The SVM classifier achieves as well as or better accuracy than naive Bayes in Persian movie. Unigrams are proved better features than bigrams and trigrams in capturing Persian sentiment orientation.
An expert support system for breast cancer diagnosis using color wavelet features.
Issac Niwas, S; Palanisamy, P; Chibbar, Rajni; Zhang, W J
2012-10-01
Breast cancer diagnosis can be done through the pathologic assessments of breast tissue samples such as core needle biopsy technique. The result of analysis on this sample by pathologist is crucial for breast cancer patient. In this paper, nucleus of tissue samples are investigated after decomposition by means of the Log-Gabor wavelet on HSV color domain and an algorithm is developed to compute the color wavelet features. These features are used for breast cancer diagnosis using Support Vector Machine (SVM) classifier algorithm. The ability of properly trained SVM is to correctly classify patterns and make them particularly suitable for use in an expert system that aids in the diagnosis of cancer tissue samples. The results are compared with other multivariate classifiers such as Naïves Bayes classifier and Artificial Neural Network. The overall accuracy of the proposed method using SVM classifier will be further useful for automation in cancer diagnosis.
Accuracy comparison among different machine learning techniques for detecting malicious codes
NASA Astrophysics Data System (ADS)
Narang, Komal
2016-03-01
In this paper, a machine learning based model for malware detection is proposed. It can detect newly released malware i.e. zero day attack by analyzing operation codes on Android operating system. The accuracy of Naïve Bayes, Support Vector Machine (SVM) and Neural Network for detecting malicious code has been compared for the proposed model. In the experiment 400 benign files, 100 system files and 500 malicious files have been used to construct the model. The model yields the best accuracy 88.9% when neural network is used as classifier and achieved 95% and 82.8% accuracy for sensitivity and specificity respectively.
van der Ploeg, Tjeerd; Austin, Peter C; Steyerberg, Ewout W
2014-12-22
Modern modelling techniques may potentially provide more accurate predictions of binary outcomes than classical techniques. We aimed to study the predictive performance of different modelling techniques in relation to the effective sample size ("data hungriness"). We performed simulation studies based on three clinical cohorts: 1282 patients with head and neck cancer (with 46.9% 5 year survival), 1731 patients with traumatic brain injury (22.3% 6 month mortality) and 3181 patients with minor head injury (7.6% with CT scan abnormalities). We compared three relatively modern modelling techniques: support vector machines (SVM), neural nets (NN), and random forests (RF) and two classical techniques: logistic regression (LR) and classification and regression trees (CART). We created three large artificial databases with 20 fold, 10 fold and 6 fold replication of subjects, where we generated dichotomous outcomes according to different underlying models. We applied each modelling technique to increasingly larger development parts (100 repetitions). The area under the ROC-curve (AUC) indicated the performance of each model in the development part and in an independent validation part. Data hungriness was defined by plateauing of AUC and small optimism (difference between the mean apparent AUC and the mean validated AUC <0.01). We found that a stable AUC was reached by LR at approximately 20 to 50 events per variable, followed by CART, SVM, NN and RF models. Optimism decreased with increasing sample sizes and the same ranking of techniques. The RF, SVM and NN models showed instability and a high optimism even with >200 events per variable. Modern modelling techniques such as SVM, NN and RF may need over 10 times as many events per variable to achieve a stable AUC and a small optimism than classical modelling techniques such as LR. This implies that such modern techniques should only be used in medical prediction problems if very large data sets are available.
Yilmaz, Tuba; Kılıç, Mahmut Alp; Erdoğan, Melike; Çayören, Mehmet; Tunaoğlu, Doruk; Kurtoğlu, İsmail; Yaslan, Yusuf; Çayören, Hüseyin; Arkan, Akif Enes; Teksöz, Serkan; Cancan, Gülden; Kepil, Nuray; Erdamar, Sibel; Özcan, Murat; Akduman, İbrahim; Kalkan, Tunaya
2016-06-20
In the past decade, extensive research on dielectric properties of biological tissues led to characterization of dielectric property discrepancy between the malignant and healthy tissues. Such discrepancy enabled the development of microwave therapeutic and diagnostic technologies. Traditionally, dielectric property measurements of biological tissues is performed with the well-known contact probe (open-ended coaxial probe) technique. However, the technique suffers from limited accuracy and low loss resolution for permittivity and conductivity measurements, respectively. Therefore, despite the inherent dielectric property discrepancy, a rigorous measurement routine with open-ended coaxial probes is required for accurate differentiation of malignant and healthy tissues. In this paper, we propose to eliminate the need for multiple measurements with open-ended coaxial probe for malignant and healthy tissue differentiation by applying support vector machine (SVM) classification algorithm to the dielectric measurement data. To do so, first, in vivo malignant and healthy rat liver tissue dielectric property measurements are collected with open-ended coaxial probe technique between 500 MHz to 6 GHz. Cole-Cole functions are fitted to the measured dielectric properties and measurement data is verified with the literature. Malign tissue classification is realized by applying SVM to the open-ended coaxial probe measurements where as high as 99.2% accuracy (F1 Score) is obtained.
Automated Tissue Classification Framework for Reproducible Chronic Wound Assessment
Mukherjee, Rashmi; Manohar, Dhiraj Dhane; Das, Dev Kumar; Achar, Arun; Mitra, Analava; Chakraborty, Chandan
2014-01-01
The aim of this paper was to develop a computer assisted tissue classification (granulation, necrotic, and slough) scheme for chronic wound (CW) evaluation using medical image processing and statistical machine learning techniques. The red-green-blue (RGB) wound images grabbed by normal digital camera were first transformed into HSI (hue, saturation, and intensity) color space and subsequently the “S” component of HSI color channels was selected as it provided higher contrast. Wound areas from 6 different types of CW were segmented from whole images using fuzzy divergence based thresholding by minimizing edge ambiguity. A set of color and textural features describing granulation, necrotic, and slough tissues in the segmented wound area were extracted using various mathematical techniques. Finally, statistical learning algorithms, namely, Bayesian classification and support vector machine (SVM), were trained and tested for wound tissue classification in different CW images. The performance of the wound area segmentation protocol was further validated by ground truth images labeled by clinical experts. It was observed that SVM with 3rd order polynomial kernel provided the highest accuracies, that is, 86.94%, 90.47%, and 75.53%, for classifying granulation, slough, and necrotic tissues, respectively. The proposed automated tissue classification technique achieved the highest overall accuracy, that is, 87.61%, with highest kappa statistic value (0.793). PMID:25114925
NASA Astrophysics Data System (ADS)
Adhi Pradana, Wisnu; Adiwijaya; Novia Wisesty, Untari
2018-03-01
Support Vector Machine or commonly called SVM is one method that can be used to process the classification of a data. SVM classifies data from 2 different classes with hyperplane. In this study, the system was built using SVM to develop Arabic Speech Recognition. In the development of the system, there are 2 kinds of speakers that have been tested that is dependent speakers and independent speakers. The results from this system is an accuracy of 85.32% for speaker dependent and 61.16% for independent speakers.
Artan, Yusuf; Haider, Masoom A; Langer, Deanna L; van der Kwast, Theodorus H; Evans, Andrew J; Yang, Yongyi; Wernick, Miles N; Trachtenberg, John; Yetik, Imam Samil
2010-09-01
Prostate cancer is a leading cause of cancer death for men in the United States. Fortunately, the survival rate for early diagnosed patients is relatively high. Therefore, in vivo imaging plays an important role for the detection and treatment of the disease. Accurate prostate cancer localization with noninvasive imaging can be used to guide biopsy, radiotherapy, and surgery as well as to monitor disease progression. Magnetic resonance imaging (MRI) performed with an endorectal coil provides higher prostate cancer localization accuracy, when compared to transrectal ultrasound (TRUS). However, in general, a single type of MRI is not sufficient for reliable tumor localization. As an alternative, multispectral MRI, i.e., the use of multiple MRI-derived datasets, has emerged as a promising noninvasive imaging technique for the localization of prostate cancer; however almost all studies are with human readers. There is a significant inter and intraobserver variability for human readers, and it is substantially difficult for humans to analyze the large dataset of multispectral MRI. To solve these problems, this study presents an automated localization method using cost-sensitive support vector machines (SVMs) and shows that this method results in improved localization accuracy than classical SVM. Additionally, we develop a new segmentation method by combining conditional random fields (CRF) with a cost-sensitive framework and show that our method further improves cost-sensitive SVM results by incorporating spatial information. We test SVM, cost-sensitive SVM, and the proposed cost-sensitive CRF on multispectral MRI datasets acquired from 21 biopsy-confirmed cancer patients. Our results show that multispectral MRI helps to increase the accuracy of prostate cancer localization when compared to single MR images; and that using advanced methods such as cost-sensitive SVM as well as the proposed cost-sensitive CRF can boost the performance significantly when compared to SVM.
NASA Astrophysics Data System (ADS)
Karsi, Redouane; Zaim, Mounia; El Alami, Jamila
2017-07-01
Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.
Pizarro, Ricardo A; Cheng, Xi; Barnett, Alan; Lemaitre, Herve; Verchinski, Beth A; Goldman, Aaron L; Xiao, Ena; Luo, Qian; Berman, Karen F; Callicott, Joseph H; Weinberger, Daniel R; Mattay, Venkata S
2016-01-01
High-resolution three-dimensional magnetic resonance imaging (3D-MRI) is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM) algorithm in the quality assessment of structural brain images, using global and region of interest (ROI) automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy) of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.
Intelligent Gearbox Diagnosis Methods Based on SVM, Wavelet Lifting and RBR
Gao, Lixin; Ren, Zhiqiang; Tang, Wenliang; Wang, Huaqing; Chen, Peng
2010-01-01
Given the problems in intelligent gearbox diagnosis methods, it is difficult to obtain the desired information and a large enough sample size to study; therefore, we propose the application of various methods for gearbox fault diagnosis, including wavelet lifting, a support vector machine (SVM) and rule-based reasoning (RBR). In a complex field environment, it is less likely for machines to have the same fault; moreover, the fault features can also vary. Therefore, a SVM could be used for the initial diagnosis. First, gearbox vibration signals were processed with wavelet packet decomposition, and the signal energy coefficients of each frequency band were extracted and used as input feature vectors in SVM for normal and faulty pattern recognition. Second, precision analysis using wavelet lifting could successfully filter out the noisy signals while maintaining the impulse characteristics of the fault; thus effectively extracting the fault frequency of the machine. Lastly, the knowledge base was built based on the field rules summarized by experts to identify the detailed fault type. Results have shown that SVM is a powerful tool to accomplish gearbox fault pattern recognition when the sample size is small, whereas the wavelet lifting scheme can effectively extract fault features, and rule-based reasoning can be used to identify the detailed fault type. Therefore, a method that combines SVM, wavelet lifting and rule-based reasoning ensures effective gearbox fault diagnosis. PMID:22399894
Intelligent gearbox diagnosis methods based on SVM, wavelet lifting and RBR.
Gao, Lixin; Ren, Zhiqiang; Tang, Wenliang; Wang, Huaqing; Chen, Peng
2010-01-01
Given the problems in intelligent gearbox diagnosis methods, it is difficult to obtain the desired information and a large enough sample size to study; therefore, we propose the application of various methods for gearbox fault diagnosis, including wavelet lifting, a support vector machine (SVM) and rule-based reasoning (RBR). In a complex field environment, it is less likely for machines to have the same fault; moreover, the fault features can also vary. Therefore, a SVM could be used for the initial diagnosis. First, gearbox vibration signals were processed with wavelet packet decomposition, and the signal energy coefficients of each frequency band were extracted and used as input feature vectors in SVM for normal and faulty pattern recognition. Second, precision analysis using wavelet lifting could successfully filter out the noisy signals while maintaining the impulse characteristics of the fault; thus effectively extracting the fault frequency of the machine. Lastly, the knowledge base was built based on the field rules summarized by experts to identify the detailed fault type. Results have shown that SVM is a powerful tool to accomplish gearbox fault pattern recognition when the sample size is small, whereas the wavelet lifting scheme can effectively extract fault features, and rule-based reasoning can be used to identify the detailed fault type. Therefore, a method that combines SVM, wavelet lifting and rule-based reasoning ensures effective gearbox fault diagnosis.
NASA Astrophysics Data System (ADS)
Ahmad, J. A.; Forman, B. A.
2017-12-01
High Mountain Asia (HMA) serves as a water supply source for over 1.3 billion people, primarily in south-east Asia. Most of this water originates as snow (or ice) that melts during the summer months and contributes to the run-off downstream. In spite of its critical role, there is still considerable uncertainty regarding the total amount of snow in HMA and its spatial and temporal variation. In this study, the NASA Land Information Systems (LIS) is used to model the hydrologic cycle over the Indus basin. In addition, the ability of support vector machines (SVM), a machine learning technique, to predict passive microwave brightness temperatures at a specific frequency and polarization as a function of LIS-derived land surface model output is explored in a sensitivity analysis. Multi-frequency, multi-polarization passive microwave brightness temperatures as measured by the Advanced Microwave Scanning Radiometer - Earth Observing System (AMSR-E) over the Indus basin are used as training targets during the SVM training process. Normalized sensitivity coefficients (NSC) are then computed to assess the sensitivity of a well-trained SVM to each LIS-derived state variable. Preliminary results conform with the known first-order physics. For example, input states directly linked to physical temperature like snow temperature, air temperature, and vegetation temperature have positive NSC's whereas input states that increase volume scattering such as snow water equivalent or snow density yield negative NSC's. Air temperature exhibits the largest sensitivity coefficients due to its inherent, high-frequency variability. Adherence of this machine learning algorithm to the first-order physics bodes well for its potential use in LIS as the observation operator within a radiance data assimilation system aimed at improving regional- and continental-scale snow estimates.
Support vector machines for nuclear reactor state estimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zavaljevski, N.; Gross, K. C.
2000-02-14
Validation of nuclear power reactor signals is often performed by comparing signal prototypes with the actual reactor signals. The signal prototypes are often computed based on empirical data. The implementation of an estimation algorithm which can make predictions on limited data is an important issue. A new machine learning algorithm called support vector machines (SVMS) recently developed by Vladimir Vapnik and his coworkers enables a high level of generalization with finite high-dimensional data. The improved generalization in comparison with standard methods like neural networks is due mainly to the following characteristics of the method. The input data space is transformedmore » into a high-dimensional feature space using a kernel function, and the learning problem is formulated as a convex quadratic programming problem with a unique solution. In this paper the authors have applied the SVM method for data-based state estimation in nuclear power reactors. In particular, they implemented and tested kernels developed at Argonne National Laboratory for the Multivariate State Estimation Technique (MSET), a nonlinear, nonparametric estimation technique with a wide range of applications in nuclear reactors. The methodology has been applied to three data sets from experimental and commercial nuclear power reactor applications. The results are promising. The combination of MSET kernels with the SVM method has better noise reduction and generalization properties than the standard MSET algorithm.« less
PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants.
Vieira, Lucas Maciel; Grativol, Clicia; Thiebaut, Flavia; Carvalho, Thais G; Hardoim, Pablo R; Hemerly, Adriana; Lifschitz, Sergio; Ferreira, Paulo Cavalcanti Gomes; Walter, Maria Emilia M T
2017-03-04
Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane ( Saccharum spp.) and in maize ( Zea mays ). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.
PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants
Vieira, Lucas Maciel; Grativol, Clicia; Thiebaut, Flavia; Carvalho, Thais G.; Hardoim, Pablo R.; Hemerly, Adriana; Lifschitz, Sergio; Ferreira, Paulo Cavalcanti Gomes; Walter, Maria Emilia M. T.
2017-01-01
Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane (Saccharum spp.) and in maize (Zea mays). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms. PMID:29657283
Pulmonary Nodule Recognition Based on Multiple Kernel Learning Support Vector Machine-PSO
Zhu, Zhichuan; Zhao, Qingdong; Liu, Liwei; Zhang, Lijuan
2018-01-01
Pulmonary nodule recognition is the core module of lung CAD. The Support Vector Machine (SVM) algorithm has been widely used in pulmonary nodule recognition, and the algorithm of Multiple Kernel Learning Support Vector Machine (MKL-SVM) has achieved good results therein. Based on grid search, however, the MKL-SVM algorithm needs long optimization time in course of parameter optimization; also its identification accuracy depends on the fineness of grid. In the paper, swarm intelligence is introduced and the Particle Swarm Optimization (PSO) is combined with MKL-SVM algorithm to be MKL-SVM-PSO algorithm so as to realize global optimization of parameters rapidly. In order to obtain the global optimal solution, different inertia weights such as constant inertia weight, linear inertia weight, and nonlinear inertia weight are applied to pulmonary nodules recognition. The experimental results show that the model training time of the proposed MKL-SVM-PSO algorithm is only 1/7 of the training time of the MKL-SVM grid search algorithm, achieving better recognition effect. Moreover, Euclidean norm of normalized error vector is proposed to measure the proximity between the average fitness curve and the optimal fitness curve after convergence. Through statistical analysis of the average of 20 times operation results with different inertial weights, it can be seen that the dynamic inertial weight is superior to the constant inertia weight in the MKL-SVM-PSO algorithm. In the dynamic inertial weight algorithm, the parameter optimization time of nonlinear inertia weight is shorter; the average fitness value after convergence is much closer to the optimal fitness value, which is better than the linear inertial weight. Besides, a better nonlinear inertial weight is verified. PMID:29853983
Pulmonary Nodule Recognition Based on Multiple Kernel Learning Support Vector Machine-PSO.
Li, Yang; Zhu, Zhichuan; Hou, Alin; Zhao, Qingdong; Liu, Liwei; Zhang, Lijuan
2018-01-01
Pulmonary nodule recognition is the core module of lung CAD. The Support Vector Machine (SVM) algorithm has been widely used in pulmonary nodule recognition, and the algorithm of Multiple Kernel Learning Support Vector Machine (MKL-SVM) has achieved good results therein. Based on grid search, however, the MKL-SVM algorithm needs long optimization time in course of parameter optimization; also its identification accuracy depends on the fineness of grid. In the paper, swarm intelligence is introduced and the Particle Swarm Optimization (PSO) is combined with MKL-SVM algorithm to be MKL-SVM-PSO algorithm so as to realize global optimization of parameters rapidly. In order to obtain the global optimal solution, different inertia weights such as constant inertia weight, linear inertia weight, and nonlinear inertia weight are applied to pulmonary nodules recognition. The experimental results show that the model training time of the proposed MKL-SVM-PSO algorithm is only 1/7 of the training time of the MKL-SVM grid search algorithm, achieving better recognition effect. Moreover, Euclidean norm of normalized error vector is proposed to measure the proximity between the average fitness curve and the optimal fitness curve after convergence. Through statistical analysis of the average of 20 times operation results with different inertial weights, it can be seen that the dynamic inertial weight is superior to the constant inertia weight in the MKL-SVM-PSO algorithm. In the dynamic inertial weight algorithm, the parameter optimization time of nonlinear inertia weight is shorter; the average fitness value after convergence is much closer to the optimal fitness value, which is better than the linear inertial weight. Besides, a better nonlinear inertial weight is verified.
Binding Affinity prediction with Property Encoded Shape Distribution signatures
Das, Sourav; Krein, Michael P.
2010-01-01
We report the use of the molecular signatures known as “Property-Encoded Shape Distributions” (PESD) together with standard Support Vector Machine (SVM) techniques to produce validated models that can predict the binding affinity of a large number of protein ligand complexes. This “PESD-SVM” method uses PESD signatures that encode molecular shapes and property distributions on protein and ligand surfaces as features to build SVM models that require no subjective feature selection. A simple protocol was employed for tuning the SVM models during their development, and the results were compared to SFCscore – a regression-based method that was previously shown to perform better than 14 other scoring functions. Although the PESD-SVM method is based on only two surface property maps, the overall results were comparable. For most complexes with a dominant enthalpic contribution to binding (ΔH/-TΔS > 3), a good correlation between true and predicted affinities was observed. Entropy and solvent were not considered in the present approach and further improvement in accuracy would require accounting for these components rigorously. PMID:20095526
A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease
NASA Astrophysics Data System (ADS)
Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas
2017-08-01
The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.
Multi-view L2-SVM and its multi-view core vector machine.
Huang, Chengquan; Chung, Fu-lai; Wang, Shitong
2016-03-01
In this paper, a novel L2-SVM based classifier Multi-view L2-SVM is proposed to address multi-view classification tasks. The proposed Multi-view L2-SVM classifier does not have any bias in its objective function and hence has the flexibility like μ-SVC in the sense that the number of the yielded support vectors can be controlled by a pre-specified parameter. The proposed Multi-view L2-SVM classifier can make full use of the coherence and the difference of different views through imposing the consensus among multiple views to improve the overall classification performance. Besides, based on the generalized core vector machine GCVM, the proposed Multi-view L2-SVM classifier is extended into its GCVM version MvCVM which can realize its fast training on large scale multi-view datasets, with its asymptotic linear time complexity with the sample size and its space complexity independent of the sample size. Our experimental results demonstrated the effectiveness of the proposed Multi-view L2-SVM classifier for small scale multi-view datasets and the proposed MvCVM classifier for large scale multi-view datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Song, Shuaiwen; Fu, Haohuan
2014-08-16
Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. To address the challenges above, we designed and implemented MICSVM, a highly efficient parallel SVM for x86 based multi-core and many core architectures,more » such as the Intel Ivy Bridge CPUs and Intel Xeon Phi coprocessor (MIC).« less
Structural analysis of online handwritten mathematical symbols based on support vector machines
NASA Astrophysics Data System (ADS)
Simistira, Foteini; Papavassiliou, Vassilis; Katsouros, Vassilis; Carayannis, George
2013-01-01
Mathematical expression recognition is still a very challenging task for the research community mainly because of the two-dimensional (2d) structure of mathematical expressions (MEs). In this paper, we present a novel approach for the structural analysis between two on-line handwritten mathematical symbols of a ME, based on spatial features of the symbols. We introduce six features to represent the spatial affinity of the symbols and compare two multi-class classification methods that employ support vector machines (SVMs): one based on the "one-against-one" technique and one based on the "one-against-all", in identifying the relation between a pair of symbols (i.e. subscript, numerator, etc). A dataset containing 1906 spatial relations derived from the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) 2012 training dataset is constructed to evaluate the classifiers and compare them with the rule-based classifier of the ILSP-1 system participated in the contest. The experimental results give an overall mean error rate of 2.61% for the "one-against-one" SVM approach, 6.57% for the "one-against-all" SVM technique and 12.31% error rate for the ILSP-1 classifier.
SELF-BLM: Prediction of drug-target interactions via self-training SVM.
Keum, Jongsoo; Nam, Hojung
2017-01-01
Predicting drug-target interactions is important for the development of novel drugs and the repositioning of drugs. To predict such interactions, there are a number of methods based on drug and target protein similarity. Although these methods, such as the bipartite local model (BLM), show promise, they often categorize unknown interactions as negative interaction. Therefore, these methods are not ideal for finding potential drug-target interactions that have not yet been validated as positive interactions. Thus, here we propose a method that integrates machine learning techniques, such as self-training support vector machine (SVM) and BLM, to develop a self-training bipartite local model (SELF-BLM) that facilitates the identification of potential interactions. The method first categorizes unlabeled interactions and negative interactions among unknown interactions using a clustering method. Then, using the BLM method and self-training SVM, the unlabeled interactions are self-trained and final local classification models are constructed. When applied to four classes of proteins that include enzymes, G-protein coupled receptors (GPCRs), ion channels, and nuclear receptors, SELF-BLM showed the best performance for predicting not only known interactions but also potential interactions in three protein classes compare to other related studies. The implemented software and supporting data are available at https://github.com/GIST-CSBL/SELF-BLM.
NASA Astrophysics Data System (ADS)
Liu, Fei; He, Yong
2008-02-01
Visible and near infrared (Vis/NIR) transmission spectroscopy and chemometric methods were utilized to predict the pH values of cola beverages. Five varieties of cola were prepared and 225 samples (45 samples for each variety) were selected for the calibration set, while 75 samples (15 samples for each variety) for the validation set. The smoothing way of Savitzky-Golay and standard normal variate (SNV) followed by first-derivative were used as the pre-processing methods. Partial least squares (PLS) analysis was employed to extract the principal components (PCs) which were used as the inputs of least squares-support vector machine (LS-SVM) model according to their accumulative reliabilities. Then LS-SVM with radial basis function (RBF) kernel function and a two-step grid search technique were applied to build the regression model with a comparison of PLS regression. The correlation coefficient (r), root mean square error of prediction (RMSEP) and bias were 0.961, 0.040 and 0.012 for PLS, while 0.975, 0.031 and 4.697x10 -3 for LS-SVM, respectively. Both methods obtained a satisfying precision. The results indicated that Vis/NIR spectroscopy combined with chemometric methods could be applied as an alternative way for the prediction of pH of cola beverages.
Wissel, Tobias; Pfeiffer, Tim; Frysch, Robert; Knight, Robert T.; Chang, Edward F.; Hinrichs, Hermann; Rieger, Jochem W.; Rose, Georg
2013-01-01
Objective Support Vector Machines (SVM) have developed into a gold standard for accurate classification in Brain-Computer-Interfaces (BCI). The choice of the most appropriate classifier for a particular application depends on several characteristics in addition to decoding accuracy. Here we investigate the implementation of Hidden Markov Models (HMM)for online BCIs and discuss strategies to improve their performance. Approach We compare the SVM, serving as a reference, and HMMs for classifying discrete finger movements obtained from the Electrocorticograms of four subjects doing a finger tapping experiment. The classifier decisions are based on a subset of low-frequency time domain and high gamma oscillation features. Main results We show that decoding optimization between the two approaches is due to the way features are extracted and selected and less dependent on the classifier. An additional gain in HMM performance of up to 6% was obtained by introducing model constraints. Comparable accuracies of up to 90% were achieved with both SVM and HMM with the high gamma cortical response providing the most important decoding information for both techniques. Significance We discuss technical HMM characteristics and adaptations in the context of the presented data as well as for general BCI applications. Our findings suggest that HMMs and their characteristics are promising for efficient online brain-computer interfaces. PMID:24045504
Source localization in an ocean waveguide using supervised machine learning.
Niu, Haiqiang; Reeves, Emma; Gerstoft, Peter
2017-09-01
Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.
Online Least Squares One-Class Support Vector Machines-Based Abnormal Visual Event Detection
Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem
2013-01-01
The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method. PMID:24351629
Online least squares one-class support vector machines-based abnormal visual event detection.
Wang, Tian; Chen, Jie; Zhou, Yi; Snoussi, Hichem
2013-12-12
The abnormal event detection problem is an important subject in real-time video surveillance. In this paper, we propose a novel online one-class classification algorithm, online least squares one-class support vector machine (online LS-OC-SVM), combined with its sparsified version (sparse online LS-OC-SVM). LS-OC-SVM extracts a hyperplane as an optimal description of training objects in a regularized least squares sense. The online LS-OC-SVM learns a training set with a limited number of samples to provide a basic normal model, then updates the model through remaining data. In the sparse online scheme, the model complexity is controlled by the coherence criterion. The online LS-OC-SVM is adopted to handle the abnormal event detection problem. Each frame of the video is characterized by the covariance matrix descriptor encoding the moving information, then is classified into a normal or an abnormal frame. Experiments are conducted, on a two-dimensional synthetic distribution dataset and a benchmark video surveillance dataset, to demonstrate the promising results of the proposed online LS-OC-SVM method.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.
Becker, Natalia; Toedt, Grischa; Lichter, Peter; Benner, Axel
2011-05-09
Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data
2011-01-01
Background Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net. We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone. Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Results Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error. Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. Conclusions The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters. The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'. We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets. PMID:21554689
Machine learning study for the prediction of transdermal peptide
NASA Astrophysics Data System (ADS)
Jung, Eunkyoung; Choi, Seung-Hoon; Lee, Nam Kyung; Kang, Sang-Kee; Choi, Yun-Jaie; Shin, Jae-Min; Choi, Kihang; Jung, Dong Hyun
2011-04-01
In order to develop a computational method to rapidly evaluate transdermal peptides, we report approaches for predicting the transdermal activity of peptides on the basis of peptide sequence information using Artificial Neural Network (ANN), Partial Least Squares (PLS) and Support Vector Machine (SVM). We identified 269 transdermal peptides by the phage display technique and use them as the positive controls to develop and test machine learning models. Combinations of three descriptors with neural network architectures, the number of latent variables and the kernel functions are tried in training to make appropriate predictions. The capacity of models is evaluated by means of statistical indicators including sensitivity, specificity, and the area under the receiver operating characteristic curve (ROC score). In the ROC score-based comparison, three methods proved capable of providing a reasonable prediction of transdermal peptide. The best result is obtained by SVM model with a radial basis function and VHSE descriptors. The results indicate that it is possible to discriminate between transdermal peptides and random sequences using our models. We anticipate that our models will be applicable to prediction of transdermal peptide for large peptide database for facilitating efficient transdermal drug delivery through intact skin.
Classification of ROTSE Variable Stars using Machine Learning
NASA Astrophysics Data System (ADS)
Wozniak, P. R.; Akerlof, C.; Amrose, S.; Brumby, S.; Casperson, D.; Gisler, G.; Kehoe, R.; Lee, B.; Marshall, S.; McGowan, K. E.; McKay, T.; Perkins, S.; Priedhorsky, W.; Rykoff, E.; Smith, D. A.; Theiler, J.; Vestrand, W. T.; Wren, J.; ROTSE Collaboration
2001-12-01
We evaluate several Machine Learning algorithms as potential tools for automated classification of variable stars. Using the ROTSE sample of ~1800 variables from a pilot study of 5% of the whole sky, we compare the effectiveness of a supervised technique (Support Vector Machines, SVM) versus unsupervised methods (K-means and Autoclass). There are 8 types of variables in the sample: RR Lyr AB, RR Lyr C, Delta Scuti, Cepheids, detached eclipsing binaries, contact binaries, Miras and LPVs. Preliminary results suggest a very high ( ~95%) efficiency of SVM in isolating a few best defined classes against the rest of the sample, and good accuracy ( ~70-75%) for all classes considered simultaneously. This includes some degeneracies, irreducible with the information at hand. Supervised methods naturally outperform unsupervised methods, in terms of final error rate, but unsupervised methods offer many advantages for large sets of unlabeled data. Therefore, both types of methods should be considered as promising tools for mining vast variability surveys. We project that there are more than 30,000 periodic variables in the ROTSE-I data base covering the entire local sky between V=10 and 15.5 mag. This sample size is already stretching the time capabilities of human analysts.
Groundwater vulnerability indices conditioned by Supervised Intelligence Committee Machine (SICM).
Nadiri, Ata Allah; Gharekhani, Maryam; Khatibi, Rahman; Sadeghfam, Sina; Moghaddam, Asghar Asghari
2017-01-01
This research presents a Supervised Intelligent Committee Machine (SICM) model to assess groundwater vulnerability indices of an aquifer. SICM uses Artificial Neural Networks (ANN) to overarch three Artificial Intelligence (AI) models: Support Vector Machine (SVM), Neuro-Fuzzy (NF) and Gene Expression Programming (GEP). Each model uses the DRASTIC index, the acronym of 7 geological, hydrological and hydrogeological parameters, which collectively represents intrinsic (or natural) vulnerability and gives a sense of contaminants, such as nitrate-N, penetrating aquifers from the surface. These models are trained to modify or condition their DRASTIC index values by measured nitrate-N concentration. The three AI-techniques often perform similarly but have differences as well and therefore SICM exploits the situation to improve the modeled values by producing a hybrid modeling results through selecting better performing SVM, NF and GEP components. The models of the study area at Ardabil aquifer show that the vulnerability indices by the DRASTIC framework produces sharp fronts but AI models smoothen the fronts and reflect a better correlation with observed nitrate values; SICM improves on the performances of three AI models and cope well with heterogeneity and uncertain parameters. Copyright © 2016 Elsevier B.V. All rights reserved.
Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong
2016-01-01
Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.
NASA Astrophysics Data System (ADS)
Ali, Salah M.; Hui, K. H.; Hee, L. M.; Salman Leong, M.; Al-Obaidi, M. A.; Ali, Y. H.; Abdelrhman, Ahmed M.
2018-03-01
Acoustic emission (AE) analysis has become a vital tool for initiating the maintenance tasks in many industries. However, the analysis process and interpretation has been found to be highly dependent on the experts. Therefore, an automated monitoring method would be required to reduce the cost and time consumed in the interpretation of AE signal. This paper investigates the application of two of the most common machine learning approaches namely artificial neural network (ANN) and support vector machine (SVM) to automate the diagnosis of valve faults in reciprocating compressor based on AE signal parameters. Since the accuracy is an essential factor in any automated diagnostic system, this paper also provides a comparative study based on predictive performance of ANN and SVM. AE parameters data was acquired from single stage reciprocating air compressor with different operational and valve conditions. ANN and SVM diagnosis models were subsequently devised by combining AE parameters of different conditions. Results demonstrate that ANN and SVM models have the same results in term of prediction accuracy. However, SVM model is recommended to automate diagnose the valve condition in due to the ability of handling a high number of input features with low sampling data sets.
Density-Dependent Quantized Least Squares Support Vector Machine for Large Data Sets.
Nan, Shengyu; Sun, Lei; Chen, Badong; Lin, Zhiping; Toh, Kar-Ann
2017-01-01
Based on the knowledge that input data distribution is important for learning, a data density-dependent quantization scheme (DQS) is proposed for sparse input data representation. The usefulness of the representation scheme is demonstrated by using it as a data preprocessing unit attached to the well-known least squares support vector machine (LS-SVM) for application on big data sets. Essentially, the proposed DQS adopts a single shrinkage threshold to obtain a simple quantization scheme, which adapts its outputs to input data density. With this quantization scheme, a large data set is quantized to a small subset where considerable sample size reduction is generally obtained. In particular, the sample size reduction can save significant computational cost when using the quantized subset for feature approximation via the Nyström method. Based on the quantized subset, the approximated features are incorporated into LS-SVM to develop a data density-dependent quantized LS-SVM (DQLS-SVM), where an analytic solution is obtained in the primal solution space. The developed DQLS-SVM is evaluated on synthetic and benchmark data with particular emphasis on large data sets. Extensive experimental results show that the learning machine incorporating DQS attains not only high computational efficiency but also good generalization performance.
NASA Astrophysics Data System (ADS)
Luo, Jianjun; Wei, Caisheng; Dai, Honghua; Yuan, Jianping
2018-03-01
This paper focuses on robust adaptive control for a class of uncertain nonlinear systems subject to input saturation and external disturbance with guaranteed predefined tracking performance. To reduce the limitations of classical predefined performance control method in the presence of unknown initial tracking errors, a novel predefined performance function with time-varying design parameters is first proposed. Then, aiming at reducing the complexity of nonlinear approximations, only two least-square-support-vector-machine-based (LS-SVM-based) approximators with two design parameters are required through norm form transformation of the original system. Further, a novel LS-SVM-based adaptive constrained control scheme is developed under the time-vary predefined performance using backstepping technique. Wherein, to avoid the tedious analysis and repeated differentiations of virtual control laws in the backstepping technique, a simple and robust finite-time-convergent differentiator is devised to only extract its first-order derivative at each step in the presence of external disturbance. In this sense, the inherent demerit of backstepping technique-;explosion of terms; brought by the recursive virtual controller design is conquered. Moreover, an auxiliary system is designed to compensate the control saturation. Finally, three groups of numerical simulations are employed to validate the effectiveness of the newly developed differentiator and the proposed adaptive constrained control scheme.
Support Vector Machine algorithm for regression and classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Chenggang; Zavaljevski, Nela
2001-08-01
The software is an implementation of the Support Vector Machine (SVM) algorithm that was invented and developed by Vladimir Vapnik and his co-workers at AT&T Bell Laboratories. The specific implementation reported here is an Active Set method for solving a quadratic optimization problem that forms the major part of any SVM program. The implementation is tuned to specific constraints generated in the SVM learning. Thus, it is more efficient than general-purpose quadratic optimization programs. A decomposition method has been implemented in the software that enables processing large data sets. The size of the learning data is virtually unlimited by themore » capacity of the computer physical memory. The software is flexible and extensible. Two upper bounds are implemented to regulate the SVM learning for classification, which allow users to adjust the false positive and false negative rates. The software can be used either as a standalone, general-purpose SVM regression or classification program, or be embedded into a larger software system.« less
Elhenawy, Mohammed; Jahangiri, Arash; Rakha, Hesham A; El-Shawarby, Ihab
2015-10-01
The ability to model driver stop/run behavior at signalized intersections considering the roadway surface condition is critical in the design of advanced driver assistance systems. Such systems can reduce intersection crashes and fatalities by predicting driver stop/run behavior. The research presented in this paper uses data collected from two controlled field experiments on the Smart Road at the Virginia Tech Transportation Institute (VTTI) to model driver stop/run behavior at the onset of a yellow indication for different roadway surface conditions. The paper offers two contributions. First, it introduces a new predictor related to driver aggressiveness and demonstrates that this measure enhances the modeling of driver stop/run behavior. Second, it applies well-known artificial intelligence techniques including: adaptive boosting (AdaBoost), random forest, and support vector machine (SVM) algorithms as well as traditional logistic regression techniques on the data in order to develop a model that can be used by traffic signal controllers to predict driver stop/run decisions in a connected vehicle environment. The research demonstrates that by adding the proposed driver aggressiveness predictor to the model, there is a statistically significant increase in the model accuracy. Moreover the false alarm rate is significantly reduced but this reduction is not statistically significant. The study demonstrates that, for the subject data, the SVM machine learning algorithm performs the best in terms of optimum classification accuracy and false positive rates. However, the SVM model produces the best performance in terms of the classification accuracy only. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Qiao, T.; Ren, J.; Craigie, C.; Zabalza, J.; Maltin, Ch.; Marshall, S.
2015-03-01
It is well known that the eating quality of beef has a significant influence on the repurchase behavior of consumers. There are several key factors that affect the perception of quality, including color, tenderness, juiciness, and flavor. To support consumer repurchase choices, there is a need for an objective measurement of quality that could be applied to meat prior to its sale. Objective approaches such as offered by spectral technologies may be useful, but the analytical algorithms used remain to be optimized. For visible and near infrared (VISNIR) spectroscopy, Partial Least Squares Regression (PLSR) is a widely used technique for meat related quality modeling and prediction. In this paper, a Support Vector Machine (SVM) based machine learning approach is presented to predict beef eating quality traits. Although SVM has been successfully used in various disciplines, it has not been applied extensively to the analysis of meat quality parameters. To this end, the performance of PLSR and SVM as tools for the analysis of meat tenderness is evaluated, using a large dataset acquired under industrial conditions. The spectral dataset was collected using VISNIR spectroscopy with the wavelength ranging from 350 to 1800 nm on 234 beef M. longissimus thoracis steaks from heifers, steers, and young bulls. As the dimensionality with the VISNIR data is very high (over 1600 spectral bands), the Principal Component Analysis (PCA) technique was applied for feature extraction and data reduction. The extracted principal components (less than 100) were then used for data modeling and prediction. The prediction results showed that SVM has a greater potential to predict beef eating quality than PLSR, especially for the prediction of tenderness. The infl uence of animal gender on beef quality prediction was also investigated, and it was found that beef quality traits were predicted most accurately in beef from young bulls.
Epileptic seizure detection in EEG signal with GModPCA and support vector machine.
Jaiswal, Abeg Kumar; Banka, Haider
2017-01-01
Epilepsy is one of the most common neurological disorders caused by recurrent seizures. Electroencephalograms (EEGs) record neural activity and can detect epilepsy. Visual inspection of an EEG signal for epileptic seizure detection is a time-consuming process and may lead to human error; therefore, recently, a number of automated seizure detection frameworks were proposed to replace these traditional methods. Feature extraction and classification are two important steps in these procedures. Feature extraction focuses on finding the informative features that could be used for classification and correct decision-making. Therefore, proposing effective feature extraction techniques for seizure detection is of great significance. Principal Component Analysis (PCA) is a dimensionality reduction technique used in different fields of pattern recognition including EEG signal classification. Global modular PCA (GModPCA) is a variation of PCA. In this paper, an effective framework with GModPCA and Support Vector Machine (SVM) is presented for epileptic seizure detection in EEG signals. The feature extraction is performed with GModPCA, whereas SVM trained with radial basis function kernel performed the classification between seizure and nonseizure EEG signals. Seven different experimental cases were conducted on the benchmark epilepsy EEG dataset. The system performance was evaluated using 10-fold cross-validation. In addition, we prove analytically that GModPCA has less time and space complexities as compared to PCA. The experimental results show that EEG signals have strong inter-sub-pattern correlations. GModPCA and SVM have been able to achieve 100% accuracy for the classification between normal and epileptic signals. Along with this, seven different experimental cases were tested. The classification results of the proposed approach were better than were compared the results of some of the existing methods proposed in literature. It is also found that the time and space complexities of GModPCA are less as compared to PCA. This study suggests that GModPCA and SVM could be used for automated epileptic seizure detection in EEG signal.
Qureshi, Muhammad Naveed Iqbal; Min, Beomjun; Jo, Hang Joon; Lee, Boreom
2016-01-01
The classification of neuroimaging data for the diagnosis of certain brain diseases is one of the main research goals of the neuroscience and clinical communities. In this study, we performed multiclass classification using a hierarchical extreme learning machine (H-ELM) classifier. We compared the performance of this classifier with that of a support vector machine (SVM) and basic extreme learning machine (ELM) for cortical MRI data from attention deficit/hyperactivity disorder (ADHD) patients. We used 159 structural MRI images of children from the publicly available ADHD-200 MRI dataset. The data consisted of three types, namely, typically developing (TDC), ADHD-inattentive (ADHD-I), and ADHD-combined (ADHD-C). We carried out feature selection by using standard SVM-based recursive feature elimination (RFE-SVM) that enabled us to achieve good classification accuracy (60.78%). In this study, we found the RFE-SVM feature selection approach in combination with H-ELM to effectively enable the acquisition of high multiclass classification accuracy rates for structural neuroimaging data. In addition, we found that the most important features for classification were the surface area of the superior frontal lobe, and the cortical thickness, volume, and mean surface area of the whole cortex. PMID:27500640
NASA Astrophysics Data System (ADS)
Rama Krishna, K.; Ramachandran, K. I.
2018-02-01
Crack propagation is a major cause of failure in rotating machines. It adversely affects the productivity, safety, and the machining quality. Hence, detecting the crack’s severity accurately is imperative for the predictive maintenance of such machines. Fault diagnosis is an established concept in identifying the faults, for observing the non-linear behaviour of the vibration signals at various operating conditions. In this work, we find the classification efficiencies for both original and the reconstructed vibrational signals. The reconstructed signals are obtained using Variational Mode Decomposition (VMD), by splitting the original signal into three intrinsic mode functional components and framing them accordingly. Feature extraction, feature selection and feature classification are the three phases in obtaining the classification efficiencies. All the statistical features from the original signals and reconstructed signals are found out in feature extraction process individually. A few statistical parameters are selected in feature selection process and are classified using the SVM classifier. The obtained results show the best parameters and appropriate kernel in SVM classifier for detecting the faults in bearings. Hence, we conclude that better results were obtained by VMD and SVM process over normal process using SVM. This is owing to denoising and filtering the raw vibrational signals.
Qureshi, Muhammad Naveed Iqbal; Min, Beomjun; Jo, Hang Joon; Lee, Boreom
2016-01-01
The classification of neuroimaging data for the diagnosis of certain brain diseases is one of the main research goals of the neuroscience and clinical communities. In this study, we performed multiclass classification using a hierarchical extreme learning machine (H-ELM) classifier. We compared the performance of this classifier with that of a support vector machine (SVM) and basic extreme learning machine (ELM) for cortical MRI data from attention deficit/hyperactivity disorder (ADHD) patients. We used 159 structural MRI images of children from the publicly available ADHD-200 MRI dataset. The data consisted of three types, namely, typically developing (TDC), ADHD-inattentive (ADHD-I), and ADHD-combined (ADHD-C). We carried out feature selection by using standard SVM-based recursive feature elimination (RFE-SVM) that enabled us to achieve good classification accuracy (60.78%). In this study, we found the RFE-SVM feature selection approach in combination with H-ELM to effectively enable the acquisition of high multiclass classification accuracy rates for structural neuroimaging data. In addition, we found that the most important features for classification were the surface area of the superior frontal lobe, and the cortical thickness, volume, and mean surface area of the whole cortex.
NASA Astrophysics Data System (ADS)
Kale, Mandar; Mukhopadhyay, Sudipta; Dash, Jatindra K.; Garg, Mandeep; Khandelwal, Niranjan
2016-03-01
Interstitial lung disease (ILD) is complicated group of pulmonary disorders. High Resolution Computed Tomography (HRCT) considered to be best imaging technique for analysis of different pulmonary disorders. HRCT findings can be categorised in several patterns viz. Consolidation, Emphysema, Ground Glass Opacity, Nodular, Normal etc. based on their texture like appearance. Clinician often find it difficult to diagnosis these pattern because of their complex nature. In such scenario computer-aided diagnosis system could help clinician to identify patterns. Several approaches had been proposed for classification of ILD patterns. This includes computation of textural feature and training /testing of classifier such as artificial neural network (ANN), support vector machine (SVM) etc. In this paper, wavelet features are calculated from two different ILD database, publically available MedGIFT ILD database and private ILD database, followed by performance evaluation of ANN and SVM classifiers in terms of average accuracy. It is found that average classification accuracy by SVM is greater than ANN where trained and tested on same database. Investigation continued further to test variation in accuracy of classifier when training and testing is performed with alternate database and training and testing of classifier with database formed by merging samples from same class from two individual databases. The average classification accuracy drops when two independent databases used for training and testing respectively. There is significant improvement in average accuracy when classifiers are trained and tested with merged database. It infers dependency of classification accuracy on training data. It is observed that SVM outperforms ANN when same database is used for training and testing.
NASA Astrophysics Data System (ADS)
Du, Kongchang; Zhao, Ying; Lei, Jiaqiang
2017-09-01
In hydrological time series prediction, singular spectrum analysis (SSA) and discrete wavelet transform (DWT) are widely used as preprocessing techniques for artificial neural network (ANN) and support vector machine (SVM) predictors. These hybrid or ensemble models seem to largely reduce the prediction error. In current literature researchers apply these techniques to the whole observed time series and then obtain a set of reconstructed or decomposed time series as inputs to ANN or SVM. However, through two comparative experiments and mathematical deduction we found the usage of SSA and DWT in building hybrid models is incorrect. Since SSA and DWT adopt 'future' values to perform the calculation, the series generated by SSA reconstruction or DWT decomposition contain information of 'future' values. These hybrid models caused incorrect 'high' prediction performance and may cause large errors in practice.
Discrimination of transgenic soybean seeds by terahertz spectroscopy
NASA Astrophysics Data System (ADS)
Liu, Wei; Liu, Changhong; Chen, Feng; Yang, Jianbo; Zheng, Lei
2016-10-01
Discrimination of genetically modified organisms is increasingly demanded by legislation and consumers worldwide. The feasibility of a non-destructive discrimination of glyphosate-resistant and conventional soybean seeds and their hybrid descendants was examined by terahertz time-domain spectroscopy system combined with chemometrics. Principal component analysis (PCA), least squares-support vector machines (LS-SVM) and PCA-back propagation neural network (PCA-BPNN) models with the first and second derivative and standard normal variate (SNV) transformation pre-treatments were applied to classify soybean seeds based on genotype. Results demonstrated clear differences among glyphosate-resistant, hybrid descendants and conventional non-transformed soybean seeds could easily be visualized with an excellent classification (accuracy was 88.33% in validation set) using the LS-SVM and the spectra with SNV pre-treatment. The results indicated that THz spectroscopy techniques together with chemometrics would be a promising technique to distinguish transgenic soybean seeds from non-transformed seeds with high efficiency and without any major sample preparation.
Guinness, Robert E
2015-04-28
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.
Guinness, Robert E.
2015-01-01
This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s). We investigated a wide range of supervised learning techniques for classification, including decision trees (DT), support vector machines (SVM), naive Bayes classifiers (NB), Bayesian networks (BN), logistic regression (LR), artificial neural networks (ANN) and several instance-based classifiers (KStar, LWLand IBk). Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall) were DT (96.5%), BN (90.9%), LWL (95.5%) and KStar (95.6%). In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity. PMID:25928060
Oh, Jooyoung; Cho, Dongrae; Park, Jaesub; Na, Se Hee; Kim, Jongin; Heo, Jaeseok; Shin, Cheung Soo; Kim, Jae-Jin; Park, Jin Young; Lee, Boreom
2018-03-27
Delirium is an important syndrome found in patients in the intensive care unit (ICU), however, it is usually under-recognized during treatment. This study was performed to investigate whether delirious patients can be successfully distinguished from non-delirious patients by using heart rate variability (HRV) and machine learning. Electrocardiography data of 140 patients was acquired during daily ICU care, and HRV data were analyzed. Delirium, including its type, severity, and etiologies, was evaluated daily by trained psychiatrists. HRV data and various machine learning algorithms including linear support vector machine (SVM), SVM with radial basis function (RBF) kernels, linear extreme learning machine (ELM), ELM with RBF kernels, linear discriminant analysis, and quadratic discriminant analysis were utilized to distinguish delirium patients from non-delirium patients. HRV data of 4797 ECGs were included, and 39 patients had delirium at least once during their ICU stay. The maximum classification accuracy was acquired using SVM with RBF kernels. Our prediction method based on HRV with machine learning was comparable to previous delirium prediction models using massive amounts of clinical information. Our results show that autonomic alterations could be a significant feature of patients with delirium in the ICU, suggesting the potential for the automatic prediction and early detection of delirium based on HRV with machine learning.
Exploitation of RF-DNA for Device Classification and Verification Using GRLVQI Processing
2012-12-01
5 FLD Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . 6 kNN K-Nearest Neighbor...Neighbor ( kNN ), Support Vector Machine (SVM), and simple cross-correlation techniques [40, 57, 82, 88, 94, 95]. The RF-DNA fingerprinting research in...Expansion and the Dis- crete Gabor Transform on a Non-Separable Lattice”. 2000 IEEE Int’l Conf on Acoustics, Speech , and Signal Processing (ICASSP00
NASA Astrophysics Data System (ADS)
Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.
2017-10-01
Crop maps are essential inputs for the agricultural planning done at various governmental and agribusinesses agencies. Remote sensing offers timely and costs efficient technologies to identify and map crop types over large areas. Among the plethora of classification methods, Support Vector Machine (SVM) and Random Forest (RF) are widely used because of their proven performance. In this work, we study the synergic use of both methods by introducing a random forest kernel (RFK) in an SVM classifier. A time series of multispectral WorldView-2 images acquired over Mali (West Africa) in 2014 was used to develop our case study. Ground truth containing five common crop classes (cotton, maize, millet, peanut, and sorghum) were collected at 45 farms and used to train and test the classifiers. An SVM with the standard Radial Basis Function (RBF) kernel, a RF, and an SVM-RFK were trained and tested over 10 random training and test subsets generated from the ground data. Results show that the newly proposed SVM-RFK classifier can compete with both RF and SVM-RBF. The overall accuracies based on the spectral bands only are of 83, 82 and 83% respectively. Adding vegetation indices to the analysis result in the classification accuracy of 82, 81 and 84% for SVM-RFK, RF, and SVM-RBF respectively. Overall, it can be observed that the newly tested RFK can compete with SVM-RBF and RF classifiers in terms of classification accuracy.
MultiMiTar: a novel multi objective optimization based miRNA-target prediction method.
Mitra, Ramkrishna; Bandyopadhyay, Sanghamitra
2011-01-01
Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level. In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM. MultiMiTar is found to achieve much higher Matthew's correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from -0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm.
Yoo, Tae Keun; Kim, Sung Kean; Kim, Deok Won; Choi, Joon Yul; Lee, Wan Hyung; Oh, Ein; Park, Eun-Cheol
2013-11-01
A number of clinical decision tools for osteoporosis risk assessment have been developed to select postmenopausal women for the measurement of bone mineral density. We developed and validated machine learning models with the aim of more accurately identifying the risk of osteoporosis in postmenopausal women compared to the ability of conventional clinical decision tools. We collected medical records from Korean postmenopausal women based on the Korea National Health and Nutrition Examination Surveys. The training data set was used to construct models based on popular machine learning algorithms such as support vector machines (SVM), random forests, artificial neural networks (ANN), and logistic regression (LR) based on simple surveys. The machine learning models were compared to four conventional clinical decision tools: osteoporosis self-assessment tool (OST), osteoporosis risk assessment instrument (ORAI), simple calculated osteoporosis risk estimation (SCORE), and osteoporosis index of risk (OSIRIS). SVM had significantly better area under the curve (AUC) of the receiver operating characteristic than ANN, LR, OST, ORAI, SCORE, and OSIRIS for the training set. SVM predicted osteoporosis risk with an AUC of 0.827, accuracy of 76.7%, sensitivity of 77.8%, and specificity of 76.0% at total hip, femoral neck, or lumbar spine for the testing set. The significant factors selected by SVM were age, height, weight, body mass index, duration of menopause, duration of breast feeding, estrogen therapy, hyperlipidemia, hypertension, osteoarthritis, and diabetes mellitus. Considering various predictors associated with low bone density, the machine learning methods may be effective tools for identifying postmenopausal women at high risk for osteoporosis.
Arbitrary norm support vector machines.
Huang, Kaizhu; Zheng, Danian; King, Irwin; Lyu, Michael R
2009-02-01
Support vector machines (SVM) are state-of-the-art classifiers. Typically L2-norm or L1-norm is adopted as a regularization term in SVMs, while other norm-based SVMs, for example, the L0-norm SVM or even the L(infinity)-norm SVM, are rarely seen in the literature. The major reason is that L0-norm describes a discontinuous and nonconvex term, leading to a combinatorially NP-hard optimization problem. In this letter, motivated by Bayesian learning, we propose a novel framework that can implement arbitrary norm-based SVMs in polynomial time. One significant feature of this framework is that only a sequence of sequential minimal optimization problems needs to be solved, thus making it practical in many real applications. The proposed framework is important in the sense that Bayesian priors can be efficiently plugged into most learning methods without knowing the explicit form. Hence, this builds a connection between Bayesian learning and the kernel machines. We derive the theoretical framework, demonstrate how our approach works on the L0-norm SVM as a typical example, and perform a series of experiments to validate its advantages. Experimental results on nine benchmark data sets are very encouraging. The implemented L0-norm is competitive with or even better than the standard L2-norm SVM in terms of accuracy but with a reduced number of support vectors, -9.46% of the number on average. When compared with another sparse model, the relevance vector machine, our proposed algorithm also demonstrates better sparse properties with a training speed over seven times faster.
Carbon Nanotube Growth Rate Regression using Support Vector Machines and Artificial Neural Networks
2014-03-27
intensity D peak. Reprinted with permission from [38]. The SVM classifier is trained using custom written Java code leveraging the Sequential Minimal...Society Encog is a machine learning framework for Java , C++ and .Net applications that supports Bayesian Networks, Hidden Markov Models, SVMs and ANNs [13...SVM classifiers are trained using Weka libraries and leveraging custom written Java code. The data set is created as an Attribute Relationship File
Zhou, Shu; Li, Guo-Bo; Huang, Lu-Yi; Xie, Huan-Zhang; Zhao, Ying-Lan; Chen, Yu-Zong; Li, Lin-Li; Yang, Sheng-Yong
2014-08-01
Drug-induced ototoxicity, as a toxic side effect, is an important issue needed to be considered in drug discovery. Nevertheless, current experimental methods used to evaluate drug-induced ototoxicity are often time-consuming and expensive, indicating that they are not suitable for a large-scale evaluation of drug-induced ototoxicity in the early stage of drug discovery. We thus, in this investigation, established an effective computational prediction model of drug-induced ototoxicity using an optimal support vector machine (SVM) method, GA-CG-SVM. Three GA-CG-SVM models were developed based on three training sets containing agents bearing different risk levels of drug-induced ototoxicity. For comparison, models based on naïve Bayesian (NB) and recursive partitioning (RP) methods were also used on the same training sets. Among all the prediction models, the GA-CG-SVM model II showed the best performance, which offered prediction accuracies of 85.33% and 83.05% for two independent test sets, respectively. Overall, the good performance of the GA-CG-SVM model II indicates that it could be used for the prediction of drug-induced ototoxicity in the early stage of drug discovery. Copyright © 2014 Elsevier Ltd. All rights reserved.
Liu, Xue-song; Sun, Fen-fang; Jin, Ye; Wu, Yong-jiang; Gu, Zhi-xin; Zhu, Li; Yan, Dong-lan
2015-12-01
A novel method was developed for the rapid determination of multi-indicators in corni fructus by means of near infrared (NIR) spectroscopy. Particle swarm optimization (PSO) based least squares support vector machine was investigated to increase the levels of quality control. The calibration models of moisture, extractum, morroniside and loganin were established using the PSO-LS-SVM algorithm. The performance of PSO-LS-SVM models was compared with partial least squares regression (PLSR) and back propagation artificial neural network (BP-ANN). The calibration and validation results of PSO-LS-SVM were superior to both PLS and BP-ANN. For PSO-LS-SVM models, the correlation coefficients (r) of calibrations were all above 0.942. The optimal prediction results were also achieved by PSO-LS-SVM models with the RMSEP (root mean square error of prediction) and RSEP (relative standard errors of prediction) less than 1.176 and 15.5% respectively. The results suggest that PSO-LS-SVM algorithm has a good model performance and high prediction accuracy. NIR has a potential value for rapid determination of multi-indicators in Corni Fructus.
Hybrid wavelet-support vector machine approach for modelling rainfall-runoff process.
Komasi, Mehdi; Sharghi, Soroush
2016-01-01
Because of the importance of water resources management, the need for accurate modeling of the rainfall-runoff process has rapidly grown in the past decades. Recently, the support vector machine (SVM) approach has been used by hydrologists for rainfall-runoff modeling and the other fields of hydrology. Similar to the other artificial intelligence models, such as artificial neural network (ANN) and adaptive neural fuzzy inference system, the SVM model is based on the autoregressive properties. In this paper, the wavelet analysis was linked to the SVM model concept for modeling the rainfall-runoff process of Aghchai and Eel River watersheds. In this way, the main time series of two variables, rainfall and runoff, were decomposed to multiple frequent time series by wavelet theory; then, these time series were imposed as input data on the SVM model in order to predict the runoff discharge one day ahead. The obtained results show that the wavelet SVM model can predict both short- and long-term runoff discharges by considering the seasonality effects. Also, the proposed hybrid model is relatively more appropriate than classical autoregressive ones such as ANN and SVM because it uses the multi-scale time series of rainfall and runoff data in the modeling process.
Classification of Stellar Spectra with Fuzzy Minimum Within-Class Support Vector Machine
NASA Astrophysics Data System (ADS)
Zhong-bao, Liu; Wen-ai, Song; Jing, Zhang; Wen-juan, Zhao
2017-06-01
Classification is one of the important tasks in astronomy, especially in spectra analysis. Support Vector Machine (SVM) is a typical classification method, which is widely used in spectra classification. Although it performs well in practice, its classification accuracies can not be greatly improved because of two limitations. One is it does not take the distribution of the classes into consideration. The other is it is sensitive to noise. In order to solve the above problems, inspired by the maximization of the Fisher's Discriminant Analysis (FDA) and the SVM separability constraints, fuzzy minimum within-class support vector machine (FMWSVM) is proposed in this paper. In FMWSVM, the distribution of the classes is reflected by the within-class scatter in FDA and the fuzzy membership function is introduced to decrease the influence of the noise. The comparative experiments with SVM on the SDSS datasets verify the effectiveness of the proposed classifier FMWSVM.
A hybrid SVM-FFA method for prediction of monthly mean global solar radiation
NASA Astrophysics Data System (ADS)
Shamshirband, Shahaboddin; Mohammadi, Kasra; Tong, Chong Wen; Zamani, Mazdak; Motamedi, Shervin; Ch, Sudheer
2016-07-01
In this study, a hybrid support vector machine-firefly optimization algorithm (SVM-FFA) model is proposed to estimate monthly mean horizontal global solar radiation (HGSR). The merit of SVM-FFA is assessed statistically by comparing its performance with three previously used approaches. Using each approach and long-term measured HGSR, three models are calibrated by considering different sets of meteorological parameters measured for Bandar Abbass situated in Iran. It is found that the model (3) utilizing the combination of relative sunshine duration, difference between maximum and minimum temperatures, relative humidity, water vapor pressure, average temperature, and extraterrestrial solar radiation shows superior performance based upon all approaches. Moreover, the extraterrestrial radiation is introduced as a significant parameter to accurately estimate the global solar radiation. The survey results reveal that the developed SVM-FFA approach is greatly capable to provide favorable predictions with significantly higher precision than other examined techniques. For the SVM-FFA (3), the statistical indicators of mean absolute percentage error (MAPE), root mean square error (RMSE), relative root mean square error (RRMSE), and coefficient of determination ( R 2) are 3.3252 %, 0.1859 kWh/m2, 3.7350 %, and 0.9737, respectively which according to the RRMSE has an excellent performance. As a more evaluation of SVM-FFA (3), the ratio of estimated to measured values is computed and found that 47 out of 48 months considered as testing data fall between 0.90 and 1.10. Also, by performing a further verification, it is concluded that SVM-FFA (3) offers absolute superiority over the empirical models using relatively similar input parameters. In a nutshell, the hybrid SVM-FFA approach would be considered highly efficient to estimate the HGSR.
Liao, Quan; Yao, Jianhua; Yuan, Shengang
2007-05-01
The study of prediction of toxicity is very important and necessary because measurement of toxicity is typically time-consuming and expensive. In this paper, Recursive Partitioning (RP) method was used to select descriptors. RP and Support Vector Machines (SVM) were used to construct structure-toxicity relationship models, RP model and SVM model, respectively. The performances of the two models are different. The prediction accuracies of the RP model are 80.2% for mutagenic compounds in MDL's toxicity database, 83.4% for compounds in CMC and 84.9% for agrochemicals in in-house database respectively. Those of SVM model are 81.4%, 87.0% and 87.3% respectively.
Prediction on sunspot activity based on fuzzy information granulation and support vector machine
NASA Astrophysics Data System (ADS)
Peng, Lingling; Yan, Haisheng; Yang, Zhigang
2018-04-01
In order to analyze the range of sunspots, a combined prediction method of forecasting the fluctuation range of sunspots based on fuzzy information granulation (FIG) and support vector machine (SVM) was put forward. Firstly, employing the FIG to granulate sample data and extract va)alid information of each window, namely the minimum value, the general average value and the maximum value of each window. Secondly, forecasting model is built respectively with SVM and then cross method is used to optimize these parameters. Finally, the fluctuation range of sunspots is forecasted with the optimized SVM model. Case study demonstrates that the model have high accuracy and can effectively predict the fluctuation of sunspots.
A support vector machine based control application to the experimental three-tank system.
Iplikci, Serdar
2010-07-01
This paper presents a support vector machine (SVM) approach to generalized predictive control (GPC) of multiple-input multiple-output (MIMO) nonlinear systems. The possession of higher generalization potential and at the same time avoidance of getting stuck into the local minima have motivated us to employ SVM algorithms for modeling MIMO systems. Based on the SVM model, detailed and compact formulations for calculating predictions and gradient information, which are used in the computation of the optimal control action, are given in the paper. The proposed MIMO SVM-based GPC method has been verified on an experimental three-tank liquid level control system. Experimental results have shown that the proposed method can handle the control task successfully for different reference trajectories. Moreover, a detailed discussion on data gathering, model selection and effects of the control parameters have been given in this paper. 2010 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Yuksel, Kivanc; Chang, Xin; Skarbek, Władysław
2017-08-01
The novel smile recognition algorithm is presented based on extraction of 68 facial salient points (fp68) using the ensemble of regression trees. The smile detector exploits the Support Vector Machine linear model. It is trained with few hundreds exemplar images by SVM algorithm working in 136 dimensional space. It is shown by the strict statistical data analysis that such geometric detector strongly depends on the geometry of mouth opening area, measured by triangulation of outer lip contour. To this goal two Bayesian detectors were developed and compared with SVM detector. The first uses the mouth area in 2D image, while the second refers to the mouth area in 3D animated face model. The 3D modeling is based on Candide-3 model and it is performed in real time along with three smile detectors and statistics estimators. The mouth area/Bayesian detectors exhibit high correlation with fp68/SVM detector in a range [0:8; 1:0], depending mainly on light conditions and individual features with advantage of 3D technique, especially in hard light conditions.
NASA Astrophysics Data System (ADS)
Imani, Moslem; You, Rey-Jer; Kuo, Chung-Yen
2014-10-01
Sea level forecasting at various time intervals is of great importance in water supply management. Evolutionary artificial intelligence (AI) approaches have been accepted as an appropriate tool for modeling complex nonlinear phenomena in water bodies. In the study, we investigated the ability of two AI techniques: support vector machine (SVM), which is mathematically well-founded and provides new insights into function approximation, and gene expression programming (GEP), which is used to forecast Caspian Sea level anomalies using satellite altimetry observations from June 1992 to December 2013. SVM demonstrates the best performance in predicting Caspian Sea level anomalies, given the minimum root mean square error (RMSE = 0.035) and maximum coefficient of determination (R2 = 0.96) during the prediction periods. A comparison between the proposed AI approaches and the cascade correlation neural network (CCNN) model also shows the superiority of the GEP and SVM models over the CCNN.
Acuña, Gonzalo; Ramirez, Cristian; Curilem, Millaray
2014-01-01
The lack of sensors for some relevant state variables in fermentation processes can be coped by developing appropriate software sensors. In this work, NARX-ANN, NARMAX-ANN, NARX-SVM and NARMAX-SVM models are compared when acting as software sensors of biomass concentration for a solid substrate cultivation (SSC) process. Results show that NARMAX-SVM outperforms the other models with an SMAPE index under 9 for a 20 % amplitude noise. In addition, NARMAX models perform better than NARX models under the same noise conditions because of their better predictive capabilities as they include prediction errors as inputs. In the case of perturbation of initial conditions of the autoregressive variable, NARX models exhibited better convergence capabilities. This work also confirms that a difficult to measure variable, like biomass concentration, can be estimated on-line from easy to measure variables like CO₂ and O₂ using an adequate software sensor based on computational intelligence techniques.
NASA Astrophysics Data System (ADS)
Zhou, Chao; Yin, Kunlong; Cao, Ying; Ahmed, Bayes; Li, Yuanyao; Catani, Filippo; Pourghasemi, Hamid Reza
2018-03-01
Landslide is a common natural hazard and responsible for extensive damage and losses in mountainous areas. In this study, Longju in the Three Gorges Reservoir area in China was taken as a case study for landslide susceptibility assessment in order to develop effective risk prevention and mitigation strategies. To begin, 202 landslides were identified, including 95 colluvial landslides and 107 rockfalls. Twelve landslide causal factor maps were prepared initially, and the relationship between these factors and each landslide type was analyzed using the information value model. Later, the unimportant factors were selected and eliminated using the information gain ratio technique. The landslide locations were randomly divided into two groups: 70% for training and 30% for verifying. Two machine learning models: the support vector machine (SVM) and artificial neural network (ANN), and a multivariate statistical model: the logistic regression (LR), were applied for landslide susceptibility modeling (LSM) for each type. The LSM index maps, obtained from combining the assessment results of the two landslide types, were classified into five levels. The performance of the LSMs was evaluated using the receiver operating characteristics curve and Friedman test. Results show that the elimination of noise-generating factors and the separated modeling of each landslide type have significantly increased the prediction accuracy. The machine learning models outperformed the multivariate statistical model and SVM model was found ideal for the case study area.
Torija, Antonio J; Ruiz, Diego P; Ramos-Ridao, Angel F
2014-06-01
To ensure appropriate soundscape management in urban environments, the urban-planning authorities need a range of tools that enable such a task to be performed. An essential step during the management of urban areas from a sound standpoint should be the evaluation of the soundscape in such an area. In this sense, it has been widely acknowledged that a subjective and acoustical categorization of a soundscape is the first step to evaluate it, providing a basis for designing or adapting it to match people's expectations as well. In this sense, this work proposes a model for automatic classification of urban soundscapes. This model is intended for the automatic classification of urban soundscapes based on underlying acoustical and perceptual criteria. Thus, this classification model is proposed to be used as a tool for a comprehensive urban soundscape evaluation. Because of the great complexity associated with the problem, two machine learning techniques, Support Vector Machines (SVM) and Support Vector Machines trained with Sequential Minimal Optimization (SMO), are implemented in developing model classification. The results indicate that the SMO model outperforms the SVM model in the specific task of soundscape classification. With the implementation of the SMO algorithm, the classification model achieves an outstanding performance (91.3% of instances correctly classified). © 2013 Elsevier B.V. All rights reserved.
Nonlinear Demodulation and Channel Coding in EBPSK Scheme
Chen, Xianqing; Wu, Lenan
2012-01-01
The extended binary phase shift keying (EBPSK) is an efficient modulation technique, and a special impacting filter (SIF) is used in its demodulator to improve the bit error rate (BER) performance. However, the conventional threshold decision cannot achieve the optimum performance, and the SIF brings more difficulty in obtaining the posterior probability for LDPC decoding. In this paper, we concentrate not only on reducing the BER of demodulation, but also on providing accurate posterior probability estimates (PPEs). A new approach for the nonlinear demodulation based on the support vector machine (SVM) classifier is introduced. The SVM method which selects only a few sampling points from the filter output was used for getting PPEs. The simulation results show that the accurate posterior probability can be obtained with this method and the BER performance can be improved significantly by applying LDPC codes. Moreover, we analyzed the effect of getting the posterior probability with different methods and different sampling rates. We show that there are more advantages of the SVM method under bad condition and it is less sensitive to the sampling rate than other methods. Thus, SVM is an effective method for EBPSK demodulation and getting posterior probability for LDPC decoding. PMID:23213281
Nonlinear demodulation and channel coding in EBPSK scheme.
Chen, Xianqing; Wu, Lenan
2012-01-01
The extended binary phase shift keying (EBPSK) is an efficient modulation technique, and a special impacting filter (SIF) is used in its demodulator to improve the bit error rate (BER) performance. However, the conventional threshold decision cannot achieve the optimum performance, and the SIF brings more difficulty in obtaining the posterior probability for LDPC decoding. In this paper, we concentrate not only on reducing the BER of demodulation, but also on providing accurate posterior probability estimates (PPEs). A new approach for the nonlinear demodulation based on the support vector machine (SVM) classifier is introduced. The SVM method which selects only a few sampling points from the filter output was used for getting PPEs. The simulation results show that the accurate posterior probability can be obtained with this method and the BER performance can be improved significantly by applying LDPC codes. Moreover, we analyzed the effect of getting the posterior probability with different methods and different sampling rates. We show that there are more advantages of the SVM method under bad condition and it is less sensitive to the sampling rate than other methods. Thus, SVM is an effective method for EBPSK demodulation and getting posterior probability for LDPC decoding.
Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil
2014-09-07
Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Barman, Ranjan Kumar; Mukhopadhyay, Anirban; Das, Santasabuj
2017-04-01
Bacterial small non-coding RNAs (sRNAs) are not translated into proteins, but act as functional RNAs. They are involved in diverse biological processes like virulence, stress response and quorum sensing. Several high-throughput techniques have enabled identification of sRNAs in bacteria, but experimental detection remains a challenge and grossly incomplete for most species. Thus, there is a need to develop computational tools to predict bacterial sRNAs. Here, we propose a computational method to identify sRNAs in bacteria using support vector machine (SVM) classifier. The primary sequence and secondary structure features of experimentally-validated sRNAs of Salmonella Typhimurium LT2 (SLT2) was used to build the optimal SVM model. We found that a tri-nucleotide composition feature of sRNAs achieved an accuracy of 88.35% for SLT2. We validated the SVM model also on the experimentally-detected sRNAs of E. coli and Salmonella Typhi. The proposed model had robustly attained an accuracy of 81.25% and 88.82% for E. coli K-12 and S. Typhi Ty2, respectively. We confirmed that this method significantly improved the identification of sRNAs in bacteria. Furthermore, we used a sliding window-based method and identified sRNAs from complete genomes of SLT2, S. Typhi Ty2 and E. coli K-12 with sensitivities of 89.09%, 83.33% and 67.39%, respectively.
Wang, Xibin; Luo, Fengji; Qian, Ying; Ranzi, Gianluca
2016-01-01
With the rapid development of ICT and Web technologies, a large an amount of information is becoming available and this is producing, in some instances, a condition of information overload. Under these conditions, it is difficult for a person to locate and access useful information for making decisions. To address this problem, there are information filtering systems, such as the personalized recommendation system (PRS) considered in this paper, that assist a person in identifying possible products or services of interest based on his/her preferences. Among available approaches, collaborative Filtering (CF) is one of the most widely used recommendation techniques. However, CF has some limitations, e.g., the relatively simple similarity calculation, cold start problem, etc. In this context, this paper presents a new regression model based on the support vector machine (SVM) classification and an improved PSO (IPSO) for the development of an electronic movie PRS. In its implementation, a SVM classification model is first established to obtain a preliminary movie recommendation list based on which a SVM regression model is applied to predict movies’ ratings. The proposed PRS not only considers the movie’s content information but also integrates the users’ demographic and behavioral information to better capture the users’ interests and preferences. The efficiency of the proposed method is verified by a series of experiments based on the MovieLens benchmark data set. PMID:27898691
Wang, Xibin; Luo, Fengji; Qian, Ying; Ranzi, Gianluca
2016-01-01
With the rapid development of ICT and Web technologies, a large an amount of information is becoming available and this is producing, in some instances, a condition of information overload. Under these conditions, it is difficult for a person to locate and access useful information for making decisions. To address this problem, there are information filtering systems, such as the personalized recommendation system (PRS) considered in this paper, that assist a person in identifying possible products or services of interest based on his/her preferences. Among available approaches, collaborative Filtering (CF) is one of the most widely used recommendation techniques. However, CF has some limitations, e.g., the relatively simple similarity calculation, cold start problem, etc. In this context, this paper presents a new regression model based on the support vector machine (SVM) classification and an improved PSO (IPSO) for the development of an electronic movie PRS. In its implementation, a SVM classification model is first established to obtain a preliminary movie recommendation list based on which a SVM regression model is applied to predict movies' ratings. The proposed PRS not only considers the movie's content information but also integrates the users' demographic and behavioral information to better capture the users' interests and preferences. The efficiency of the proposed method is verified by a series of experiments based on the MovieLens benchmark data set.
Mapping membrane activity in undiscovered peptide sequence space using machine learning
Fulan, Benjamin M.; Wong, Gerard C. L.
2016-01-01
There are some ∼1,100 known antimicrobial peptides (AMPs), which permeabilize microbial membranes but have diverse sequences. Here, we develop a support vector machine (SVM)-based classifier to investigate ⍺-helical AMPs and the interrelated nature of their functional commonality and sequence homology. SVM is used to search the undiscovered peptide sequence space and identify Pareto-optimal candidates that simultaneously maximize the distance σ from the SVM hyperplane (thus maximize its “antimicrobialness”) and its ⍺-helicity, but minimize mutational distance to known AMPs. By calibrating SVM machine learning results with killing assays and small-angle X-ray scattering (SAXS), we find that the SVM metric σ correlates not with a peptide’s minimum inhibitory concentration (MIC), but rather its ability to generate negative Gaussian membrane curvature. This surprising result provides a topological basis for membrane activity common to AMPs. Moreover, we highlight an important distinction between the maximal recognizability of a sequence to a trained AMP classifier (its ability to generate membrane curvature) and its maximal antimicrobial efficacy. As mutational distances are increased from known AMPs, we find AMP-like sequences that are increasingly difficult for nature to discover via simple mutation. Using the sequence map as a discovery tool, we find a unexpectedly diverse taxonomy of sequences that are just as membrane-active as known AMPs, but with a broad range of primary functions distinct from AMP functions, including endogenous neuropeptides, viral fusion proteins, topogenic peptides, and amyloids. The SVM classifier is useful as a general detector of membrane activity in peptide sequences. PMID:27849600
Using machine learning algorithms to guide rehabilitation planning for home care clients.
Zhu, Mu; Zhang, Zhanyang; Hirdes, John P; Stolee, Paul
2007-12-20
Targeting older clients for rehabilitation is a clinical challenge and a research priority. We investigate the potential of machine learning algorithms - Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) - to guide rehabilitation planning for home care clients. This study is a secondary analysis of data on 24,724 longer-term clients from eight home care programs in Ontario. Data were collected with the RAI-HC assessment system, in which the Activities of Daily Living Clinical Assessment Protocol (ADLCAP) is used to identify clients with rehabilitation potential. For study purposes, a client is defined as having rehabilitation potential if there was: i) improvement in ADL functioning, or ii) discharge home. SVM and KNN results are compared with those obtained using the ADLCAP. For comparison, the machine learning algorithms use the same functional and health status indicators as the ADLCAP. The KNN and SVM algorithms achieved similar substantially improved performance over the ADLCAP, although false positive and false negative rates were still fairly high (FP > .18, FN > .34 versus FP > .29, FN. > .58 for ADLCAP). Results are used to suggest potential revisions to the ADLCAP. Machine learning algorithms achieved superior predictions than the current protocol. Machine learning results are less readily interpretable, but can also be used to guide development of improved clinical protocols.
Osteoporosis risk prediction using machine learning and conventional methods.
Kim, Sung Kean; Yoo, Tae Keun; Oh, Ein; Kim, Deok Won
2013-01-01
A number of clinical decision tools for osteoporosis risk assessment have been developed to select postmenopausal women for the measurement of bone mineral density. We developed and validated machine learning models with the aim of more accurately identifying the risk of osteoporosis in postmenopausal women, and compared with the ability of a conventional clinical decision tool, osteoporosis self-assessment tool (OST). We collected medical records from Korean postmenopausal women based on the Korea National Health and Nutrition Surveys (KNHANES V-1). The training data set was used to construct models based on popular machine learning algorithms such as support vector machines (SVM), random forests (RF), artificial neural networks (ANN), and logistic regression (LR) based on various predictors associated with low bone density. The learning models were compared with OST. SVM had significantly better area under the curve (AUC) of the receiver operating characteristic (ROC) than ANN, LR, and OST. Validation on the test set showed that SVM predicted osteoporosis risk with an AUC of 0.827, accuracy of 76.7%, sensitivity of 77.8%, and specificity of 76.0%. We were the first to perform comparisons of the performance of osteoporosis prediction between the machine learning and conventional methods using population-based epidemiological data. The machine learning methods may be effective tools for identifying postmenopausal women at high risk for osteoporosis.
A Fault Alarm and Diagnosis Method Based on Sensitive Parameters and Support Vector Machine
NASA Astrophysics Data System (ADS)
Zhang, Jinjie; Yao, Ziyun; Lv, Zhiquan; Zhu, Qunxiong; Xu, Fengtian; Jiang, Zhinong
2015-08-01
Study on the extraction of fault feature and the diagnostic technique of reciprocating compressor is one of the hot research topics in the field of reciprocating machinery fault diagnosis at present. A large number of feature extraction and classification methods have been widely applied in the related research, but the practical fault alarm and the accuracy of diagnosis have not been effectively improved. Developing feature extraction and classification methods to meet the requirements of typical fault alarm and automatic diagnosis in practical engineering is urgent task. The typical mechanical faults of reciprocating compressor are presented in the paper, and the existing data of online monitoring system is used to extract fault feature parameters within 15 types in total; the inner sensitive connection between faults and the feature parameters has been made clear by using the distance evaluation technique, also sensitive characteristic parameters of different faults have been obtained. On this basis, a method based on fault feature parameters and support vector machine (SVM) is developed, which will be applied to practical fault diagnosis. A better ability of early fault warning has been proved by the experiment and the practical fault cases. Automatic classification by using the SVM to the data of fault alarm has obtained better diagnostic accuracy.
New KF-PP-SVM classification method for EEG in brain-computer interfaces.
Yang, Banghua; Han, Zhijun; Zan, Peng; Wang, Qian
2014-01-01
Classification methods are a crucial direction in the current study of brain-computer interfaces (BCIs). To improve the classification accuracy for electroencephalogram (EEG) signals, a novel KF-PP-SVM (kernel fisher, posterior probability, and support vector machine) classification method is developed. Its detailed process entails the use of common spatial patterns to obtain features, based on which the within-class scatter is calculated. Then the scatter is added into the kernel function of a radial basis function to construct a new kernel function. This new kernel is integrated into the SVM to obtain a new classification model. Finally, the output of SVM is calculated based on posterior probability and the final recognition result is obtained. To evaluate the effectiveness of the proposed KF-PP-SVM method, EEG data collected from laboratory are processed with four different classification schemes (KF-PP-SVM, KF-SVM, PP-SVM, and SVM). The results showed that the overall average improvements arising from the use of the KF-PP-SVM scheme as opposed to KF-SVM, PP-SVM and SVM schemes are 2.49%, 5.83 % and 6.49 % respectively.
Deep neural mapping support vector machines.
Li, Yujian; Zhang, Ting
2017-09-01
The choice of kernel has an important effect on the performance of a support vector machine (SVM). The effect could be reduced by NEUROSVM, an architecture using multilayer perceptron for feature extraction and SVM for classification. In binary classification, a general linear kernel NEUROSVM can be theoretically simplified as an input layer, many hidden layers, and an SVM output layer. As a feature extractor, the sub-network composed of the input and hidden layers is first trained together with a virtual ordinary output layer by backpropagation, then with the output of its last hidden layer taken as input of the SVM classifier for further training separately. By taking the sub-network as a kernel mapping from the original input space into a feature space, we present a novel model, called deep neural mapping support vector machine (DNMSVM), from the viewpoint of deep learning. This model is also a new and general kernel learning method, where the kernel mapping is indeed an explicit function expressed as a sub-network, different from an implicit function induced by a kernel function traditionally. Moreover, we exploit a two-stage procedure of contrastive divergence learning and gradient descent for DNMSVM to jointly training an adaptive kernel mapping instead of a kernel function, without requirement of kernel tricks. As a whole of the sub-network and the SVM classifier, the joint training of DNMSVM is done by using gradient descent to optimize the objective function with the sub-network layer-wise pre-trained via contrastive divergence learning of restricted Boltzmann machines. Compared to the separate training of NEUROSVM, the joint training is a new algorithm for DNMSVM to have advantages over NEUROSVM. Experimental results show that DNMSVM can outperform NEUROSVM and RBFSVM (i.e., SVM with the kernel of radial basis function), demonstrating its effectiveness. Copyright © 2017 Elsevier Ltd. All rights reserved.
Wire connector classification with machine vision and a novel hybrid SVM
NASA Astrophysics Data System (ADS)
Chauhan, Vedang; Joshi, Keyur D.; Surgenor, Brian W.
2018-04-01
A machine vision-based system has been developed and tested that uses a novel hybrid Support Vector Machine (SVM) in a part inspection application with clear plastic wire connectors. The application required the system to differentiate between 4 different known styles of connectors plus one unknown style, for a total of 5 classes. The requirement to handle an unknown class is what necessitated the hybrid approach. The system was trained with the 4 known classes and tested with 5 classes (the 4 known plus the 1 unknown). The hybrid classification approach used two layers of SVMs: one layer was semi-supervised and the other layer was supervised. The semi-supervised SVM was a special case of unsupervised machine learning that classified test images as one of the 4 known classes (to accept) or as the unknown class (to reject). The supervised SVM classified test images as one of the 4 known classes and consequently would give false positives (FPs). Two methods were tested. The difference between the methods was that the order of the layers was switched. The method with the semi-supervised layer first gave an accuracy of 80% with 20% FPs. The method with the supervised layer first gave an accuracy of 98% with 0% FPs. Further work is being conducted to see if the hybrid approach works with other applications that have an unknown class requirement.
Kim, Jongin; Lee, Boreom
2018-05-07
Different modalities such as structural MRI, FDG-PET, and CSF have complementary information, which is likely to be very useful for diagnosis of AD and MCI. Therefore, it is possible to develop a more effective and accurate AD/MCI automatic diagnosis method by integrating complementary information of different modalities. In this paper, we propose multi-modal sparse hierarchical extreme leaning machine (MSH-ELM). We used volume and mean intensity extracted from 93 regions of interest (ROIs) as features of MRI and FDG-PET, respectively, and used p-tau, t-tau, and Aβ42 as CSF features. In detail, high-level representation was individually extracted from each of MRI, FDG-PET, and CSF using a stacked sparse extreme learning machine auto-encoder (sELM-AE). Then, another stacked sELM-AE was devised to acquire a joint hierarchical feature representation by fusing the high-level representations obtained from each modality. Finally, we classified joint hierarchical feature representation using a kernel-based extreme learning machine (KELM). The results of MSH-ELM were compared with those of conventional ELM, single kernel support vector machine (SK-SVM), multiple kernel support vector machine (MK-SVM) and stacked auto-encoder (SAE). Performance was evaluated through 10-fold cross-validation. In the classification of AD vs. HC and MCI vs. HC problem, the proposed MSH-ELM method showed mean balanced accuracies of 96.10% and 86.46%, respectively, which is much better than those of competing methods. In summary, the proposed algorithm exhibits consistently better performance than SK-SVM, ELM, MK-SVM and SAE in the two binary classification problems (AD vs. HC and MCI vs. HC). © 2018 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Farda, N. M.
2017-12-01
Coastal wetlands provide ecosystem services essential to people and the environment. Changes in coastal wetlands, especially on land use, are important to monitor by utilizing multi-temporal imagery. The Google Earth Engine (GEE) provides many machine learning algorithms (10 algorithms) that are very useful for extracting land use from imagery. The research objective is to explore machine learning in Google Earth Engine and its accuracy for multi-temporal land use mapping of coastal wetland area. Landsat 3 MSS (1978), Landsat 5 TM (1991), Landsat 7 ETM+ (2001), and Landsat 8 OLI (2014) images located in Segara Anakan lagoon are selected to represent multi temporal images. The input for machine learning are visible and near infrared bands, PCA band, invers PCA bands, bare soil index, vegetation index, wetness index, elevation from ASTER GDEM, and GLCM (Harralick) texture, and also polygon samples in 140 locations. There are 10 machine learning algorithms applied to extract coastal wetlands land use from Landsat imagery. The algorithms are Fast Naive Bayes, CART (Classification and Regression Tree), Random Forests, GMO Max Entropy, Perceptron (Multi Class Perceptron), Winnow, Voting SVM, Margin SVM, Pegasos (Primal Estimated sub-GrAdient SOlver for Svm), IKPamir (Intersection Kernel Passive Aggressive Method for Information Retrieval, SVM). Machine learning in Google Earth Engine are very helpful in multi-temporal land use mapping, the highest accuracy for land use mapping of coastal wetland is CART with 96.98 % Overall Accuracy using K-Fold Cross Validation (K = 10). GEE is particularly useful for multi-temporal land use mapping with ready used image and classification algorithms, and also very challenging for other applications.
The generalization ability of online SVM classification based on Markov sampling.
Xu, Jie; Yan Tang, Yuan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang
2015-03-01
In this paper, we consider online support vector machine (SVM) classification learning algorithms with uniformly ergodic Markov chain (u.e.M.c.) samples. We establish the bound on the misclassification error of an online SVM classification algorithm with u.e.M.c. samples based on reproducing kernel Hilbert spaces and obtain a satisfactory convergence rate. We also introduce a novel online SVM classification algorithm based on Markov sampling, and present the numerical studies on the learning ability of online SVM classification based on Markov sampling for benchmark repository. The numerical studies show that the learning performance of the online SVM classification algorithm based on Markov sampling is better than that of classical online SVM classification based on random sampling as the size of training samples is larger.
NASA Astrophysics Data System (ADS)
Keshavamurthy, Krishna N.; Leary, Owen P.; Merck, Lisa H.; Kimia, Benjamin; Collins, Scott; Wright, David W.; Allen, Jason W.; Brock, Jeffrey F.; Merck, Derek
2017-03-01
Traumatic brain injury (TBI) is a major cause of death and disability in the United States. Time to treatment is often related to patient outcome. Access to cerebral imaging data in a timely manner is a vital component of patient care. Current methods of detecting and quantifying intracranial pathology can be time-consuming and require careful review of 2D/3D patient images by a radiologist. Additional time is needed for image protocoling, acquisition, and processing. These steps often occur in series, adding more time to the process and potentially delaying time-dependent management decisions for patients with traumatic brain injury. Our team adapted machine learning and computer vision methods to develop a technique that rapidly and automatically detects CT-identifiable lesions. Specifically, we use scale invariant feature transform (SIFT)1 and deep convolutional neural networks (CNN)2 to identify important image features that can distinguish TBI lesions from background data. Our learning algorithm is a linear support vector machine (SVM)3. Further, we also employ tools from topological data analysis (TDA) for gleaning insights into the correlation patterns between healthy and pathological data. The technique was validated using 409 CT scans of the brain, acquired via the Progesterone for the Treatment of Traumatic Brain Injury phase III clinical trial (ProTECT_III) which studied patients with moderate to severe TBI4. CT data were annotated by a central radiologist and included patients with positive and negative scans. Additionally, the largest lesion on each positive scan was manually segmented. We reserved 80% of the data for training the SVM and used the remaining 20% for testing. Preliminary results are promising with 92.55% prediction accuracy (sensitivity = 91.15%, specificity = 93.45%), indicating the potential usefulness of this technique in clinical scenarios.
Fan, X-J; Wan, X-B; Huang, Y; Cai, H-M; Fu, X-H; Yang, Z-L; Chen, D-K; Song, S-X; Wu, P-H; Liu, Q; Wang, L; Wang, J-P
2012-01-01
Background: Current imaging modalities are inadequate in preoperatively predicting regional lymph node metastasis (RLNM) status in rectal cancer (RC). Here, we designed support vector machine (SVM) model to address this issue by integrating epithelial–mesenchymal-transition (EMT)-related biomarkers along with clinicopathological variables. Methods: Using tissue microarrays and immunohistochemistry, the EMT-related biomarkers expression was measured in 193 RC patients. Of which, 74 patients were assigned to the training set to select the robust variables for designing SVM model. The SVM model predictive value was validated in the testing set (119 patients). Results: In training set, eight variables, including six EMT-related biomarkers and two clinicopathological variables, were selected to devise SVM model. In testing set, we identified 63 patients with high risk to RLNM and 56 patients with low risk. The sensitivity, specificity and overall accuracy of SVM in predicting RLNM were 68.3%, 81.1% and 72.3%, respectively. Importantly, multivariate logistic regression analysis showed that SVM model was indeed an independent predictor of RLNM status (odds ratio, 11.536; 95% confidence interval, 4.113–32.361; P<0.0001). Conclusion: Our SVM-based model displayed moderately strong predictive power in defining the RLNM status in RC patients, providing an important approach to select RLNM high-risk subgroup for neoadjuvant chemoradiotherapy. PMID:22538975
[Identification of varieties of textile fibers by using Vis/NIR infrared spectroscopy technique].
Wu, Gui-Fang; He, Yong
2010-02-01
The aim of the present paper was to provide new insight into Vis/NIR spectroscopic analysis of textile fibers. In order to achieve rapid identification of the varieties of fibers, the authors selected 5 kinds of fibers of cotton, flax, wool, silk and tencel to do a study with Vis/NIR spectroscopy. Firstly, the spectra of each kind of fiber were scanned by spectrometer, and principal component analysis (PCA) method was used to analyze the characteristics of the pattern of Vis/NIR spectra. Principal component scores scatter plot (PC1 x PC2 x PC3) of fiber indicated the classification effect of five varieties of fibers. The former 6 principal components (PCs) were selected according to the quantity and size of PCs. The PCA classification model was optimized by using the least-squares support vector machines (LS-SVM) method. The authors used the 6 PCs extracted by PCA as the inputs of LS-SVM, and PCA-LS-SVM model was built to achieve varieties validation as well as mathematical model building and optimization analysis. Two hundred samples (40 samples for each variety of fibers) of five varieties of fibers were used for calibration of PCA-LS-SVM model, and the other 50 samples (10 samples for each variety of fibers) were used for validation. The result of validation showed that Vis/NIR spectroscopy technique based on PCA-LS-SVM had a powerful classification capability. It provides a new method for identifying varieties of fibers rapidly and real time, so it has important significance for protecting the rights of consumers, ensuring the quality of textiles, and implementing rationalization production and transaction of textile materials and its production.
Predicting breast cancer using an expression values weighted clinical classifier.
Thomas, Minta; De Brabanter, Kris; Suykens, Johan A K; De Moor, Bart
2014-12-31
Clinical data, such as patient history, laboratory analysis, ultrasound parameters-which are the basis of day-to-day clinical decision support-are often used to guide the clinical management of cancer in the presence of microarray data. Several data fusion techniques are available to integrate genomics or proteomics data, but only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. To improve clinical management, these data should be fully exploited. This requires efficient algorithms to integrate these data sets and design a final classifier. LS-SVM classifiers and generalized eigenvalue/singular value decompositions are successfully used in many bioinformatics applications for prediction tasks. While bringing up the benefits of these two techniques, we propose a machine learning approach, a weighted LS-SVM classifier to integrate two data sources: microarray and clinical parameters. We compared and evaluated the proposed methods on five breast cancer case studies. Compared to LS-SVM classifier on individual data sets, generalized eigenvalue decomposition (GEVD) and kernel GEVD, the proposed weighted LS-SVM classifier offers good prediction performance, in terms of test area under ROC Curve (AUC), on all breast cancer case studies. Thus a clinical classifier weighted with microarray data set results in significantly improved diagnosis, prognosis and prediction responses to therapy. The proposed model has been shown as a promising mathematical framework in both data fusion and non-linear classification problems.
A support vector machine for predicting defibrillation outcomes from waveform metrics.
Howe, Andrew; Escalona, Omar J; Di Maio, Rebecca; Massot, Bertrand; Cromie, Nick A; Darragh, Karen M; Adgey, Jennifer; McEneaney, David J
2014-03-01
Algorithms to predict shock success based on VF waveform metrics could significantly enhance resuscitation by optimising the timing of defibrillation. To investigate robust methods of predicting defibrillation success in VF cardiac arrest patients, by using a support vector machine (SVM) optimisation approach. Frequency-domain (AMSA, dominant frequency and median frequency) and time-domain (slope and RMS amplitude) VF waveform metrics were calculated in a 4.1Y window prior to defibrillation. Conventional prediction test validity of each waveform parameter was conducted and used AUC>0.6 as the criterion for inclusion as a corroborative attribute processed by the SVM classification model. The latter used a Gaussian radial-basis-function (RBF) kernel and the error penalty factor C was fixed to 1. A two-fold cross-validation resampling technique was employed. A total of 41 patients had 115 defibrillation instances. AMSA, slope and RMS waveform metrics performed test validation with AUC>0.6 for predicting termination of VF and return-to-organised rhythm. Predictive accuracy of the optimised SVM design for termination of VF was 81.9% (± 1.24 SD); positive and negative predictivity were respectively 84.3% (± 1.98 SD) and 77.4% (± 1.24 SD); sensitivity and specificity were 87.6% (± 2.69 SD) and 71.6% (± 9.38 SD) respectively. AMSA, slope and RMS were the best VF waveform frequency-time parameters predictors of termination of VF according to test validity assessment. This a priori can be used for a simplified SVM optimised design that combines the predictive attributes of these VF waveform metrics for improved prediction accuracy and generalisation performance without requiring the definition of any threshold value on waveform metrics. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Tharwat, Alaa; Moemen, Yasmine S; Hassanien, Aboul Ella
2017-04-01
Measuring toxicity is an important step in drug development. Nevertheless, the current experimental methods used to estimate the drug toxicity are expensive and time-consuming, indicating that they are not suitable for large-scale evaluation of drug toxicity in the early stage of drug development. Hence, there is a high demand to develop computational models that can predict the drug toxicity risks. In this study, we used a dataset that consists of 553 drugs that biotransformed in liver. The toxic effects were calculated for the current data, namely, mutagenic, tumorigenic, irritant and reproductive effect. Each drug is represented by 31 chemical descriptors (features). The proposed model consists of three phases. In the first phase, the most discriminative subset of features is selected using rough set-based methods to reduce the classification time while improving the classification performance. In the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique (SMOTE), BorderLine SMOTE and Safe Level SMOTE are used to solve the problem of imbalanced dataset. In the third phase, the Support Vector Machines (SVM) classifier is used to classify an unknown drug into toxic or non-toxic. SVM parameters such as the penalty parameter and kernel parameter have a great impact on the classification accuracy of the model. In this paper, Whale Optimization Algorithm (WOA) has been proposed to optimize the parameters of SVM, so that the classification error can be reduced. The experimental results proved that the proposed model achieved high sensitivity to all toxic effects. Overall, the high sensitivity of the WOA+SVM model indicates that it could be used for the prediction of drug toxicity in the early stage of drug development. Copyright © 2017 Elsevier Inc. All rights reserved.
Estimation of hydraulic jump characteristics of channels with sudden diverging side walls via SVM.
Roushangar, Kiyoumars; Valizadeh, Reyhaneh; Ghasempour, Roghayeh
2017-10-01
Sudden diverging channels are one of the energy dissipaters which can dissipate most of the kinetic energy of the flow through a hydraulic jump. An accurate prediction of hydraulic jump characteristics is an important step in designing hydraulic structures. This paper focuses on the capability of the support vector machine (SVM) as a meta-model approach for predicting hydraulic jump characteristics in different sudden diverging stilling basins (i.e. basins with and without appurtenances). In this regard, different models were developed and tested using 1,018 experimental data. The obtained results proved the capability of the SVM technique in predicting hydraulic jump characteristics and it was found that the developed models for a channel with a central block performed more successfully than models for channels without appurtenances or with a negative step. The superior performance for the length of hydraulic jump was obtained for the model with parameters F 1 (Froude number) and (h 2- h 1 )/h 1 (h 1 and h 2 are sequent depth of upstream and downstream respectively). Concerning the relative energy dissipation and sequent depth ratio, the model with parameters F 1 and h 1 /B (B is expansion ratio) led to the best results. According to the outcome of sensitivity analysis, Froude number had the most significant effect on the modeling. Also comparison between SVM and empirical equations indicated the great performance of the SVM.
Classification of hadith into positive suggestion, negative suggestion, and information
NASA Astrophysics Data System (ADS)
Faraby, Said Al; Riviera Rachmawati Jasin, Eliza; Kusumaningrum, Andina; Adiwijaya
2018-03-01
As one of the Muslim life guidelines, based on the meaning of its sentence(s), a hadith can be viewed as a suggestion for doing something, or a suggestion for not doing something, or just information without any suggestion. In this paper, we tried to classify the Bahasa translation of hadith into the three categories using machine learning approach. We tried stemming and stopword removal in preprocessing, and TF-IDF of unigram, bigram, and trigram as the extracted features. As the classifier, we compared between SVM and Neural Network. Since the categories are new, so in order to compare the results of the previous pipelines, we created a baseline classifier using simple rule-based string matching technique. The rule-based algorithm conditions on the occurrence of words such as “janganlah, sholatlah, and so on” to determine the category. The baseline method achieved F1-Score of 0.69, while the best F1-Score from the machine learning approach was 0.88, and it was produced by SVM model with the linear kernel.
Kim, Il-Hwan; Bong, Jae-Hwan; Park, Jooyoung; Park, Shinsuk
2017-01-01
Driver assistance systems have become a major safety feature of modern passenger vehicles. The advanced driver assistance system (ADAS) is one of the active safety systems to improve the vehicle control performance and, thus, the safety of the driver and the passengers. To use the ADAS for lane change control, rapid and correct detection of the driver’s intention is essential. This study proposes a novel preprocessing algorithm for the ADAS to improve the accuracy in classifying the driver’s intention for lane change by augmenting basic measurements from conventional on-board sensors. The information on the vehicle states and the road surface condition is augmented by using an artificial neural network (ANN) models, and the augmented information is fed to a support vector machine (SVM) to detect the driver’s intention with high accuracy. The feasibility of the developed algorithm was tested through driving simulator experiments. The results show that the classification accuracy for the driver’s intention can be improved by providing an SVM model with sufficient driving information augmented by using ANN models of vehicle dynamics. PMID:28604582
Steganalysis using logistic regression
NASA Astrophysics Data System (ADS)
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
Das, Dev Kumar; Ghosh, Madhumala; Pal, Mallika; Maiti, Asok K; Chakraborty, Chandan
2013-02-01
The aim of this paper is to address the development of computer assisted malaria parasite characterization and classification using machine learning approach based on light microscopic images of peripheral blood smears. In doing this, microscopic image acquisition from stained slides, illumination correction and noise reduction, erythrocyte segmentation, feature extraction, feature selection and finally classification of different stages of malaria (Plasmodium vivax and Plasmodium falciparum) have been investigated. The erythrocytes are segmented using marker controlled watershed transformation and subsequently total ninety six features describing shape-size and texture of erythrocytes are extracted in respect to the parasitemia infected versus non-infected cells. Ninety four features are found to be statistically significant in discriminating six classes. Here a feature selection-cum-classification scheme has been devised by combining F-statistic, statistical learning techniques i.e., Bayesian learning and support vector machine (SVM) in order to provide the higher classification accuracy using best set of discriminating features. Results show that Bayesian approach provides the highest accuracy i.e., 84% for malaria classification by selecting 19 most significant features while SVM provides highest accuracy i.e., 83.5% with 9 most significant features. Finally, the performance of these two classifiers under feature selection framework has been compared toward malaria parasite classification. Copyright © 2012 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu
2018-02-01
A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
Predicting human liver microsomal stability with machine learning techniques.
Sakiyama, Yojiro; Yuki, Hitomi; Moriya, Takashi; Hattori, Kazunari; Suzuki, Misaki; Shimada, Kaoru; Honma, Teruki
2008-02-01
To ensure a continuing pipeline in pharmaceutical research, lead candidates must possess appropriate metabolic stability in the drug discovery process. In vitro ADMET (absorption, distribution, metabolism, elimination, and toxicity) screening provides us with useful information regarding the metabolic stability of compounds. However, before the synthesis stage, an efficient process is required in order to deal with the vast quantity of data from large compound libraries and high-throughput screening. Here we have derived a relationship between the chemical structure and its metabolic stability for a data set of in-house compounds by means of various in silico machine learning such as random forest, support vector machine (SVM), logistic regression, and recursive partitioning. For model building, 1952 proprietary compounds comprising two classes (stable/unstable) were used with 193 descriptors calculated by Molecular Operating Environment. The results using test compounds have demonstrated that all classifiers yielded satisfactory results (accuracy > 0.8, sensitivity > 0.9, specificity > 0.6, and precision > 0.8). Above all, classification by random forest as well as SVM yielded kappa values of approximately 0.7 in an independent validation set, slightly higher than other classification tools. These results suggest that nonlinear/ensemble-based classification methods might prove useful in the area of in silico ADME modeling.
Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology.
Heinson, Ashley I; Gunawardana, Yawwani; Moesker, Bastiaan; Hume, Carmen C Denman; Vataga, Elena; Hall, Yper; Stylianou, Elena; McShane, Helen; Williams, Ann; Niranjan, Mahesan; Woelk, Christopher H
2017-02-01
Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.
Lajnef, Tarek; Chaibi, Sahbi; Ruby, Perrine; Aguera, Pierre-Emmanuel; Eichenlaub, Jean-Baptiste; Samet, Mounir; Kachouri, Abdennaceur; Jerbi, Karim
2015-07-30
Sleep staging is a critical step in a range of electrophysiological signal processing pipelines used in clinical routine as well as in sleep research. Although the results currently achievable with automatic sleep staging methods are promising, there is need for improvement, especially given the time-consuming and tedious nature of visual sleep scoring. Here we propose a sleep staging framework that consists of a multi-class support vector machine (SVM) classification based on a decision tree approach. The performance of the method was evaluated using polysomnographic data from 15 subjects (electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) recordings). The decision tree, or dendrogram, was obtained using a hierarchical clustering technique and a wide range of time and frequency-domain features were extracted. Feature selection was carried out using forward sequential selection and classification was evaluated using k-fold cross-validation. The dendrogram-based SVM (DSVM) achieved mean specificity, sensitivity and overall accuracy of 0.92, 0.74 and 0.88 respectively, compared to expert visual scoring. Restricting DSVM classification to data where both experts' scoring was consistent (76.73% of the data) led to a mean specificity, sensitivity and overall accuracy of 0.94, 0.82 and 0.92 respectively. The DSVM framework outperforms classification with more standard multi-class "one-against-all" SVM and linear-discriminant analysis. The promising results of the proposed methodology suggest that it may be a valuable alternative to existing automatic methods and that it could accelerate visual scoring by providing a robust starting hypnogram that can be further fine-tuned by expert inspection. Copyright © 2015 Elsevier B.V. All rights reserved.
Efficient HIK SVM learning for image classification.
Wu, Jianxin
2012-10-01
Histograms are used in almost every aspect of image processing and computer vision, from visual descriptors to image representations. Histogram intersection kernel (HIK) and support vector machine (SVM) classifiers are shown to be very effective in dealing with histograms. This paper presents contributions concerning HIK SVM for image classification. First, we propose intersection coordinate descent (ICD), a deterministic and scalable HIK SVM solver. ICD is much faster than, and has similar accuracies to, general purpose SVM solvers and other fast HIK SVM training methods. We also extend ICD to the efficient training of a broader family of kernels. Second, we show an important empirical observation that ICD is not sensitive to the C parameter in SVM, and we provide some theoretical analyses to explain this observation. ICD achieves high accuracies in many problems, using its default parameters. This is an attractive property for practitioners, because many image processing tasks are too large to choose SVM parameters using cross-validation.
Density-based penalty parameter optimization on C-SVM.
Liu, Yun; Lian, Jie; Bartolacci, Michael R; Zeng, Qing-An
2014-01-01
The support vector machine (SVM) is one of the most widely used approaches for data classification and regression. SVM achieves the largest distance between the positive and negative support vectors, which neglects the remote instances away from the SVM interface. In order to avoid a position change of the SVM interface as the result of an error system outlier, C-SVM was implemented to decrease the influences of the system's outliers. Traditional C-SVM holds a uniform parameter C for both positive and negative instances; however, according to the different number proportions and the data distribution, positive and negative instances should be set with different weights for the penalty parameter of the error terms. Therefore, in this paper, we propose density-based penalty parameter optimization of C-SVM. The experiential results indicated that our proposed algorithm has outstanding performance with respect to both precision and recall.
Gao, Xiang-Ming; Yang, Shi-Feng; Pan, San-Bo
2017-01-01
Predicting the output power of photovoltaic system with nonstationarity and randomness, an output power prediction model for grid-connected PV systems is proposed based on empirical mode decomposition (EMD) and support vector machine (SVM) optimized with an artificial bee colony (ABC) algorithm. First, according to the weather forecast data sets on the prediction date, the time series data of output power on a similar day with 15-minute intervals are built. Second, the time series data of the output power are decomposed into a series of components, including some intrinsic mode components IMFn and a trend component Res, at different scales using EMD. The corresponding SVM prediction model is established for each IMF component and trend component, and the SVM model parameters are optimized with the artificial bee colony algorithm. Finally, the prediction results of each model are reconstructed, and the predicted values of the output power of the grid-connected PV system can be obtained. The prediction model is tested with actual data, and the results show that the power prediction model based on the EMD and ABC-SVM has a faster calculation speed and higher prediction accuracy than do the single SVM prediction model and the EMD-SVM prediction model without optimization.
Shahlaei, M.; Saghaie, L.
2014-01-01
A quantitative structure–activity relationship (QSAR) study is suggested for the prediction of biological activity (pIC50) of 3, 4-dihydropyrido [3,2-d] pyrimidone derivatives as p38 inhibitors. Modeling of the biological activities of compounds of interest as a function of molecular structures was established by means of principal component analysis (PCA) and least square support vector machine (LS-SVM) methods. The results showed that the pIC50 values calculated by LS-SVM are in good agreement with the experimental data, and the performance of the LS-SVM regression model is superior to the PCA-based model. The developed LS-SVM model was applied for the prediction of the biological activities of pyrimidone derivatives, which were not in the modeling procedure. The resulted model showed high prediction ability with root mean square error of prediction of 0.460 for LS-SVM. The study provided a novel and effective approach for predicting biological activities of 3, 4-dihydropyrido [3,2-d] pyrimidone derivatives as p38 inhibitors and disclosed that LS-SVM can be used as a powerful chemometrics tool for QSAR studies. PMID:26339262
2017-01-01
Predicting the output power of photovoltaic system with nonstationarity and randomness, an output power prediction model for grid-connected PV systems is proposed based on empirical mode decomposition (EMD) and support vector machine (SVM) optimized with an artificial bee colony (ABC) algorithm. First, according to the weather forecast data sets on the prediction date, the time series data of output power on a similar day with 15-minute intervals are built. Second, the time series data of the output power are decomposed into a series of components, including some intrinsic mode components IMFn and a trend component Res, at different scales using EMD. The corresponding SVM prediction model is established for each IMF component and trend component, and the SVM model parameters are optimized with the artificial bee colony algorithm. Finally, the prediction results of each model are reconstructed, and the predicted values of the output power of the grid-connected PV system can be obtained. The prediction model is tested with actual data, and the results show that the power prediction model based on the EMD and ABC-SVM has a faster calculation speed and higher prediction accuracy than do the single SVM prediction model and the EMD-SVM prediction model without optimization. PMID:28912803
Liu, Xue-Mei; Liu, Jian-She
2012-11-01
Visible infrared spectroscopy (Vis/SW-NIRS) was investigated in the present study for measurement accuracy of soil properties,namely, available nitrogen(N) and available potassium(K). Three types of pretreatments including standard normal variate (SNV), multiplicative scattering correction (MSC) and Savitzky-Golay smoothing+first derivative were adopted to eliminate the system noises and external disturbances. Then partial least squares (PLS) and least squares-support vector machine (LS-SVM) models analysis were implemented for calibration models. Simultaneously, the performance of least squares-support vector machine (LS-SVM) models was compared with three kinds of inputs, including PCA(PCs), latent variables (LVs), and effective wavelengths (EWs). The results indicated that all LS-SVM models outperformed PLS models. The performance of the model was evaluated by the correlation coefficient (r2) and RMSEP. The optimal EWs-LS-SVM models were achieved, and the correlation coefficient (r2) and RMSEP were 0.82 and 17.2 for N and 0.72 and 15.0 for K, respectively. The results indicated that visible and short wave-near infrared spectroscopy (Vis/SW-NIRS)(325-1 075 nm) combined with LS-SVM could be utilized as a precision method for the determination of soil properties.
STAR-GALAXY CLASSIFICATION IN MULTI-BAND OPTICAL IMAGING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fadely, Ross; Willman, Beth; Hogg, David W.
2012-11-20
Ground-based optical surveys such as PanSTARRS, DES, and LSST will produce large catalogs to limiting magnitudes of r {approx}> 24. Star-galaxy separation poses a major challenge to such surveys because galaxies-even very compact galaxies-outnumber halo stars at these depths. We investigate photometric classification techniques on stars and galaxies with intrinsic FWHM <0.2 arcsec. We consider unsupervised spectral energy distribution template fitting and supervised, data-driven support vector machines (SVMs). For template fitting, we use a maximum likelihood (ML) method and a new hierarchical Bayesian (HB) method, which learns the prior distribution of template probabilities from the data. SVM requires training datamore » to classify unknown sources; ML and HB do not. We consider (1) a best-case scenario (SVM{sub best}) where the training data are (unrealistically) a random sampling of the data in both signal-to-noise and demographics and (2) a more realistic scenario where training is done on higher signal-to-noise data (SVM{sub real}) at brighter apparent magnitudes. Testing with COSMOS ugriz data, we find that HB outperforms ML, delivering {approx}80% completeness, with purity of {approx}60%-90% for both stars and galaxies. We find that no algorithm delivers perfect performance and that studies of metal-poor main-sequence turnoff stars may be challenged by poor star-galaxy separation. Using the Receiver Operating Characteristic curve, we find a best-to-worst ranking of SVM{sub best}, HB, ML, and SVM{sub real}. We conclude, therefore, that a well-trained SVM will outperform template-fitting methods. However, a normally trained SVM performs worse. Thus, HB template fitting may prove to be the optimal classification method in future surveys.« less
NASA Technical Reports Server (NTRS)
Kocurek, Michael J.
2005-01-01
The HARVIST project seeks to automatically provide an accurate, interactive interface to predict crop yield over the entire United States. In order to accomplish this goal, large images must be quickly and automatically classified by crop type. Current trained and untrained classification algorithms, while accurate, are highly inefficient when operating on large datasets. This project sought to develop new variants of two standard trained and untrained classification algorithms that are optimized to take advantage of the spatial nature of image data. The first algorithm, harvist-cluster, utilizes divide-and-conquer techniques to precluster an image in the hopes of increasing overall clustering speed. The second algorithm, harvistSVM, utilizes support vector machines (SVMs), a type of trained classifier. It seeks to increase classification speed by applying a "meta-SVM" to a quick (but inaccurate) SVM to approximate a slower, yet more accurate, SVM. Speedups were achieved by tuning the algorithm to quickly identify when the quick SVM was incorrect, and then reclassifying low-confidence pixels as necessary. Comparing the classification speeds of both algorithms to known baselines showed a slight speedup for large values of k (the number of clusters) for harvist-cluster, and a significant speedup for harvistSVM. Future work aims to automate the parameter tuning process required for harvistSVM, and further improve classification accuracy and speed. Additionally, this research will move documents created in Canvas into ArcGIS. The launch of the Mars Reconnaissance Orbiter (MRO) will provide a wealth of image data such as global maps of Martian weather and high resolution global images of Mars. The ability to store this new data in a georeferenced format will support future Mars missions by providing data for landing site selection and the search for water on Mars.
Kernel machines for epilepsy diagnosis via EEG signal classification: a comparative study.
Lima, Clodoaldo A M; Coelho, André L V
2011-10-01
We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely, Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). Copyright © 2011 Elsevier B.V. All rights reserved.
Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization.
Nishio, Mizuho; Nishizawa, Mitsuo; Sugiyama, Osamu; Kojima, Ryosuke; Yakami, Masahiro; Kuroda, Tomohiro; Togashi, Kaori
2018-01-01
We aimed to evaluate a computer-aided diagnosis (CADx) system for lung nodule classification focussing on (i) usefulness of the conventional CADx system (hand-crafted imaging feature + machine learning algorithm), (ii) comparison between support vector machine (SVM) and gradient tree boosting (XGBoost) as machine learning algorithms, and (iii) effectiveness of parameter optimization using Bayesian optimization and random search. Data on 99 lung nodules (62 lung cancers and 37 benign lung nodules) were included from public databases of CT images. A variant of the local binary pattern was used for calculating a feature vector. SVM or XGBoost was trained using the feature vector and its corresponding label. Tree Parzen Estimator (TPE) was used as Bayesian optimization for parameters of SVM and XGBoost. Random search was done for comparison with TPE. Leave-one-out cross-validation was used for optimizing and evaluating the performance of our CADx system. Performance was evaluated using area under the curve (AUC) of receiver operating characteristic analysis. AUC was calculated 10 times, and its average was obtained. The best averaged AUC of SVM and XGBoost was 0.850 and 0.896, respectively; both were obtained using TPE. XGBoost was generally superior to SVM. Optimal parameters for achieving high AUC were obtained with fewer numbers of trials when using TPE, compared with random search. Bayesian optimization of SVM and XGBoost parameters was more efficient than random search. Based on observer study, AUC values of two board-certified radiologists were 0.898 and 0.822. The results show that diagnostic accuracy of our CADx system was comparable to that of radiologists with respect to classifying lung nodules.
Anam, Khairul; Al-Jumaily, Adel
2017-01-01
The success of myoelectric pattern recognition (M-PR) mostly relies on the features extracted and classifier employed. This paper proposes and evaluates a fast classifier, extreme learning machine (ELM), to classify individual and combined finger movements on amputees and non-amputees. ELM is a single hidden layer feed-forward network (SLFN) that avoids iterative learning by determining input weights randomly and output weights analytically. Therefore, it can accelerate the training time of SLFNs. In addition to the classifier evaluation, this paper evaluates various feature combinations to improve the performance of M-PR and investigate some feature projections to improve the class separability of the features. Different from other studies on the implementation of ELM in the myoelectric controller, this paper presents a complete and thorough investigation of various types of ELMs including the node-based and kernel-based ELM. Furthermore, this paper provides comparisons of ELMs and other well-known classifiers such as linear discriminant analysis (LDA), k-nearest neighbour (kNN), support vector machine (SVM) and least-square SVM (LS-SVM). The experimental results show the most accurate ELM classifier is radial basis function ELM (RBF-ELM). The comparison of RBF-ELM and other well-known classifiers shows that RBF-ELM is as accurate as SVM and LS-SVM but faster than the SVM family; it is superior to LDA and kNN. The experimental results also indicate that the accuracy gap of the M-PR on the amputees and non-amputees is not too much with the accuracy of 98.55% on amputees and 99.5% on the non-amputees using six electromyography (EMG) channels. Copyright © 2016 Elsevier Ltd. All rights reserved.
2012-01-01
Background Members of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM). Result The amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity. Conclusion The results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences. PMID:23046503
Feature Selection Methods for Robust Decoding of Finger Movements in a Non-human Primate
Padmanaban, Subash; Baker, Justin; Greger, Bradley
2018-01-01
Objective: The performance of machine learning algorithms used for neural decoding of dexterous tasks may be impeded due to problems arising when dealing with high-dimensional data. The objective of feature selection algorithms is to choose a near-optimal subset of features from the original feature space to improve the performance of the decoding algorithm. The aim of our study was to compare the effects of four feature selection techniques, Wilcoxon signed-rank test, Relative Importance, Principal Component Analysis (PCA), and Mutual Information Maximization on SVM classification performance for a dexterous decoding task. Approach: A nonhuman primate (NHP) was trained to perform small coordinated movements—similar to typing. An array of microelectrodes was implanted in the hand area of the motor cortex of the NHP and used to record action potentials (AP) during finger movements. A Support Vector Machine (SVM) was used to classify which finger movement the NHP was making based upon AP firing rates. We used the SVM classification to examine the functional parameters of (i) robustness to simulated failure and (ii) longevity of classification. We also compared the effect of using isolated-neuron and multi-unit firing rates as the feature vector supplied to the SVM. Main results: The average decoding accuracy for multi-unit features and single-unit features using Mutual Information Maximization (MIM) across 47 sessions was 96.74 ± 3.5% and 97.65 ± 3.36% respectively. The reduction in decoding accuracy between using 100% of the features and 10% of features based on MIM was 45.56% (from 93.7 to 51.09%) and 4.75% (from 95.32 to 90.79%) for multi-unit and single-unit features respectively. MIM had best performance compared to other feature selection methods. Significance: These results suggest improved decoding performance can be achieved by using optimally selected features. The results based on clinically relevant performance metrics also suggest that the decoding algorithm can be made robust by using optimal features and feature selection algorithms. We believe that even a few percent increase in performance is important and improves the decoding accuracy of the machine learning algorithm potentially increasing the ease of use of a brain machine interface. PMID:29467602
NASA Astrophysics Data System (ADS)
Yeganeh, B.; Motlagh, M. Shafie Pour; Rashidi, Y.; Kamalan, H.
2012-08-01
Due to the health impacts caused by exposures to air pollutants in urban areas, monitoring and forecasting of air quality parameters have become popular as an important topic in atmospheric and environmental research today. The knowledge on the dynamics and complexity of air pollutants behavior has made artificial intelligence models as a useful tool for a more accurate pollutant concentration prediction. This paper focuses on an innovative method of daily air pollution prediction using combination of Support Vector Machine (SVM) as predictor and Partial Least Square (PLS) as a data selection tool based on the measured values of CO concentrations. The CO concentrations of Rey monitoring station in the south of Tehran, from Jan. 2007 to Feb. 2011, have been used to test the effectiveness of this method. The hourly CO concentrations have been predicted using the SVM and the hybrid PLS-SVM models. Similarly, daily CO concentrations have been predicted based on the aforementioned four years measured data. Results demonstrated that both models have good prediction ability; however the hybrid PLS-SVM has better accuracy. In the analysis presented in this paper, statistic estimators including relative mean errors, root mean squared errors and the mean absolute relative error have been employed to compare performances of the models. It has been concluded that the errors decrease after size reduction and coefficients of determination increase from 56 to 81% for SVM model to 65-85% for hybrid PLS-SVM model respectively. Also it was found that the hybrid PLS-SVM model required lower computational time than SVM model as expected, hence supporting the more accurate and faster prediction ability of hybrid PLS-SVM model.
Lu, Wei-Zhen; Wang, Wen-Jian
2005-04-01
Monitoring and forecasting of air quality parameters are popular and important topics of atmospheric and environmental research today due to the health impact caused by exposing to air pollutants existing in urban air. The accurate models for air pollutant prediction are needed because such models would allow forecasting and diagnosing potential compliance or non-compliance in both short- and long-term aspects. Artificial neural networks (ANN) are regarded as reliable and cost-effective method to achieve such tasks and have produced some promising results to date. Although ANN has addressed more attentions to environmental researchers, its inherent drawbacks, e.g., local minima, over-fitting training, poor generalization performance, determination of the appropriate network architecture, etc., impede the practical application of ANN. Support vector machine (SVM), a novel type of learning machine based on statistical learning theory, can be used for regression and time series prediction and have been reported to perform well by some promising results. The work presented in this paper aims to examine the feasibility of applying SVM to predict air pollutant levels in advancing time series based on the monitored air pollutant database in Hong Kong downtown area. At the same time, the functional characteristics of SVM are investigated in the study. The experimental comparisons between the SVM model and the classical radial basis function (RBF) network demonstrate that the SVM is superior to the conventional RBF network in predicting air quality parameters with different time series and of better generalization performance than the RBF model.
A novel and efficient technique for identification and classification of GPCRs.
Gupta, Ravi; Mittal, Ankush; Singh, Kuldip
2008-07-01
G-protein coupled receptors (GPCRs) play a vital role in different biological processes, such as regulation of growth, death, and metabolism of cells. GPCRs are the focus of significant amount of current pharmaceutical research since they interact with more than 50% of prescription drugs. The dipeptide-based support vector machine (SVM) approach is the most accurate technique to identify and classify the GPCRs. However, this approach has two major disadvantages. First, the dimension of dipeptide-based feature vector is equal to 400. The large dimension makes the classification task computationally and memory wise inefficient. Second, it does not consider the biological properties of protein sequence for identification and classification of GPCRs. In this paper, we present a novel-feature-based SVM classification technique. The novel features are derived by applying wavelet-based time series analysis approach on protein sequences. The proposed feature space summarizes the variance information of seven important biological properties of amino acids in a protein sequence. In addition, the dimension of the feature vector for proposed technique is equal to 35. Experiments were performed on GPCRs protein sequences available at GPCRs Database. Our approach achieves an accuracy of 99.9%, 98.06%, 97.78%, and 94.08% for GPCR superfamily, families, subfamilies, and subsubfamilies (amine group), respectively, when evaluated using fivefold cross-validation. Further, an accuracy of 99.8%, 97.26%, and 97.84% was obtained when evaluated on unseen or recall datasets of GPCR superfamily, families, and subfamilies, respectively. Comparison with dipeptide-based SVM technique shows the effectiveness of our approach.
Ahmadi, Hamed; Rodehutscord, Markus
2017-01-01
In the nutrition literature, there are several reports on the use of artificial neural network (ANN) and multiple linear regression (MLR) approaches for predicting feed composition and nutritive value, while the use of support vector machines (SVM) method as a new alternative approach to MLR and ANN models is still not fully investigated. The MLR, ANN, and SVM models were developed to predict metabolizable energy (ME) content of compound feeds for pigs based on the German energy evaluation system from analyzed contents of crude protein (CP), ether extract (EE), crude fiber (CF), and starch. A total of 290 datasets from standardized digestibility studies with compound feeds was provided from several institutions and published papers, and ME was calculated thereon. Accuracy and precision of developed models were evaluated, given their produced prediction values. The results revealed that the developed ANN [ R 2 = 0.95; root mean square error (RMSE) = 0.19 MJ/kg of dry matter] and SVM ( R 2 = 0.95; RMSE = 0.21 MJ/kg of dry matter) models produced better prediction values in estimating ME in compound feed than those produced by conventional MLR ( R 2 = 0.89; RMSE = 0.27 MJ/kg of dry matter). The developed ANN and SVM models produced better prediction values in estimating ME in compound feed than those produced by conventional MLR; however, there were not obvious differences between performance of ANN and SVM models. Thus, SVM model may also be considered as a promising tool for modeling the relationship between chemical composition and ME of compound feeds for pigs. To provide the readers and nutritionist with the easy and rapid tool, an Excel ® calculator, namely, SVM_ME_pig, was created to predict the metabolizable energy values in compound feeds for pigs using developed support vector machine model.
NASA Astrophysics Data System (ADS)
Salehi, Hassan S.; Li, Hai; Merkulov, Alex; Kumavor, Patrick D.; Vavadi, Hamed; Sanders, Melinda; Kueck, Angela; Brewer, Molly A.; Zhu, Quing
2016-04-01
Most ovarian cancers are diagnosed at advanced stages due to the lack of efficacious screening techniques. Photoacoustic tomography (PAT) has a potential to image tumor angiogenesis and detect early neovascular changes of the ovary. We have developed a coregistered PAT and ultrasound (US) prototype system for real-time assessment of ovarian masses. Features extracted from PAT and US angular beams, envelopes, and images were input to a logistic classifier and a support vector machine (SVM) classifier to diagnose ovaries as benign or malignant. A total of 25 excised ovaries of 15 patients were studied and the logistic and SVM classifiers achieved sensitivities of 70.4 and 87.7%, and specificities of 95.6 and 97.9%, respectively. Furthermore, the ovaries of two patients were noninvasively imaged using the PAT/US system before surgical excision. By using five significant features and the logistic classifier, 12 out of 14 images (86% sensitivity) from a malignant ovarian mass and all 17 images (100% specificity) from a benign mass were accurately classified; the SVM correctly classified 10 out of 14 malignant images (71% sensitivity) and all 17 benign images (100% specificity). These initial results demonstrate the clinical potential of the PAT/US technique for ovarian cancer diagnosis.
SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.
SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier
Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru
2014-01-01
Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306
Automatic Cataract Hardness Classification Ex Vivo by Ultrasound Techniques.
Caixinha, Miguel; Santos, Mário; Santos, Jaime
2016-04-01
To demonstrate the feasibility of a new methodology for cataract hardness characterization and automatic classification using ultrasound techniques, different cataract degrees were induced in 210 porcine lenses. A 25-MHz ultrasound transducer was used to obtain acoustical parameters (velocity and attenuation) and backscattering signals. B-Scan and parametric Nakagami images were constructed. Ninety-seven parameters were extracted and subjected to a Principal Component Analysis. Bayes, K-Nearest-Neighbours, Fisher Linear Discriminant and Support Vector Machine (SVM) classifiers were used to automatically classify the different cataract severities. Statistically significant increases with cataract formation were found for velocity, attenuation, mean brightness intensity of the B-Scan images and mean Nakagami m parameter (p < 0.01). The four classifiers showed a good performance for healthy versus cataractous lenses (F-measure ≥ 92.68%), while for initial versus severe cataracts the SVM classifier showed the higher performance (90.62%). The results showed that ultrasound techniques can be used for non-invasive cataract hardness characterization and automatic classification. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Mahvash Mohammadi, Neda; Hezarkhani, Ardeshir
2018-07-01
Classification of mineralised zones is an important factor for the analysis of economic deposits. In this paper, the support vector machine (SVM), a supervised learning algorithm, based on subsurface data is proposed for classification of mineralised zones in the Takht-e-Gonbad porphyry Cu-deposit (SE Iran). The effects of the input features are evaluated via calculating the accuracy rates on the SVM performance. Ultimately, the SVM model, is developed based on input features namely lithology, alteration, mineralisation, the level and, radial basis function (RBF) as a kernel function. Moreover, the optimal amount of parameters λ and C, using n-fold cross-validation method, are calculated at level 0.001 and 0.01 respectively. The accuracy of this model is 0.931 for classification of mineralised zones in the Takht-e-Gonbad porphyry deposit. The results of the study confirm the efficiency of SVM method for classification the mineralised zones.
Prediction of Backbreak in Open-Pit Blasting Operations Using the Machine Learning Method
NASA Astrophysics Data System (ADS)
Khandelwal, Manoj; Monjezi, M.
2013-03-01
Backbreak is an undesirable phenomenon in blasting operations. It can cause instability of mine walls, falling down of machinery, improper fragmentation, reduced efficiency of drilling, etc. The existence of various effective parameters and their unknown relationships are the main reasons for inaccuracy of the empirical models. Presently, the application of new approaches such as artificial intelligence is highly recommended. In this paper, an attempt has been made to predict backbreak in blasting operations of Soungun iron mine, Iran, incorporating rock properties and blast design parameters using the support vector machine (SVM) method. To investigate the suitability of this approach, the predictions by SVM have been compared with multivariate regression analysis (MVRA). The coefficient of determination (CoD) and the mean absolute error (MAE) were taken as performance measures. It was found that the CoD between measured and predicted backbreak was 0.987 and 0.89 by SVM and MVRA, respectively, whereas the MAE was 0.29 and 1.07 by SVM and MVRA, respectively.
Zhan, Xiaobin; Jiang, Shulan; Yang, Yili; Liang, Jian; Shi, Tielin; Li, Xiwen
2015-09-18
This paper proposes an ultrasonic measurement system based on least squares support vector machines (LS-SVM) for inline measurement of particle concentrations in multicomponent suspensions. Firstly, the ultrasonic signals are analyzed and processed, and the optimal feature subset that contributes to the best model performance is selected based on the importance of features. Secondly, the LS-SVM model is tuned, trained and tested with different feature subsets to obtain the optimal model. In addition, a comparison is made between the partial least square (PLS) model and the LS-SVM model. Finally, the optimal LS-SVM model with the optimal feature subset is applied to inline measurement of particle concentrations in the mixing process. The results show that the proposed method is reliable and accurate for inline measuring the particle concentrations in multicomponent suspensions and the measurement accuracy is sufficiently high for industrial application. Furthermore, the proposed method is applicable to the modeling of the nonlinear system dynamically and provides a feasible way to monitor industrial processes.
Multiclass Reduced-Set Support Vector Machines
NASA Technical Reports Server (NTRS)
Tang, Benyang; Mazzoni, Dominic
2006-01-01
There are well-established methods for reducing the number of support vectors in a trained binary support vector machine, often with minimal impact on accuracy. We show how reduced-set methods can be applied to multiclass SVMs made up of several binary SVMs, with significantly better results than reducing each binary SVM independently. Our approach is based on Burges' approach that constructs each reduced-set vector as the pre-image of a vector in kernel space, but we extend this by recomputing the SVM weights and bias optimally using the original SVM objective function. This leads to greater accuracy for a binary reduced-set SVM, and also allows vectors to be 'shared' between multiple binary SVMs for greater multiclass accuracy with fewer reduced-set vectors. We also propose computing pre-images using differential evolution, which we have found to be more robust than gradient descent alone. We show experimental results on a variety of problems and find that this new approach is consistently better than previous multiclass reduced-set methods, sometimes with a dramatic difference.
NASA Astrophysics Data System (ADS)
Mohan, Dhanya; Kumar, C. Santhosh
2016-03-01
Predicting the physiological condition (normal/abnormal) of a patient is highly desirable to enhance the quality of health care. Multi-parameter patient monitors (MPMs) using heart rate, arterial blood pressure, respiration rate and oxygen saturation (S pO2) as input parameters were developed to monitor the condition of patients, with minimum human resource utilization. The Support vector machine (SVM), an advanced machine learning approach popularly used for classification and regression is used for the realization of MPMs. For making MPMs cost effective, we experiment on the hardware implementation of the MPM using support vector machine classifier. The training of the system is done using the matlab environment and the detection of the alarm/noalarm condition is implemented in hardware. We used different kernels for SVM classification and note that the best performance was obtained using intersection kernel SVM (IKSVM). The intersection kernel support vector machine classifier MPM has outperformed the best known MPM using radial basis function kernel by an absoute improvement of 2.74% in accuracy, 1.86% in sensitivity and 3.01% in specificity. The hardware model was developed based on the improved performance system using Verilog Hardware Description Language and was implemented on Altera cyclone-II development board.
Streamlining machine learning in mobile devices for remote sensing
NASA Astrophysics Data System (ADS)
Coronel, Andrei D.; Estuar, Ma. Regina E.; Garcia, Kyle Kristopher P.; Dela Cruz, Bon Lemuel T.; Torrijos, Jose Emmanuel; Lim, Hadrian Paulo M.; Abu, Patricia Angela R.; Victorino, John Noel C.
2017-09-01
Mobile devices have been at the forefront of Intelligent Farming because of its ubiquitous nature. Applications on precision farming have been developed on smartphones to allow small farms to monitor environmental parameters surrounding crops. Mobile devices are used for most of these applications, collecting data to be sent to the cloud for storage, analysis, modeling and visualization. However, with the issue of weak and intermittent connectivity in geographically challenged areas of the Philippines, the solution is to provide analysis on the phone itself. Given this, the farmer gets a real time response after data submission. Though Machine Learning is promising, hardware constraints in mobile devices limit the computational capabilities, making model development on the phone restricted and challenging. This study discusses the development of a Machine Learning based mobile application using OpenCV libraries. The objective is to enable the detection of Fusarium oxysporum cubense (Foc) in juvenile and asymptomatic bananas using images of plant parts and microscopic samples as input. Image datasets of attached, unattached, dorsal, and ventral views of leaves were acquired through sampling protocols. Images of raw and stained specimens from soil surrounding the plant, and sap from the plant resulted to stained and unstained samples respectively. Segmentation and feature extraction techniques were applied to all images. Initial findings show no significant differences among the different feature extraction techniques. For differentiating infected from non-infected leaves, KNN yields highest average accuracy, as opposed to Naive Bayes and SVM. For microscopic images using MSER feature extraction, KNN has been tested as having a better accuracy than SVM or Naive-Bayes.
Ruiz-Gonzalez, Ruben; Gomez-Gil, Jaime; Gomez-Gil, Francisco Javier; Martínez-Martínez, Víctor
2014-01-01
The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM)-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i) accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii) the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii) when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels. PMID:25372618
Ruiz-Gonzalez, Ruben; Gomez-Gil, Jaime; Gomez-Gil, Francisco Javier; Martínez-Martínez, Víctor
2014-11-03
The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM)-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i) accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii) the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii) when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels.
Intelligent agent-based intrusion detection system using enhanced multiclass SVM.
Ganapathy, S; Yogesh, P; Kannan, A
2012-01-01
Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set.
Canizo, Brenda V; Escudero, Leticia B; Pérez, María B; Pellerano, Roberto G; Wuilloud, Rodolfo G
2018-03-01
The feasibility of the application of chemometric techniques associated with multi-element analysis for the classification of grape seeds according to their provenance vineyard soil was investigated. Grape seed samples from different localities of Mendoza province (Argentina) were evaluated. Inductively coupled plasma mass spectrometry (ICP-MS) was used for the determination of twenty-nine elements (Ag, As, Ce, Co, Cs, Cu, Eu, Fe, Ga, Gd, La, Lu, Mn, Mo, Nb, Nd, Ni, Pr, Rb, Sm, Te, Ti, Tl, Tm, U, V, Y, Zn and Zr). Once the analytical data were collected, supervised pattern recognition techniques such as linear discriminant analysis (LDA), partial least square discriminant analysis (PLS-DA), k-nearest neighbors (k-NN), support vector machine (SVM) and Random Forest (RF) were applied to construct classification/discrimination rules. The results indicated that nonlinear methods, RF and SVM, perform best with up to 98% and 93% accuracy rate, respectively, and therefore are excellent tools for classification of grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Kong, Xianyu; Sun, Yuyan; Su, Rongguo; Shi, Xiaoyong
2017-06-15
The development of techniques for real-time monitoring of the eutrophication status of coastal waters is of great importance for realizing potential cost savings in coastal monitoring programs and providing timely advice for marine health management. In this study, a GS optimized SVM was proposed to model relationships between 6 easily measured parameters (DO, Chl-a, C1, C2, C3 and C4) and the TRIX index for rapidly assessing marine eutrophication states of coastal waters. The good predictive performance of the developed method was indicated by the R 2 between the measured and predicted values (0.92 for the training dataset and 0.91 for the validation dataset) at a 95% confidence level. The classification accuracy of the eutrophication status was 86.5% for the training dataset and 85.6% for the validation dataset. The results indicated that it is feasible to develop an SVM technique for timely evaluation of the eutrophication status by easily measured parameters. Copyright © 2017. Published by Elsevier Ltd.
Intelligent Agent-Based Intrusion Detection System Using Enhanced Multiclass SVM
Ganapathy, S.; Yogesh, P.; Kannan, A.
2012-01-01
Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set. PMID:23056036
NASA Astrophysics Data System (ADS)
Xin, Ni; Gu, Xiao-Feng; Wu, Hao; Hu, Yu-Zhu; Yang, Zhong-Lin
2012-04-01
Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.
Lamb wave based damage detection using Matching Pursuit and Support Vector Machine classifier
NASA Astrophysics Data System (ADS)
Agarwal, Sushant; Mitra, Mira
2014-03-01
In this paper, the suitability of using Matching Pursuit (MP) and Support Vector Machine (SVM) for damage detection using Lamb wave response of thin aluminium plate is explored. Lamb wave response of thin aluminium plate with or without damage is simulated using finite element. Simulations are carried out at different frequencies for various kinds of damage. The procedure is divided into two parts - signal processing and machine learning. Firstly, MP is used for denoising and to maintain the sparsity of the dataset. In this study, MP is extended by using a combination of time-frequency functions as the dictionary and is deployed in two stages. Selection of a particular type of atoms lead to extraction of important features while maintaining the sparsity of the waveform. The resultant waveform is then passed as input data for SVM classifier. SVM is used to detect the location of the potential damage from the reduced data. The study demonstrates that SVM is a robust classifier in presence of noise and more efficient as compared to Artificial Neural Network (ANN). Out-of-sample data is used for the validation of the trained and tested classifier. Trained classifiers are found successful in detection of the damage with more than 95% detection rate.
An improved conjugate gradient scheme to the solution of least squares SVM.
Chu, Wei; Ong, Chong Jin; Keerthi, S Sathiya
2005-03-01
The least square support vector machines (LS-SVM) formulation corresponds to the solution of a linear system of equations. Several approaches to its numerical solutions have been proposed in the literature. In this letter, we propose an improved method to the numerical solution of LS-SVM and show that the problem can be solved using one reduced system of linear equations. Compared with the existing algorithm for LS-SVM, the approach used in this letter is about twice as efficient. Numerical results using the proposed method are provided for comparisons with other existing algorithms.
Yao, Sen; Li, Tao; Liu, HongGao; Li, JieQing; Wang, YuanZhong
2018-04-01
Boletaceae mushrooms are wild-grown edible mushrooms that have high nutrition, delicious flavor and large economic value distributing in Yunnan Province, China. Traceability is important for the authentication and quality assessment of Boletaceae mushrooms. In this study, UV-visible and Fourier transform infrared (FTIR) spectroscopies were applied for traceability of 247 Boletaceae mushroom samples in combination with chemometrics. Compared with a single spectroscopy technique, data fusion strategy can obviously improve the classification performance in partial least square discriminant analysis (PLS-DA) and grid-search support vector machine (GS-SVM) models, for both species and geographical origin traceability. In addition, PLS-DA and GS-SVM models can provide 100.00% accuracy for species traceability and have reliable evaluation parameters. For geographical origin traceability, the accuracy of prediction in the PLS-DA model by data fusion was just 64.63%, but the GS-SVM model based on data fusion was 100.00%. The results demonstrated that the data fusion strategy of UV-visible and FTIR combined with GS-SVM could provide a higher synergic effect for traceability of Boletaceae mushrooms and have a good generalization ability for the comprehensive quality control and evaluation of similar foods. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Support Vector Machine Model for Automatic Detection and Classification of Seismic Events
NASA Astrophysics Data System (ADS)
Barros, Vesna; Barros, Lucas
2016-04-01
The automated processing of multiple seismic signals to detect, localize and classify seismic events is a central tool in both natural hazards monitoring and nuclear treaty verification. However, false detections and missed detections caused by station noise and incorrect classification of arrivals are still an issue and the events are often unclassified or poorly classified. Thus, machine learning techniques can be used in automatic processing for classifying the huge database of seismic recordings and provide more confidence in the final output. Applied in the context of the International Monitoring System (IMS) - a global sensor network developed for the Comprehensive Nuclear-Test-Ban Treaty (CTBT) - we propose a fully automatic method for seismic event detection and classification based on a supervised pattern recognition technique called the Support Vector Machine (SVM). According to Kortström et al., 2015, the advantages of using SVM are handleability of large number of features and effectiveness in high dimensional spaces. Our objective is to detect seismic events from one IMS seismic station located in an area of high seismicity and mining activity and classify them as earthquakes or quarry blasts. It is expected to create a flexible and easily adjustable SVM method that can be applied in different regions and datasets. Taken a step further, accurate results for seismic stations could lead to a modification of the model and its parameters to make it applicable to other waveform technologies used to monitor nuclear explosions such as infrasound and hydroacoustic waveforms. As an authorized user, we have direct access to all IMS data and bulletins through a secure signatory account. A set of significant seismic waveforms containing different types of events (e.g. earthquake, quarry blasts) and noise is being analysed to train the model and learn the typical pattern of the signal from these events. Moreover, comparing the performance of the support-vector network to various classical learning algorithms used before in seismic detection and classification is an essential final step to analyze the advantages and disadvantages of the model.
NASA Astrophysics Data System (ADS)
Laib dit Leksir, Y.; Mansour, M.; Moussaoui, A.
2018-03-01
Analysis and processing of databases obtained from infrared thermal inspections made on electrical installations require the development of new tools to obtain more information to visual inspections. Consequently, methods based on the capture of thermal images show a great potential and are increasingly employed in this field. However, there is a need for the development of effective techniques to analyse these databases in order to extract significant information relating to the state of the infrastructures. This paper presents a technique explaining how this approach can be implemented and proposes a system that can help to detect faults in thermal images of electrical installations. The proposed method classifies and identifies the region of interest (ROI). The identification is conducted using support vector machine (SVM) algorithm. The aim here is to capture the faults that exist in electrical equipments during an inspection of some machines using A40 FLIR camera. After that, binarization techniques are employed to select the region of interest. Later the comparative analysis of the obtained misclassification errors using the proposed method with Fuzzy c means and Ostu, has also be addressed.
Prediction of mortality after radical cystectomy for bladder cancer by machine learning techniques.
Wang, Guanjin; Lam, Kin-Man; Deng, Zhaohong; Choi, Kup-Sze
2015-08-01
Bladder cancer is a common cancer in genitourinary malignancy. For muscle invasive bladder cancer, surgical removal of the bladder, i.e. radical cystectomy, is in general the definitive treatment which, unfortunately, carries significant morbidities and mortalities. Accurate prediction of the mortality of radical cystectomy is therefore needed. Statistical methods have conventionally been used for this purpose, despite the complex interactions of high-dimensional medical data. Machine learning has emerged as a promising technique for handling high-dimensional data, with increasing application in clinical decision support, e.g. cancer prediction and prognosis. Its ability to reveal the hidden nonlinear interactions and interpretable rules between dependent and independent variables is favorable for constructing models of effective generalization performance. In this paper, seven machine learning methods are utilized to predict the 5-year mortality of radical cystectomy, including back-propagation neural network (BPN), radial basis function (RBFN), extreme learning machine (ELM), regularized ELM (RELM), support vector machine (SVM), naive Bayes (NB) classifier and k-nearest neighbour (KNN), on a clinicopathological dataset of 117 patients of the urology unit of a hospital in Hong Kong. The experimental results indicate that RELM achieved the highest average prediction accuracy of 0.8 at a fast learning speed. The research findings demonstrate the potential of applying machine learning techniques to support clinical decision making. Copyright © 2015 Elsevier Ltd. All rights reserved.
Unsupervised Anomaly Detection Based on Clustering and Multiple One-Class SVM
NASA Astrophysics Data System (ADS)
Song, Jungsuk; Takakura, Hiroki; Okabe, Yasuo; Kwon, Yongjin
Intrusion detection system (IDS) has played an important role as a device to defend our networks from cyber attacks. However, since it is unable to detect unknown attacks, i.e., 0-day attacks, the ultimate challenge in intrusion detection field is how we can exactly identify such an attack by an automated manner. Over the past few years, several studies on solving these problems have been made on anomaly detection using unsupervised learning techniques such as clustering, one-class support vector machine (SVM), etc. Although they enable one to construct intrusion detection models at low cost and effort, and have capability to detect unforeseen attacks, they still have mainly two problems in intrusion detection: a low detection rate and a high false positive rate. In this paper, we propose a new anomaly detection method based on clustering and multiple one-class SVM in order to improve the detection rate while maintaining a low false positive rate. We evaluated our method using KDD Cup 1999 data set. Evaluation results show that our approach outperforms the existing algorithms reported in the literature; especially in detection of unknown attacks.
Mourão-Miranda, Janaina; Hardoon, David R.; Hahn, Tim; Marquand, Andre F.; Williams, Steve C.R.; Shawe-Taylor, John; Brammer, Michael
2011-01-01
Pattern recognition approaches, such as the Support Vector Machine (SVM), have been successfully used to classify groups of individuals based on their patterns of brain activity or structure. However these approaches focus on finding group differences and are not applicable to situations where one is interested in accessing deviations from a specific class or population. In the present work we propose an application of the one-class SVM (OC-SVM) to investigate if patterns of fMRI response to sad facial expressions in depressed patients would be classified as outliers in relation to patterns of healthy control subjects. We defined features based on whole brain voxels and anatomical regions. In both cases we found a significant correlation between the OC-SVM predictions and the patients' Hamilton Rating Scale for Depression (HRSD), i.e. the more depressed the patients were the more of an outlier they were. In addition the OC-SVM split the patient groups into two subgroups whose membership was associated with future response to treatment. When applied to region-based features the OC-SVM classified 52% of patients as outliers. However among the patients classified as outliers 70% did not respond to treatment and among those classified as non-outliers 89% responded to treatment. In addition 89% of the healthy controls were classified as non-outliers. PMID:21723950
USDA-ARS?s Scientific Manuscript database
Support Vector Machine (SVM) was used in the Genetic Algorithms (GA) process to select and classify a subset of hyperspectral image bands. The method was applied to fluorescence hyperspectral data for the detection of aflatoxin contamination in Aspergillus flavus infected single corn kernels. In the...
Comparison of water extraction methods in Tibet based on GF-1 data
NASA Astrophysics Data System (ADS)
Jia, Lingjun; Shang, Kun; Liu, Jing; Sun, Zhongqing
2018-03-01
In this study, we compared four different water extraction methods with GF-1 data according to different water types in Tibet, including Support Vector Machine (SVM), Principal Component Analysis (PCA), Decision Tree Classifier based on False Normalized Difference Water Index (FNDWI-DTC), and PCA-SVM. The results show that all of the four methods can extract large area water body, but only SVM and PCA-SVM can obtain satisfying extraction results for small size water body. The methods were evaluated by both overall accuracy (OAA) and Kappa coefficient (KC). The OAA of PCA-SVM, SVM, FNDWI-DTC, PCA are 96.68%, 94.23%, 93.99%, 93.01%, and the KCs are 0.9308, 0.8995, 0.8962, 0.8842, respectively, in consistent with visual inspection. In summary, SVM is better for narrow rivers extraction and PCA-SVM is suitable for water extraction of various types. As for dark blue lakes, the methods using PCA can extract more quickly and accurately.
Sørensen, Lauge; Nielsen, Mads
2018-05-15
The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.
Detection of Genetically Modified Sugarcane by Using Terahertz Spectroscopy and Chemometrics
NASA Astrophysics Data System (ADS)
Liu, J.; Xie, H.; Zha, B.; Ding, W.; Luo, J.; Hu, C.
2018-03-01
A methodology is proposed to identify genetically modified sugarcane from non-genetically modified sugarcane by using terahertz spectroscopy and chemometrics techniques, including linear discriminant analysis (LDA), support vector machine-discriminant analysis (SVM-DA), and partial least squares-discriminant analysis (PLS-DA). The classification rate of the above mentioned methods is compared, and different types of preprocessing are considered. According to the experimental results, the best option is PLS-DA, with an identification rate of 98%. The results indicated that THz spectroscopy and chemometrics techniques are a powerful tool to identify genetically modified and non-genetically modified sugarcane.
Spatial-spectral blood cell classification with microscopic hyperspectral imagery
NASA Astrophysics Data System (ADS)
Ran, Qiong; Chang, Lan; Li, Wei; Xu, Xiaofeng
2017-10-01
Microscopic hyperspectral images provide a new way for blood cell examination. The hyperspectral imagery can greatly facilitate the classification of different blood cells. In this paper, the microscopic hyperspectral images are acquired by connecting the microscope and the hyperspectral imager, and then tested for blood cell classification. For combined use of the spectral and spatial information provided by hyperspectral images, a spatial-spectral classification method is improved from the classical extreme learning machine (ELM) by integrating spatial context into the image classification task with Markov random field (MRF) model. Comparisons are done among ELM, ELM-MRF, support vector machines(SVM) and SVMMRF methods. Results show the spatial-spectral classification methods(ELM-MRF, SVM-MRF) perform better than pixel-based methods(ELM, SVM), and the proposed ELM-MRF has higher precision and show more accurate location of cells.
Classification of fMRI resting-state maps using machine learning techniques: A comparative study
NASA Astrophysics Data System (ADS)
Gallos, Ioannis; Siettos, Constantinos
2017-11-01
We compare the efficiency of Principal Component Analysis (PCA) and nonlinear learning manifold algorithms (ISOMAP and Diffusion maps) for classifying brain maps between groups of schizophrenia patients and healthy from fMRI scans during a resting-state experiment. After a standard pre-processing pipeline, we applied spatial Independent component analysis (ICA) to reduce (a) noise and (b) spatial-temporal dimensionality of fMRI maps. On the cross-correlation matrix of the ICA components, we applied PCA, ISOMAP and Diffusion Maps to find an embedded low-dimensional space. Finally, support-vector-machines (SVM) and k-NN algorithms were used to evaluate the performance of the algorithms in classifying between the two groups.
Identification of eggs from different production systems based on hyperspectra and CS-SVM.
Sun, J; Cong, S L; Mao, H P; Zhou, X; Wu, X H; Zhang, X D
2017-06-01
1. To identify the origin of table eggs more accurately, a method based on hyperspectral imaging technology was studied. 2. The hyperspectral data of 200 samples of intensive and extensive eggs were collected. Standard normalised variables combined with a Savitzky-Golay were used to eliminate noise, then stepwise regression (SWR) was used for feature selection. Grid search algorithm (GS), genetic search algorithm (GA), particle swarm optimisation algorithm (PSO) and cuckoo search algorithm (CS) were applied by support vector machine (SVM) methods to establish an SVM identification model with the optimal parameters. The full spectrum data and the data after feature selection were the input of the model, while egg category was the output. 3. The SWR-CS-SVM model performed better than the other models, including SWR-GS-SVM, SWR-GA-SVM, SWR-PSO-SVM and others based on full spectral data. The training and test classification accuracy of the SWR-CS-SVM model were respectively 99.3% and 96%. 4. SWR-CS-SVM proved effective for identifying egg varieties and could also be useful for the non-destructive identification of other types of egg.
Xu, Jingting; Hu, Hong; Dai, Yang
The identification of enhancers is a challenging task. Various types of epigenetic information including histone modification have been utilized in the construction of enhancer prediction models based on a diverse panel of machine learning schemes. However, DNA methylation profiles generated from the whole genome bisulfite sequencing (WGBS) have not been fully explored for their potential in enhancer prediction despite the fact that low methylated regions (LMRs) have been implied to be distal active regulatory regions. In this work, we propose a prediction framework, LMethyR-SVM, using LMRs identified from cell-type-specific WGBS DNA methylation profiles and a weighted support vector machine learning framework. In LMethyR-SVM, the set of cell-type-specific LMRs is further divided into three sets: reliable positive, like positive and likely negative, according to their resemblance to a small set of experimentally validated enhancers in the VISTA database based on an estimated non-parametric density distribution. Then, the prediction model is obtained by solving a weighted support vector machine. We demonstrate the performance of LMethyR-SVM by using the WGBS DNA methylation profiles derived from the human embryonic stem cell type (H1) and the fetal lung fibroblast cell type (IMR90). The predicted enhancers are highly conserved with a reasonable validation rate based on a set of commonly used positive markers including transcription factors, p300 binding and DNase-I hypersensitive sites. In addition, we show evidence that the large fraction of the LMethyR-SVM predicted enhancers are not predicted by ChromHMM in H1 cell type and they are more enriched for the FANTOM5 enhancers. Our work suggests that low methylated regions detected from the WGBS data are useful as complementary resources to histone modification marks in developing models for the prediction of cell-type-specific enhancers.
Lu, Xinjiang; Liu, Wenbo; Zhou, Chuang; Huang, Minghui
2017-06-13
The least-squares support vector machine (LS-SVM) is a popular data-driven modeling method and has been successfully applied to a wide range of applications. However, it has some disadvantages, including being ineffective at handling non-Gaussian noise as well as being sensitive to outliers. In this paper, a robust LS-SVM method is proposed and is shown to have more reliable performance when modeling a nonlinear system under conditions where Gaussian or non-Gaussian noise is present. The construction of a new objective function allows for a reduction of the mean of the modeling error as well as the minimization of its variance, and it does not constrain the mean of the modeling error to zero. This differs from the traditional LS-SVM, which uses a worst-case scenario approach in order to minimize the modeling error and constrains the mean of the modeling error to zero. In doing so, the proposed method takes the modeling error distribution information into consideration and is thus less conservative and more robust in regards to random noise. A solving method is then developed in order to determine the optimal parameters for the proposed robust LS-SVM. An additional analysis indicates that the proposed LS-SVM gives a smaller weight to a large-error training sample and a larger weight to a small-error training sample, and is thus more robust than the traditional LS-SVM. The effectiveness of the proposed robust LS-SVM is demonstrated using both artificial and real life cases.
Zhang, Shanxin; Zhou, Zhiping; Chen, Xinmeng; Hu, Yong; Yang, Lindong
2017-08-07
DNase I hypersensitive sites (DHSs) are accessible chromatin regions hypersensitive to cleavages by DNase I endonucleases. DHSs are indicative of cis-regulatory DNA elements (CREs), all of which play important roles in global gene expression regulation. It is helpful for discovering CREs by recognition of DHSs in genome. To accelerate the investigation, it is an important complement to develop cost-effective computational methods to identify DHSs. However, there is a lack of tools used for identifying DHSs in plant genome. Here we presented pDHS-SVM, a computational predictor to identify plant DHSs. To integrate the global sequence-order information and local DNA properties, reverse complement kmer and dinucleotide-based auto covariance of DNA sequences were applied to construct the feature space. In this work, fifteen physical-chemical properties of dinucleotides were used and Support Vector Machine (SVM) was employed. To further improve the performance of the predictor and extract an optimized subset of nucleotide physical-chemical properties positive for the DHSs, a heuristic nucleotide physical-chemical property selection algorithm was introduced. With the optimized subset of properties, experimental results of Arabidopsis thaliana and rice (Oryza sativa) showed that pDHS-SVM could achieve accuracies up to 87.00%, and 85.79%, respectively. The results indicated the effectiveness of proposed method for predicting DHSs. Furthermore, pDHS-SVM could provide a helpful complement for predicting CREs in plant genome. Our implementation of the novel proposed method pDHS-SVM is freely available as source code, at https://github.com/shanxinzhang/pDHS-SVM. Copyright © 2017 Elsevier Ltd. All rights reserved.
Probabilistic topic modeling for the analysis and classification of genomic sequences
2015-01-01
Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734
Balabin, Roman M; Lomakina, Ekaterina I
2011-06-28
A multilayer feed-forward artificial neural network (MLP-ANN) with a single, hidden layer that contains a finite number of neurons can be regarded as a universal non-linear approximator. Today, the ANN method and linear regression (MLR) model are widely used for quantum chemistry (QC) data analysis (e.g., thermochemistry) to improve their accuracy (e.g., Gaussian G2-G4, B3LYP/B3-LYP, X1, or W1 theoretical methods). In this study, an alternative approach based on support vector machines (SVMs) is used, the least squares support vector machine (LS-SVM) regression. It has been applied to ab initio (first principle) and density functional theory (DFT) quantum chemistry data. So, QC + SVM methodology is an alternative to QC + ANN one. The task of the study was to estimate the Møller-Plesset (MPn) or DFT (B3LYP, BLYP, BMK) energies calculated with large basis sets (e.g., 6-311G(3df,3pd)) using smaller ones (6-311G, 6-311G*, 6-311G**) plus molecular descriptors. A molecular set (BRM-208) containing a total of 208 organic molecules was constructed and used for the LS-SVM training, cross-validation, and testing. MP2, MP3, MP4(DQ), MP4(SDQ), and MP4/MP4(SDTQ) ab initio methods were tested. Hartree-Fock (HF/SCF) results were also reported for comparison. Furthermore, constitutional (CD: total number of atoms and mole fractions of different atoms) and quantum-chemical (QD: HOMO-LUMO gap, dipole moment, average polarizability, and quadrupole moment) molecular descriptors were used for the building of the LS-SVM calibration model. Prediction accuracies (MADs) of 1.62 ± 0.51 and 0.85 ± 0.24 kcal mol(-1) (1 kcal mol(-1) = 4.184 kJ mol(-1)) were reached for SVM-based approximations of ab initio and DFT energies, respectively. The LS-SVM model was more accurate than the MLR model. A comparison with the artificial neural network approach shows that the accuracy of the LS-SVM method is similar to the accuracy of ANN. The extrapolation and interpolation results show that LS-SVM is superior by almost an order of magnitude over the ANN method in terms of the stability, generality, and robustness of the final model. The LS-SVM model needs a much smaller numbers of samples (a much smaller sample set) to make accurate prediction results. Potential energy surface (PES) approximations for molecular dynamics (MD) studies are discussed as a promising application for the LS-SVM calibration approach. This journal is © the Owner Societies 2011
Orrù, Graziella; Pettersson-Yeo, William; Marquand, Andre F; Sartori, Giuseppe; Mechelli, Andrea
2012-04-01
Standard univariate analysis of neuroimaging data has revealed a host of neuroanatomical and functional differences between healthy individuals and patients suffering a wide range of neurological and psychiatric disorders. Significant only at group level however these findings have had limited clinical translation, and recent attention has turned toward alternative forms of analysis, including Support-Vector-Machine (SVM). A type of machine learning, SVM allows categorisation of an individual's previously unseen data into a predefined group using a classification algorithm, developed on a training data set. In recent years, SVM has been successfully applied in the context of disease diagnosis, transition prediction and treatment prognosis, using both structural and functional neuroimaging data. Here we provide a brief overview of the method and review those studies that applied it to the investigation of Alzheimer's disease, schizophrenia, major depression, bipolar disorder, presymptomatic Huntington's disease, Parkinson's disease and autistic spectrum disorder. We conclude by discussing the main theoretical and practical challenges associated with the implementation of this method into the clinic and possible future directions. Copyright © 2012 Elsevier Ltd. All rights reserved.
Du, Tianchuan; Liao, Li; Wu, Cathy H; Sun, Bilin
2016-11-01
Protein-protein interactions play essential roles in many biological processes. Acquiring knowledge of the residue-residue contact information of two interacting proteins is not only helpful in annotating functions for proteins, but also critical for structure-based drug design. The prediction of the protein residue-residue contact matrix of the interfacial regions is challenging. In this work, we introduced deep learning techniques (specifically, stacked autoencoders) to build deep neural network models to tackled the residue-residue contact prediction problem. In tandem with interaction profile Hidden Markov Models, which was used first to extract Fisher score features from protein sequences, stacked autoencoders were deployed to extract and learn hidden abstract features. The deep learning model showed significant improvement over the traditional machine learning model, Support Vector Machines (SVM), with the overall accuracy increased by 15% from 65.40% to 80.82%. We showed that the stacked autoencoders could extract novel features, which can be utilized by deep neural networks and other classifiers to enhance learning, out of the Fisher score features. It is further shown that deep neural networks have significant advantages over SVM in making use of the newly extracted features. Copyright © 2016. Published by Elsevier Inc.
Sparse kernel methods for high-dimensional survival data.
Evers, Ludger; Messow, Claudia-Martina
2008-07-15
Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.
Object recognition of ladar with support vector machine
NASA Astrophysics Data System (ADS)
Sun, Jian-Feng; Li, Qi; Wang, Qi
2005-01-01
Intensity, range and Doppler images can be obtained by using laser radar. Laser radar can detect much more object information than other detecting sensor, such as passive infrared imaging and synthetic aperture radar (SAR), so it is well suited as the sensor of object recognition. Traditional method of laser radar object recognition is extracting target features, which can be influenced by noise. In this paper, a laser radar recognition method-Support Vector Machine is introduced. Support Vector Machine (SVM) is a new hotspot of recognition research after neural network. It has well performance on digital written and face recognition. Two series experiments about SVM designed for preprocessing and non-preprocessing samples are performed by real laser radar images, and the experiments results are compared.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.
Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptidemore » identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample« less
NASA Astrophysics Data System (ADS)
Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher
2012-10-01
Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.
Lei, Tailong; Sun, Huiyong; Kang, Yu; Zhu, Feng; Liu, Hui; Zhou, Wenfang; Wang, Zhe; Li, Dan; Li, Youyong; Hou, Tingjun
2017-11-06
Xenobiotic chemicals and their metabolites are mainly excreted out of our bodies by the urinary tract through the urine. Chemical-induced urinary tract toxicity is one of the main reasons that cause failure during drug development, and it is a common adverse event for medications, natural supplements, and environmental chemicals. Despite its importance, there are only a few in silico models for assessing urinary tract toxicity for a large number of compounds with diverse chemical structures. Here, we developed a series of qualitative and quantitative structure-activity relationship (QSAR) models for predicting urinary tract toxicity. In our study, the recursive feature elimination method incorporated with random forests (RFE-RF) was used for dimension reduction, and then eight machine learning approaches were used for QSAR modeling, i.e., relevance vector machine (RVM), support vector machine (SVM), regularized random forest (RRF), C5.0 trees, eXtreme gradient boosting (XGBoost), AdaBoost.M1, SVM boosting (SVMBoost), and RVM boosting (RVMBoost). For building classification models, the synthetic minority oversampling technique was used to handle the imbalance data set problem. Among all the machine learning approaches, SVMBoost based on the RBF kernel achieves both the best quantitative (q ext 2 = 0.845) and qualitative predictions for the test set (MCC of 0.787, AUC of 0.893, sensitivity of 89.6%, specificity of 94.1%, and global accuracy of 90.8%). The application domains were then analyzed, and all of the tested chemicals fall within the application domain coverage. We also examined the structure features of the chemicals with large prediction errors. In brief, both the regression and classification models developed by the SVMBoost approach have reliable prediction capability for assessing chemical-induced urinary tract toxicity.
Schnyer, David M; Clasen, Peter C; Gonzalez, Christopher; Beevers, Christopher G
2017-06-30
Using MRI to diagnose mental disorders has been a long-term goal. Despite this, the vast majority of prior neuroimaging work has been descriptive rather than predictive. The current study applies support vector machine (SVM) learning to MRI measures of brain white matter to classify adults with Major Depressive Disorder (MDD) and healthy controls. In a precisely matched group of individuals with MDD (n =25) and healthy controls (n =25), SVM learning accurately (74%) classified patients and controls across a brain map of white matter fractional anisotropy values (FA). The study revealed three main findings: 1) SVM applied to DTI derived FA maps can accurately classify MDD vs. healthy controls; 2) prediction is strongest when only right hemisphere white matter is examined; and 3) removing FA values from a region identified by univariate contrast as significantly different between MDD and healthy controls does not change the SVM accuracy. These results indicate that SVM learning applied to neuroimaging data can classify the presence versus absence of MDD and that predictive information is distributed across brain networks rather than being highly localized. Finally, MDD group differences revealed through typical univariate contrasts do not necessarily reveal patterns that provide accurate predictive information. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Predicting metabolic syndrome using decision tree and support vector machine methods.
Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh
2016-05-01
Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in predicting metabolic syndrome. According to this study, in cases where only the final result of the decision is regarded significant, SVM method can be used with acceptable accuracy in decision making medical issues. This method has not been implemented in the previous research.
Jiang, Rou; You, Rui; Pei, Xiao-Qing; Zou, Xiong; Zhang, Meng-Xia; Wang, Tong-Min; Sun, Rui; Luo, Dong-Hua; Huang, Pei-Yu; Chen, Qiu-Yan; Hua, Yi-Jun; Tang, Lin-Quan; Guo, Ling; Mo, Hao-Yuan; Qian, Chao-Nan; Mai, Hai-Qiang; Hong, Ming-Huang; Cai, Hong-Min; Chen, Ming-Yuan
2016-01-19
The aim of this study was to develop a prognostic classifier and subdivided the M1 stage for nasopharyngeal carcinoma patients with synchronous metastases (mNPC). A retrospective cohort of 347 mNPC patients was recruited between January 2000 and December 2010. Thirty hematological markers and 11 clinical characteristics were collected, and the association of these factors with overall survival (OS) was evaluated. Advanced machine learning schemes of a support vector machine (SVM) were used to select a subset of highly informative factors and to construct a prognostic model (mNPC-SVM). The mNPC-SVM classifier identified ten informative variables, including three clinical indexes and seven hematological markers. The median survival time for low-risk patients (M1a) as identified by the mNPC-SVM classifier was 38.0 months, and survival time was dramatically reduced to 13.8 months for high-risk patients (M1b) (P < 0.001). Multivariate adjustment using prognostic factors revealed that the mNPC-SVM classifier remained a powerful predictor of OS (M1a vs. M1b, hazard ratio, 3.45; 95% CI, 2.59 to 4.60, P < 0.001). Moreover, combination treatment of systemic chemotherapy and loco-regional radiotherapy was associated with significantly better survival outcomes than chemotherapy alone (the 5-year OS, 47.0% vs. 10.0%, P < 0.001) in the M1a subgroup but not in the M1b subgroup (12.0% vs. 3.0%, P = 0.101). These findings were validated by a separate cohort. In conclusion, the newly developed mNPC-SVM classifier led to more precise risk definitions that offer a promising subdivision of the M1 stage and individualized selection for future therapeutic regimens in mNPC patients.
An SVM model with hybrid kernels for hydrological time series
NASA Astrophysics Data System (ADS)
Wang, C.; Wang, H.; Zhao, X.; Xie, Q.
2017-12-01
Support Vector Machine (SVM) models have been widely applied to the forecast of climate/weather and its impact on other environmental variables such as hydrologic response to climate/weather. When using SVM, the choice of the kernel function plays the key role. Conventional SVM models mostly use one single type of kernel function, e.g., radial basis kernel function. Provided that there are several featured kernel functions available, each having its own advantages and drawbacks, a combination of these kernel functions may give more flexibility and robustness to SVM approach, making it suitable for a wide range of application scenarios. This paper presents such a linear combination of radial basis kernel and polynomial kernel for the forecast of monthly flowrate in two gaging stations using SVM approach. The results indicate significant improvement in the accuracy of predicted series compared to the approach with either individual kernel function, thus demonstrating the feasibility and advantages of such hybrid kernel approach for SVM applications.
NASA Astrophysics Data System (ADS)
Yadav, Basant; Ch, Sudheer; Mathur, Shashi; Adamowski, Jan
2016-12-01
In-situ bioremediation is the most common groundwater remediation procedure used for treating organically contaminated sites. A simulation-optimization approach, which incorporates a simulation model for groundwaterflow and transport processes within an optimization program, could help engineers in designing a remediation system that best satisfies management objectives as well as regulatory constraints. In-situ bioremediation is a highly complex, non-linear process and the modelling of such a complex system requires significant computational exertion. Soft computing techniques have a flexible mathematical structure which can generalize complex nonlinear processes. In in-situ bioremediation management, a physically-based model is used for the simulation and the simulated data is utilized by the optimization model to optimize the remediation cost. The recalling of simulator to satisfy the constraints is an extremely tedious and time consuming process and thus there is need for a simulator which can reduce the computational burden. This study presents a simulation-optimization approach to achieve an accurate and cost effective in-situ bioremediation system design for groundwater contaminated with BTEX (Benzene, Toluene, Ethylbenzene, and Xylenes) compounds. In this study, the Extreme Learning Machine (ELM) is used as a proxy simulator to replace BIOPLUME III for the simulation. The selection of ELM is done by a comparative analysis with Artificial Neural Network (ANN) and Support Vector Machine (SVM) as they were successfully used in previous studies of in-situ bioremediation system design. Further, a single-objective optimization problem is solved by a coupled Extreme Learning Machine (ELM)-Particle Swarm Optimization (PSO) technique to achieve the minimum cost for the in-situ bioremediation system design. The results indicate that ELM is a faster and more accurate proxy simulator than ANN and SVM. The total cost obtained by the ELM-PSO approach is held to a minimum while successfully satisfying all the regulatory constraints of the contaminated site.
NASA Astrophysics Data System (ADS)
Rienow, A.; Menz, G.
2015-12-01
Since the beginning of the millennium, artificial intelligence techniques as cellular automata (CA) and multi-agent systems (MAS) have been incorporated into land-system simulations to address the complex challenges of transitions in urban areas as open, dynamic systems. The study presents a hybrid modeling approach for modeling the two antagonistic processes of urban sprawl and urban decline at once. The simulation power of support vector machines (SVM), cellular automata (CA) and multi-agent systems (MAS) are integrated into one modeling framework and applied to the largest agglomeration of Central Europe: the Ruhr. A modified version of SLEUTH (short for Slope, Land-use, Exclusion, Urban, Transport, and Hillshade) functions as the CA component. SLEUTH makes use of historic urban land-use data sets and growth coefficients for the purpose of modeling physical urban expansion. The machine learning algorithm of SVM is applied in order to enhance SLEUTH. Thus, the stochastic variability of the CA is reduced and information about the human and ecological forces driving the local suitability of urban sprawl is incorporated. Subsequently, the supported CA is coupled with the MAS ReHoSh (Residential Mobility and the Housing Market of Shrinking City Systems). The MAS models population patterns, housing prices, and housing demand in shrinking regions based on interactions between household and city agents. Semi-explicit urban weights are introduced as a possibility of modeling from and to the pixel simultaneously. Three scenarios of changing housing preferences reveal the urban development of the region in terms of quantity and location. They reflect the dissemination of sustainable thinking among stakeholders versus the steady dream of owning a house in sub- and exurban areas. Additionally, the outcomes are transferred into a digital petri dish reflecting a synthetic environment with perfect conditions of growth. Hence, the generic growth elements affecting the future face of post-industrial cities are revealed. Finally, the advantages and limitations of linking pixels and people by combining AI and machine learning techniques in a multi-scale geosimulation approach are to be discussed.
Process service quality evaluation based on Dempster-Shafer theory and support vector machine.
Pei, Feng-Que; Li, Dong-Bo; Tong, Yi-Fei; He, Fei
2017-01-01
Human involvement influences traditional service quality evaluations, which triggers an evaluation's low accuracy, poor reliability and less impressive predictability. This paper proposes a method by employing a support vector machine (SVM) and Dempster-Shafer evidence theory to evaluate the service quality of a production process by handling a high number of input features with a low sampling data set, which is called SVMs-DS. Features that can affect production quality are extracted by a large number of sensors. Preprocessing steps such as feature simplification and normalization are reduced. Based on three individual SVM models, the basic probability assignments (BPAs) are constructed, which can help the evaluation in a qualitative and quantitative way. The process service quality evaluation results are validated by the Dempster rules; the decision threshold to resolve conflicting results is generated from three SVM models. A case study is presented to demonstrate the effectiveness of the SVMs-DS method.
Cao, Hongliang; Xin, Ya; Yuan, Qiaoxia
2016-02-01
To predict conveniently the biochar yield from cattle manure pyrolysis, intelligent modeling approach was introduced in this research. A traditional artificial neural networks (ANN) model and a novel least squares support vector machine (LS-SVM) model were developed. For the identification and prediction evaluation of the models, a data set with 33 experimental data was used, which were obtained using a laboratory-scale fixed bed reaction system. The results demonstrated that the intelligent modeling approach is greatly convenient and effective for the prediction of the biochar yield. In particular, the novel LS-SVM model has a more satisfying predicting performance and its robustness is better than the traditional ANN model. The introduction and application of the LS-SVM modeling method gives a successful example, which is a good reference for the modeling study of cattle manure pyrolysis process, even other similar processes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Han, Bucong; Ma, Xiaohua; Zhao, Ruiying; Zhang, Jingxian; Wei, Xiaona; Liu, Xianghui; Liu, Xin; Zhang, Cunlong; Tan, Chunyan; Jiang, Yuyang; Chen, Yuzong
2012-11-23
Src plays various roles in tumour progression, invasion, metastasis, angiogenesis and survival. It is one of the multiple targets of multi-target kinase inhibitors in clinical uses and trials for the treatment of leukemia and other cancers. These successes and appearances of drug resistance in some patients have raised significant interest and efforts in discovering new Src inhibitors. Various in-silico methods have been used in some of these efforts. It is desirable to explore additional in-silico methods, particularly those capable of searching large compound libraries at high yields and reduced false-hit rates. We evaluated support vector machines (SVM) as virtual screening tools for searching Src inhibitors from large compound libraries. SVM trained and tested by 1,703 inhibitors and 63,318 putative non-inhibitors correctly identified 93.53%~ 95.01% inhibitors and 99.81%~ 99.90% non-inhibitors in 5-fold cross validation studies. SVM trained by 1,703 inhibitors reported before 2011 and 63,318 putative non-inhibitors correctly identified 70.45% of the 44 inhibitors reported since 2011, and predicted as inhibitors 44,843 (0.33%) of 13.56M PubChem, 1,496 (0.89%) of 168 K MDDR, and 719 (7.73%) of 9,305 MDDR compounds similar to the known inhibitors. SVM showed comparable yield and reduced false hit rates in searching large compound libraries compared to the similarity-based and other machine-learning VS methods developed from the same set of training compounds and molecular descriptors. We tested three virtual hits of the same novel scaffold from in-house chemical libraries not reported as Src inhibitor, one of which showed moderate activity. SVM may be potentially explored for searching Src inhibitors from large compound libraries at low false-hit rates.
Predicting distant failure in early stage NSCLC treated with SBRT using clinical parameters.
Zhou, Zhiguo; Folkert, Michael; Cannon, Nathan; Iyengar, Puneeth; Westover, Kenneth; Zhang, Yuanyuan; Choy, Hak; Timmerman, Robert; Yan, Jingsheng; Xie, Xian-J; Jiang, Steve; Wang, Jing
2016-06-01
The aim of this study is to predict early distant failure in early stage non-small cell lung cancer (NSCLC) treated with stereotactic body radiation therapy (SBRT) using clinical parameters by machine learning algorithms. The dataset used in this work includes 81 early stage NSCLC patients with at least 6months of follow-up who underwent SBRT between 2006 and 2012 at a single institution. The clinical parameters (n=18) for each patient include demographic parameters, tumor characteristics, treatment fraction schemes, and pretreatment medications. Three predictive models were constructed based on different machine learning algorithms: (1) artificial neural network (ANN), (2) logistic regression (LR) and (3) support vector machine (SVM). Furthermore, to select an optimal clinical parameter set for the model construction, three strategies were adopted: (1) clonal selection algorithm (CSA) based selection strategy; (2) sequential forward selection (SFS) method; and (3) statistical analysis (SA) based strategy. 5-cross-validation is used to validate the performance of each predictive model. The accuracy was assessed by area under the receiver operating characteristic (ROC) curve (AUC), sensitivity and specificity of the system was also evaluated. The AUCs for ANN, LR and SVM were 0.75, 0.73, and 0.80, respectively. The sensitivity values for ANN, LR and SVM were 71.2%, 72.9% and 83.1%, while the specificity values for ANN, LR and SVM were 59.1%, 63.6% and 63.6%, respectively. Meanwhile, the CSA based strategy outperformed SFS and SA in terms of AUC, sensitivity and specificity. Based on clinical parameters, the SVM with the CSA optimal parameter set selection strategy achieves better performance than other strategies for predicting distant failure in lung SBRT patients. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Guo, Canyong; Luo, Xuefang; Zhou, Xiaohua; Shi, Beijia; Wang, Juanjuan; Zhao, Jinqi; Zhang, Xiaoxia
2017-06-05
Vibrational spectroscopic techniques such as infrared, near-infrared and Raman spectroscopy have become popular in detecting and quantifying polymorphism of pharmaceutics since they are fast and non-destructive. This study assessed the ability of three vibrational spectroscopy combined with multivariate analysis to quantify a low-content undesired polymorph within a binary polymorphic mixture. Partial least squares (PLS) regression and support vector machine (SVM) regression were employed to build quantitative models. Fusidic acid, a steroidal antibiotic, was used as the model compound. It was found that PLS regression performed slightly better than SVM regression in all the three spectroscopic techniques. Root mean square errors of prediction (RMSEP) were ranging from 0.48% to 1.17% for diffuse reflectance FTIR spectroscopy and 1.60-1.93% for diffuse reflectance FT-NIR spectroscopy and 1.62-2.31% for Raman spectroscopy. The results indicate that diffuse reflectance FTIR spectroscopy offers significant advantages in providing accurate measurement of polymorphic content in the fusidic acid binary mixtures, while Raman spectroscopy is the least accurate technique for quantitative analysis of polymorphs. Copyright © 2017 Elsevier B.V. All rights reserved.
Hummel, A D; Maciel, R F; Sousa, F S; Cohrs, F M; Falcão, A E J; Teixeira, F; Baptista, R; Mancini, F; da Costa, T M; Alves, D; Rodrigues, R G D S; Miranda, R; Pisa, I T
2011-05-01
The gold standard for nephrotoxicity and acute cellular rejection (ACR) is a biopsy, an invasive and expensive procedure. More efficient strategies to screen patients for biopsy are important from the clinical and financial points of view. The aim of this study was to evaluate various artificial intelligence techniques to screen for the need for a biopsy among patients suspected of nephrotoxicity or ACR during the first year after renal transplantation. We used classifiers like artificial neural networks (ANN), support vector machines (SVM), and Bayesian inference (BI) to indicate if the clinical course of the event suggestive of the need for a biopsy. Each classifier was evaluated by values of sensitivity and area under the ROC curve (AUC) for each of the classifiers. The technique that showed the best sensitivity value as an indicator for biopsy was SVM with an AUC of 0.79 and an accuracy rate of 79.86%. The results were better than those described in previous works. The accuracy for an indication of biopsy screening was efficient enough to become useful in clinical practice. Copyright © 2011 Elsevier Inc. All rights reserved.
Verma, Ruchi; Varshney, Grish C; Raghava, G P S
2010-06-01
The rate of human death due to malaria is increasing day-by-day. Thus the malaria causing parasite Plasmodium falciparum (PF) remains the cause of concern. With the wealth of data now available, it is imperative to understand protein localization in order to gain deeper insight into their functional roles. In this manuscript, an attempt has been made to develop prediction method for the localization of mitochondrial proteins. In this study, we describe a method for predicting mitochondrial proteins of malaria parasite using machine-learning technique. All models were trained and tested on 175 proteins (40 mitochondrial and 135 non-mitochondrial proteins) and evaluated using five-fold cross validation. We developed a Support Vector Machine (SVM) model for predicting mitochondrial proteins of P. falciparum, using amino acids and dipeptides composition and achieved maximum MCC 0.38 and 0.51, respectively. In this study, split amino acid composition (SAAC) is used where composition of N-termini, C-termini, and rest of protein is computed separately. The performance of SVM model improved significantly from MCC 0.38 to 0.73 when SAAC instead of simple amino acid composition was used as input. In addition, SVM model has been developed using composition of PSSM profile with MCC 0.75 and accuracy 91.38%. We achieved maximum MCC 0.81 with accuracy 92% using a hybrid model, which combines PSSM profile and SAAC. When evaluated on an independent dataset our method performs better than existing methods. A web server PFMpred has been developed for predicting mitochondrial proteins of malaria parasites ( http://www.imtech.res.in/raghava/pfmpred/).
Accuracy of dementia diagnosis: a direct comparison between radiologists and a computerized method.
Klöppel, Stefan; Stonnington, Cynthia M; Barnes, Josephine; Chen, Frederick; Chu, Carlton; Good, Catriona D; Mader, Irina; Mitchell, L Anne; Patel, Ameet C; Roberts, Catherine C; Fox, Nick C; Jack, Clifford R; Ashburner, John; Frackowiak, Richard S J
2008-11-01
There has been recent interest in the application of machine learning techniques to neuroimaging-based diagnosis. These methods promise fully automated, standard PC-based clinical decisions, unbiased by variable radiological expertise. We recently used support vector machines (SVMs) to separate sporadic Alzheimer's disease from normal ageing and from fronto-temporal lobar degeneration (FTLD). In this study, we compare the results to those obtained by radiologists. A binary diagnostic classification was made by six radiologists with different levels of experience on the same scans and information that had been previously analysed with SVM. SVMs correctly classified 95% (sensitivity/specificity: 95/95) of sporadic Alzheimer's disease and controls into their respective groups. Radiologists correctly classified 65-95% (median 89%; sensitivity/specificity: 88/90) of scans. SVM correctly classified another set of sporadic Alzheimer's disease in 93% (sensitivity/specificity: 100/86) of cases, whereas radiologists ranged between 80% and 90% (median 83%; sensitivity/specificity: 80/85). SVMs were better at separating patients with sporadic Alzheimer's disease from those with FTLD (SVM 89%; sensitivity/specificity: 83/95; compared to radiological range from 63% to 83%; median 71%; sensitivity/specificity: 64/76). Radiologists were always accurate when they reported a high degree of diagnostic confidence. The results show that well-trained neuroradiologists classify typical Alzheimer's disease-associated scans comparable to SVMs. However, SVMs require no expert knowledge and trained SVMs can readily be exchanged between centres for use in diagnostic classification. These results are encouraging and indicate a role for computerized diagnostic methods in clinical practice.
Accuracy of dementia diagnosis—a direct comparison between radiologists and a computerized method
Stonnington, Cynthia M.; Barnes, Josephine; Chen, Frederick; Chu, Carlton; Good, Catriona D.; Mader, Irina; Mitchell, L. Anne; Patel, Ameet C.; Roberts, Catherine C.; Fox, Nick C.; Jack, Clifford R.; Ashburner, John; Frackowiak, Richard S. J.
2008-01-01
There has been recent interest in the application of machine learning techniques to neuroimaging-based diagnosis. These methods promise fully automated, standard PC-based clinical decisions, unbiased by variable radiological expertise. We recently used support vector machines (SVMs) to separate sporadic Alzheimer's disease from normal ageing and from fronto-temporal lobar degeneration (FTLD). In this study, we compare the results to those obtained by radiologists. A binary diagnostic classification was made by six radiologists with different levels of experience on the same scans and information that had been previously analysed with SVM. SVMs correctly classified 95% (sensitivity/specificity: 95/95) of sporadic Alzheimer's disease and controls into their respective groups. Radiologists correctly classified 65–95% (median 89%; sensitivity/specificity: 88/90) of scans. SVM correctly classified another set of sporadic Alzheimer's disease in 93% (sensitivity/specificity: 100/86) of cases, whereas radiologists ranged between 80% and 90% (median 83%; sensitivity/specificity: 80/85). SVMs were better at separating patients with sporadic Alzheimer's disease from those with FTLD (SVM 89%; sensitivity/specificity: 83/95; compared to radiological range from 63% to 83%; median 71%; sensitivity/specificity: 64/76). Radiologists were always accurate when they reported a high degree of diagnostic confidence. The results show that well-trained neuroradiologists classify typical Alzheimer's disease-associated scans comparable to SVMs. However, SVMs require no expert knowledge and trained SVMs can readily be exchanged between centres for use in diagnostic classification. These results are encouraging and indicate a role for computerized diagnostic methods in clinical practice. PMID:18835868
Gabere, Musa Nur; Hussein, Mohamed Aly; Aziz, Mohammad Azhar
2016-01-01
Purpose There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. Methods In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). Results The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. Conclusion This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. PMID:27330311
Application of Machine Learning Approaches for Protein-protein Interactions Prediction.
Zhang, Mengying; Su, Qiang; Lu, Yi; Zhao, Manman; Niu, Bing
2017-01-01
Proteomics endeavors to study the structures, functions and interactions of proteins. Information of the protein-protein interactions (PPIs) helps to improve our knowledge of the functions and the 3D structures of proteins. Thus determining the PPIs is essential for the study of the proteomics. In this review, in order to study the application of machine learning in predicting PPI, some machine learning approaches such as support vector machine (SVM), artificial neural networks (ANNs) and random forest (RF) were selected, and the examples of its applications in PPIs were listed. SVM and RF are two commonly used methods. Nowadays, more researchers predict PPIs by combining more than two methods. This review presents the application of machine learning approaches in predicting PPI. Many examples of success in identification and prediction in the area of PPI prediction have been discussed, and the PPIs research is still in progress. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Automated Classification and Removal of EEG Artifacts With SVM and Wavelet-ICA.
Sai, Chong Yeh; Mokhtar, Norrima; Arof, Hamzah; Cumming, Paul; Iwahashi, Masahiro
2018-05-01
Brain electrical activity recordings by electroencephalography (EEG) are often contaminated with signal artifacts. Procedures for automated removal of EEG artifacts are frequently sought for clinical diagnostics and brain-computer interface applications. In recent years, a combination of independent component analysis (ICA) and discrete wavelet transform has been introduced as standard technique for EEG artifact removal. However, in performing the wavelet-ICA procedure, visual inspection or arbitrary thresholding may be required for identifying artifactual components in the EEG signal. We now propose a novel approach for identifying artifactual components separated by wavelet-ICA using a pretrained support vector machine (SVM). Our method presents a robust and extendable system that enables fully automated identification and removal of artifacts from EEG signals, without applying any arbitrary thresholding. Using test data contaminated by eye blink artifacts, we show that our method performed better in identifying artifactual components than did existing thresholding methods. Furthermore, wavelet-ICA in conjunction with SVM successfully removed target artifacts, while largely retaining the EEG source signals of interest. We propose a set of features including kurtosis, variance, Shannon's entropy, and range of amplitude as training and test data of SVM to identify eye blink artifacts in EEG signals. This combinatorial method is also extendable to accommodate multiple types of artifacts present in multichannel EEG. We envision future research to explore other descriptive features corresponding to other types of artifactual components.
Uncertainty Management for Diagnostics and Prognostics of Batteries using Bayesian Techniques
NASA Technical Reports Server (NTRS)
Saha, Bhaskar; Goebel, kai
2007-01-01
Uncertainty management has always been the key hurdle faced by diagnostics and prognostics algorithms. A Bayesian treatment of this problem provides an elegant and theoretically sound approach to the modern Condition- Based Maintenance (CBM)/Prognostic Health Management (PHM) paradigm. The application of the Bayesian techniques to regression and classification in the form of Relevance Vector Machine (RVM), and to state estimation as in Particle Filters (PF), provides a powerful tool to integrate the diagnosis and prognosis of battery health. The RVM, which is a Bayesian treatment of the Support Vector Machine (SVM), is used for model identification, while the PF framework uses the learnt model, statistical estimates of noise and anticipated operational conditions to provide estimates of remaining useful life (RUL) in the form of a probability density function (PDF). This type of prognostics generates a significant value addition to the management of any operation involving electrical systems.
Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines
del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J.; Raboso, Mariano
2015-01-01
Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation—based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking—to reduce the dimensions of images—and binarization—to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements. PMID:26091392
Acoustic Biometric System Based on Preprocessing Techniques and Linear Support Vector Machines.
del Val, Lara; Izquierdo-Fuente, Alberto; Villacorta, Juan J; Raboso, Mariano
2015-06-17
Drawing on the results of an acoustic biometric system based on a MSE classifier, a new biometric system has been implemented. This new system preprocesses acoustic images, extracts several parameters and finally classifies them, based on Support Vector Machine (SVM). The preprocessing techniques used are spatial filtering, segmentation-based on a Gaussian Mixture Model (GMM) to separate the person from the background, masking-to reduce the dimensions of images-and binarization-to reduce the size of each image. An analysis of classification error and a study of the sensitivity of the error versus the computational burden of each implemented algorithm are presented. This allows the selection of the most relevant algorithms, according to the benefits required by the system. A significant improvement of the biometric system has been achieved by reducing the classification error, the computational burden and the storage requirements.
Rhombic micro-displacement amplifier for piezoelectric actuator and its linear and hybrid model
NASA Astrophysics Data System (ADS)
Chen, Jinglong; Zhang, Chunlin; Xu, Minglong; Zi, Yanyang; Zhang, Xinong
2015-01-01
This paper proposes rhombic micro-displacement amplifier (RMDA) for piezoelectric actuator (PA). First, the geometric amplification relations are analyzed and linear model is built to analyze the mechanical and electrical properties of this amplifier. Next, the accurate modeling method of amplifier is studied for important application of precise servo control. The classical Preisach model (CPM) is generally implemented using a numerical technique based on the first-order reversal curves (FORCs). The accuracy of CPM mainly depends on the number of FORCs. However, it is generally difficult to achieve enough number of FORCs in practice. So, Support Vector Machine (SVM) is employed in the work to circumvent the deficiency of the CPM. Then the hybrid model, which is based on discrete CPM and SVM is developed to account for hysteresis and dynamic effects. Finally, experimental validation is carried out. The analyzed result shows that this amplifier with the hybrid model is suitable for control application.
USDA-ARS?s Scientific Manuscript database
It is important to find an appropriate pattern-recognition method for in-field plant identification based on spectral measurement in order to classify the crop and weeds accurately. In this study, the method of Support Vector Machine (SVM) was evaluated and compared with two other methods, Decision ...
Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K
2011-10-01
To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods including only regression or both regression and ranking constraints on clinical data. On high dimensional data, the former model performs better. However, this approach does not have a theoretical link with standard statistical models for survival data. This link can be made by means of transformation models when ranking constraints are included. Copyright © 2011 Elsevier B.V. All rights reserved.
Paiva, Joana S; Cardoso, João; Pereira, Tânia
2018-01-01
The main goal of this study was to develop an automatic method based on supervised learning methods, able to distinguish healthy from pathologic arterial pulse wave (APW), and those two from noisy waveforms (non-relevant segments of the signal), from the data acquired during a clinical examination with a novel optical system. The APW dataset analysed was composed by signals acquired in a clinical environment from a total of 213 subjects, including healthy volunteers and non-healthy patients. The signals were parameterised by means of 39pulse features: morphologic, time domain statistics, cross-correlation features, wavelet features. Multiclass Support Vector Machine Recursive Feature Elimination (SVM RFE) method was used to select the most relevant features. A comparative study was performed in order to evaluate the performance of the two classifiers: Support Vector Machine (SVM) and Artificial Neural Network (ANN). SVM achieved a statistically significant better performance for this problem with an average accuracy of 0.9917±0.0024 and a F-Measure of 0.9925±0.0019, in comparison with ANN, which reached the values of 0.9847±0.0032 and 0.9852±0.0031 for Accuracy and F-Measure, respectively. A significant difference was observed between the performances obtained with SVM classifier using a different number of features from the original set available. The comparison between SVM and NN allowed reassert the higher performance of SVM. The results obtained in this study showed the potential of the proposed method to differentiate those three important signal outcomes (healthy, pathologic and noise) and to reduce bias associated with clinical diagnosis of cardiovascular disease using APW. Copyright © 2017 Elsevier B.V. All rights reserved.
Xiao, Chuncai; Hao, Kuangrong; Ding, Yongsheng
2014-12-30
This paper creates a bi-directional prediction model to predict the performance of carbon fiber and the productive parameters based on a support vector machine (SVM) and improved particle swarm optimization (IPSO) algorithm (SVM-IPSO). In the SVM, it is crucial to select the parameters that have an important impact on the performance of prediction. The IPSO is proposed to optimize them, and then the SVM-IPSO model is applied to the bi-directional prediction of carbon fiber production. The predictive accuracy of SVM is mainly dependent on its parameters, and IPSO is thus exploited to seek the optimal parameters for SVM in order to improve its prediction capability. Inspired by a cell communication mechanism, we propose IPSO by incorporating information of the global best solution into the search strategy to improve exploitation, and we employ IPSO to establish the bi-directional prediction model: in the direction of the forward prediction, we consider productive parameters as input and property indexes as output; in the direction of the backward prediction, we consider property indexes as input and productive parameters as output, and in this case, the model becomes a scheme design for novel style carbon fibers. The results from a set of the experimental data show that the proposed model can outperform the radial basis function neural network (RNN), the basic particle swarm optimization (PSO) method and the hybrid approach of genetic algorithm and improved particle swarm optimization (GA-IPSO) method in most of the experiments. In other words, simulation results demonstrate the effectiveness and advantages of the SVM-IPSO model in dealing with the problem of forecasting.
NASA Astrophysics Data System (ADS)
Cubillas, J. E.; Japitana, M.
2016-06-01
This study demonstrates the application of CIELAB, Color intensity, and One Dimensional Scalar Constancy as features for image recognition and classifying benthic habitats in an image with the coastal areas of Hinatuan, Surigao Del Sur, Philippines as the study area. The study area is composed of four datasets, namely: (a) Blk66L005, (b) Blk66L021, (c) Blk66L024, and (d) Blk66L0114. SVM optimization was performed in Matlab® software with the help of Parallel Computing Toolbox to hasten the SVM computing speed. The image used for collecting samples for SVM procedure was Blk66L0114 in which a total of 134,516 sample objects of mangrove, possible coral existence with rocks, sand, sea, fish pens and sea grasses were collected and processed. The collected samples were then used as training sets for the supervised learning algorithm and for the creation of class definitions. The learned hyper-planes separating one class from another in the multi-dimensional feature space can be thought of as a super feature which will then be used in developing the C (classifier) rule set in eCognition® software. The classification results of the sampling site yielded an accuracy of 98.85% which confirms the reliability of remote sensing techniques and analysis employed to orthophotos like the CIELAB, Color Intensity and One dimensional scalar constancy and the use of SVM classification algorithm in classifying benthic habitats.
Automated classification of neurological disorders of gait using spatio-temporal gait parameters.
Pradhan, Cauchy; Wuehr, Max; Akrami, Farhoud; Neuhaeusser, Maximilian; Huth, Sabrina; Brandt, Thomas; Jahn, Klaus; Schniepp, Roman
2015-04-01
Automated pattern recognition systems have been used for accurate identification of neurological conditions as well as the evaluation of the treatment outcomes. This study aims to determine the accuracy of diagnoses of (oto-)neurological gait disorders using different types of automated pattern recognition techniques. Clinically confirmed cases of phobic postural vertigo (N = 30), cerebellar ataxia (N = 30), progressive supranuclear palsy (N = 30), bilateral vestibulopathy (N = 30), as well as healthy subjects (N = 30) were recruited for the study. 8 measurements with 136 variables using a GAITRite(®) sensor carpet were obtained from each subject. Subjects were randomly divided into two groups (training cases and validation cases). Sensitivity and specificity of k-nearest neighbor (KNN), naive-bayes classifier (NB), artificial neural network (ANN), and support vector machine (SVM) in classifying the validation cases were calculated. ANN and SVM had the highest overall sensitivity with 90.6% and 92.0% respectively, followed by NB (76.0%) and KNN (73.3%). SVM and ANN showed high false negative rates for bilateral vestibulopathy cases (20.0% and 26.0%); while KNN and NB had high false negative rates for progressive supranuclear palsy cases (76.7% and 40.0%). Automated pattern recognition systems are able to identify pathological gait patterns and establish clinical diagnosis with good accuracy. SVM and ANN in particular differentiate gait patterns of several distinct oto-neurological disorders of gait with high sensitivity and specificity compared to KNN and NB. Both SVM and ANN appear to be a reliable diagnostic and management tool for disorders of gait. Copyright © 2015 Elsevier Ltd. All rights reserved.
Parkinson's disease detection based on dysphonia measurements
NASA Astrophysics Data System (ADS)
Lahmiri, Salim
2017-04-01
Assessing dysphonic symptoms is a noninvasive and effective approach to detect Parkinson's disease (PD) in patients. The main purpose of this study is to investigate the effect of different dysphonia measurements on PD detection by support vector machine (SVM). Seven categories of dysphonia measurements are considered. Experimental results from ten-fold cross-validation technique demonstrate that vocal fundamental frequency statistics yield the highest accuracy of 88 % ± 0.04. When all dysphonia measurements are employed, the SVM classifier achieves 94 % ± 0.03 accuracy. A refinement of the original patterns space by removing dysphonia measurements with similar variation across healthy and PD subjects allows achieving 97.03 % ± 0.03 accuracy. The latter performance is larger than what is reported in the literature on the same dataset with ten-fold cross-validation technique. Finally, it was found that measures of ratio of noise to tonal components in the voice are the most suitable dysphonic symptoms to detect PD subjects as they achieve 99.64 % ± 0.01 specificity. This finding is highly promising for understanding PD symptoms.
Computer-based detection of diabetes retinopathy stages using digital fundus images.
Acharya, U R; Lim, C M; Ng, E Y K; Chee, C; Tamura, T
2009-07-01
Diabetes mellitus is a heterogeneous clinical syndrome characterized by hyperglycaemia and the long-term complications are retinopathy, neuropathy, nephropathy, and cardiomyopathy. It is a leading cause of blindness. Diabetic retinopathy is the progressive pathological alterations in the retinal microvasculature, leading to areas of retinal nonperfusion, increased vascular permeability, and the pathological proliferation of retinal vessels. Hence, it is beneficial to have regular cost-effective eye screening for diabetes subjects. Nowadays, different stages of diabetes retinopathy are detected by retinal examination using indirect biomicroscopy by senior ophthalmologists. In this work, morphological image processing and support vector machine (SVM) techniques were used for the automatic diagnosis of eye health. In this study, 331 fundus images were analysed. Five groups were identified: normal retina, mild non-proliferative diabetic retinopathy, moderate non-proliferative diabetic retinopathy, severe non-proliferative diabetic retinopathy, and proliferative diabetic retinopathy. Four salient features blood vessels, microaneurysms, exudates, and haemorrhages were extracted from the raw images using image-processing techniques and fed to the SVM for classification. A sensitivity of more than 82 per cent and specificity of 86 per cent was demonstrated for the system developed.
Al-Qazzaz, Noor Kamal; Ali, Sawal; Ahmad, Siti Anom; Escudero, Javier
2017-07-01
The aim of the present study was to discriminate the electroencephalogram (EEG) of 5 patients with vascular dementia (VaD), 15 patients with stroke-related mild cognitive impairment (MCI), and 15 control normal subjects during a working memory (WM) task. We used independent component analysis (ICA) and wavelet transform (WT) as a hybrid preprocessing approach for EEG artifact removal. Three different features were extracted from the cleaned EEG signals: spectral entropy (SpecEn), permutation entropy (PerEn) and Tsallis entropy (TsEn). Two classification schemes were applied - support vector machine (SVM) and k-nearest neighbors (kNN) - with fuzzy neighborhood preserving analysis with QR-decomposition (FNPAQR) as a dimensionality reduction technique. The FNPAQR dimensionality reduction technique increased the SVM classification accuracy from 82.22% to 90.37% and from 82.6% to 86.67% for kNN. These results suggest that FNPAQR consistently improves the discrimination of VaD, MCI patients and control normal subjects and it could be a useful feature selection to help the identification of patients with VaD and MCI.
Terahertz spectroscopic investigation of human gastric normal and tumor tissues
NASA Astrophysics Data System (ADS)
Hou, Dibo; Li, Xian; Cai, Jinhui; Ma, Yehao; Kang, Xusheng; Huang, Pingjie; Zhang, Guangxin
2014-09-01
Human dehydrated normal and cancerous gastric tissues were measured using transmission time-domain terahertz spectroscopy. Based on the obtained terahertz absorption spectra, the contrasts between the two kinds of tissue were investigated and techniques for automatic identification of cancerous tissue were studied. Distinctive differences were demonstrated in both the shape and amplitude of the absorption spectra between normal and tumor tissue. Additionally, some spectral features in the range of 0.2~0.5 THz and 1~1.5 THz were revealed for all cancerous gastric tissues. To systematically achieve the identification of gastric cancer, principal component analysis combined with t-test was used to extract valuable information indicating the best distinction between the two types. Two clustering approaches, K-means and support vector machine (SVM), were then performed to classify the processed terahertz data into normal and cancerous groups. SVM presented a satisfactory result with less false classification cases. The results of this study implicate the potential of the terahertz technique to detect gastric cancer. The applied data analysis methodology provides a suggestion for automatic discrimination of terahertz spectra in other applications.
Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA
Ma, Xiaoqi
2015-01-01
A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. PMID:26543867
Hasan, Mehedi; Kotov, Alexander; Carcone, April; Dong, Ming; Naar, Sylvie; Hartlieb, Kathryn Brogan
2016-08-01
This study examines the effectiveness of state-of-the-art supervised machine learning methods in conjunction with different feature types for the task of automatic annotation of fragments of clinical text based on codebooks with a large number of categories. We used a collection of motivational interview transcripts consisting of 11,353 utterances, which were manually annotated by two human coders as the gold standard, and experimented with state-of-art classifiers, including Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), Random Forest (RF), AdaBoost, DiscLDA, Conditional Random Fields (CRF) and Convolutional Neural Network (CNN) in conjunction with lexical, contextual (label of the previous utterance) and semantic (distribution of words in the utterance across the Linguistic Inquiry and Word Count dictionaries) features. We found out that, when the number of classes is large, the performance of CNN and CRF is inferior to SVM. When only lexical features were used, interview transcripts were automatically annotated by SVM with the highest classification accuracy among all classifiers of 70.8%, 61% and 53.7% based on the codebooks consisting of 17, 20 and 41 codes, respectively. Using contextual and semantic features, as well as their combination, in addition to lexical ones, improved the accuracy of SVM for annotation of utterances in motivational interview transcripts with a codebook consisting of 17 classes to 71.5%, 74.2%, and 75.1%, respectively. Our results demonstrate the potential of using machine learning methods in conjunction with lexical, semantic and contextual features for automatic annotation of clinical interview transcripts with near-human accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
Support Vector Machine-Based Endmember Extraction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Filippi, Anthony M; Archibald, Richard K
Introduced in this paper is the utilization of Support Vector Machines (SVMs) to automatically perform endmember extraction from hyperspectral data. The strengths of SVM are exploited to provide a fast and accurate calculated representation of high-dimensional data sets that may consist of multiple distributions. Once this representation is computed, the number of distributions can be determined without prior knowledge. For each distribution, an optimal transform can be determined that preserves informational content while reducing the data dimensionality, and hence, the computational cost. Finally, endmember extraction for the whole data set is accomplished. Results indicate that this Support Vector Machine-Based Endmembermore » Extraction (SVM-BEE) algorithm has the capability of autonomously determining endmembers from multiple clusters with computational speed and accuracy, while maintaining a robust tolerance to noise.« less
NASA Astrophysics Data System (ADS)
Khawaja, Taimoor Saleem
A high-belief low-overhead Prognostics and Health Management (PHM) system is desired for online real-time monitoring of complex non-linear systems operating in a complex (possibly non-Gaussian) noise environment. This thesis presents a Bayesian Least Squares Support Vector Machine (LS-SVM) based framework for fault diagnosis and failure prognosis in nonlinear non-Gaussian systems. The methodology assumes the availability of real-time process measurements, definition of a set of fault indicators and the existence of empirical knowledge (or historical data) to characterize both nominal and abnormal operating conditions. An efficient yet powerful Least Squares Support Vector Machine (LS-SVM) algorithm, set within a Bayesian Inference framework, not only allows for the development of real-time algorithms for diagnosis and prognosis but also provides a solid theoretical framework to address key concepts related to classification for diagnosis and regression modeling for prognosis. SVM machines are founded on the principle of Structural Risk Minimization (SRM) which tends to find a good trade-off between low empirical risk and small capacity. The key features in SVM are the use of non-linear kernels, the absence of local minima, the sparseness of the solution and the capacity control obtained by optimizing the margin. The Bayesian Inference framework linked with LS-SVMs allows a probabilistic interpretation of the results for diagnosis and prognosis. Additional levels of inference provide the much coveted features of adaptability and tunability of the modeling parameters. The two main modules considered in this research are fault diagnosis and failure prognosis. With the goal of designing an efficient and reliable fault diagnosis scheme, a novel Anomaly Detector is suggested based on the LS-SVM machines. The proposed scheme uses only baseline data to construct a 1-class LS-SVM machine which, when presented with online data is able to distinguish between normal behavior and any abnormal or novel data during real-time operation. The results of the scheme are interpreted as a posterior probability of health (1 - probability of fault). As shown through two case studies in Chapter 3, the scheme is well suited for diagnosing imminent faults in dynamical non-linear systems. Finally, the failure prognosis scheme is based on an incremental weighted Bayesian LS-SVR machine. It is particularly suited for online deployment given the incremental nature of the algorithm and the quick optimization problem solved in the LS-SVR algorithm. By way of kernelization and a Gaussian Mixture Modeling (GMM) scheme, the algorithm can estimate "possibly" non-Gaussian posterior distributions for complex non-linear systems. An efficient regression scheme associated with the more rigorous core algorithm allows for long-term predictions, fault growth estimation with confidence bounds and remaining useful life (RUL) estimation after a fault is detected. The leading contributions of this thesis are (a) the development of a novel Bayesian Anomaly Detector for efficient and reliable Fault Detection and Identification (FDI) based on Least Squares Support Vector Machines, (b) the development of a data-driven real-time architecture for long-term Failure Prognosis using Least Squares Support Vector Machines, (c) Uncertainty representation and management using Bayesian Inference for posterior distribution estimation and hyper-parameter tuning, and finally (d) the statistical characterization of the performance of diagnosis and prognosis algorithms in order to relate the efficiency and reliability of the proposed schemes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cardenas, C; The University of Texas Graduate School of Biomedical Sciences, Houston, TX; Wong, A
Purpose: To develop and test population-based machine learning algorithms for delineating high-dose clinical target volumes (CTVs) in H&N tumors. Automating and standardizing the contouring of CTVs can reduce both physician contouring time and inter-physician variability, which is one of the largest sources of uncertainty in H&N radiotherapy. Methods: Twenty-five node-negative patients treated with definitive radiotherapy were selected (6 right base of tongue, 11 left and 9 right tonsil). All patients had GTV and CTVs manually contoured by an experienced radiation oncologist prior to treatment. This contouring process, which is driven by anatomical, pathological, and patient specific information, typically results inmore » non-uniform margin expansions about the GTV. Therefore, we tested two methods to delineate high-dose CTV given a manually-contoured GTV: (1) regression-support vector machines(SVM) and (2) classification-SVM. These models were trained and tested on each patient group using leave-one-out cross-validation. The volume difference(VD) and Dice similarity coefficient(DSC) between the manual and auto-contoured CTV were calculated to evaluate the results. Distances from GTV-to-CTV were computed about each patient’s GTV and these distances, in addition to distances from GTV to surrounding anatomy in the expansion direction, were utilized in the regression-SVM method. The classification-SVM method used categorical voxel-information (GTV, selected anatomical structures, else) from a 3×3×3cm3 ROI centered about the voxel to classify voxels as CTV. Results: Volumes for the auto-contoured CTVs ranged from 17.1 to 149.1cc and 17.4 to 151.9cc; the average(range) VD between manual and auto-contoured CTV were 0.93 (0.48–1.59) and 1.16(0.48–1.97); while average(range) DSC values were 0.75(0.59–0.88) and 0.74(0.59–0.81) for the regression-SVM and classification-SVM methods, respectively. Conclusion: We developed two novel machine learning methods to delineate high-dose CTV for H&N patients. Both methods showed promising results that hint to a solution to the standardization of the contouring process of clinical target volumes. Varian Medical Systems grant.« less
Machine learning algorithms to classify spinal muscular atrophy subtypes.
Srivastava, Tuhin; Darras, Basil T; Wu, Jim S; Rutkove, Seward B
2012-07-24
The development of better biomarkers for disease assessment remains an ongoing effort across the spectrum of neurologic illnesses. One approach for refining biomarkers is based on the concept of machine learning, in which individual, unrelated biomarkers are simultaneously evaluated. In this cross-sectional study, we assess the possibility of using machine learning, incorporating both quantitative muscle ultrasound (QMU) and electrical impedance myography (EIM) data, for classification of muscles affected by spinal muscular atrophy (SMA). Twenty-one normal subjects, 15 subjects with SMA type 2, and 10 subjects with SMA type 3 underwent EIM and QMU measurements of unilateral biceps, wrist extensors, quadriceps, and tibialis anterior. EIM and QMU parameters were then applied in combination using a support vector machine (SVM), a type of machine learning, in an attempt to accurately categorize 165 individual muscles. For all 3 classification problems, normal vs SMA, normal vs SMA 3, and SMA 2 vs SMA 3, use of SVM provided the greatest accuracy in discrimination, surpassing both EIM and QMU individually. For example, the accuracy, as measured by the receiver operating characteristic area under the curve (ROC-AUC) for the SVM discriminating SMA 2 muscles from SMA 3 muscles was 0.928; in comparison, the ROC-AUCs for EIM and QMU parameters alone were only 0.877 (p < 0.05) and 0.627 (p < 0.05), respectively. Combining EIM and QMU data categorizes individual SMA-affected muscles with very high accuracy. Further investigation of this approach for classifying and for following the progression of neuromuscular illness is warranted.
Motor Oil Classification using Color Histograms and Pattern Recognition Techniques.
Ahmadi, Shiva; Mani-Varnosfaderani, Ahmad; Habibi, Biuck
2018-04-20
Motor oil classification is important for quality control and the identification of oil adulteration. In thiswork, we propose a simple, rapid, inexpensive and nondestructive approach based on image analysis and pattern recognition techniques for the classification of nine different types of motor oils according to their corresponding color histograms. For this, we applied color histogram in different color spaces such as red green blue (RGB), grayscale, and hue saturation intensity (HSI) in order to extract features that can help with the classification procedure. These color histograms and their combinations were used as input for model development and then were statistically evaluated by using linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and support vector machine (SVM) techniques. Here, two common solutions for solving a multiclass classification problem were applied: (1) transformation to binary classification problem using a one-against-all (OAA) approach and (2) extension from binary classifiers to a single globally optimized multilabel classification model. In the OAA strategy, LDA, QDA, and SVM reached up to 97% in terms of accuracy, sensitivity, and specificity for both the training and test sets. In extension from binary case, despite good performances by the SVM classification model, QDA and LDA provided better results up to 92% for RGB-grayscale-HSI color histograms and up to 93% for the HSI color map, respectively. In order to reduce the numbers of independent variables for modeling, a principle component analysis algorithm was used. Our results suggest that the proposed method is promising for the identification and classification of different types of motor oils.
Use of data mining techniques to determine and predict length of stay of cardiac patients.
Hachesu, Peyman Rezaei; Ahmadi, Maryam; Alizadeh, Somayyeh; Sadoughi, Farahnaz
2013-06-01
Predicting the length of stay (LOS) of patients in a hospital is important in providing them with better services and higher satisfaction, as well as helping the hospital management plan and managing hospital resources as meticulously as possible. We propose applying data mining techniques to extract useful knowledge and draw an accurate model to predict the LOS of heart patients. Data were collected from patients with coronary artery disease (CAD). The patient records of 4,948 patients who had suffered CAD were included in the analysis. The techniques used are classification with three algorithms, namely, decision tree, support vector machines (SVM), and artificial neural network (ANN). LOS is the target variable, and 36 input variables are used for prediction. A confusion matrix was obtained to calculate sensitivity, specificity, and accuracy. The overall accuracy of SVM was 96.4% in the training set. Most single patients (64.3%) had an LOS ≤5 days, whereas 41.2% of married patients had an LOS >10 days. Moreover, the study showed that comorbidity states, such as lung disorders and hemorrhage with drug consumption have an impact on long LOS. The presence of comorbidities, an ejection fraction <2, being a current smoker, and having social security type insurance in coronary artery patients led to longer LOS than other subjects. All three algorithms are able to predict LOS with various degrees of accuracy. The findings demonstrated that the SVM was the best fit. There was a significant tendency for LOS to be longer in patients with lung or respiratory disorders and high blood pressure.
Feature generation using genetic programming with application to fault classification.
Guo, Hong; Jack, Lindsay B; Nandi, Asoke K
2005-02-01
One of the major challenges in pattern recognition problems is the feature extraction process which derives new features from existing features, or directly from raw data in order to reduce the cost of computation during the classification process, while improving classifier efficiency. Most current feature extraction techniques transform the original pattern vector into a new vector with increased discrimination capability but lower dimensionality. This is conducted within a predefined feature space, and thus, has limited searching power. Genetic programming (GP) can generate new features from the original dataset without prior knowledge of the probabilistic distribution. In this paper, a GP-based approach is developed for feature extraction from raw vibration data recorded from a rotating machine with six different conditions. The created features are then used as the inputs to a neural classifier for the identification of six bearing conditions. Experimental results demonstrate the ability of GP to discover autimatically the different bearing conditions using features expressed in the form of nonlinear functions. Furthermore, four sets of results--using GP extracted features with artificial neural networks (ANN) and support vector machines (SVM), as well as traditional features with ANN and SVM--have been obtained. This GP-based approach is used for bearing fault classification for the first time and exhibits superior searching power over other techniques. Additionaly, it significantly reduces the time for computation compared with genetic algorithm (GA), therefore, makes a more practical realization of the solution.
Combined data mining/NIR spectroscopy for purity assessment of lime juice
NASA Astrophysics Data System (ADS)
Shafiee, Sahameh; Minaei, Saeid
2018-06-01
This paper reports the data mining study on the NIR spectrum of lime juice samples to determine their purity (natural or synthetic). NIR spectra for 72 pure and synthetic lime juice samples were recorded in reflectance mode. Sample outliers were removed using PCA analysis. Different data mining techniques for feature selection (Genetic Algorithm (GA)) and classification (including the radial basis function (RBF) network, Support Vector Machine (SVM), and Random Forest (RF) tree) were employed. Based on the results, SVM proved to be the most accurate classifier as it achieved the highest accuracy (97%) using the raw spectrum information. The classifier accuracy dropped to 93% when selected feature vector by GA search method was applied as classifier input. It can be concluded that some relevant features which produce good performance with the SVM classifier are removed by feature selection. Also, reduced spectra using PCA do not show acceptable performance (total accuracy of 66% by RBFNN), which indicates that dimensional reduction methods such as PCA do not always lead to more accurate results. These findings demonstrate the potential of data mining combination with near-infrared spectroscopy for monitoring lime juice quality in terms of natural or synthetic nature.
NASA Astrophysics Data System (ADS)
Yang, Yue; Wu, Yongjiang; Li, Weili; Liu, Xuesong; Zheng, Jiyu; Zhang, Wentao; Chen, Yong
2018-02-01
Near infrared (NIR) spectroscopy coupled with chemometrics was used to discriminate the geographical origin of Herba Epimedii in this work. Four different classification models, namely discriminant analysis (DA), back propagation neural network (BPNN), K-nearest neighbor (KNN), and support vector machine (SVM), were constructed, and their performances in terms of recognition accuracy were compared. The results indicated that the SVM model was superior over the other models in the geographical origin identification of Herba Epimedii. The recognition rates of the optimum SVM model were up to 100% for the calibration set and 94.44% for the prediction set, respectively. In addition, the feasibility of NIR spectroscopy with the CARS-PLSR calibration model in prediction of icariin content of Herba Epimedii was also investigated. The determination coefficient (RP2) and root-mean-square error (RMSEP) for prediction set were 0.9269 and 0.0480, respectively. It can be concluded that the NIR spectroscopy technique in combination with chemometrics has great potential in determination of geographical origin and icariin content of Herba Epimedii. This study can provide a valuable reference for rapid quality control of food products.
Carbon dioxide emission prediction using support vector machine
NASA Astrophysics Data System (ADS)
Saleh, Chairul; Rachman Dzakiyullah, Nur; Bayu Nugroho, Jonathan
2016-02-01
In this paper, the SVM model was proposed for predict expenditure of carbon (CO2) emission. The energy consumption such as electrical energy and burning coal is input variable that affect directly increasing of CO2 emissions were conducted to built the model. Our objective is to monitor the CO2 emission based on the electrical energy and burning coal used from the production process. The data electrical energy and burning coal used were obtained from Alcohol Industry in order to training and testing the models. It divided by cross-validation technique into 90% of training data and 10% of testing data. To find the optimal parameters of SVM model was used the trial and error approach on the experiment by adjusting C parameters and Epsilon. The result shows that the SVM model has an optimal parameter on C parameters 0.1 and 0 Epsilon. To measure the error of the model by using Root Mean Square Error (RMSE) with error value as 0.004. The smallest error of the model represents more accurately prediction. As a practice, this paper was contributing for an executive manager in making the effective decision for the business operation were monitoring expenditure of CO2 emission.
Application of GA-SVM method with parameter optimization for landslide development prediction
NASA Astrophysics Data System (ADS)
Li, X. Z.; Kong, J. M.
2013-10-01
Prediction of landslide development process is always a hot issue in landslide research. So far, many methods for landslide displacement series prediction have been proposed. Support vector machine (SVM) has been proved to be a novel algorithm with good performance. However, the performance strongly depends on the right selection of the parameters (C and γ) of SVM model. In this study, we presented an application of GA-SVM method with parameter optimization in landslide displacement rate prediction. We selected a typical large-scale landslide in some hydro - electrical engineering area of Southwest China as a case. On the basis of analyzing the basic characteristics and monitoring data of the landslide, a single-factor GA-SVM model and a multi-factor GA-SVM model of the landslide were built. Moreover, the models were compared with single-factor and multi-factor SVM models of the landslide. The results show that, the four models have high prediction accuracies, but the accuracies of GA-SVM models are slightly higher than those of SVM models and the accuracies of multi-factor models are slightly higher than those of single-factor models for the landslide prediction. The accuracy of the multi-factor GA-SVM models is the highest, with the smallest RSME of 0.0009 and the biggest RI of 0.9992.
Fast and Accurate Support Vector Machines on Large Scale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vishnu, Abhinav; Narasimhan, Jayenthi; Holder, Larry
Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminatemore » the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively --- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm--- de facto sequential SVM software --- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.« less
Assessing the druggability of protein-protein interactions by a supervised machine-learning method.
Sugaya, Nobuyoshi; Ikeda, Kazuyoshi
2009-08-25
Protein-protein interactions (PPIs) are challenging but attractive targets of small molecule drugs for therapeutic interventions of human diseases. In this era of rapid accumulation of PPI data, there is great need for a methodology that can efficiently select drug target PPIs by holistically assessing the druggability of PPIs. To address this need, we propose here a novel approach based on a supervised machine-learning method, support vector machine (SVM). To assess the druggability of the PPIs, 69 attributes were selected to cover a wide range of structural, drug and chemical, and functional information on the PPIs. These attributes were used as feature vectors in the SVM-based method. Thirty PPIs known to be druggable were carefully selected from previous studies; these were used as positive instances. Our approach was applied to 1,295 human PPIs with tertiary structures of their protein complexes already solved. The best SVM model constructed discriminated the already-known target PPIs from others at an accuracy of 81% (sensitivity, 82%; specificity, 79%) in cross-validation. Among the attributes, the two with the greatest discriminative power in the best SVM model were the number of interacting proteins and the number of pathways. Using the model, we predicted several promising candidates for druggable PPIs, such as SMAD4/SKI. As more PPI data are accumulated in the near future, our method will have increased ability to accelerate the discovery of druggable PPIs.
NASA Astrophysics Data System (ADS)
Wang, Hongjin; Hsieh, Sheng-Jen; Peng, Bo; Zhou, Xunfei
2016-07-01
A method without requirements on knowledge about thermal properties of coatings or those of substrates will be interested in the industrial application. Supervised machine learning regressions may provide possible solution to the problem. This paper compares the performances of two regression models (artificial neural networks (ANN) and support vector machines for regression (SVM)) with respect to coating thickness estimations made based on surface temperature increments collected via time resolved thermography. We describe SVM roles in coating thickness prediction. Non-dimensional analyses are conducted to illustrate the effects of coating thicknesses and various factors on surface temperature increments. It's theoretically possible to correlate coating thickness with surface increment. Based on the analyses, the laser power is selected in such a way: during the heating, the temperature increment is high enough to determine the coating thickness variance but low enough to avoid surface melting. Sixty-one pain-coated samples with coating thicknesses varying from 63.5 μm to 571 μm are used to train models. Hyper-parameters of the models are optimized by 10-folder cross validation. Another 28 sets of data are then collected to test the performance of the three methods. The study shows that SVM can provide reliable predictions of unknown data, due to its deterministic characteristics, and it works well when used for a small input data group. The SVM model generates more accurate coating thickness estimates than the ANN model.
SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor
Vidovic, Marina M. -C.; Görnitz, Nico; Müller, Klaus-Robert; Rätsch, Gunnar; Kloft, Marius
2015-01-01
Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but—due to its black-box character—motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs—regardless of their length and complexity—underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. PMID:26690911
Yan, Jianjun; Shen, Xiaojing; Wang, Yiqin; Li, Fufeng; Xia, Chunming; Guo, Rui; Chen, Chunfeng; Shen, Qingwei
2010-01-01
This study aims at utilising Wavelet Packet Transform (WPT) and Support Vector Machine (SVM) algorithm to make objective analysis and quantitative research for the auscultation in Traditional Chinese Medicine (TCM) diagnosis. First, Wavelet Packet Decomposition (WPD) at level 6 was employed to split more elaborate frequency bands of the auscultation signals. Then statistic analysis was made based on the extracted Wavelet Packet Energy (WPE) features from WPD coefficients. Furthermore, the pattern recognition was used to distinguish mixed subjects' statistical feature values of sample groups through SVM. Finally, the experimental results showed that the classification accuracies were at a high level.
Human action recognition with group lasso regularized-support vector machine
NASA Astrophysics Data System (ADS)
Luo, Huiwu; Lu, Huanzhang; Wu, Yabei; Zhao, Fei
2016-05-01
The bag-of-visual-words (BOVW) and Fisher kernel are two popular models in human action recognition, and support vector machine (SVM) is the most commonly used classifier for the two models. We show two kinds of group structures in the feature representation constructed by BOVW and Fisher kernel, respectively, since the structural information of feature representation can be seen as a prior for the classifier and can improve the performance of the classifier, which has been verified in several areas. However, the standard SVM employs L2-norm regularization in its learning procedure, which penalizes each variable individually and cannot express the structural information of feature representation. We replace the L2-norm regularization with group lasso regularization in standard SVM, and a group lasso regularized-support vector machine (GLRSVM) is proposed. Then, we embed the group structural information of feature representation into GLRSVM. Finally, we introduce an algorithm to solve the optimization problem of GLRSVM by alternating directions method of multipliers. The experiments evaluated on KTH, YouTube, and Hollywood2 datasets show that our method achieves promising results and improves the state-of-the-art methods on KTH and YouTube datasets.
NASA Astrophysics Data System (ADS)
Jegadeeshwaran, R.; Sugumaran, V.
2015-02-01
Hydraulic brakes in automobiles are important components for the safety of passengers; therefore, the brakes are a good subject for condition monitoring. The condition of the brake components can be monitored by using the vibration characteristics. On-line condition monitoring by using machine learning approach is proposed in this paper as a possible solution to such problems. The vibration signals for both good as well as faulty conditions of brakes were acquired from a hydraulic brake test setup with the help of a piezoelectric transducer and a data acquisition system. Descriptive statistical features were extracted from the acquired vibration signals and the feature selection was carried out using the C4.5 decision tree algorithm. There is no specific method to find the right number of features required for classification for a given problem. Hence an extensive study is needed to find the optimum number of features. The effect of the number of features was also studied, by using the decision tree as well as Support Vector Machines (SVM). The selected features were classified using the C-SVM and Nu-SVM with different kernel functions. The results are discussed and the conclusion of the study is presented.
NASA Astrophysics Data System (ADS)
Taha, Zahari; Muazu Musa, Rabiu; Majeed, Anwar P. P. Abdul; Razali Abdullah, Mohamad; Amirul Abdullah, Muhammad; Hasnun Arif Hassan, Mohd; Khalil, Zubair
2018-04-01
The present study employs a machine learning algorithm namely support vector machine (SVM) to classify high and low potential archers from a collection of bio-physiological variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. The bio-physiological variables namely resting heart rate, resting respiratory rate, resting diastolic blood pressure, resting systolic blood pressure, as well as calories intake, were measured prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models i.e. linear, quadratic and cubic kernel functions, were trained on the aforementioned variables. The k-means clustered the archers into high (HPA) and low potential archers (LPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy with a classification accuracy of 94% in comparison the other tested models. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected bio-physiological variables examined.
Vidyasagar, Mathukumalli
2015-01-01
This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and k-means clustering. Several references indicative of the application of these methods to cancer biology are discussed.
NASA Astrophysics Data System (ADS)
Effati, Meysam; Thill, Jean-Claude; Shabani, Shahin
2015-04-01
The contention of this paper is that many social science research problems are too "wicked" to be suitably studied using conventional statistical and regression-based methods of data analysis. This paper argues that an integrated geospatial approach based on methods of machine learning is well suited to this purpose. Recognizing the intrinsic wickedness of traffic safety issues, such approach is used to unravel the complexity of traffic crash severity on highway corridors as an example of such problems. The support vector machine (SVM) and coactive neuro-fuzzy inference system (CANFIS) algorithms are tested as inferential engines to predict crash severity and uncover spatial and non-spatial factors that systematically relate to crash severity, while a sensitivity analysis is conducted to determine the relative influence of crash severity factors. Different specifications of the two methods are implemented, trained, and evaluated against crash events recorded over a 4-year period on a regional highway corridor in Northern Iran. Overall, the SVM model outperforms CANFIS by a notable margin. The combined use of spatial analysis and artificial intelligence is effective at identifying leading factors of crash severity, while explicitly accounting for spatial dependence and spatial heterogeneity effects. Thanks to the demonstrated effectiveness of a sensitivity analysis, this approach produces comprehensive results that are consistent with existing traffic safety theories and supports the prioritization of effective safety measures that are geographically targeted and behaviorally sound on regional highway corridors.
Im, JongOne; Sen, Suman; Lindsay, Stuart; Zhang, Peiming
2018-06-28
In the present study, we demonstrate a tunneling nanogap technique to identify individual RNA nucleotides, which can be used as a mechanism to read the nucleobases for direct sequencing of RNA in a solid-state nanopore. The tunneling nanogap is composed of two electrodes separated by a distance of <3 nm and functionalized with a recognition molecule. When a chemical entity is captured in the gap, it generates electron tunneling currents, a process we call recognition tunneling (RT). Using RT nanogaps created in a scanning tunneling microscope (STM), we acquired the electron tunneling signals for the canonical and two modified RNA nucleotides. To call the individual RNA nucleotides from the RT data, we adopted a machine learning algorithm, support vector machine (SVM), for the data analysis. Through the SVM, we were able to identify the individual RNA nucleotides and distinguish them from their DNA counterparts with reasonably high accuracy. Since each RNA nucleoside contains a hydroxyl group at the 2'-position of its sugar ring in an RNA strand, it allows for the formation of a tunneling junction at a larger nanogap compared to the DNA nucleoside in a DNA strand, which lacks the 2' hydroxyl group. It also proves advantageous for the manufacture of RT devices. This study is a proof-of-principle demonstration for the development of an RT nanopore device for directly sequencing single RNA molecules, including those bearing modifications.
A Semisupervised Support Vector Machines Algorithm for BCI Systems
Qin, Jianzhao; Li, Yuanqing; Sun, Wei
2007-01-01
As an emerging technology, brain-computer interfaces (BCIs) bring us new communication interfaces which translate brain activities into control signals for devices like computers, robots, and so forth. In this study, we propose a semisupervised support vector machine (SVM) algorithm for brain-computer interface (BCI) systems, aiming at reducing the time-consuming training process. In this algorithm, we apply a semisupervised SVM for translating the features extracted from the electrical recordings of brain into control signals. This SVM classifier is built from a small labeled data set and a large unlabeled data set. Meanwhile, to reduce the time for training semisupervised SVM, we propose a batch-mode incremental learning method, which can also be easily applied to the online BCI systems. Additionally, it is suggested in many studies that common spatial pattern (CSP) is very effective in discriminating two different brain states. However, CSP needs a sufficient labeled data set. In order to overcome the drawback of CSP, we suggest a two-stage feature extraction method for the semisupervised learning algorithm. We apply our algorithm to two BCI experimental data sets. The offline data analysis results demonstrate the effectiveness of our algorithm. PMID:18368141
Extraction and classification of 3D objects from volumetric CT data
NASA Astrophysics Data System (ADS)
Song, Samuel M.; Kwon, Junghyun; Ely, Austin; Enyeart, John; Johnson, Chad; Lee, Jongkyu; Kim, Namho; Boyd, Douglas P.
2016-05-01
We propose an Automatic Threat Detection (ATD) algorithm for Explosive Detection System (EDS) using our multistage Segmentation Carving (SC) followed by Support Vector Machine (SVM) classifier. The multi-stage Segmentation and Carving (SC) step extracts all suspect 3-D objects. The feature vector is then constructed for all extracted objects and the feature vector is classified by the Support Vector Machine (SVM) previously learned using a set of ground truth threat and benign objects. The learned SVM classifier has shown to be effective in classification of different types of threat materials. The proposed ATD algorithm robustly deals with CT data that are prone to artifacts due to scatter, beam hardening as well as other systematic idiosyncrasies of the CT data. Furthermore, the proposed ATD algorithm is amenable for including newly emerging threat materials as well as for accommodating data from newly developing sensor technologies. Efficacy of the proposed ATD algorithm with the SVM classifier is demonstrated by the Receiver Operating Characteristics (ROC) curve that relates Probability of Detection (PD) as a function of Probability of False Alarm (PFA). The tests performed using CT data of passenger bags shows excellent performance characteristics.
Igne, Benoît; Drennen, James K; Anderson, Carl A
2014-01-01
Changes in raw materials and process wear and tear can have significant effects on the prediction error of near-infrared calibration models. When the variability that is present during routine manufacturing is not included in the calibration, test, and validation sets, the long-term performance and robustness of the model will be limited. Nonlinearity is a major source of interference. In near-infrared spectroscopy, nonlinearity can arise from light path-length differences that can come from differences in particle size or density. The usefulness of support vector machine (SVM) regression to handle nonlinearity and improve the robustness of calibration models in scenarios where the calibration set did not include all the variability present in test was evaluated. Compared to partial least squares (PLS) regression, SVM regression was less affected by physical (particle size) and chemical (moisture) differences. The linearity of the SVM predicted values was also improved. Nevertheless, although visualization and interpretation tools have been developed to enhance the usability of SVM-based methods, work is yet to be done to provide chemometricians in the pharmaceutical industry with a regression method that can supplement PLS-based methods.
Dong, Ni; Huang, Helai; Zheng, Liang
2015-09-01
In zone-level crash prediction, accounting for spatial dependence has become an extensively studied topic. This study proposes Support Vector Machine (SVM) model to address complex, large and multi-dimensional spatial data in crash prediction. Correlation-based Feature Selector (CFS) was applied to evaluate candidate factors possibly related to zonal crash frequency in handling high-dimension spatial data. To demonstrate the proposed approaches and to compare them with the Bayesian spatial model with conditional autoregressive prior (i.e., CAR), a dataset in Hillsborough county of Florida was employed. The results showed that SVM models accounting for spatial proximity outperform the non-spatial model in terms of model fitting and predictive performance, which indicates the reasonableness of considering cross-zonal spatial correlations. The best model predictive capability, relatively, is associated with the model considering proximity of the centroid distance by choosing the RBF kernel and setting the 10% of the whole dataset as the testing data, which further exhibits SVM models' capacity for addressing comparatively complex spatial data in regional crash prediction modeling. Moreover, SVM models exhibit the better goodness-of-fit compared with CAR models when utilizing the whole dataset as the samples. A sensitivity analysis of the centroid-distance-based spatial SVM models was conducted to capture the impacts of explanatory variables on the mean predicted probabilities for crash occurrence. While the results conform to the coefficient estimation in the CAR models, which supports the employment of the SVM model as an alternative in regional safety modeling. Copyright © 2015 Elsevier Ltd. All rights reserved.
A structural SVM approach for reference parsing.
Zhang, Xiaoli; Zou, Jie; Le, Daniel X; Thoma, George R
2011-06-09
Automated extraction of bibliographic data, such as article titles, author names, abstracts, and references is essential to the affordable creation of large citation databases. References, typically appearing at the end of journal articles, can also provide valuable information for extracting other bibliographic data. Therefore, parsing individual reference to extract author, title, journal, year, etc. is sometimes a necessary preprocessing step in building citation-indexing systems. The regular structure in references enables us to consider reference parsing a sequence learning problem and to study structural Support Vector Machine (structural SVM), a newly developed structured learning algorithm on parsing references. In this study, we implemented structural SVM and used two types of contextual features to compare structural SVM with conventional SVM. Both methods achieve above 98% token classification accuracy and above 95% overall chunk-level accuracy for reference parsing. We also compared SVM and structural SVM to Conditional Random Field (CRF). The experimental results show that structural SVM and CRF achieve similar accuracies at token- and chunk-levels. When only basic observation features are used for each token, structural SVM achieves higher performance compared to SVM since it utilizes the contextual label features. However, when the contextual observation features from neighboring tokens are combined, SVM performance improves greatly, and is close to that of structural SVM after adding the second order contextual observation features. The comparison of these two methods with CRF using the same set of binary features show that both structural SVM and CRF perform better than SVM, indicating their stronger sequence learning ability in reference parsing.
[Study on application of SVM in prediction of coronary heart disease].
Zhu, Yue; Wu, Jianghua; Fang, Ying
2013-12-01
Base on the data of blood pressure, plasma lipid, Glu and UA by physical test, Support Vector Machine (SVM) was applied to identify coronary heart disease (CHD) in patients and non-CHD individuals in south China population for guide of further prevention and treatment of the disease. Firstly, the SVM classifier was built using radial basis kernel function, liner kernel function and polynomial kernel function, respectively. Secondly, the SVM penalty factor C and kernel parameter sigma were optimized by particle swarm optimization (PSO) and then employed to diagnose and predict the CHD. By comparison with those from artificial neural network with the back propagation (BP) model, linear discriminant analysis, logistic regression method and non-optimized SVM, the overall results of our calculation demonstrated that the classification performance of optimized RBF-SVM model could be superior to other classifier algorithm with higher accuracy rate, sensitivity and specificity, which were 94.51%, 92.31% and 96.67%, respectively. So, it is well concluded that SVM could be used as a valid method for assisting diagnosis of CHD.
"When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.
Daniulaityte, Raminta; Chen, Lu; Lamy, Francois R; Carlson, Robert G; Thirunarayan, Krishnaprasad; Sheth, Amit
2016-10-24
To harness the full potential of social media for epidemiological surveillance of drug abuse trends, the field needs a greater level of automation in processing and analyzing social media content. The objective of the study is to describe the development of supervised machine-learning techniques for the eDrugTrends platform to automatically classify tweets by type/source of communication (personal, official/media, retail) and sentiment (positive, negative, neutral) expressed in cannabis- and synthetic cannabinoid-related tweets. Tweets were collected using Twitter streaming Application Programming Interface and filtered through the eDrugTrends platform using keywords related to cannabis, marijuana edibles, marijuana concentrates, and synthetic cannabinoids. After creating coding rules and assessing intercoder reliability, a manually labeled data set (N=4000) was developed by coding several batches of randomly selected subsets of tweets extracted from the pool of 15,623,869 collected by eDrugTrends (May-November 2015). Out of 4000 tweets, 25% (1000/4000) were used to build source classifiers and 75% (3000/4000) were used for sentiment classifiers. Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) were used to train the classifiers. Source classification (n=1000) tested Approach 1 that used short URLs, and Approach 2 where URLs were expanded and included into the bag-of-words analysis. For sentiment classification, Approach 1 used all tweets, regardless of their source/type (n=3000), while Approach 2 applied sentiment classification to personal communication tweets only (2633/3000, 88%). Multiclass and binary classification tasks were examined, and machine-learning sentiment classifier performance was compared with Valence Aware Dictionary for sEntiment Reasoning (VADER), a lexicon and rule-based method. The performance of each classifier was assessed using 5-fold cross validation that calculated average F-scores. One-tailed t test was used to determine if differences in F-scores were statistically significant. In multiclass source classification, the use of expanded URLs did not contribute to significant improvement in classifier performance (0.7972 vs 0.8102 for SVM, P=.19). In binary classification, the identification of all source categories improved significantly when unshortened URLs were used, with personal communication tweets benefiting the most (0.8736 vs 0.8200, P<.001). In multiclass sentiment classification Approach 1, SVM (0.6723) performed similarly to NB (0.6683) and LR (0.6703). In Approach 2, SVM (0.7062) did not differ from NB (0.6980, P=.13) or LR (F=0.6931, P=.05), but it was over 40% more accurate than VADER (F=0.5030, P<.001). In multiclass task, improvements in sentiment classification (Approach 2 vs Approach 1) did not reach statistical significance (eg, SVM: 0.7062 vs 0.6723, P=.052). In binary sentiment classification (positive vs negative), Approach 2 (focus on personal communication tweets only) improved classification results, compared with Approach 1, for LR (0.8752 vs 0.8516, P=.04) and SVM (0.8800 vs 0.8557, P=.045). The study provides an example of the use of supervised machine learning methods to categorize cannabis- and synthetic cannabinoid-related tweets with fairly high accuracy. Use of these content analysis tools along with geographic identification capabilities developed by the eDrugTrends platform will provide powerful methods for tracking regional changes in user opinions related to cannabis and synthetic cannabinoids use over time and across different regions.
Solution Path for Pin-SVM Classifiers With Positive and Negative $\\tau $ Values.
Huang, Xiaolin; Shi, Lei; Suykens, Johan A K
2017-07-01
Applying the pinball loss in a support vector machine (SVM) classifier results in pin-SVM. The pinball loss is characterized by a parameter τ . Its value is related to the quantile level and different τ values are suitable for different problems. In this paper, we establish an algorithm to find the entire solution path for pin-SVM with different τ values. This algorithm is based on the fact that the optimal solution to pin-SVM is continuous and piecewise linear with respect to τ . We also show that the nonnegativity constraint on τ is not necessary, i.e., τ can be extended to negative values. First, in some applications, a negative τ leads to better accuracy. Second, τ = -1 corresponds to a simple solution that links SVM and the classical kernel rule. The solution for τ = -1 can be obtained directly and then be used as a starting point of the solution path. The proposed method efficiently traverses τ values through the solution path, and then achieves good performance by a suitable τ . In particular, τ = 0 corresponds to C-SVM, meaning that the traversal algorithm can output a result at least as good as C-SVM with respect to validation error.
Araki, Tadashi; Ikeda, Nobutaka; Shukla, Devarshi; Jain, Pankaj K; Londhe, Narendra D; Shrivastava, Vimal K; Banchhor, Sumit K; Saba, Luca; Nicolaides, Andrew; Shafique, Shoaib; Laird, John R; Suri, Jasjit S
2016-05-01
Percutaneous coronary interventional procedures need advance planning prior to stenting or an endarterectomy. Cardiologists use intravascular ultrasound (IVUS) for screening, risk assessment and stratification of coronary artery disease (CAD). We hypothesize that plaque components are vulnerable to rupture due to plaque progression. Currently, there are no standard grayscale IVUS tools for risk assessment of plaque rupture. This paper presents a novel strategy for risk stratification based on plaque morphology embedded with principal component analysis (PCA) for plaque feature dimensionality reduction and dominant feature selection technique. The risk assessment utilizes 56 grayscale coronary features in a machine learning framework while linking information from carotid and coronary plaque burdens due to their common genetic makeup. This system consists of a machine learning paradigm which uses a support vector machine (SVM) combined with PCA for optimal and dominant coronary artery morphological feature extraction. Carotid artery proven intima-media thickness (cIMT) biomarker is adapted as a gold standard during the training phase of the machine learning system. For the performance evaluation, K-fold cross validation protocol is adapted with 20 trials per fold. For choosing the dominant features out of the 56 grayscale features, a polling strategy of PCA is adapted where the original value of the features is unaltered. Different protocols are designed for establishing the stability and reliability criteria of the coronary risk assessment system (cRAS). Using the PCA-based machine learning paradigm and cross-validation protocol, a classification accuracy of 98.43% (AUC 0.98) with K=10 folds using an SVM radial basis function (RBF) kernel was achieved. A reliability index of 97.32% and machine learning stability criteria of 5% were met for the cRAS. This is the first Computer aided design (CADx) system of its kind that is able to demonstrate the ability of coronary risk assessment and stratification while demonstrating a successful design of the machine learning system based on our assumptions. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Support vector machine for the diagnosis of malignant mesothelioma
NASA Astrophysics Data System (ADS)
Ushasukhanya, S.; Nithyakalyani, A.; Sivakumar, V.
2018-04-01
Harmful mesothelioma is an illness in which threatening (malignancy) cells shape in the covering of the trunk or stomach area. Being presented to asbestos can influence the danger of threatening mesothelioma. Signs and side effects of threatening mesothelioma incorporate shortness of breath and agony under the rib confine. Tests that inspect within the trunk and belly are utilized to recognize (find) and analyse harmful mesothelioma. Certain elements influence forecast (shot of recuperation) and treatment choices. In this review, Support vector machine (SVM) classifiers were utilized for Mesothelioma sickness conclusion. SVM output is contrasted by concentrating on Mesothelioma’s sickness and findings by utilizing similar information set. The support vector machine algorithm gives 92.5% precision acquired by means of 3-overlap cross-approval. The Mesothelioma illness dataset were taken from an organization reports from Turkey.
Failure prediction using machine learning and time series in optical network.
Wang, Zhilong; Zhang, Min; Wang, Danshi; Song, Chuang; Liu, Min; Li, Jin; Lou, Liqi; Liu, Zhuo
2017-08-07
In this paper, we propose a performance monitoring and failure prediction method in optical networks based on machine learning. The primary algorithms of this method are the support vector machine (SVM) and double exponential smoothing (DES). With a focus on risk-aware models in optical networks, the proposed protection plan primarily investigates how to predict the risk of an equipment failure. To the best of our knowledge, this important problem has not yet been fully considered. Experimental results showed that the average prediction accuracy of our method was 95% when predicting the optical equipment failure state. This finding means that our method can forecast an equipment failure risk with high accuracy. Therefore, our proposed DES-SVM method can effectively improve traditional risk-aware models to protect services from possible failures and enhance the optical network stability.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu Xiaoying; Ho, Shirley; Trac, Hy
We investigate machine learning (ML) techniques for predicting the number of galaxies (N{sub gal}) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N{sub gal}. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: supportmore » vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N{sub gal} by training our algorithms on the following six halo properties: number of particles, M{sub 200}, {sigma}{sub v}, v{sub max}, half-mass radius, and spin. For Millennium, our predicted N{sub gal} values have a mean-squared error (MSE) of {approx}0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to {approx}5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N{sub gal}. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M{sub star}, low M{sub star}). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.« less
Li, Liqi; Luo, Qifa; Xiao, Weidong; Li, Jinhui; Zhou, Shiwen; Li, Yongsheng; Zheng, Xiaoqi; Yang, Hua
2017-02-01
Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.
Mirsky, Simcha K; Barnea, Itay; Levi, Mattan; Greenspan, Hayit; Shaked, Natan T
2017-09-01
Currently, the delicate process of selecting sperm cells to be used for in vitro fertilization (IVF) is still based on the subjective, qualitative analysis of experienced clinicians using non-quantitative optical microscopy techniques. In this work, a method was developed for the automated analysis of sperm cells based on the quantitative phase maps acquired through use of interferometric phase microscopy (IPM). Over 1,400 human sperm cells from 8 donors were imaged using IPM, and an algorithm was designed to digitally isolate sperm cell heads from the quantitative phase maps while taking into consideration both the cell 3D morphology and contents, as well as acquire features describing sperm head morphology. A subset of these features was used to train a support vector machine (SVM) classifier to automatically classify sperm of good and bad morphology. The SVM achieves an area under the receiver operating characteristic curve of 88.59% and an area under the precision-recall curve of 88.67%, as well as precisions of 90% or higher. We believe that our automatic analysis can become the basis for objective and automatic sperm cell selection in IVF. © 2017 International Society for Advancement of Cytometry. © 2017 International Society for Advancement of Cytometry.
Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging.
Wang, Rui; Li, Rui; Lei, Yanyan; Zhu, Quing
2015-01-01
Support vector machine (SVM) is one of the most effective classification methods for cancer detection. The efficiency and quality of a SVM classifier depends strongly on several important features and a set of proper parameters. Here, a series of classification analyses, with one set of photoacoustic data from ovarian tissues ex vivo and a widely used breast cancer dataset- the Wisconsin Diagnostic Breast Cancer (WDBC), revealed the different accuracy of a SVM classification in terms of the number of features used and the parameters selected. A pattern recognition system is proposed by means of SVM-Recursive Feature Elimination (RFE) with the Radial Basis Function (RBF) kernel. To improve the effectiveness and robustness of the system, an optimized tuning ensemble algorithm called as SVM-RFE(C) with correlation filter was implemented to quantify feature and parameter information based on cross validation. The proposed algorithm is first demonstrated outperforming SVM-RFE on WDBC. Then the best accuracy of 94.643% and sensitivity of 94.595% were achieved when using SVM-RFE(C) to test 57 new PAT data from 19 patients. The experiment results show that the classifier constructed with SVM-RFE(C) algorithm is able to learn additional information from new data and has significant potential in ovarian cancer diagnosis.
Sriwastava, Brijesh Kumar; Basu, Subhadip; Maulik, Ujjwal
2015-10-01
Protein-protein interaction (PPI) site prediction aids to ascertain the interface residues that participate in interaction processes. Fuzzy support vector machine (F-SVM) is proposed as an effective method to solve this problem, and we have shown that the performance of the classical SVM can be enhanced with the help of an interaction-affinity based fuzzy membership function. The performances of both SVM and F-SVM on the PPI databases of the Homo sapiens and E. coli organisms are evaluated and estimated the statistical significance of the developed method over classical SVM and other fuzzy membership-based SVM methods available in the literature. Our membership function uses the residue-level interaction affinity scores for each pair of positive and negative sequence fragments. The average AUC scores in the 10-fold cross-validation experiments are measured as 79.94% and 80.48% for the Homo sapiens and E. coli organisms respectively. On the independent test datasets, AUC scores are obtained as 76.59% and 80.17% respectively for the two organisms. In almost all cases, the developed F-SVM method improves the performances obtained by the corresponding classical SVM and the other classifiers, available in the literature.
NASA Astrophysics Data System (ADS)
Teye, Ernest; Huang, Xingyi; Dai, Huang; Chen, Quansheng
2013-10-01
Quick, accurate and reliable technique for discrimination of cocoa beans according to geographical origin is essential for quality control and traceability management. This current study presents the application of Near Infrared Spectroscopy technique and multivariate classification for the differentiation of Ghana cocoa beans. A total of 194 cocoa bean samples from seven cocoa growing regions were used. Principal component analysis (PCA) was used to extract relevant information from the spectral data and this gave visible cluster trends. The performance of four multivariate classification methods: Linear discriminant analysis (LDA), K-nearest neighbors (KNN), Back propagation artificial neural network (BPANN) and Support vector machine (SVM) were compared. The performances of the models were optimized by cross validation. The results revealed that; SVM model was superior to all the mathematical methods with a discrimination rate of 100% in both the training and prediction set after preprocessing with Mean centering (MC). BPANN had a discrimination rate of 99.23% for the training set and 96.88% for prediction set. While LDA model had 96.15% and 90.63% for the training and prediction sets respectively. KNN model had 75.01% for the training set and 72.31% for prediction set. The non-linear classification methods used were superior to the linear ones. Generally, the results revealed that NIR Spectroscopy coupled with SVM model could be used successfully to discriminate cocoa beans according to their geographical origins for effective quality assurance.
NASA Astrophysics Data System (ADS)
Luna, A. S.; Paredes, M. L. L.; de Oliveira, G. C. G.; Corrêa, S. M.
2014-12-01
It is well known that air quality is a complex function of emissions, meteorology and topography, and statistical tools provide a sound framework for relating these variables. The observed data were contents of nitrogen dioxide (NO2), nitrogen monoxide (NO), nitrogen oxides (NOx), carbon monoxide (CO), ozone (O3), scalar wind speed (SWS), global solar radiation (GSR), temperature (TEM), moisture content in the air (HUM), collected by a mobile automatic monitoring station at Rio de Janeiro City in two places of the metropolitan area during 2011 and 2012. The aims of this study were: (1) to analyze the behavior of the variables, using the method of PCA for exploratory data analysis; (2) to propose forecasts of O3 levels from primary pollutants and meteorological factors, using nonlinear regression methods like ANN and SVM, from primary pollutants and meteorological factors. The PCA technique showed that for first dataset, variables NO, NOx and SWS have a greater impact on the concentration of O3 and the other data set had the TEM and GSR as the most influential variables. The obtained results from the nonlinear regression techniques ANN and SVM were remarkably closely and acceptable to one dataset presenting coefficient of determination for validation respectively 0.9122 and 0.9152, and root mean square error of 7.66 and 7.85, respectively. For these datasets, the PCA, SVM and ANN had demonstrated their robustness as useful tools for evaluation, and forecast scenarios for air quality.
Li, Wu; Hu, Bing; Wang, Ming-wei
2014-12-01
In the present paper, the terahertz time-domain spectroscopy (THz-TDS) identification model of borneol based on principal component analysis (PCA) and support vector machine (SVM) was established. As one Chinese common agent, borneol needs a rapid, simple and accurate detection and identification method for its different source and being easily confused in the pharmaceutical and trade links. In order to assure the quality of borneol product and guard the consumer's right, quickly, efficiently and correctly identifying borneol has significant meaning to the production and transaction of borneol. Terahertz time-domain spectroscopy is a new spectroscopy approach to characterize material using terahertz pulse. The absorption terahertz spectra of blumea camphor, borneol camphor and synthetic borneol were measured in the range of 0.2 to 2 THz with the transmission THz-TDS. The PCA scores of 2D plots (PC1 X PC2) and 3D plots (PC1 X PC2 X PC3) of three kinds of borneol samples were obtained through PCA analysis, and both of them have good clustering effect on the 3 different kinds of borneol. The value matrix of the first 10 principal components (PCs) was used to replace the original spectrum data, and the 60 samples of the three kinds of borneol were trained and then the unknown 60 samples were identified. Four kinds of support vector machine model of different kernel functions were set up in this way. Results show that the accuracy of identification and classification of SVM RBF kernel function for three kinds of borneol is 100%, and we selected the SVM with the radial basis kernel function to establish the borneol identification model, in addition, in the noisy case, the classification accuracy rates of four SVM kernel function are above 85%, and this indicates that SVM has strong generalization ability. This study shows that PCA with SVM method of borneol terahertz spectroscopy has good classification and identification effects, and provides a new method for species identification of borneol in Chinese medicine.
Progressive Classification Using Support Vector Machines
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri; Kocurek, Michael
2009-01-01
An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user can halt this reclassification process at any point, thereby obtaining the best possible result for a given amount of computation time. Alternatively, the results can be displayed as they are generated, providing the user with real-time feedback about the current accuracy of classification.
NASA Astrophysics Data System (ADS)
Lu, Shengzi; Dong, Meirong; Huang, Jianwei; Li, Wenbing; Lu, Jidong; Li, Jun
2018-02-01
T91 steel is a representative martensitic heat-resistant steel widely used in high temperature compression components of industrial equipment. During the service period, the operation safety and the service life of the equipment will be affected by the change of structure and mechanical properties of the steel components, which is called material aging. In order to develop a rapid in-situ aging estimation technology of high temperature compression components surface, laser-induced breakdown spectroscopy (LIBS) coupled with support vector machine (SVM) was employed in this paper. The spectral characteristics of 10 T91 steel specimens with different aging grades were analyzed. Line intensities and the line intensity ratios (ionic/atomic and alloying element/matrix element) that indicate the change of metallographic structure were used to establish SVM models, and the results using different variable sets were compared. The model was optimized by comparing different pulse number for practical effectiveness, and the robustness of the model was investigated in dealing with the inhomogeneity of steel composition. The study results show that the estimation model obtained the best performance using line intensities and line intensity ratios averaged from 31st-60th laser pulses as input variables. The estimation accuracy of validation set was greatly improved from 75.8% to 95.3%. In addition, the model showed the outstanding capacity for handling the fluctuations of spectral signals between measuring-points (spots), which indicated that the aging estimation based on a few measuring-points is feasible. The studies presented here demonstrate that the LIBS coupled with SVM is a new useful technique for the aging estimation of steel, and would be well-suited for fast safety assessment in industrial field.
A hybrid approach to select features and classify diseases based on medical data
NASA Astrophysics Data System (ADS)
AbdelLatif, Hisham; Luo, Jiawei
2018-03-01
Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms
Belekar, Vilas; Lingineni, Karthik; Garg, Prabha
2015-01-01
The breast cancer resistant protein (BCRP) is an important transporter and its inhibitors play an important role in cancer treatment by improving the oral bioavailability as well as blood brain barrier (BBB) permeability of anticancer drugs. In this work, a computational model was developed to predict the compounds as BCRP inhibitors or non-inhibitors. Various machine learning approaches like, support vector machine (SVM), k-nearest neighbor (k-NN) and artificial neural network (ANN) were used to develop the models. The Matthews correlation coefficients (MCC) of developed models using ANN, k-NN and SVM are 0.67, 0.71 and 0.77, and prediction accuracies are 85.2%, 88.3% and 90.8% respectively. The developed models were tested with a test set of 99 compounds and further validated with external set of 98 compounds. Distribution plot analysis and various machine learning models were also developed based on druglikeness descriptors. Applicability domain is used to check the prediction reliability of the new molecules.
Lou, Yun-xiao; Fu, Xian-shu; Yu, Xiao-ping; Zhang, Ya-fen
2017-01-01
This paper focused on an effective method to discriminate the geographical origin of Wuyi-Rock tea by the stable isotope ratio (SIR) and metallic element profiling (MEP) combined with support vector machine (SVM) analysis. Wuyi-Rock tea (n = 99) collected from nine producing areas and non-Wuyi-Rock tea (n = 33) from eleven nonproducing areas were analysed for SIR and MEP by established methods. The SVM model based on coupled data produced the best prediction accuracy (0.9773). This prediction shows that instrumental methods combined with a classification model can provide an effective and stable tool for provenance discrimination. Moreover, every feature variable in stable isotope and metallic element data was ranked by its contribution to the model. The results show that δ2H, δ18O, Cs, Cu, Ca, and Rb contents are significant indications for provenance discrimination and not all of the metallic elements improve the prediction accuracy of the SVM model. PMID:28473941
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D.; Hood, Darryl B.; Skelton, Tyler
2014-01-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire. PMID:23395953
An IPSO-SVM algorithm for security state prediction of mine production logistics system
NASA Astrophysics Data System (ADS)
Zhang, Yanliang; Lei, Junhui; Ma, Qiuli; Chen, Xin; Bi, Runfang
2017-06-01
A theoretical basis for the regulation of corporate security warning and resources was provided in order to reveal the laws behind the security state in mine production logistics. Considering complex mine production logistics system and the variable is difficult to acquire, a superior security status predicting model of mine production logistics system based on the improved particle swarm optimization and support vector machine (IPSO-SVM) is proposed in this paper. Firstly, through the linear adjustments of inertia weight and learning weights, the convergence speed and search accuracy are enhanced with the aim to deal with situations associated with the changeable complexity and the data acquisition difficulty. The improved particle swarm optimization (IPSO) is then introduced to resolve the problem of parameter settings in traditional support vector machines (SVM). At the same time, security status index system is built to determine the classification standards of safety status. The feasibility and effectiveness of this method is finally verified using the experimental results.
Age group classification and gender detection based on forced expiratory spirometry.
Cosgun, Sema; Ozbek, I Yucel
2015-08-01
This paper investigates the utility of forced expiratory spirometry (FES) test with efficient machine learning algorithms for the purpose of gender detection and age group classification. The proposed method has three main stages: feature extraction, training of the models and detection. In the first stage, some features are extracted from volume-time curve and expiratory flow-volume loop obtained from FES test. In the second stage, the probabilistic models for each gender and age group are constructed by training Gaussian mixture models (GMMs) and Support vector machine (SVM) algorithm. In the final stage, the gender (or age group) of test subject is estimated by using the trained GMM (or SVM) model. Experiments have been evaluated on a large database from 4571 subjects. The experimental results show that average correct classification rate performance of both GMM and SVM methods based on the FES test is more than 99.3 % and 96.8 % for gender and age group classification, respectively.
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D; Hood, Darryl B; Skelton, Tyler
2013-02-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire.
NASA Astrophysics Data System (ADS)
Endah, S. N.; Nugraheni, D. M. K.; Adhy, S.; Sutikno
2017-04-01
According to Law No. 32 of 2002 and the Indonesian Broadcasting Commission Regulation No. 02/P/KPI/12/2009 & No. 03/P/KPI/12/2009, stated that broadcast programs should not scold with harsh words, not harass, insult or demean minorities and marginalized groups. However, there are no suitable tools to censor those words automatically. Therefore, researches to develop a system of intelligent software to censor the words automatically are needed. To conduct censor, the system must be able to recognize the words in question. This research proposes the classification of speech divide into two classes using Support Vector Machine (SVM), first class is set of rude words and the second class is set of properly words. The speech pitch values as an input in SVM, it used for the development of the system for the Indonesian rude swear word. The results of the experiment show that SVM is good for this system.
Activity Recognition in Egocentric video using SVM, kNN and Combined SVMkNN Classifiers
NASA Astrophysics Data System (ADS)
Sanal Kumar, K. P.; Bhavani, R., Dr.
2017-08-01
Egocentric vision is a unique perspective in computer vision which is human centric. The recognition of egocentric actions is a challenging task which helps in assisting elderly people, disabled patients and so on. In this work, life logging activity videos are taken as input. There are 2 categories, first one is the top level and second one is second level. Here, the recognition is done using the features like Histogram of Oriented Gradients (HOG), Motion Boundary Histogram (MBH) and Trajectory. The features are fused together and it acts as a single feature. The extracted features are reduced using Principal Component Analysis (PCA). The features that are reduced are provided as input to the classifiers like Support Vector Machine (SVM), k nearest neighbor (kNN) and combined Support Vector Machine (SVM) and k Nearest Neighbor (kNN) (combined SVMkNN). These classifiers are evaluated and the combined SVMkNN provided better results than other classifiers in the literature.
Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS.
Yu, Hwanjo; Kim, Taehoon; Oh, Jinoh; Ko, Ilhwan; Kim, Sungchul; Han, Wook-Shin
2010-04-16
Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.
Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS
2010-01-01
Background Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. Results RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. Conclusions RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user’s feedback and efficiently processes the function to return relevant articles in real time. PMID:20406504
Support vector machine-based facial-expression recognition method combining shape and appearance
NASA Astrophysics Data System (ADS)
Han, Eun Jung; Kang, Byung Jun; Park, Kang Ryoung; Lee, Sangyoun
2010-11-01
Facial expression recognition can be widely used for various applications, such as emotion-based human-machine interaction, intelligent robot interfaces, face recognition robust to expression variation, etc. Previous studies have been classified as either shape- or appearance-based recognition. The shape-based method has the disadvantage that the individual variance of facial feature points exists irrespective of similar expressions, which can cause a reduction of the recognition accuracy. The appearance-based method has a limitation in that the textural information of the face is very sensitive to variations in illumination. To overcome these problems, a new facial-expression recognition method is proposed, which combines both shape and appearance information, based on the support vector machine (SVM). This research is novel in the following three ways as compared to previous works. First, the facial feature points are automatically detected by using an active appearance model. From these, the shape-based recognition is performed by using the ratios between the facial feature points based on the facial-action coding system. Second, the SVM, which is trained to recognize the same and different expression classes, is proposed to combine two matching scores obtained from the shape- and appearance-based recognitions. Finally, a single SVM is trained to discriminate four different expressions, such as neutral, a smile, anger, and a scream. By determining the expression of the input facial image whose SVM output is at a minimum, the accuracy of the expression recognition is much enhanced. The experimental results showed that the recognition accuracy of the proposed method was better than previous researches and other fusion methods.
Application of the Teager-Kaiser energy operator in bearing fault diagnosis.
Henríquez Rodríguez, Patricia; Alonso, Jesús B; Ferrer, Miguel A; Travieso, Carlos M
2013-03-01
Condition monitoring of rotating machines is important in the prevention of failures. As most machine malfunctions are related to bearing failures, several bearing diagnosis techniques have been developed. Some of them feature the bearing vibration signal with statistical measures and others extract the bearing fault characteristic frequency from the AM component of the vibration signal. In this paper, we propose to transform the vibration signal to the Teager-Kaiser domain and feature it with statistical and energy-based measures. A bearing database with normal and faulty bearings is used. The diagnosis is performed with two classifiers: a neural network classifier and a LS-SVM classifier. Experiments show that the Teager domain features outperform those based on the temporal or AM signal. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
A gas-sensing array produced from screen-printed, zeolite-modified chromium titanate
NASA Astrophysics Data System (ADS)
Pugh, David C.; Hailes, Stephen M. V.; Parkin, Ivan P.
2015-08-01
Metal oxide semiconducting (MOS) gas sensors represent a cheap, robust and sensitive technology for detecting volatile organic compounds. MOS sensors have consistently been shown to lack sensitivity to a broad range on analytes, leading to false positive errors. In this study an array of five chromium titanate (CTO) thick-film sensors were produced. These were modified by incorporating a range of zeolites, namely β, Y, mordenite and ZSM5, into the bulk sensor material. Sensors were exposed to three common reducing gases, namely acetone, ethanol and toluene, and a machine learning technique was applied to differentiate between the different gases. All sensors produced strong resistive responses (increases in resistance) and a support vector machine (SVM) was able to classify the data to a high degree of selectivity.
Least Square Regression Method for Estimating Gas Concentration in an Electronic Nose System
Khalaf, Walaa; Pace, Calogero; Gaudioso, Manlio
2009-01-01
We describe an Electronic Nose (ENose) system which is able to identify the type of analyte and to estimate its concentration. The system consists of seven sensors, five of them being gas sensors (supplied with different heater voltage values), the remainder being a temperature and a humidity sensor, respectively. To identify a new analyte sample and then to estimate its concentration, we use both some machine learning techniques and the least square regression principle. In fact, we apply two different training models; the first one is based on the Support Vector Machine (SVM) approach and is aimed at teaching the system how to discriminate among different gases, while the second one uses the least squares regression approach to predict the concentration of each type of analyte. PMID:22573980
Frick, Andreas; Gingnell, Malin; Marquand, Andre F.; Howner, Katarina; Fischer, Håkan; Kristiansson, Marianne; Williams, Steven C.R.; Fredrikson, Mats; Furmark, Tomas
2014-01-01
Functional neuroimaging of social anxiety disorder (SAD) support altered neural activation to threat-provoking stimuli focally in the fear network, while structural differences are distributed over the temporal and frontal cortices as well as limbic structures. Previous neuroimaging studies have investigated the brain at the voxel level using mass-univariate methods which do not enable detection of more complex patterns of activity and structural alterations that may separate SAD from healthy individuals. Support vector machine (SVM) is a supervised machine learning method that capitalizes on brain activation and structural patterns to classify individuals. The aim of this study was to investigate if it is possible to discriminate SAD patients (n = 14) from healthy controls (n = 12) using SVM based on (1) functional magnetic resonance imaging during fearful face processing and (2) regional gray matter volume. Whole brain and region of interest (fear network) SVM analyses were performed for both modalities. For functional scans, significant classifications were obtained both at whole brain level and when restricting the analysis to the fear network while gray matter SVM analyses correctly classified participants only when using the whole brain search volume. These results support that SAD is characterized by aberrant neural activation to affective stimuli in the fear network, while disorder-related alterations in regional gray matter volume are more diffusely distributed over the whole brain. SVM may thus be useful for identifying imaging biomarkers of SAD. PMID:24239689
NASA Astrophysics Data System (ADS)
Taha, Zahari; Muazu Musa, Rabiu; Majeed, A. P. P. Abdul; Razali Abdullah, Mohamad; Aizzat Zakaria, Muhammad; Muaz Alim, Muhammad; Arif Mat Jizat, Jessnor; Fauzi Ibrahim, Mohamad
2018-03-01
Support Vector Machine (SVM) has been revealed to be a powerful learning algorithm for classification and prediction. However, the use of SVM for prediction and classification in sport is at its inception. The present study classified and predicted high and low potential archers from a collection of psychological coping skills variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. Psychological coping skills inventory which evaluates the archers level of related coping skills were filled out by the archers prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models, i.e. linear and fine radial basis function (RBF) kernel functions, were trained on the psychological variables. The k-means clustered the archers into high psychologically prepared archers (HPPA) and low psychologically prepared archers (LPPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy and precision throughout the exercise with an accuracy of 92% and considerably fewer error rate for the prediction of the HPPA and the LPPA as compared to the fine RBF SVM. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected psychological coping skills variables examined which would consequently save time and energy during talent identification and development programme.
Automatic classification of seismic events within a regional seismograph network
NASA Astrophysics Data System (ADS)
Tiira, Timo; Kortström, Jari; Uski, Marja
2015-04-01
A fully automatic method for seismic event classification within a sparse regional seismograph network is presented. The tool is based on a supervised pattern recognition technique, Support Vector Machine (SVM), trained here to distinguish weak local earthquakes from a bulk of human-made or spurious seismic events. The classification rules rely on differences in signal energy distribution between natural and artificial seismic sources. Seismic records are divided into four windows, P, P coda, S, and S coda. For each signal window STA is computed in 20 narrow frequency bands between 1 and 41 Hz. The 80 discrimination parameters are used as a training data for the SVM. The SVM models are calculated for 19 on-line seismic stations in Finland. The event data are compiled mainly from fully automatic event solutions that are manually classified after automatic location process. The station-specific SVM training events include 11-302 positive (earthquake) and 227-1048 negative (non-earthquake) examples. The best voting rules for combining results from different stations are determined during an independent testing period. Finally, the network processing rules are applied to an independent evaluation period comprising 4681 fully automatic event determinations, of which 98 % have been manually identified as explosions or noise and 2 % as earthquakes. The SVM method correctly identifies 94 % of the non-earthquakes and all the earthquakes. The results imply that the SVM tool can identify and filter out blasts and spurious events from fully automatic event solutions with a high level of confidence. The tool helps to reduce work-load in manual seismic analysis by leaving only ~5 % of the automatic event determinations, i.e. the probable earthquakes for more detailed seismological analysis. The approach presented is easy to adjust to requirements of a denser or wider high-frequency network, once enough training examples for building a station-specific data set are available.
Ji, Xiaoliang; Shang, Xu; Dahlgren, Randy A; Zhang, Minghua
2017-07-01
Accurate quantification of dissolved oxygen (DO) is critically important for managing water resources and controlling pollution. Artificial intelligence (AI) models have been successfully applied for modeling DO content in aquatic ecosystems with limited data. However, the efficacy of these AI models in predicting DO levels in the hypoxic river systems having multiple pollution sources and complicated pollutants behaviors is unclear. Given this dilemma, we developed a promising AI model, known as support vector machine (SVM), to predict the DO concentration in a hypoxic river in southeastern China. Four different calibration models, specifically, multiple linear regression, back propagation neural network, general regression neural network, and SVM, were established, and their prediction accuracy was systemically investigated and compared. A total of 11 hydro-chemical variables were used as model inputs. These variables were measured bimonthly at eight sampling sites along the rural-suburban-urban portion of Wen-Rui Tang River from 2004 to 2008. The performances of the established models were assessed through the mean square error (MSE), determination coefficient (R 2 ), and Nash-Sutcliffe (NS) model efficiency. The results indicated that the SVM model was superior to other models in predicting DO concentration in Wen-Rui Tang River. For SVM, the MSE, R 2 , and NS values for the testing subset were 0.9416 mg/L, 0.8646, and 0.8763, respectively. Sensitivity analysis showed that ammonium-nitrogen was the most significant input variable of the proposal SVM model. Overall, these results demonstrated that the proposed SVM model can efficiently predict water quality, especially for highly impaired and hypoxic river systems.
Cerebral 18F-FDG PET in macrophagic myofasciitis: An individual SVM-based approach.
Blanc-Durand, Paul; Van Der Gucht, Axel; Guedj, Eric; Abulizi, Mukedaisi; Aoun-Sebaiti, Mehdi; Lerman, Lionel; Verger, Antoine; Authier, François-Jérôme; Itti, Emmanuel
2017-01-01
Macrophagic myofasciitis (MMF) is an emerging condition with highly specific myopathological alterations. A peculiar spatial pattern of a cerebral glucose hypometabolism involving occipito-temporal cortex and cerebellum have been reported in patients with MMF; however, the full pattern is not systematically present in routine interpretation of scans, and with varying degrees of severity depending on the cognitive profile of patients. Aim was to generate and evaluate a support vector machine (SVM) procedure to classify patients between healthy or MMF 18F-FDG brain profiles. 18F-FDG PET brain images of 119 patients with MMF and 64 healthy subjects were retrospectively analyzed. The whole-population was divided into two groups; a training set (100 MMF, 44 healthy subjects) and a testing set (19 MMF, 20 healthy subjects). Dimensionality reduction was performed using a t-map from statistical parametric mapping (SPM) and a SVM with a linear kernel was trained on the training set. To evaluate the performance of the SVM classifier, values of sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV) and accuracy (Acc) were calculated. The SPM12 analysis on the training set exhibited the already reported hypometabolism pattern involving occipito-temporal and fronto-parietal cortices, limbic system and cerebellum. The SVM procedure, based on the t-test mask generated from the training set, correctly classified MMF patients of the testing set with following Se, Sp, PPV, NPV and Acc: 89%, 85%, 85%, 89%, and 87%. We developed an original and individual approach including a SVM to classify patients between healthy or MMF metabolic brain profiles using 18F-FDG-PET. Machine learning algorithms are promising for computer-aided diagnosis but will need further validation in prospective cohorts.
Zhang, Ming-Huan; Ma, Jun-Shan; Shen, Ying; Chen, Ying
2016-09-01
This study aimed to investigate the optimal support vector machines (SVM)-based classifier of duchenne muscular dystrophy (DMD) magnetic resonance imaging (MRI) images. T1-weighted (T1W) and T2-weighted (T2W) images of the 15 boys with DMD and 15 normal controls were obtained. Textural features of the images were extracted and wavelet decomposed, and then, principal features were selected. Scale transform was then performed for MRI images. Afterward, SVM-based classifiers of MRI images were analyzed based on the radical basis function and decomposition levels. The cost (C) parameter and kernel parameter [Formula: see text] were used for classification. Then, the optimal SVM-based classifier, expressed as [Formula: see text]), was identified by performance evaluation (sensitivity, specificity and accuracy). Eight of 12 textural features were selected as principal features (eigenvalues [Formula: see text]). The 16 SVM-based classifiers were obtained using combination of (C, [Formula: see text]), and those with lower C and [Formula: see text] values showed higher performances, especially classifier of [Formula: see text]). The SVM-based classifiers of T1W images showed higher performance than T1W images at the same decomposition level. The T1W images in classifier of [Formula: see text]) at level 2 decomposition showed the highest performance of all, and its overall correct sensitivity, specificity, and accuracy reached 96.9, 97.3, and 97.1 %, respectively. The T1W images in SVM-based classifier [Formula: see text] at level 2 decomposition showed the highest performance of all, demonstrating that it was the optimal classification for the diagnosis of DMD.
Identifying saltcedar with hyperspectral data and support vector machines
USDA-ARS?s Scientific Manuscript database
Saltcedar (Tamarix spp.) are a group of dense phreatophytic shrubs and trees that are invasive to riparian areas throughout the United States. This study determined the feasibility of using hyperspectral data and a support vector machine (SVM) classifier to discriminate saltcedar from other cover t...
Votano, Joseph R; Parham, Marc; Hall, L Mark; Hall, Lowell H; Kier, Lemont B; Oloff, Scott; Tropsha, Alexander
2006-11-30
Four modeling techniques, using topological descriptors to represent molecular structure, were employed to produce models of human serum protein binding (% bound) on a data set of 1008 experimental values, carefully screened from publicly available sources. To our knowledge, this data is the largest set on human serum protein binding reported for QSAR modeling. The data was partitioned into a training set of 808 compounds and an external validation test set of 200 compounds. Partitioning was accomplished by clustering the compounds in a structure descriptor space so that random sampling of 20% of the whole data set produced an external test set that is a good representative of the training set with respect to both structure and protein binding values. The four modeling techniques include multiple linear regression (MLR), artificial neural networks (ANN), k-nearest neighbors (kNN), and support vector machines (SVM). With the exception of the MLR model, the ANN, kNN, and SVM QSARs were ensemble models. Training set correlation coefficients and mean absolute error ranged from r2=0.90 and MAE=7.6 for ANN to r2=0.61 and MAE=16.2 for MLR. Prediction results from the validation set yielded correlation coefficients and mean absolute errors which ranged from r2=0.70 and MAE=14.1 for ANN to a low of r2=0.59 and MAE=18.3 for the SVM model. Structure descriptors that contribute significantly to the models are discussed and compared with those found in other published models. For the ANN model, structure descriptor trends with respect to their affects on predicted protein binding can assist the chemist in structure modification during the drug design process.
Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying
2015-01-01
Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.
2015-01-01
Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes. PMID:26681483
Kuo, Pao-Jen; Wu, Shao-Chun; Chien, Peng-Chen; Rau, Cheng-Shyuan; Chen, Yi-Chun; Hsieh, Hsiao-Yun; Hsieh, Ching-Hua
2018-01-01
Objectives This study aimed to build and test the models of machine learning (ML) to predict the mortality of hospitalised motorcycle riders. Setting The study was conducted in a level-1 trauma centre in southern Taiwan. Participants Motorcycle riders who were hospitalised between January 2009 and December 2015 were classified into a training set (n=6306) and test set (n=946). Using the demographic information, injury characteristics and laboratory data of patients, logistic regression (LR), support vector machine (SVM) and decision tree (DT) analyses were performed to determine the mortality of individual motorcycle riders, under different conditions, using all samples or reduced samples, as well as all variables or selected features in the algorithm. Primary and secondary outcome measures The predictive performance of the model was evaluated based on accuracy, sensitivity, specificity and geometric mean, and an analysis of the area under the receiver operating characteristic curves of the two different models was carried out. Results In the training set, both LR and SVM had a significantly higher area under the receiver operating characteristic curve (AUC) than DT. No significant difference was observed in the AUC of LR and SVM, regardless of whether all samples or reduced samples and whether all variables or selected features were used. In the test set, the performance of the SVM model for all samples with selected features was better than that of all other models, with an accuracy of 98.73%, sensitivity of 86.96%, specificity of 99.02%, geometric mean of 92.79% and AUC of 0.9517, in mortality prediction. Conclusion ML can provide a feasible level of accuracy in predicting the mortality of motorcycle riders. Integration of the ML model, particularly the SVM algorithm in the trauma system, may help identify high-risk patients and, therefore, guide appropriate interventions by the clinical staff. PMID:29306885
NASA Astrophysics Data System (ADS)
Nieto, Paulino José García; García-Gonzalo, Esperanza; Vilán, José Antonio Vilán; Robleda, Abraham Segade
2015-12-01
The main aim of this research work is to build a new practical hybrid regression model to predict the milling tool wear in a regular cut as well as entry cut and exit cut of a milling tool. The model was based on Particle Swarm Optimization (PSO) in combination with support vector machines (SVMs). This optimization mechanism involved kernel parameter setting in the SVM training procedure, which significantly influences the regression accuracy. Bearing this in mind, a PSO-SVM-based model, which is based on the statistical learning theory, was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc. To accomplish the objective of this study, the experimental dataset represents experiments from runs on a milling machine under various operating conditions. In this way, data sampled by three different types of sensors (acoustic emission sensor, vibration sensor and current sensor) were acquired at several positions. A second aim is to determine the factors with the greatest bearing on the milling tool flank wear with a view to proposing milling machine's improvements. Firstly, this hybrid PSO-SVM-based regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the flank wear (output variable) and input variables (time, depth of cut, feed, etc.). Indeed, regression with optimal hyperparameters was performed and a determination coefficient of 0.95 was obtained. The agreement of this model with experimental data confirmed its good performance. Secondly, the main advantages of this PSO-SVM-based model are its capacity to produce a simple, easy-to-interpret model, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, the main conclusions of this study are exposed.
2012-01-01
Background Src plays various roles in tumour progression, invasion, metastasis, angiogenesis and survival. It is one of the multiple targets of multi-target kinase inhibitors in clinical uses and trials for the treatment of leukemia and other cancers. These successes and appearances of drug resistance in some patients have raised significant interest and efforts in discovering new Src inhibitors. Various in-silico methods have been used in some of these efforts. It is desirable to explore additional in-silico methods, particularly those capable of searching large compound libraries at high yields and reduced false-hit rates. Results We evaluated support vector machines (SVM) as virtual screening tools for searching Src inhibitors from large compound libraries. SVM trained and tested by 1,703 inhibitors and 63,318 putative non-inhibitors correctly identified 93.53%~ 95.01% inhibitors and 99.81%~ 99.90% non-inhibitors in 5-fold cross validation studies. SVM trained by 1,703 inhibitors reported before 2011 and 63,318 putative non-inhibitors correctly identified 70.45% of the 44 inhibitors reported since 2011, and predicted as inhibitors 44,843 (0.33%) of 13.56M PubChem, 1,496 (0.89%) of 168 K MDDR, and 719 (7.73%) of 9,305 MDDR compounds similar to the known inhibitors. Conclusions SVM showed comparable yield and reduced false hit rates in searching large compound libraries compared to the similarity-based and other machine-learning VS methods developed from the same set of training compounds and molecular descriptors. We tested three virtual hits of the same novel scaffold from in-house chemical libraries not reported as Src inhibitor, one of which showed moderate activity. SVM may be potentially explored for searching Src inhibitors from large compound libraries at low false-hit rates. PMID:23173901
A Comparison of Artificial Intelligence Methods on Determining Coronary Artery Disease
NASA Astrophysics Data System (ADS)
Babaoğlu, Ismail; Baykan, Ömer Kaan; Aygül, Nazif; Özdemir, Kurtuluş; Bayrak, Mehmet
The aim of this study is to show a comparison of multi-layered perceptron neural network (MLPNN) and support vector machine (SVM) on determination of coronary artery disease existence upon exercise stress testing (EST) data. EST and coronary angiography were performed on 480 patients with acquiring 23 verifying features from each. The robustness of the proposed methods is examined using classification accuracy, k-fold cross-validation method and Cohen's kappa coefficient. The obtained classification accuracies are approximately 78% and 79% for MLPNN and SVM respectively. Both MLPNN and SVM methods are rather satisfactory than human-based method looking to Cohen's kappa coefficients. Besides, SVM is slightly better than MLPNN when looking to the diagnostic accuracy, average of sensitivity and specificity, and also Cohen's kappa coefficient.
Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li
2011-01-01
Background Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Methodology/Principal Findings Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. Conclusions/Significance The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice. PMID:21359184
Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li
2011-02-16
Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice.
NASA Astrophysics Data System (ADS)
He, Zhibin; Wen, Xiaohu; Liu, Hu; Du, Jun
2014-02-01
Data driven models are very useful for river flow forecasting when the underlying physical relationships are not fully understand, but it is not clear whether these data driven models still have a good performance in the small river basin of semiarid mountain regions where have complicated topography. In this study, the potential of three different data driven methods, artificial neural network (ANN), adaptive neuro fuzzy inference system (ANFIS) and support vector machine (SVM) were used for forecasting river flow in the semiarid mountain region, northwestern China. The models analyzed different combinations of antecedent river flow values and the appropriate input vector has been selected based on the analysis of residuals. The performance of the ANN, ANFIS and SVM models in training and validation sets are compared with the observed data. The model which consists of three antecedent values of flow has been selected as the best fit model for river flow forecasting. To get more accurate evaluation of the results of ANN, ANFIS and SVM models, the four quantitative standard statistical performance evaluation measures, the coefficient of correlation (R), root mean squared error (RMSE), Nash-Sutcliffe efficiency coefficient (NS) and mean absolute relative error (MARE), were employed to evaluate the performances of various models developed. The results indicate that the performance obtained by ANN, ANFIS and SVM in terms of different evaluation criteria during the training and validation period does not vary substantially; the performance of the ANN, ANFIS and SVM models in river flow forecasting was satisfactory. A detailed comparison of the overall performance indicated that the SVM model performed better than ANN and ANFIS in river flow forecasting for the validation data sets. The results also suggest that ANN, ANFIS and SVM method can be successfully applied to establish river flow with complicated topography forecasting models in the semiarid mountain regions.
Ma, X H; Wang, R; Tan, C Y; Jiang, Y Y; Lu, T; Rao, H B; Li, X Y; Go, M L; Low, B C; Chen, Y Z
2010-10-04
Multitarget agents have been increasingly explored for enhancing efficacy and reducing countertarget activities and toxicities. Efficient virtual screening (VS) tools for searching selective multitarget agents are desired. Combinatorial support vector machines (C-SVM) were tested as VS tools for searching dual-inhibitors of 11 combinations of 9 anticancer kinase targets (EGFR, VEGFR, PDGFR, Src, FGFR, Lck, CDK1, CDK2, GSK3). C-SVM trained on 233-1,316 non-dual-inhibitors correctly identified 26.8%-57.3% (majority >36%) of the 56-230 intra-kinase-group dual-inhibitors (equivalent to the 50-70% yields of two independent individual target VS tools), and 12.2% of the 41 inter-kinase-group dual-inhibitors. C-SVM were fairly selective in misidentifying as dual-inhibitors 3.7%-48.1% (majority <20%) of the 233-1,316 non-dual-inhibitors of the same kinase pairs and 0.98%-4.77% of the 3,971-5,180 inhibitors of other kinases. C-SVM produced low false-hit rates in misidentifying as dual-inhibitors 1,746-4,817 (0.013%-0.036%) of the 13.56 M PubChem compounds, 12-175 (0.007%-0.104%) of the 168 K MDDR compounds, and 0-84 (0.0%-2.9%) of the 19,495-38,483 MDDR compounds similar to the known dual-inhibitors. C-SVM was compared to other VS methods Surflex-Dock, DOCK Blaster, kNN and PNN against the same sets of kinase inhibitors and the full set or subset of the 1.02 M Zinc clean-leads data set. C-SVM produced comparable dual-inhibitor yields, slightly better false-hit rates for kinase inhibitors, and significantly lower false-hit rates for the Zinc clean-leads data set. Combinatorial SVM showed promising potential for searching selective multitarget agents against intra-kinase-group kinases without explicit knowledge of multitarget agents.
Kumar, Pankaj; Ma, Xiaohua; Liu, Xianghui; Jia, Jia; Bucong, Han; Xue, Ying; Li, Ze Rong; Yang, Sheng Yong; Wei, Yu Quan; Chen, Yu Zong
2011-05-01
Various in vitro and in-silico methods have been used for drug genotoxicity tests, which show limited genotoxicity (GT+) and non-genotoxicity (GT-) identification rates. New methods and combinatorial approaches have been explored for enhanced collective identification capability. The rates of in-silco methods may be further improved by significantly diversified training data enriched by the large number of recently reported GT+ and GT- compounds, but a major concern is the increased noise levels arising from high false-positive rates of in vitro data. In this work, we evaluated the effect of training data size and noise level on the performance of support vector machines (SVM) method known to tolerate high noise levels in training data. Two SVMs of different diversity/noise levels were developed and tested. H-SVM trained by higher diversity higher noise data (GT+ in any in vivo or in vitro test) outperforms L-SVM trained by lower noise lower diversity data (GT+ in in vivo or Ames test only). H-SVM trained by 4,763 GT+ compounds reported before 2008 and 8,232 GT- compounds excluding clinical trial drugs correctly identified 81.6% of the 38 GT+ compounds reported since 2008, predicted 83.1% of the 2,008 clinical trial drugs as GT-, and 23.96% of 168 K MDDR and 27.23% of 17.86M PubChem compounds as GT+. These are comparable to the 43.1-51.9% GT+ and 75-93% GT- rates of existing in-silico methods, 58.8% GT+ and 79% GT- rates of Ames method, and the estimated percentages of 23% in vivo and 31-33% in vitro GT+ compounds in the "universe of chemicals". There is a substantial level of agreement between H-SVM and L-SVM predicted GT+ and GT- MDDR compounds and the prediction from TOPKAT. SVM showed good potential in identifying GT+ compounds from large compound libraries based on higher diversity and higher noise training data.
Sung, Yao-Ting; Chen, Ju-Ling; Cha, Ji-Her; Tseng, Hou-Chiang; Chang, Tao-Hsing; Chang, Kuo-En
2015-06-01
Multilevel linguistic features have been proposed for discourse analysis, but there have been few applications of multilevel linguistic features to readability models and also few validations of such models. Most traditional readability formulae are based on generalized linear models (GLMs; e.g., discriminant analysis and multiple regression), but these models have to comply with certain statistical assumptions about data properties and include all of the data in formulae construction without pruning the outliers in advance. The use of such readability formulae tends to produce a low text classification accuracy, while using a support vector machine (SVM) in machine learning can enhance the classification outcome. The present study constructed readability models by integrating multilevel linguistic features with SVM, which is more appropriate for text classification. Taking the Chinese language as an example, this study developed 31 linguistic features as the predicting variables at the word, semantic, syntax, and cohesion levels, with grade levels of texts as the criterion variable. The study compared four types of readability models by integrating unilevel and multilevel linguistic features with GLMs and an SVM. The results indicate that adopting a multilevel approach in readability analysis provides a better representation of the complexities of both texts and the reading comprehension process.
Chen, Qiu-Feng; Chen, Hua-Jun; Liu, Jun; Sun, Tao; Shen, Qun-Tai
2016-01-01
Machine learning-based approaches play an important role in examining functional magnetic resonance imaging (fMRI) data in a multivariate manner and extracting features predictive of group membership. This study was performed to assess the potential for measuring brain intrinsic activity to identify minimal hepatic encephalopathy (MHE) in cirrhotic patients, using the support vector machine (SVM) method. Resting-state fMRI data were acquired in 16 cirrhotic patients with MHE and 19 cirrhotic patients without MHE. The regional homogeneity (ReHo) method was used to investigate the local synchrony of intrinsic brain activity. Psychometric Hepatic Encephalopathy Score (PHES) was used to define MHE condition. SVM-classifier was then applied using leave-one-out cross-validation, to determine the discriminative ReHo-map for MHE. The discrimination map highlights a set of regions, including the prefrontal cortex, anterior cingulate cortex, anterior insular cortex, inferior parietal lobule, precentral and postcentral gyri, superior and medial temporal cortices, and middle and inferior occipital gyri. The optimized discriminative model showed total accuracy of 82.9% and sensitivity of 81.3%. Our results suggested that a combination of the SVM approach and brain intrinsic activity measurement could be helpful for detection of MHE in cirrhotic patients.
NASA Astrophysics Data System (ADS)
Adelabu, Samuel; Mutanga, Onisimo; Adam, Elhadi; Cho, Moses Azong
2013-01-01
Classification of different tree species in semiarid areas can be challenging as a result of the change in leaf structure and orientation due to soil moisture constraints. Tree species mapping is, however, a key parameter for forest management in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples.We performed classification using random forest (RF) and support vector machines (SVM) based on EnMap box. The overall accuracies for classifying the five tree species was 88.75 and 85% for both SVM and RF, respectively. We also demonstrated that the new red-edge band in the RapidEye sensor has the potential for classifying tree species in semiarid environments when integrated with other standard bands. Similarly, we observed that where there are limited training samples, SVM is preferred over RF. Finally, we demonstrated that the two accuracy measures of quantity and allocation disagreement are simpler and more helpful for the vast majority of remote sensing classification process than the kappa coefficient. Overall, high species classification can be achieved using strategically located RapidEye bands integrated with advanced processing algorithms.