Sample records for vector machine methods

  1. Research on bearing fault diagnosis of large machinery based on mathematical morphology

    NASA Astrophysics Data System (ADS)

    Wang, Yu

    2018-04-01

    To study the automatic diagnosis of large machinery fault based on support vector machine, combining the four common faults of the large machinery, the support vector machine is used to classify and identify the fault. The extracted feature vectors are entered. The feature vector is trained and identified by multi - classification method. The optimal parameters of the support vector machine are searched by trial and error method and cross validation method. Then, the support vector machine is compared with BP neural network. The results show that the support vector machines are short in time and high in classification accuracy. It is more suitable for the research of fault diagnosis in large machinery. Therefore, it can be concluded that the training speed of support vector machines (SVM) is fast and the performance is good.

  2. A Two-Layer Least Squares Support Vector Machine Approach to Credit Risk Assessment

    NASA Astrophysics Data System (ADS)

    Liu, Jingli; Li, Jianping; Xu, Weixuan; Shi, Yong

    Least squares support vector machine (LS-SVM) is a revised version of support vector machine (SVM) and has been proved to be a useful tool for pattern recognition. LS-SVM had excellent generalization performance and low computational cost. In this paper, we propose a new method called two-layer least squares support vector machine which combines kernel principle component analysis (KPCA) and linear programming form of least square support vector machine. With this method sparseness and robustness is obtained while solving large dimensional and large scale database. A U.S. commercial credit card database is used to test the efficiency of our method and the result proved to be a satisfactory one.

  3. A new method for the prediction of chatter stability lobes based on dynamic cutting force simulation model and support vector machine

    NASA Astrophysics Data System (ADS)

    Peng, Chong; Wang, Lun; Liao, T. Warren

    2015-10-01

    Currently, chatter has become the critical factor in hindering machining quality and productivity in machining processes. To avoid cutting chatter, a new method based on dynamic cutting force simulation model and support vector machine (SVM) is presented for the prediction of chatter stability lobes. The cutting force is selected as the monitoring signal, and the wavelet energy entropy theory is used to extract the feature vectors. A support vector machine is constructed using the MATLAB LIBSVM toolbox for pattern classification based on the feature vectors derived from the experimental cutting data. Then combining with the dynamic cutting force simulation model, the stability lobes diagram (SLD) can be estimated. Finally, the predicted results are compared with existing methods such as zero-order analytical (ZOA) and semi-discretization (SD) method as well as actual cutting experimental results to confirm the validity of this new method.

  4. Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data.

    PubMed

    Held, Elizabeth; Cape, Joshua; Tintle, Nathan

    2016-01-01

    Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.

  5. Currency crisis indication by using ensembles of support vector machine classifiers

    NASA Astrophysics Data System (ADS)

    Ramli, Nor Azuana; Ismail, Mohd Tahir; Wooi, Hooy Chee

    2014-07-01

    There are many methods that had been experimented in the analysis of currency crisis. However, not all methods could provide accurate indications. This paper introduces an ensemble of classifiers by using Support Vector Machine that's never been applied in analyses involving currency crisis before with the aim of increasing the indication accuracy. The proposed ensemble classifiers' performances are measured using percentage of accuracy, root mean squared error (RMSE), area under the Receiver Operating Characteristics (ROC) curve and Type II error. The performances of an ensemble of Support Vector Machine classifiers are compared with the single Support Vector Machine classifier and both of classifiers are tested on the data set from 27 countries with 12 macroeconomic indicators for each country. From our analyses, the results show that the ensemble of Support Vector Machine classifiers outperforms single Support Vector Machine classifier on the problem involving indicating a currency crisis in terms of a range of standard measures for comparing the performance of classifiers.

  6. Methods, systems and apparatus for controlling third harmonic voltage when operating a multi-space machine in an overmodulation region

    DOEpatents

    Perisic, Milun; Kinoshita, Michael H; Ranson, Ray M; Gallegos-Lopez, Gabriel

    2014-06-03

    Methods, system and apparatus are provided for controlling third harmonic voltages when operating a multi-phase machine in an overmodulation region. The multi-phase machine can be, for example, a five-phase machine in a vector controlled motor drive system that includes a five-phase PWM controlled inverter module that drives the five-phase machine. Techniques for overmodulating a reference voltage vector are provided. For example, when the reference voltage vector is determined to be within the overmodulation region, an angle of the reference voltage vector can be modified to generate a reference voltage overmodulation control angle, and a magnitude of the reference voltage vector can be modified, based on the reference voltage overmodulation control angle, to generate a modified magnitude of the reference voltage vector. By modifying the reference voltage vector, voltage command signals that control a five-phase inverter module can be optimized to increase output voltages generated by the five-phase inverter module.

  7. A Power Transformers Fault Diagnosis Model Based on Three DGA Ratios and PSO Optimization SVM

    NASA Astrophysics Data System (ADS)

    Ma, Hongzhe; Zhang, Wei; Wu, Rongrong; Yang, Chunyan

    2018-03-01

    In order to make up for the shortcomings of existing transformer fault diagnosis methods in dissolved gas-in-oil analysis (DGA) feature selection and parameter optimization, a transformer fault diagnosis model based on the three DGA ratios and particle swarm optimization (PSO) optimize support vector machine (SVM) is proposed. Using transforming support vector machine to the nonlinear and multi-classification SVM, establishing the particle swarm optimization to optimize the SVM multi classification model, and conducting transformer fault diagnosis combined with the cross validation principle. The fault diagnosis results show that the average accuracy of test method is better than the standard support vector machine and genetic algorithm support vector machine, and the proposed method can effectively improve the accuracy of transformer fault diagnosis is proved.

  8. Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.

    PubMed

    Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi

    2013-01-01

    The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.

  9. Methods, systems and apparatus for optimization of third harmonic current injection in a multi-phase machine

    DOEpatents

    Gallegos-Lopez, Gabriel

    2012-10-02

    Methods, system and apparatus are provided for increasing voltage utilization in a five-phase vector controlled machine drive system that employs third harmonic current injection to increase torque and power output by a five-phase machine. To do so, a fundamental current angle of a fundamental current vector is optimized for each particular torque-speed of operating point of the five-phase machine.

  10. Lysine acetylation sites prediction using an ensemble of support vector machine classifiers.

    PubMed

    Xu, Yan; Wang, Xiao-Bo; Ding, Jun; Wu, Ling-Yun; Deng, Nai-Yang

    2010-05-07

    Lysine acetylation is an essentially reversible and high regulated post-translational modification which regulates diverse protein properties. Experimental identification of acetylation sites is laborious and expensive. Hence, there is significant interest in the development of computational methods for reliable prediction of acetylation sites from amino acid sequences. In this paper we use an ensemble of support vector machine classifiers to perform this work. The experimentally determined acetylation lysine sites are extracted from Swiss-Prot database and scientific literatures. Experiment results show that an ensemble of support vector machine classifiers outperforms single support vector machine classifier and other computational methods such as PAIL and LysAcet on the problem of predicting acetylation lysine sites. The resulting method has been implemented in EnsemblePail, a web server for lysine acetylation sites prediction available at http://www.aporc.org/EnsemblePail/. Copyright (c) 2010 Elsevier Ltd. All rights reserved.

  11. Dual linear structured support vector machine tracking method via scale correlation filter

    NASA Astrophysics Data System (ADS)

    Li, Weisheng; Chen, Yanquan; Xiao, Bin; Feng, Chen

    2018-01-01

    Adaptive tracking-by-detection methods based on structured support vector machine (SVM) performed well on recent visual tracking benchmarks. However, these methods did not adopt an effective strategy of object scale estimation, which limits the overall tracking performance. We present a tracking method based on a dual linear structured support vector machine (DLSSVM) with a discriminative scale correlation filter. The collaborative tracker comprised of a DLSSVM model and a scale correlation filter obtains good results in tracking target position and scale estimation. The fast Fourier transform is applied for detection. Extensive experiments show that our tracking approach outperforms many popular top-ranking trackers. On a benchmark including 100 challenging video sequences, the average precision of the proposed method is 82.8%.

  12. Research on intrusion detection based on Kohonen network and support vector machine

    NASA Astrophysics Data System (ADS)

    Shuai, Chunyan; Yang, Hengcheng; Gong, Zeweiyi

    2018-05-01

    In view of the problem of low detection accuracy and the long detection time of support vector machine, which directly applied to the network intrusion detection system. Optimization of SVM parameters can greatly improve the detection accuracy, but it can not be applied to high-speed network because of the long detection time. a method based on Kohonen neural network feature selection is proposed to reduce the optimization time of support vector machine parameters. Firstly, this paper is to calculate the weights of the KDD99 network intrusion data by Kohonen network and select feature by weight. Then, after the feature selection is completed, genetic algorithm (GA) and grid search method are used for parameter optimization to find the appropriate parameters and classify them by support vector machines. By comparing experiments, it is concluded that feature selection can reduce the time of parameter optimization, which has little influence on the accuracy of classification. The experiments suggest that the support vector machine can be used in the network intrusion detection system and reduce the missing rate.

  13. Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences.

    PubMed

    An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Fang, Yu-Hong; Zhao, Yu-Jun; Zhang, Ming

    2016-01-01

    We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.

  14. Harmonic reduction of Direct Torque Control of six-phase induction motor.

    PubMed

    Taheri, A

    2016-07-01

    In this paper, a new switching method in Direct Torque Control (DTC) of a six-phase induction machine for reduction of current harmonics is introduced. Selecting a suitable vector in each sampling period is an ordinal method in the ST-DTC drive of a six-phase induction machine. The six-phase induction machine has 64 voltage vectors and divided further into four groups. In the proposed DTC method, the suitable voltage vectors are selected from two vector groups. By a suitable selection of two vectors in each sampling period, the harmonic amplitude is decreased more, in and various comparison to that of the ST-DTC drive. The harmonics loss is greater reduced, while the electromechanical energy is decreased with switching loss showing a little increase. Spectrum analysis of the phase current in the standard and new switching table DTC of the six-phase induction machine and determination for the amplitude of each harmonics is proposed in this paper. The proposed method has a less sampling time in comparison to the ordinary method. The Harmonic analyses of the current in the low and high speed shows the performance of the presented method. The simplicity of the proposed method and its implementation without any extra hardware is other advantages of the proposed method. The simulation and experimental results show the preference of the proposed method. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  15. Bayesian Kernel Methods for Non-Gaussian Distributions: Binary and Multi-class Classification Problems

    DTIC Science & Technology

    2013-05-28

    those of the support vector machine and relevance vector machine, and the model runs more quickly than the other algorithms . When one class occurs...incremental support vector machine algorithm for online learning when fewer than 50 data points are available. (a) Papers published in peer-reviewed journals...learning environments, where data processing occurs one observation at a time and the classification algorithm improves over time with new

  16. Power line identification of millimeter wave radar based on PCA-GS-SVM

    NASA Astrophysics Data System (ADS)

    Fang, Fang; Zhang, Guifeng; Cheng, Yansheng

    2017-12-01

    Aiming at the problem that the existing detection method can not effectively solve the security of UAV's ultra low altitude flight caused by power line, a power line recognition method based on grid search (GS) and the principal component analysis and support vector machine (PCA-SVM) is proposed. Firstly, the candidate line of Hough transform is reduced by PCA, and the main feature of candidate line is extracted. Then, upport vector machine (SVM is) optimized by grid search method (GS). Finally, using support vector machine classifier optimized parameters to classify the candidate line. MATLAB simulation results show that this method can effectively identify the power line and noise, and has high recognition accuracy and algorithm efficiency.

  17. Three-dimensional tool radius compensation for multi-axis peripheral milling

    NASA Astrophysics Data System (ADS)

    Chen, Youdong; Wang, Tianmiao

    2013-05-01

    Few function about 3D tool radius compensation is applied to generating executable motion control commands in the existing computer numerical control (CNC) systems. Once the tool radius is changed, especially in the case of tool size changing with tool wear in machining, a new NC program has to be recreated. A generic 3D tool radius compensation method for multi-axis peripheral milling in CNC systems is presented. The offset path is calculated by offsetting the tool path along the direction of the offset vector with a given distance. The offset vector is perpendicular to both the tangent vector of the tool path and the orientation vector of the tool axis relative to the workpiece. The orientation vector equations of the tool axis relative to the workpiece are obtained through homogeneous coordinate transformation matrix and forward kinematics of generalized kinematics model of multi-axis machine tools. To avoid cutting into the corner formed by the two adjacent tool paths, the coordinates of offset path at the intersection point have been calculated according to the transition type that is determined by the angle between the two tool path tangent vectors at the corner. Through the verification by the solid cutting simulation software VERICUT® with different tool radiuses on a table-tilting type five-axis machine tool, and by the real machining experiment of machining a soup spoon on a five-axis machine tool with the developed CNC system, the effectiveness of the proposed 3D tool radius compensation method is confirmed. The proposed compensation method can be suitable for all kinds of three- to five-axis machine tools as a general form.

  18. Testing of the Support Vector Machine for Binary-Class Classification

    NASA Technical Reports Server (NTRS)

    Scholten, Matthew

    2011-01-01

    The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results

  19. Support vector machine applied to predict the zoonotic potential of E. coli O157 cattle isolates

    USDA-ARS?s Scientific Manuscript database

    Methods based on sequence data analysis facilitate the tracking of disease outbreaks, allow relationships between strains to be reconstructed and virulence factors to be identified. However, these methods are used postfactum after an outbreak has happened. Here, we show that support vector machine a...

  20. Exact analytical modeling of magnetic vector potential in surface inset permanent magnet DC machines considering magnet segmentation

    NASA Astrophysics Data System (ADS)

    Jabbari, Ali

    2018-01-01

    Surface inset permanent magnet DC machine can be used as an alternative in automation systems due to their high efficiency and robustness. Magnet segmentation is a common technique in order to mitigate pulsating torque components in permanent magnet machines. An accurate computation of air-gap magnetic field distribution is necessary in order to calculate machine performance. An exact analytical method for magnetic vector potential calculation in surface inset permanent magnet machines considering magnet segmentation has been proposed in this paper. The analytical method is based on the resolution of Laplace and Poisson equations as well as Maxwell equation in polar coordinate by using sub-domain method. One of the main contributions of the paper is to derive an expression for the magnetic vector potential in the segmented PM region by using hyperbolic functions. The developed method is applied on the performance computation of two prototype surface inset magnet segmented motors with open circuit and on load conditions. The results of these models are validated through FEM method.

  1. The maximum vector-angular margin classifier and its fast training on large datasets using a core vector machine.

    PubMed

    Hu, Wenjun; Chung, Fu-Lai; Wang, Shitong

    2012-03-01

    Although pattern classification has been extensively studied in the past decades, how to effectively solve the corresponding training on large datasets is a problem that still requires particular attention. Many kernelized classification methods, such as SVM and SVDD, can be formulated as the corresponding quadratic programming (QP) problems, but computing the associated kernel matrices requires O(n2)(or even up to O(n3)) computational complexity, where n is the size of the training patterns, which heavily limits the applicability of these methods for large datasets. In this paper, a new classification method called the maximum vector-angular margin classifier (MAMC) is first proposed based on the vector-angular margin to find an optimal vector c in the pattern feature space, and all the testing patterns can be classified in terms of the maximum vector-angular margin ρ, between the vector c and all the training data points. Accordingly, it is proved that the kernelized MAMC can be equivalently formulated as the kernelized Minimum Enclosing Ball (MEB), which leads to a distinctive merit of MAMC, i.e., it has the flexibility of controlling the sum of support vectors like v-SVC and may be extended to a maximum vector-angular margin core vector machine (MAMCVM) by connecting the core vector machine (CVM) method with MAMC such that the corresponding fast training on large datasets can be effectively achieved. Experimental results on artificial and real datasets are provided to validate the power of the proposed methods. Copyright © 2011 Elsevier Ltd. All rights reserved.

  2. Logic Learning Machine and standard supervised methods for Hodgkin's lymphoma prognosis using gene expression data and clinical variables.

    PubMed

    Parodi, Stefano; Manneschi, Chiara; Verda, Damiano; Ferrari, Enrico; Muselli, Marco

    2018-03-01

    This study evaluates the performance of a set of machine learning techniques in predicting the prognosis of Hodgkin's lymphoma using clinical factors and gene expression data. Analysed samples from 130 Hodgkin's lymphoma patients included a small set of clinical variables and more than 54,000 gene features. Machine learning classifiers included three black-box algorithms ( k-nearest neighbour, Artificial Neural Network, and Support Vector Machine) and two methods based on intelligible rules (Decision Tree and the innovative Logic Learning Machine method). Support Vector Machine clearly outperformed any of the other methods. Among the two rule-based algorithms, Logic Learning Machine performed better and identified a set of simple intelligible rules based on a combination of clinical variables and gene expressions. Decision Tree identified a non-coding gene ( XIST) involved in the early phases of X chromosome inactivation that was overexpressed in females and in non-relapsed patients. XIST expression might be responsible for the better prognosis of female Hodgkin's lymphoma patients.

  3. Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation.

    PubMed

    Mikhchi, Abbas; Honarvar, Mahmood; Kashan, Nasser Emam Jomeh; Aminafshar, Mehdi

    2016-06-21

    Genotype imputation is an important tool for prediction of unknown genotypes for both unrelated individuals and parent-offspring trios. Several imputation methods are available and can either employ universal machine learning methods, or deploy algorithms dedicated to infer missing genotypes. In this research the performance of eight machine learning methods: Support Vector Machine, K-Nearest Neighbors, Extreme Learning Machine, Radial Basis Function, Random Forest, AdaBoost, LogitBoost, and TotalBoost compared in terms of the imputation accuracy, computation time and the factors affecting imputation accuracy. The methods employed using real and simulated datasets to impute the un-typed SNPs in parent-offspring trios. The tested methods show that imputation of parent-offspring trios can be accurate. The Random Forest and Support Vector Machine were more accurate than the other machine learning methods. The TotalBoost performed slightly worse than the other methods.The running times were different between methods. The ELM was always most fast algorithm. In case of increasing the sample size, the RBF requires long imputation time.The tested methods in this research can be an alternative for imputation of un-typed SNPs in low missing rate of data. However, it is recommended that other machine learning methods to be used for imputation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. Fabric wrinkle characterization and classification using modified wavelet coefficients and optimized support-vector-machine classifier

    USDA-ARS?s Scientific Manuscript database

    This paper presents a novel wrinkle evaluation method that uses modified wavelet coefficients and an optimized support-vector-machine (SVM) classification scheme to characterize and classify wrinkle appearance of fabric. Fabric images were decomposed with the wavelet transform (WT), and five parame...

  5. Scattering transform and LSPTSVM based fault diagnosis of rotating machinery

    NASA Astrophysics Data System (ADS)

    Ma, Shangjun; Cheng, Bo; Shang, Zhaowei; Liu, Geng

    2018-05-01

    This paper proposes an algorithm for fault diagnosis of rotating machinery to overcome the shortcomings of classical techniques which are noise sensitive in feature extraction and time consuming for training. Based on the scattering transform and the least squares recursive projection twin support vector machine (LSPTSVM), the method has the advantages of high efficiency and insensitivity for noise signal. Using the energy of the scattering coefficients in each sub-band, the features of the vibration signals are obtained. Then, an LSPTSVM classifier is used for fault diagnosis. The new method is compared with other common methods including the proximal support vector machine, the standard support vector machine and multi-scale theory by using fault data for two systems, a motor bearing and a gear box. The results show that the new method proposed in this study is more effective for fault diagnosis of rotating machinery.

  6. Object recognition of ladar with support vector machine

    NASA Astrophysics Data System (ADS)

    Sun, Jian-Feng; Li, Qi; Wang, Qi

    2005-01-01

    Intensity, range and Doppler images can be obtained by using laser radar. Laser radar can detect much more object information than other detecting sensor, such as passive infrared imaging and synthetic aperture radar (SAR), so it is well suited as the sensor of object recognition. Traditional method of laser radar object recognition is extracting target features, which can be influenced by noise. In this paper, a laser radar recognition method-Support Vector Machine is introduced. Support Vector Machine (SVM) is a new hotspot of recognition research after neural network. It has well performance on digital written and face recognition. Two series experiments about SVM designed for preprocessing and non-preprocessing samples are performed by real laser radar images, and the experiments results are compared.

  7. A comparative study of surface EMG classification by fuzzy relevance vector machine and fuzzy support vector machine.

    PubMed

    Xie, Hong-Bo; Huang, Hu; Wu, Jianhua; Liu, Lei

    2015-02-01

    We present a multiclass fuzzy relevance vector machine (FRVM) learning mechanism and evaluate its performance to classify multiple hand motions using surface electromyographic (sEMG) signals. The relevance vector machine (RVM) is a sparse Bayesian kernel method which avoids some limitations of the support vector machine (SVM). However, RVM still suffers the difficulty of possible unclassifiable regions in multiclass problems. We propose two fuzzy membership function-based FRVM algorithms to solve such problems, based on experiments conducted on seven healthy subjects and two amputees with six hand motions. Two feature sets, namely, AR model coefficients and room mean square value (AR-RMS), and wavelet transform (WT) features, are extracted from the recorded sEMG signals. Fuzzy support vector machine (FSVM) analysis was also conducted for wide comparison in terms of accuracy, sparsity, training and testing time, as well as the effect of training sample sizes. FRVM yielded comparable classification accuracy with dramatically fewer support vectors in comparison with FSVM. Furthermore, the processing delay of FRVM was much less than that of FSVM, whilst training time of FSVM much faster than FRVM. The results indicate that FRVM classifier trained using sufficient samples can achieve comparable generalization capability as FSVM with significant sparsity in multi-channel sEMG classification, which is more suitable for sEMG-based real-time control applications.

  8. Multi-centre diagnostic classification of individual structural neuroimaging scans from patients with major depressive disorder.

    PubMed

    Mwangi, Benson; Ebmeier, Klaus P; Matthews, Keith; Steele, J Douglas

    2012-05-01

    Quantitative abnormalities of brain structure in patients with major depressive disorder have been reported at a group level for decades. However, these structural differences appear subtle in comparison with conventional radiologically defined abnormalities, with considerable inter-subject variability. Consequently, it has not been possible to readily identify scans from patients with major depressive disorder at an individual level. Recently, machine learning techniques such as relevance vector machines and support vector machines have been applied to predictive classification of individual scans with variable success. Here we describe a novel hybrid method, which combines machine learning with feature selection and characterization, with the latter aimed at maximizing the accuracy of machine learning prediction. The method was tested using a multi-centre dataset of T(1)-weighted 'structural' scans. A total of 62 patients with major depressive disorder and matched controls were recruited from referred secondary care clinical populations in Aberdeen and Edinburgh, UK. The generalization ability and predictive accuracy of the classifiers was tested using data left out of the training process. High prediction accuracy was achieved (~90%). While feature selection was important for maximizing high predictive accuracy with machine learning, feature characterization contributed only a modest improvement to relevance vector machine-based prediction (~5%). Notably, while the only information provided for training the classifiers was T(1)-weighted scans plus a categorical label (major depressive disorder versus controls), both relevance vector machine and support vector machine 'weighting factors' (used for making predictions) correlated strongly with subjective ratings of illness severity. These results indicate that machine learning techniques have the potential to inform clinical practice and research, as they can make accurate predictions about brain scan data from individual subjects. Furthermore, machine learning weighting factors may reflect an objective biomarker of major depressive disorder illness severity, based on abnormalities of brain structure.

  9. Automatic event detection in low SNR microseismic signals based on multi-scale permutation entropy and a support vector machine

    NASA Astrophysics Data System (ADS)

    Jia, Rui-Sheng; Sun, Hong-Mei; Peng, Yan-Jun; Liang, Yong-Quan; Lu, Xin-Ming

    2017-07-01

    Microseismic monitoring is an effective means for providing early warning of rock or coal dynamical disasters, and its first step is microseismic event detection, although low SNR microseismic signals often cannot effectively be detected by routine methods. To solve this problem, this paper presents permutation entropy and a support vector machine to detect low SNR microseismic events. First, an extraction method of signal features based on multi-scale permutation entropy is proposed by studying the influence of the scale factor on the signal permutation entropy. Second, the detection model of low SNR microseismic events based on the least squares support vector machine is built by performing a multi-scale permutation entropy calculation for the collected vibration signals, constructing a feature vector set of signals. Finally, a comparative analysis of the microseismic events and noise signals in the experiment proves that the different characteristics of the two can be fully expressed by using multi-scale permutation entropy. The detection model of microseismic events combined with the support vector machine, which has the features of high classification accuracy and fast real-time algorithms, can meet the requirements of online, real-time extractions of microseismic events.

  10. The optional selection of micro-motion feature based on Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Li, Bo; Ren, Hongmei; Xiao, Zhi-he; Sheng, Jing

    2017-11-01

    Micro-motion form of target is multiple, different micro-motion forms are apt to be modulated, which makes it difficult for feature extraction and recognition. Aiming at feature extraction of cone-shaped objects with different micro-motion forms, this paper proposes the best selection method of micro-motion feature based on support vector machine. After the time-frequency distribution of radar echoes, comparing the time-frequency spectrum of objects with different micro-motion forms, features are extracted based on the differences between the instantaneous frequency variations of different micro-motions. According to the methods based on SVM (Support Vector Machine) features are extracted, then the best features are acquired. Finally, the result shows the method proposed in this paper is feasible under the test condition of certain signal-to-noise ratio(SNR).

  11. The assisted prediction modelling frame with hybridisation and ensemble for business risk forecasting and an implementation

    NASA Astrophysics Data System (ADS)

    Li, Hui; Hong, Lu-Yao; Zhou, Qing; Yu, Hai-Jie

    2015-08-01

    The business failure of numerous companies results in financial crises. The high social costs associated with such crises have made people to search for effective tools for business risk prediction, among which, support vector machine is very effective. Several modelling means, including single-technique modelling, hybrid modelling, and ensemble modelling, have been suggested in forecasting business risk with support vector machine. However, existing literature seldom focuses on the general modelling frame for business risk prediction, and seldom investigates performance differences among different modelling means. We reviewed researches on forecasting business risk with support vector machine, proposed the general assisted prediction modelling frame with hybridisation and ensemble (APMF-WHAE), and finally, investigated the use of principal components analysis, support vector machine, random sampling, and group decision, under the general frame in forecasting business risk. Under the APMF-WHAE frame with support vector machine as the base predictive model, four specific predictive models were produced, namely, pure support vector machine, a hybrid support vector machine involved with principal components analysis, a support vector machine ensemble involved with random sampling and group decision, and an ensemble of hybrid support vector machine using group decision to integrate various hybrid support vector machines on variables produced from principle components analysis and samples from random sampling. The experimental results indicate that hybrid support vector machine and ensemble of hybrid support vector machines were able to produce dominating performance than pure support vector machine and support vector machine ensemble.

  12. [Support vector machine?assisted diagnosis of human malignant gastric tissues based on dielectric properties].

    PubMed

    Zhang, Sa; Li, Zhou; Xin, Xue-Gang

    2017-12-20

    To achieve differential diagnosis of normal and malignant gastric tissues based on discrepancies in their dielectric properties using support vector machine. The dielectric properties of normal and malignant gastric tissues at the frequency ranging from 42.58 to 500 MHz were measured by coaxial probe method, and the Cole?Cole model was used to fit the measured data. Receiver?operating characteristic (ROC) curve analysis was used to evaluate the discrimination capability with respect to permittivity, conductivity, and Cole?Cole fitting parameters. Support vector machine was used for discriminating normal and malignant gastric tissues, and the discrimination accuracy was calculated using k?fold cross? The area under the ROC curve was above 0.8 for permittivity at the 5 frequencies at the lower end of the measured frequency range. The combination of the support vector machine with the permittivity at all these 5 frequencies combined achieved the highest discrimination accuracy of 84.38% with a MATLAB runtime of 3.40 s. The support vector machine?assisted diagnosis is feasible for human malignant gastric tissues based on the dielectric properties.

  13. Multiclass Reduced-Set Support Vector Machines

    NASA Technical Reports Server (NTRS)

    Tang, Benyang; Mazzoni, Dominic

    2006-01-01

    There are well-established methods for reducing the number of support vectors in a trained binary support vector machine, often with minimal impact on accuracy. We show how reduced-set methods can be applied to multiclass SVMs made up of several binary SVMs, with significantly better results than reducing each binary SVM independently. Our approach is based on Burges' approach that constructs each reduced-set vector as the pre-image of a vector in kernel space, but we extend this by recomputing the SVM weights and bias optimally using the original SVM objective function. This leads to greater accuracy for a binary reduced-set SVM, and also allows vectors to be 'shared' between multiple binary SVMs for greater multiclass accuracy with fewer reduced-set vectors. We also propose computing pre-images using differential evolution, which we have found to be more robust than gradient descent alone. We show experimental results on a variety of problems and find that this new approach is consistently better than previous multiclass reduced-set methods, sometimes with a dramatic difference.

  14. IMPROVEMENT OF SMVGEAR II ON VECTOR AND SCALAR MACHINES THROUGH ABSOLUTE ERROR TOLERANCE CONTROL (R823186)

    EPA Science Inventory

    The computer speed of SMVGEAR II was improved markedly on scalar and vector machines with relatively little loss in accuracy. The improvement was due to a method of frequently recalculating the absolute error tolerance instead of keeping it constant for a given set of chemistry. ...

  15. Daily sea level prediction at Chiayi coast, Taiwan using extreme learning machine and relevance vector machine

    NASA Astrophysics Data System (ADS)

    Imani, Moslem; Kao, Huan-Chin; Lan, Wen-Hau; Kuo, Chung-Yen

    2018-02-01

    The analysis and the prediction of sea level fluctuations are core requirements of marine meteorology and operational oceanography. Estimates of sea level with hours-to-days warning times are especially important for low-lying regions and coastal zone management. The primary purpose of this study is to examine the applicability and capability of extreme learning machine (ELM) and relevance vector machine (RVM) models for predicting sea level variations and compare their performances with powerful machine learning methods, namely, support vector machine (SVM) and radial basis function (RBF) models. The input dataset from the period of January 2004 to May 2011 used in the study was obtained from the Dongshi tide gauge station in Chiayi, Taiwan. Results showed that the ELM and RVM models outperformed the other methods. The performance of the RVM approach was superior in predicting the daily sea level time series given the minimum root mean square error of 34.73 mm and the maximum determination coefficient of 0.93 (R2) during the testing periods. Furthermore, the obtained results were in close agreement with the original tide-gauge data, which indicates that RVM approach is a promising alternative method for time series prediction and could be successfully used for daily sea level forecasts.

  16. Support vector machine based classification of fast Fourier transform spectroscopy of proteins

    NASA Astrophysics Data System (ADS)

    Lazarevic, Aleksandar; Pokrajac, Dragoljub; Marcano, Aristides; Melikechi, Noureddine

    2009-02-01

    Fast Fourier transform spectroscopy has proved to be a powerful method for study of the secondary structure of proteins since peak positions and their relative amplitude are affected by the number of hydrogen bridges that sustain this secondary structure. However, to our best knowledge, the method has not been used yet for identification of proteins within a complex matrix like a blood sample. The principal reason is the apparent similarity of protein infrared spectra with actual differences usually masked by the solvent contribution and other interactions. In this paper, we propose a novel machine learning based method that uses protein spectra for classification and identification of such proteins within a given sample. The proposed method uses principal component analysis (PCA) to identify most important linear combinations of original spectral components and then employs support vector machine (SVM) classification model applied on such identified combinations to categorize proteins into one of given groups. Our experiments have been performed on the set of four different proteins, namely: Bovine Serum Albumin, Leptin, Insulin-like Growth Factor 2 and Osteopontin. Our proposed method of applying principal component analysis along with support vector machines exhibits excellent classification accuracy when identifying proteins using their infrared spectra.

  17. An M-step preconditioned conjugate gradient method for parallel computation

    NASA Technical Reports Server (NTRS)

    Adams, L.

    1983-01-01

    This paper describes a preconditioned conjugate gradient method that can be effectively implemented on both vector machines and parallel arrays to solve sparse symmetric and positive definite systems of linear equations. The implementation on the CYBER 203/205 and on the Finite Element Machine is discussed and results obtained using the method on these machines are given.

  18. Interpreting linear support vector machine models with heat map molecule coloring

    PubMed Central

    2011-01-01

    Background Model-based virtual screening plays an important role in the early drug discovery stage. The outcomes of high-throughput screenings are a valuable source for machine learning algorithms to infer such models. Besides a strong performance, the interpretability of a machine learning model is a desired property to guide the optimization of a compound in later drug discovery stages. Linear support vector machines showed to have a convincing performance on large-scale data sets. The goal of this study is to present a heat map molecule coloring technique to interpret linear support vector machine models. Based on the weights of a linear model, the visualization approach colors each atom and bond of a compound according to its importance for activity. Results We evaluated our approach on a toxicity data set, a chromosome aberration data set, and the maximum unbiased validation data sets. The experiments show that our method sensibly visualizes structure-property and structure-activity relationships of a linear support vector machine model. The coloring of ligands in the binding pocket of several crystal structures of a maximum unbiased validation data set target indicates that our approach assists to determine the correct ligand orientation in the binding pocket. Additionally, the heat map coloring enables the identification of substructures important for the binding of an inhibitor. Conclusions In combination with heat map coloring, linear support vector machine models can help to guide the modification of a compound in later stages of drug discovery. Particularly substructures identified as important by our method might be a starting point for optimization of a lead compound. The heat map coloring should be considered as complementary to structure based modeling approaches. As such, it helps to get a better understanding of the binding mode of an inhibitor. PMID:21439031

  19. Distributed collaborative probabilistic design for turbine blade-tip radial running clearance using support vector machine of regression

    NASA Astrophysics Data System (ADS)

    Fei, Cheng-Wei; Bai, Guang-Chen

    2014-12-01

    To improve the computational precision and efficiency of probabilistic design for mechanical dynamic assembly like the blade-tip radial running clearance (BTRRC) of gas turbine, a distribution collaborative probabilistic design method-based support vector machine of regression (SR)(called as DCSRM) is proposed by integrating distribution collaborative response surface method and support vector machine regression model. The mathematical model of DCSRM is established and the probabilistic design idea of DCSRM is introduced. The dynamic assembly probabilistic design of aeroengine high-pressure turbine (HPT) BTRRC is accomplished to verify the proposed DCSRM. The analysis results reveal that the optimal static blade-tip clearance of HPT is gained for designing BTRRC, and improving the performance and reliability of aeroengine. The comparison of methods shows that the DCSRM has high computational accuracy and high computational efficiency in BTRRC probabilistic analysis. The present research offers an effective way for the reliability design of mechanical dynamic assembly and enriches mechanical reliability theory and method.

  20. Analysis of spectrally resolved autofluorescence images by support vector machines

    NASA Astrophysics Data System (ADS)

    Mateasik, A.; Chorvat, D.; Chorvatova, A.

    2013-02-01

    Spectral analysis of the autofluorescence images of isolated cardiac cells was performed to evaluate and to classify the metabolic state of the cells in respect to the responses to metabolic modulators. The classification was done using machine learning approach based on support vector machine with the set of the automatically calculated features from recorded spectral profile of spectral autofluorescence images. This classification method was compared with the classical approach where the individual spectral components contributing to cell autofluorescence were estimated by spectral analysis, namely by blind source separation using non-negative matrix factorization. Comparison of both methods showed that machine learning can effectively classify the spectrally resolved autofluorescence images without the need of detailed knowledge about the sources of autofluorescence and their spectral properties.

  1. Three dimensional magnetic fields in extra high speed modified Lundell alternators computed by a combined vector-scalar magnetic potential finite element method

    NASA Technical Reports Server (NTRS)

    Demerdash, N. A.; Wang, R.; Secunde, R.

    1992-01-01

    A 3D finite element (FE) approach was developed and implemented for computation of global magnetic fields in a 14.3 kVA modified Lundell alternator. The essence of the new method is the combined use of magnetic vector and scalar potential formulations in 3D FEs. This approach makes it practical, using state of the art supercomputer resources, to globally analyze magnetic fields and operating performances of rotating machines which have truly 3D magnetic flux patterns. The 3D FE-computed fields and machine inductances as well as various machine performance simulations of the 14.3 kVA machine are presented in this paper and its two companion papers.

  2. Precise on-machine extraction of the surface normal vector using an eddy current sensor array

    NASA Astrophysics Data System (ADS)

    Wang, Yongqing; Lian, Meng; Liu, Haibo; Ying, Yangwei; Sheng, Xianjun

    2016-11-01

    To satisfy the requirements of on-machine measurement of the surface normal during complex surface manufacturing, a highly robust normal vector extraction method using an Eddy current (EC) displacement sensor array is developed, the output of which is almost unaffected by surface brightness, machining coolant and environmental noise. A precise normal vector extraction model based on a triangular-distributed EC sensor array is first established. Calibration of the effects of object surface inclination and coupling interference on measurement results, and the relative position of EC sensors, is involved. A novel apparatus employing three EC sensors and a force transducer was designed, which can be easily integrated into the computer numerical control (CNC) machine tool spindle and/or robot terminal execution. Finally, to test the validity and practicability of the proposed method, typical experiments were conducted with specified testing pieces using the developed approach and system, such as an inclined plane and cylindrical and spherical surfaces.

  3. Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.

    PubMed

    Polat, Huseyin; Danaei Mehr, Homay; Cetin, Aydin

    2017-04-01

    As Chronic Kidney Disease progresses slowly, early detection and effective treatment are the only cure to reduce the mortality rate. Machine learning techniques are gaining significance in medical diagnosis because of their classification ability with high accuracy rates. The accuracy of classification algorithms depend on the use of correct feature selection algorithms to reduce the dimension of datasets. In this study, Support Vector Machine classification algorithm was used to diagnose Chronic Kidney Disease. To diagnose the Chronic Kidney Disease, two essential types of feature selection methods namely, wrapper and filter approaches were chosen to reduce the dimension of Chronic Kidney Disease dataset. In wrapper approach, classifier subset evaluator with greedy stepwise search engine and wrapper subset evaluator with the Best First search engine were used. In filter approach, correlation feature selection subset evaluator with greedy stepwise search engine and filtered subset evaluator with the Best First search engine were used. The results showed that the Support Vector Machine classifier by using filtered subset evaluator with the Best First search engine feature selection method has higher accuracy rate (98.5%) in the diagnosis of Chronic Kidney Disease compared to other selected methods.

  4. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    PubMed Central

    2010-01-01

    Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480

  5. Vision based nutrient deficiency classification in maize plants using multi class support vector machines

    NASA Astrophysics Data System (ADS)

    Leena, N.; Saju, K. K.

    2018-04-01

    Nutritional deficiencies in plants are a major concern for farmers as it affects productivity and thus profit. The work aims to classify nutritional deficiencies in maize plant in a non-destructive mannerusing image processing and machine learning techniques. The colored images of the leaves are analyzed and classified with multi-class support vector machine (SVM) method. Several images of maize leaves with known deficiencies like nitrogen, phosphorous and potassium (NPK) are used to train the SVM classifier prior to the classification of test images. The results show that the method was able to classify and identify nutritional deficiencies.

  6. Seminal quality prediction using data mining methods.

    PubMed

    Sahoo, Anoop J; Kumar, Yugal

    2014-01-01

    Now-a-days, some new classes of diseases have come into existences which are known as lifestyle diseases. The main reasons behind these diseases are changes in the lifestyle of people such as alcohol drinking, smoking, food habits etc. After going through the various lifestyle diseases, it has been found that the fertility rates (sperm quantity) in men has considerably been decreasing in last two decades. Lifestyle factors as well as environmental factors are mainly responsible for the change in the semen quality. The objective of this paper is to identify the lifestyle and environmental features that affects the seminal quality and also fertility rate in man using data mining methods. The five artificial intelligence techniques such as Multilayer perceptron (MLP), Decision Tree (DT), Navie Bayes (Kernel), Support vector machine+Particle swarm optimization (SVM+PSO) and Support vector machine (SVM) have been applied on fertility dataset to evaluate the seminal quality and also to predict the person is either normal or having altered fertility rate. While the eight feature selection techniques such as support vector machine (SVM), neural network (NN), evolutionary logistic regression (LR), support vector machine plus particle swarm optimization (SVM+PSO), principle component analysis (PCA), chi-square test, correlation and T-test methods have been used to identify more relevant features which affect the seminal quality. These techniques are applied on fertility dataset which contains 100 instances with nine attribute with two classes. The experimental result shows that SVM+PSO provides higher accuracy and area under curve (AUC) rate (94% & 0.932) among multi-layer perceptron (MLP) (92% & 0.728), Support Vector Machines (91% & 0.758), Navie Bayes (Kernel) (89% & 0.850) and Decision Tree (89% & 0.735) for some of the seminal parameters. This paper also focuses on the feature selection process i.e. how to select the features which are more important for prediction of fertility rate. In this paper, eight feature selection methods are applied on fertility dataset to find out a set of good features. The investigational results shows that childish diseases (0.079) and high fever features (0.057) has less impact on fertility rate while age (0.8685), season (0.843), surgical intervention (0.7683), alcohol consumption (0.5992), smoking habit (0.575), number of hours spent on setting (0.4366) and accident (0.5973) features have more impact. It is also observed that feature selection methods increase the accuracy of above mentioned techniques (multilayer perceptron 92%, support vector machine 91%, SVM+PSO 94%, Navie Bayes (Kernel) 89% and decision tree 89%) as compared to without feature selection methods (multilayer perceptron 86%, support vector machine 86%, SVM+PSO 85%, Navie Bayes (Kernel) 83% and decision tree 84%) which shows the applicability of feature selection methods in prediction. This paper lightens the application of artificial techniques in medical domain. From this paper, it can be concluded that data mining methods can be used to predict a person with or without disease based on environmental and lifestyle parameters/features rather than undergoing various medical test. In this paper, five data mining techniques are used to predict the fertility rate and among which SVM+PSO provide more accurate results than support vector machine and decision tree.

  7. An implementation of support vector machine on sentiment classification of movie reviews

    NASA Astrophysics Data System (ADS)

    Yulietha, I. M.; Faraby, S. A.; Adiwijaya; Widyaningtyas, W. C.

    2018-03-01

    With technological advances, all information about movie is available on the internet. If the information is processed properly, it will get the quality of the information. This research proposes to the classify sentiments on movie review documents. This research uses Support Vector Machine (SVM) method because it can classify high dimensional data in accordance with the data used in this research in the form of text. Support Vector Machine is a popular machine learning technique for text classification because it can classify by learning from a collection of documents that have been classified previously and can provide good result. Based on number of datasets, the 90-10 composition has the best result that is 85.6%. Based on SVM kernel, kernel linear with constant 1 has the best result that is 84.9%

  8. Implementation of an ADI method on parallel computers

    NASA Technical Reports Server (NTRS)

    Fatoohi, Raad A.; Grosch, Chester E.

    1987-01-01

    The implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the FLEX/32 and CRAY/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.

  9. Implementation of an ADI method on parallel computers

    NASA Technical Reports Server (NTRS)

    Fatoohi, Raad A.; Grosch, Chester E.

    1987-01-01

    In this paper the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented.

  10. Fuzzy support vector machines for adaptive Morse code recognition.

    PubMed

    Yang, Cheng-Hong; Jin, Li-Cheng; Chuang, Li-Yeh

    2006-11-01

    Morse code is now being harnessed for use in rehabilitation applications of augmentative-alternative communication and assistive technology, facilitating mobility, environmental control and adapted worksite access. In this paper, Morse code is selected as a communication adaptive device for persons who suffer from muscle atrophy, cerebral palsy or other severe handicaps. A stable typing rate is strictly required for Morse code to be effective as a communication tool. Therefore, an adaptive automatic recognition method with a high recognition rate is needed. The proposed system uses both fuzzy support vector machines and the variable-degree variable-step-size least-mean-square algorithm to achieve these objectives. We apply fuzzy memberships to each point, and provide different contributions to the decision learning function for support vector machines. Statistical analyses demonstrated that the proposed method elicited a higher recognition rate than other algorithms in the literature.

  11. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    ERIC Educational Resources Information Center

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  12. Hybrid Model Based on Genetic Algorithms and SVM Applied to Variable Selection within Fruit Juice Classification

    PubMed Central

    Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.

    2013-01-01

    Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933

  13. Supervised machine learning algorithms to diagnose stress for vehicle drivers based on physiological sensor signals.

    PubMed

    Barua, Shaibal; Begum, Shahina; Ahmed, Mobyen Uddin

    2015-01-01

    Machine learning algorithms play an important role in computer science research. Recent advancement in sensor data collection in clinical sciences lead to a complex, heterogeneous data processing, and analysis for patient diagnosis and prognosis. Diagnosis and treatment of patients based on manual analysis of these sensor data are difficult and time consuming. Therefore, development of Knowledge-based systems to support clinicians in decision-making is important. However, it is necessary to perform experimental work to compare performances of different machine learning methods to help to select appropriate method for a specific characteristic of data sets. This paper compares classification performance of three popular machine learning methods i.e., case-based reasoning, neutral networks and support vector machine to diagnose stress of vehicle drivers using finger temperature and heart rate variability. The experimental results show that case-based reasoning outperforms other two methods in terms of classification accuracy. Case-based reasoning has achieved 80% and 86% accuracy to classify stress using finger temperature and heart rate variability. On contrary, both neural network and support vector machine have achieved less than 80% accuracy by using both physiological signals.

  14. Research on Classification of Chinese Text Data Based on SVM

    NASA Astrophysics Data System (ADS)

    Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao

    2017-09-01

    Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.

  15. Optimal quantum cloning based on the maximin principle by using a priori information

    NASA Astrophysics Data System (ADS)

    Kang, Peng; Dai, Hong-Yi; Wei, Jia-Hua; Zhang, Ming

    2016-10-01

    We propose an optimal 1 →2 quantum cloning method based on the maximin principle by making full use of a priori information of amplitude and phase about the general cloned qubit input set, which is a simply connected region enclosed by a "longitude-latitude grid" on the Bloch sphere. Theoretically, the fidelity of the optimal quantum cloning machine derived from this method is the largest in terms of the maximin principle compared with that of any other machine. The problem solving is an optimization process that involves six unknown complex variables, six vectors in an uncertain-dimensional complex vector space, and four equality constraints. Moreover, by restricting the structure of the quantum cloning machine, the optimization problem is simplified as a three-real-parameter suboptimization problem with only one equality constraint. We obtain the explicit formula for a suboptimal quantum cloning machine. Additionally, the fidelity of our suboptimal quantum cloning machine is higher than or at least equal to that of universal quantum cloning machines and phase-covariant quantum cloning machines. It is also underlined that the suboptimal cloning machine outperforms the "belt quantum cloning machine" for some cases.

  16. Novel method of finding extreme edges in a convex set of N-dimension vectors

    NASA Astrophysics Data System (ADS)

    Hu, Chia-Lun J.

    2001-11-01

    As we published in the last few years, for a binary neural network pattern recognition system to learn a given mapping {Um mapped to Vm, m=1 to M} where um is an N- dimension analog (pattern) vector, Vm is a P-bit binary (classification) vector, the if-and-only-if (IFF) condition that this network can learn this mapping is that each i-set in {Ymi, m=1 to M} (where Ymithere existsVmiUm and Vmi=+1 or -1, is the i-th bit of VR-m).)(i=1 to P and there are P sets included here.) Is POSITIVELY, LINEARLY, INDEPENDENT or PLI. We have shown that this PLI condition is MORE GENERAL than the convexity condition applied to a set of N-vectors. In the design of old learning machines, we know that if a set of N-dimension analog vectors form a convex set, and if the machine can learn the boundary vectors (or extreme edges) of this set, then it can definitely learn the inside vectors contained in this POLYHEDRON CONE. This paper reports a new method and new algorithm to find the boundary vectors of a convex set of ND analog vectors.

  17. Application of Classification Models to Pharyngeal High-Resolution Manometry

    ERIC Educational Resources Information Center

    Mielens, Jason D.; Hoffman, Matthew R.; Ciucci, Michelle R.; McCulloch, Timothy M.; Jiang, Jack J.

    2012-01-01

    Purpose: The authors present 3 methods of performing pattern recognition on spatiotemporal plots produced by pharyngeal high-resolution manometry (HRM). Method: Classification models, including the artificial neural networks (ANNs) multilayer perceptron (MLP) and learning vector quantization (LVQ), as well as support vector machines (SVM), were…

  18. Predicting complications of percutaneous coronary intervention using a novel support vector method

    PubMed Central

    Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan

    2013-01-01

    Objective To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Materials and methods Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. Results The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer–Lemeshow χ2 value (seven cases) and the mean cross-entropy error (eight cases). Conclusions The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains. PMID:23599229

  19. Entanglement-Based Machine Learning on a Quantum Computer

    NASA Astrophysics Data System (ADS)

    Cai, X.-D.; Wu, D.; Su, Z.-E.; Chen, M.-C.; Wang, X.-L.; Li, Li; Liu, N.-L.; Lu, C.-Y.; Pan, J.-W.

    2015-03-01

    Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing "big data" could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning.

  20. Computer-Aided Diagnosis for Breast Ultrasound Using Computerized BI-RADS Features and Machine Learning Methods.

    PubMed

    Shan, Juan; Alam, S Kaisar; Garra, Brian; Zhang, Yingtao; Ahmed, Tahira

    2016-04-01

    This work identifies effective computable features from the Breast Imaging Reporting and Data System (BI-RADS), to develop a computer-aided diagnosis (CAD) system for breast ultrasound. Computerized features corresponding to ultrasound BI-RADs categories were designed and tested using a database of 283 pathology-proven benign and malignant lesions. Features were selected based on classification performance using a "bottom-up" approach for different machine learning methods, including decision tree, artificial neural network, random forest and support vector machine. Using 10-fold cross-validation on the database of 283 cases, the highest area under the receiver operating characteristic (ROC) curve (AUC) was 0.84 from a support vector machine with 77.7% overall accuracy; the highest overall accuracy, 78.5%, was from a random forest with the AUC 0.83. Lesion margin and orientation were optimum features common to all of the different machine learning methods. These features can be used in CAD systems to help distinguish benign from worrisome lesions. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. All rights reserved.

  1. Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning.

    PubMed

    Formisano, Elia; De Martino, Federico; Valente, Giancarlo

    2008-09-01

    Machine learning and pattern recognition techniques are being increasingly employed in functional magnetic resonance imaging (fMRI) data analysis. By taking into account the full spatial pattern of brain activity measured simultaneously at many locations, these methods allow detecting subtle, non-strictly localized effects that may remain invisible to the conventional analysis with univariate statistical methods. In typical fMRI applications, pattern recognition algorithms "learn" a functional relationship between brain response patterns and a perceptual, cognitive or behavioral state of a subject expressed in terms of a label, which may assume discrete (classification) or continuous (regression) values. This learned functional relationship is then used to predict the unseen labels from a new data set ("brain reading"). In this article, we describe the mathematical foundations of machine learning applications in fMRI. We focus on two methods, support vector machines and relevance vector machines, which are respectively suited for the classification and regression of fMRI patterns. Furthermore, by means of several examples and applications, we illustrate and discuss the methodological challenges of using machine learning algorithms in the context of fMRI data analysis.

  2. repRNA: a web server for generating various feature vectors of RNA sequences.

    PubMed

    Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen

    2016-02-01

    With the rapid growth of RNA sequences generated in the postgenomic age, it is highly desired to develop a flexible method that can generate various kinds of vectors to represent these sequences by focusing on their different features. This is because nearly all the existing machine-learning methods, such as SVM (support vector machine) and KNN (k-nearest neighbor), can only handle vectors but not sequences. To meet the increasing demands and speed up the genome analyses, we have developed a new web server, called "representations of RNA sequences" (repRNA). Compared with the existing methods, repRNA is much more comprehensive, flexible and powerful, as reflected by the following facts: (1) it can generate 11 different modes of feature vectors for users to choose according to their investigation purposes; (2) it allows users to select the features from 22 built-in physicochemical properties and even those defined by users' own; (3) the resultant feature vectors and the secondary structures of the corresponding RNA sequences can be visualized. The repRNA web server is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repRNA/ .

  3. Boosted Regression Trees Outperforms Support Vector Machines in Predicting (Regional) Yields of Winter Wheat from Single and Cumulated Dekadal Spot-VGT Derived Normalized Difference Vegetation Indices

    NASA Astrophysics Data System (ADS)

    Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos

    2016-08-01

    This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.

  4. Methods, systems and apparatus for synchronous current regulation of a five-phase machine

    DOEpatents

    Gallegos-Lopez, Gabriel; Perisic, Milun

    2012-10-09

    Methods, systems and apparatus are provided for controlling operation of and regulating current provided to a five-phase machine when one or more phases has experienced a fault or has failed. In one implementation, the disclosed embodiments can be used to synchronously regulate current in a vector controlled motor drive system that includes a five-phase AC machine, a five-phase inverter module coupled to the five-phase AC machine, and a synchronous current regulator.

  5. Automatic EEG artifact removal: a weighted support vector machine approach with error correction.

    PubMed

    Shao, Shi-Yun; Shen, Kai-Quan; Ong, Chong Jin; Wilder-Smith, Einar P V; Li, Xiao-Ping

    2009-02-01

    An automatic electroencephalogram (EEG) artifact removal method is presented in this paper. Compared to past methods, it has two unique features: 1) a weighted version of support vector machine formulation that handles the inherent unbalanced nature of component classification and 2) the ability to accommodate structural information typically found in component classification. The advantages of the proposed method are demonstrated on real-life EEG recordings with comparisons made to several benchmark methods. Results show that the proposed method is preferable to the other methods in the context of artifact removal by achieving a better tradeoff between removing artifacts and preserving inherent brain activities. Qualitative evaluation of the reconstructed EEG epochs also demonstrates that after artifact removal inherent brain activities are largely preserved.

  6. Product Quality Modelling Based on Incremental Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Wang, J.; Zhang, W.; Qin, B.; Shi, W.

    2012-05-01

    Incremental Support vector machine (ISVM) is a new learning method developed in recent years based on the foundations of statistical learning theory. It is suitable for the problem of sequentially arriving field data and has been widely used for product quality prediction and production process optimization. However, the traditional ISVM learning does not consider the quality of the incremental data which may contain noise and redundant data; it will affect the learning speed and accuracy to a great extent. In order to improve SVM training speed and accuracy, a modified incremental support vector machine (MISVM) is proposed in this paper. Firstly, the margin vectors are extracted according to the Karush-Kuhn-Tucker (KKT) condition; then the distance from the margin vectors to the final decision hyperplane is calculated to evaluate the importance of margin vectors, where the margin vectors are removed while their distance exceed the specified value; finally, the original SVs and remaining margin vectors are used to update the SVM. The proposed MISVM can not only eliminate the unimportant samples such as noise samples, but also can preserve the important samples. The MISVM has been experimented on two public data and one field data of zinc coating weight in strip hot-dip galvanizing, and the results shows that the proposed method can improve the prediction accuracy and the training speed effectively. Furthermore, it can provide the necessary decision supports and analysis tools for auto control of product quality, and also can extend to other process industries, such as chemical process and manufacturing process.

  7. Improving protein–protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model

    PubMed Central

    An, Ji‐Yong; Meng, Fan‐Rong; Chen, Xing; Yan, Gui‐Ying; Hu, Ji‐Pu

    2016-01-01

    Abstract Predicting protein–protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high‐throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM‐BiGP that combines the relevance vector machine (RVM) model and Bi‐gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi‐gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five‐fold cross‐validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state‐of‐the‐art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM‐BiGP method is significantly better than the SVM‐based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM‐BiGP‐PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. PMID:27452983

  8. Support Vector Hazards Machine: A Counting Process Framework for Learning Risk Scores for Censored Outcomes.

    PubMed

    Wang, Yuanjia; Chen, Tianle; Zeng, Donglin

    2016-01-01

    Learning risk scores to predict dichotomous or continuous outcomes using machine learning approaches has been studied extensively. However, how to learn risk scores for time-to-event outcomes subject to right censoring has received little attention until recently. Existing approaches rely on inverse probability weighting or rank-based regression, which may be inefficient. In this paper, we develop a new support vector hazards machine (SVHM) approach to predict censored outcomes. Our method is based on predicting the counting process associated with the time-to-event outcomes among subjects at risk via a series of support vector machines. Introducing counting processes to represent time-to-event data leads to a connection between support vector machines in supervised learning and hazards regression in standard survival analysis. To account for different at risk populations at observed event times, a time-varying offset is used in estimating risk scores. The resulting optimization is a convex quadratic programming problem that can easily incorporate non-linearity using kernel trick. We demonstrate an interesting link from the profiled empirical risk function of SVHM to the Cox partial likelihood. We then formally show that SVHM is optimal in discriminating covariate-specific hazard function from population average hazard function, and establish the consistency and learning rate of the predicted risk using the estimated risk scores. Simulation studies show improved prediction accuracy of the event times using SVHM compared to existing machine learning methods and standard conventional approaches. Finally, we analyze two real world biomedical study data where we use clinical markers and neuroimaging biomarkers to predict age-at-onset of a disease, and demonstrate superiority of SVHM in distinguishing high risk versus low risk subjects.

  9. Support vector machine in machine condition monitoring and fault diagnosis

    NASA Astrophysics Data System (ADS)

    Widodo, Achmad; Yang, Bo-Suk

    2007-08-01

    Recently, the issue of machine condition monitoring and fault diagnosis as a part of maintenance system became global due to the potential advantages to be gained from reduced maintenance costs, improved productivity and increased machine availability. This paper presents a survey of machine condition monitoring and fault diagnosis using support vector machine (SVM). It attempts to summarize and review the recent research and developments of SVM in machine condition monitoring and diagnosis. Numerous methods have been developed based on intelligent systems such as artificial neural network, fuzzy expert system, condition-based reasoning, random forest, etc. However, the use of SVM for machine condition monitoring and fault diagnosis is still rare. SVM has excellent performance in generalization so it can produce high accuracy in classification for machine condition monitoring and diagnosis. Until 2006, the use of SVM in machine condition monitoring and fault diagnosis is tending to develop towards expertise orientation and problem-oriented domain. Finally, the ability to continually change and obtain a novel idea for machine condition monitoring and fault diagnosis using SVM will be future works.

  10. A Shellcode Detection Method Based on Full Native API Sequence and Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Cheng, Yixuan; Fan, Wenqing; Huang, Wei; An, Jing

    2017-09-01

    Dynamic monitoring the behavior of a program is widely used to discriminate between benign program and malware. It is usually based on the dynamic characteristics of a program, such as API call sequence or API call frequency to judge. The key innovation of this paper is to consider the full Native API sequence and use the support vector machine to detect the shellcode. We also use the Markov chain to extract and digitize Native API sequence features. Our experimental results show that the method proposed in this paper has high accuracy and low detection rate.

  11. Machine learning to predict the occurrence of bisphosphonate-related osteonecrosis of the jaw associated with dental extraction: A preliminary report.

    PubMed

    Kim, Dong Wook; Kim, Hwiyoung; Nam, Woong; Kim, Hyung Jun; Cha, In-Ho

    2018-04-23

    The aim of this study was to build and validate five types of machine learning models that can predict the occurrence of BRONJ associated with dental extraction in patients taking bisphosphonates for the management of osteoporosis. A retrospective review of the medical records was conducted to obtain cases and controls for the study. Total 125 patients consisting of 41 cases and 84 controls were selected for the study. Five machine learning prediction algorithms including multivariable logistic regression model, decision tree, support vector machine, artificial neural network, and random forest were implemented. The outputs of these models were compared with each other and also with conventional methods, such as serum CTX level. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results. The performance of machine learning models was significantly superior to conventional statistical methods and single predictors. The random forest model yielded the best performance (AUC = 0.973), followed by artificial neural network (AUC = 0.915), support vector machine (AUC = 0.882), logistic regression (AUC = 0.844), decision tree (AUC = 0.821), drug holiday alone (AUC = 0.810), and CTX level alone (AUC = 0.630). Machine learning methods showed superior performance in predicting BRONJ associated with dental extraction compared to conventional statistical methods using drug holiday and serum CTX level. Machine learning can thus be applied in a wide range of clinical studies. Copyright © 2017. Published by Elsevier Inc.

  12. Method and apparatus for improving the quality and efficiency of ultrashort-pulse laser machining

    DOEpatents

    Stuart, Brent C.; Nguyen, Hoang T.; Perry, Michael D.

    2001-01-01

    A method and apparatus for improving the quality and efficiency of machining of materials with laser pulse durations shorter than 100 picoseconds by orienting and maintaining the polarization of the laser light such that the electric field vector is perpendicular relative to the edges of the material being processed. Its use is any machining operation requiring remote delivery and/or high precision with minimal collateral dames.

  13. Combined empirical mode decomposition and texture features for skin lesion classification using quadratic support vector machine.

    PubMed

    Wahba, Maram A; Ashour, Amira S; Napoleon, Sameh A; Abd Elnaby, Mustafa M; Guo, Yanhui

    2017-12-01

    Basal cell carcinoma is one of the most common malignant skin lesions. Automated lesion identification and classification using image processing techniques is highly required to reduce the diagnosis errors. In this study, a novel technique is applied to classify skin lesion images into two classes, namely the malignant Basal cell carcinoma and the benign nevus. A hybrid combination of bi-dimensional empirical mode decomposition and gray-level difference method features is proposed after hair removal. The combined features are further classified using quadratic support vector machine (Q-SVM). The proposed system has achieved outstanding performance of 100% accuracy, sensitivity and specificity compared to other support vector machine procedures as well as with different extracted features. Basal Cell Carcinoma is effectively classified using Q-SVM with the proposed combined features.

  14. Applications of Support Vector Machines In Chemo And Bioinformatics

    NASA Astrophysics Data System (ADS)

    Jayaraman, V. K.; Sundararajan, V.

    2010-10-01

    Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.

  15. Deep learning of support vector machines with class probability output networks.

    PubMed

    Kim, Sangwook; Yu, Zhibin; Kil, Rhee Man; Lee, Minho

    2015-04-01

    Deep learning methods endeavor to learn features automatically at multiple levels and allow systems to learn complex functions mapping from the input space to the output space for the given data. The ability to learn powerful features automatically is increasingly important as the volume of data and range of applications of machine learning methods continues to grow. This paper proposes a new deep architecture that uses support vector machines (SVMs) with class probability output networks (CPONs) to provide better generalization power for pattern classification problems. As a result, deep features are extracted without additional feature engineering steps, using multiple layers of the SVM classifiers with CPONs. The proposed structure closely approaches the ideal Bayes classifier as the number of layers increases. Using a simulation of classification problems, the effectiveness of the proposed method is demonstrated. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Product demand forecasts using wavelet kernel support vector machine and particle swarm optimization in manufacture system

    NASA Astrophysics Data System (ADS)

    Wu, Qi

    2010-03-01

    Demand forecasts play a crucial role in supply chain management. The future demand for a certain product is the basis for the respective replenishment systems. Aiming at demand series with small samples, seasonal character, nonlinearity, randomicity and fuzziness, the existing support vector kernel does not approach the random curve of the sales time series in the space (quadratic continuous integral space). In this paper, we present a hybrid intelligent system combining the wavelet kernel support vector machine and particle swarm optimization for demand forecasting. The results of application in car sale series forecasting show that the forecasting approach based on the hybrid PSOWv-SVM model is effective and feasible, the comparison between the method proposed in this paper and other ones is also given, which proves that this method is, for the discussed example, better than hybrid PSOv-SVM and other traditional methods.

  17. Emotion detection from text

    NASA Astrophysics Data System (ADS)

    Ramalingam, V. V.; Pandian, A.; Jaiswal, Abhijeet; Bhatia, Nikhar

    2018-04-01

    This paper presents a novel method based on concept of Machine Learning for Emotion Detection using various algorithms of Support Vector Machine and major emotions described are linked to the Word-Net for enhanced accuracy. The approach proposed plays a promising role to augment the Artificial Intelligence in the near future and could be vital in optimization of Human-Machine Interface.

  18. Interpreting support vector machine models for multivariate group wise analysis in neuroimaging

    PubMed Central

    Gaonkar, Bilwaj; Shinohara, Russell T; Davatzikos, Christos

    2015-01-01

    Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier’s decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification. PMID:26210913

  19. Prediction task guided representation learning of medical codes in EHR.

    PubMed

    Cui, Liwen; Xie, Xiaolei; Shen, Zuojun

    2018-06-18

    There have been rapidly growing applications using machine learning models for predictive analytics in Electronic Health Records (EHR) to improve the quality of hospital services and the efficiency of healthcare resource utilization. A fundamental and crucial step in developing such models is to convert medical codes in EHR to feature vectors. These medical codes are used to represent diagnoses or procedures. Their vector representations have a tremendous impact on the performance of machine learning models. Recently, some researchers have utilized representation learning methods from Natural Language Processing (NLP) to learn vector representations of medical codes. However, most previous approaches are unsupervised, i.e. the generation of medical code vectors is independent from prediction tasks. Thus, the obtained feature vectors may be inappropriate for a specific prediction task. Moreover, unsupervised methods often require a lot of samples to obtain reliable results, but most practical problems have very limited patient samples. In this paper, we develop a new method called Prediction Task Guided Health Record Aggregation (PTGHRA), which aggregates health records guided by prediction tasks, to construct training corpus for various representation learning models. Compared with unsupervised approaches, representation learning models integrated with PTGHRA yield a significant improvement in predictive capability of generated medical code vectors, especially for limited training samples. Copyright © 2018. Published by Elsevier Inc.

  20. Human action recognition with group lasso regularized-support vector machine

    NASA Astrophysics Data System (ADS)

    Luo, Huiwu; Lu, Huanzhang; Wu, Yabei; Zhao, Fei

    2016-05-01

    The bag-of-visual-words (BOVW) and Fisher kernel are two popular models in human action recognition, and support vector machine (SVM) is the most commonly used classifier for the two models. We show two kinds of group structures in the feature representation constructed by BOVW and Fisher kernel, respectively, since the structural information of feature representation can be seen as a prior for the classifier and can improve the performance of the classifier, which has been verified in several areas. However, the standard SVM employs L2-norm regularization in its learning procedure, which penalizes each variable individually and cannot express the structural information of feature representation. We replace the L2-norm regularization with group lasso regularization in standard SVM, and a group lasso regularized-support vector machine (GLRSVM) is proposed. Then, we embed the group structural information of feature representation into GLRSVM. Finally, we introduce an algorithm to solve the optimization problem of GLRSVM by alternating directions method of multipliers. The experiments evaluated on KTH, YouTube, and Hollywood2 datasets show that our method achieves promising results and improves the state-of-the-art methods on KTH and YouTube datasets.

  1. T-wave end detection using neural networks and Support Vector Machines.

    PubMed

    Suárez-León, Alexander Alexeis; Varon, Carolina; Willems, Rik; Van Huffel, Sabine; Vázquez-Seisdedos, Carlos Román

    2018-05-01

    In this paper we propose a new approach for detecting the end of the T-wave in the electrocardiogram (ECG) using Neural Networks and Support Vector Machines. Both, Multilayer Perceptron (MLP) neural networks and Fixed-Size Least-Squares Support Vector Machines (FS-LSSVM) were used as regression algorithms to determine the end of the T-wave. Different strategies for selecting the training set such as random selection, k-means, robust clustering and maximum quadratic (Rényi) entropy were evaluated. Individual parameters were tuned for each method during training and the results are given for the evaluation set. A comparison between MLP and FS-LSSVM approaches was performed. Finally, a fair comparison of the FS-LSSVM method with other state-of-the-art algorithms for detecting the end of the T-wave was included. The experimental results show that FS-LSSVM approaches are more suitable as regression algorithms than MLP neural networks. Despite the small training sets used, the FS-LSSVM methods outperformed the state-of-the-art techniques. FS-LSSVM can be successfully used as a T-wave end detection algorithm in ECG even with small training set sizes. Copyright © 2018 Elsevier Ltd. All rights reserved.

  2. A Temperature Compensation Method for Piezo-Resistive Pressure Sensor Utilizing Chaotic Ions Motion Algorithm Optimized Hybrid Kernel LSSVM.

    PubMed

    Li, Ji; Hu, Guoqing; Zhou, Yonghong; Zou, Chong; Peng, Wei; Alam Sm, Jahangir

    2016-10-14

    A piezo-resistive pressure sensor is made of silicon, the nature of which is considerably influenced by ambient temperature. The effect of temperature should be eliminated during the working period in expectation of linear output. To deal with this issue, an approach consists of a hybrid kernel Least Squares Support Vector Machine (LSSVM) optimized by a chaotic ions motion algorithm presented. To achieve the learning and generalization for excellent performance, a hybrid kernel function, constructed by a local kernel as Radial Basis Function (RBF) kernel, and a global kernel as polynomial kernel is incorporated into the Least Squares Support Vector Machine. The chaotic ions motion algorithm is introduced to find the best hyper-parameters of the Least Squares Support Vector Machine. The temperature data from a calibration experiment is conducted to validate the proposed method. With attention on algorithm robustness and engineering applications, the compensation result shows the proposed scheme outperforms other compared methods on several performance measures as maximum absolute relative error, minimum absolute relative error mean and variance of the averaged value on fifty runs. Furthermore, the proposed temperature compensation approach lays a foundation for more extensive research.

  3. Detection of License Plate using Sliding Window, Histogram of Oriented Gradient, and Support Vector Machines Method

    NASA Astrophysics Data System (ADS)

    Astawa, INGA; Gusti Ngurah Bagus Caturbawa, I.; Made Sajayasa, I.; Dwi Suta Atmaja, I. Made Ari

    2018-01-01

    The license plate recognition usually used as part of system such as parking system. License plate detection considered as the most important step in the license plate recognition system. We propose methods that can be used to detect the vehicle plate on mobile phone. In this paper, we used Sliding Window, Histogram of Oriented Gradient (HOG), and Support Vector Machines (SVM) method to license plate detection so it will increase the detection level even though the image is not in a good quality. The image proceed by Sliding Window method in order to find plate position. Feature extraction in every window movement had been done by HOG and SVM method. Good result had shown in this research, which is 96% of accuracy.

  4. Color image segmentation with support vector machines: applications to road signs detection.

    PubMed

    Cyganek, Bogusław

    2008-08-01

    In this paper we propose efficient color segmentation method which is based on the Support Vector Machine classifier operating in a one-class mode. The method has been developed especially for the road signs recognition system, although it can be used in other applications. The main advantage of the proposed method comes from the fact that the segmentation of characteristic colors is performed not in the original but in the higher dimensional feature space. By this a better data encapsulation with a linear hypersphere can be usually achieved. Moreover, the classifier does not try to capture the whole distribution of the input data which is often difficult to achieve. Instead, the characteristic data samples, called support vectors, are selected which allow construction of the tightest hypersphere that encloses majority of the input data. Then classification of a test data simply consists in a measurement of its distance to a centre of the found hypersphere. The experimental results show high accuracy and speed of the proposed method.

  5. Prediction of hourly PM2.5 using a space-time support vector regression model

    NASA Astrophysics Data System (ADS)

    Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang

    2018-05-01

    Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.

  6. Study on vibration characteristics and fault diagnosis method of oil-immersed flat wave reactor in Arctic area converter station

    NASA Astrophysics Data System (ADS)

    Lai, Wenqing; Wang, Yuandong; Li, Wenpeng; Sun, Guang; Qu, Guomin; Cui, Shigang; Li, Mengke; Wang, Yongqiang

    2017-10-01

    Based on long term vibration monitoring of the No.2 oil-immersed fat wave reactor in the ±500kV converter station in East Mongolia, the vibration signals in normal state and in core loose fault state were saved. Through the time-frequency analysis of the signals, the vibration characteristics of the core loose fault were obtained, and a fault diagnosis method based on the dual tree complex wavelet (DT-CWT) and support vector machine (SVM) was proposed. The vibration signals were analyzed by DT-CWT, and the energy entropy of the vibration signals were taken as the feature vector; the support vector machine was used to train and test the feature vector, and the accurate identification of the core loose fault of the flat wave reactor was realized. Through the identification of many groups of normal and core loose fault state vibration signals, the diagnostic accuracy of the result reached 97.36%. The effectiveness and accuracy of the method in the fault diagnosis of the flat wave reactor core is verified.

  7. LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 JOHNS HOPKINS SUMMER WORKSHOP.

    PubMed

    Hasegawa-Johnson, Mark; Baker, James; Borys, Sarah; Chen, Ken; Coogan, Emily; Greenberg, Steven; Juneja, Amit; Kirchhoff, Katrin; Livescu, Karen; Mohan, Srividya; Muller, Jennifer; Sonmez, Kemal; Wang, Tianyu

    2005-01-01

    Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multiframe acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.

  8. Classification of Stellar Spectra with Fuzzy Minimum Within-Class Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Zhong-bao, Liu; Wen-ai, Song; Jing, Zhang; Wen-juan, Zhao

    2017-06-01

    Classification is one of the important tasks in astronomy, especially in spectra analysis. Support Vector Machine (SVM) is a typical classification method, which is widely used in spectra classification. Although it performs well in practice, its classification accuracies can not be greatly improved because of two limitations. One is it does not take the distribution of the classes into consideration. The other is it is sensitive to noise. In order to solve the above problems, inspired by the maximization of the Fisher's Discriminant Analysis (FDA) and the SVM separability constraints, fuzzy minimum within-class support vector machine (FMWSVM) is proposed in this paper. In FMWSVM, the distribution of the classes is reflected by the within-class scatter in FDA and the fuzzy membership function is introduced to decrease the influence of the noise. The comparative experiments with SVM on the SDSS datasets verify the effectiveness of the proposed classifier FMWSVM.

  9. Geographical traceability of Marsdenia tenacissima by Fourier transform infrared spectroscopy and chemometrics

    NASA Astrophysics Data System (ADS)

    Li, Chao; Yang, Sheng-Chao; Guo, Qiao-Sheng; Zheng, Kai-Yan; Wang, Ping-Li; Meng, Zhen-Gui

    2016-01-01

    A combination of Fourier transform infrared spectroscopy with chemometrics tools provided an approach for studying Marsdenia tenacissima according to its geographical origin. A total of 128 M. tenacissima samples from four provinces in China were analyzed with FTIR spectroscopy. Six pattern recognition methods were used to construct the discrimination models: support vector machine-genetic algorithms, support vector machine-particle swarm optimization, K-nearest neighbors, radial basis function neural network, random forest and support vector machine-grid search. Experimental results showed that K-nearest neighbors was superior to other mathematical algorithms after data were preprocessed with wavelet de-noising, with a discrimination rate of 100% in both the training and prediction sets. This study demonstrated that FTIR spectroscopy coupled with K-nearest neighbors could be successfully applied to determine the geographical origins of M. tenacissima samples, thereby providing reliable authentication in a rapid, cheap and noninvasive way.

  10. Recognition and Classification of Road Condition on the Basis of Friction Force by Using a Mobile Robot

    NASA Astrophysics Data System (ADS)

    Watanabe, Tatsuhito; Katsura, Seiichiro

    A person operating a mobile robot in a remote environment receives realistic visual feedback about the condition of the road on which the robot is moving. The categorization of the road condition is necessary to evaluate the conditions for safe and comfortable driving. For this purpose, the mobile robot should be capable of recognizing and classifying the condition of the road surfaces. This paper proposes a method for recognizing the type of road surfaces on the basis of the friction between the mobile robot and the road surfaces. This friction is estimated by a disturbance observer, and a support vector machine is used to classify the surfaces. The support vector machine identifies the type of the road surface using feature vector, which is determined using the arithmetic average and variance derived from the torque values. Further, these feature vectors are mapped onto a higher dimensional space by using a kernel function. The validity of the proposed method is confirmed by experimental results.

  11. A machine learning model with human cognitive biases capable of learning from small and biased datasets.

    PubMed

    Taniguchi, Hidetaka; Sato, Hiroshi; Shirakawa, Tomohiro

    2018-05-09

    Human learners can generalize a new concept from a small number of samples. In contrast, conventional machine learning methods require large amounts of data to address the same types of problems. Humans have cognitive biases that promote fast learning. Here, we developed a method to reduce the gap between human beings and machines in this type of inference by utilizing cognitive biases. We implemented a human cognitive model into machine learning algorithms and compared their performance with the currently most popular methods, naïve Bayes, support vector machine, neural networks, logistic regression and random forests. We focused on the task of spam classification, which has been studied for a long time in the field of machine learning and often requires a large amount of data to obtain high accuracy. Our models achieved superior performance with small and biased samples in comparison with other representative machine learning methods.

  12. Using an object-based grid system to evaluate a newly developed EP approach to formulate SVMs as applied to the classification of organophosphate nerve agents

    NASA Astrophysics Data System (ADS)

    Land, Walker H., Jr.; Lewis, Michael; Sadik, Omowunmi; Wong, Lut; Wanekaya, Adam; Gonzalez, Richard J.; Balan, Arun

    2004-04-01

    This paper extends the classification approaches described in reference [1] in the following way: (1.) developing and evaluating a new method for evolving organophosphate nerve agent Support Vector Machine (SVM) classifiers using Evolutionary Programming, (2.) conducting research experiments using a larger database of organophosphate nerve agents, and (3.) upgrading the architecture to an object-based grid system for evaluating the classification of EP derived SVMs. Due to the increased threats of chemical and biological weapons of mass destruction (WMD) by international terrorist organizations, a significant effort is underway to develop tools that can be used to detect and effectively combat biochemical warfare. This paper reports the integration of multi-array sensors with Support Vector Machines (SVMs) for the detection of organophosphates nerve agents using a grid computing system called Legion. Grid computing is the use of large collections of heterogeneous, distributed resources (including machines, databases, devices, and users) to support large-scale computations and wide-area data access. Finally, preliminary results using EP derived support vector machines designed to operate on distributed systems have provided accurate classification results. In addition, distributed training time architectures are 50 times faster when compared to standard iterative training time methods.

  13. Prediction of Human Intestinal Absorption of Compounds Using Artificial Intelligence Techniques.

    PubMed

    Kumar, Rajnish; Sharma, Anju; Siddiqui, Mohammed Haris; Tiwari, Rajesh Kumar

    2017-01-01

    Information about Pharmacokinetics of compounds is an essential component of drug design and development. Modeling the pharmacokinetic properties require identification of the factors effecting absorption, distribution, metabolism and excretion of compounds. There have been continuous attempts in the prediction of intestinal absorption of compounds using various Artificial intelligence methods in the effort to reduce the attrition rate of drug candidates entering to preclinical and clinical trials. Currently, there are large numbers of individual predictive models available for absorption using machine learning approaches. Six Artificial intelligence methods namely, Support vector machine, k- nearest neighbor, Probabilistic neural network, Artificial neural network, Partial least square and Linear discriminant analysis were used for prediction of absorption of compounds. Prediction accuracy of Support vector machine, k- nearest neighbor, Probabilistic neural network, Artificial neural network, Partial least square and Linear discriminant analysis for prediction of intestinal absorption of compounds was found to be 91.54%, 88.33%, 84.30%, 86.51%, 79.07% and 80.08% respectively. Comparative analysis of all the six prediction models suggested that Support vector machine with Radial basis function based kernel is comparatively better for binary classification of compounds using human intestinal absorption and may be useful at preliminary stages of drug design and development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. Experiences in using the CYBER 203 for three-dimensional transonic flow calculations

    NASA Technical Reports Server (NTRS)

    Melson, N. D.; Keller, J. D.

    1982-01-01

    In this paper, the authors report on some of their experiences modifying two three-dimensional transonic flow programs (FLO22 and FLO27) for use on the NASA Langley Research Center CYBER 203. Both of the programs discussed were originally written for use on serial machines. Several methods were attempted to optimize the execution of the two programs on the vector machine, including: (1) leaving the program in a scalar form (i.e., serial computation) with compiler software used to optimize and vectorize the program, (2) vectorizing parts of the existing algorithm in the program, and (3) incorporating a new vectorizable algorithm (ZEBRA I or ZEBRA II) in the program.

  15. Real-data comparison of data mining methods in prediction of diabetes in iran.

    PubMed

    Tapak, Lily; Mahjub, Hossein; Hamidi, Omid; Poorolajal, Jalal

    2013-09-01

    Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.

  16. Evolutionary-driven support vector machines for determining the degree of liver fibrosis in chronic hepatitis C.

    PubMed

    Stoean, Ruxandra; Stoean, Catalin; Lupsor, Monica; Stefanescu, Horia; Badea, Radu

    2011-01-01

    Hepatic fibrosis, the principal pointer to the development of a liver disease within chronic hepatitis C, can be measured through several stages. The correct evaluation of its degree, based on recent different non-invasive procedures, is of current major concern. The latest methodology for assessing it is the Fibroscan and the effect of its employment is impressive. However, the complex interaction between its stiffness indicator and the other biochemical and clinical examinations towards a respective degree of liver fibrosis is hard to be manually discovered. In this respect, the novel, well-performing evolutionary-powered support vector machines are proposed towards an automated learning of the relationship between medical attributes and fibrosis levels. The traditional support vector machines have been an often choice for addressing hepatic fibrosis, while the evolutionary option has been validated on many real-world tasks and proven flexibility and good performance. The evolutionary approach is simple and direct, resulting from the hybridization of the learning component within support vector machines and the optimization engine of evolutionary algorithms. It discovers the optimal coefficients of surfaces that separate instances of distinct classes. Apart from a detached manner of establishing the fibrosis degree for new cases, a resulting formula also offers insight upon the correspondence between the medical factors and the respective outcome. What is more, a feature selection genetic algorithm can be further embedded into the method structure, in order to dynamically concentrate search only on the most relevant attributes. The data set refers 722 patients with chronic hepatitis C infection and 24 indicators. The five possible degrees of fibrosis range from F0 (no fibrosis) to F4 (cirrhosis). Since the standard support vector machines are among the most frequently used methods in recent artificial intelligence studies for hepatic fibrosis staging, the evolutionary method is viewed in comparison to the traditional one. The multifaceted discrimination into all five degrees of fibrosis and the slightly less difficult common separation into solely three related stages are both investigated. The resulting performance proves the superiority over the standard support vector classification and the attained formula is helpful in providing an immediate calculation of the liver stage for new cases, while establishing the presence/absence and comprehending the weight of each medical factor with respect to a certain fibrosis level. The use of the evolutionary technique for fibrosis degree prediction triggers simplicity and offers a direct expression of the influence of dynamically selected indicators on the corresponding stage. Perhaps most importantly, it significantly surpasses the classical support vector machines, which are both widely used and technically sound. All these therefore confirm the promise of the new methodology towards a dependable support within the medical decision-making. Copyright © 2010 Elsevier B.V. All rights reserved.

  17. Analysis of programming properties and the row-column generation method for 1-norm support vector machines.

    PubMed

    Zhang, Li; Zhou, WeiDa

    2013-12-01

    This paper deals with fast methods for training a 1-norm support vector machine (SVM). First, we define a specific class of linear programming with many sparse constraints, i.e., row-column sparse constraint linear programming (RCSC-LP). In nature, the 1-norm SVM is a sort of RCSC-LP. In order to construct subproblems for RCSC-LP and solve them, a family of row-column generation (RCG) methods is introduced. RCG methods belong to a category of decomposition techniques, and perform row and column generations in a parallel fashion. Specially, for the 1-norm SVM, the maximum size of subproblems of RCG is identical with the number of Support Vectors (SVs). We also introduce a semi-deleting rule for RCG methods and prove the convergence of RCG methods when using the semi-deleting rule. Experimental results on toy data and real-world datasets illustrate that it is efficient to use RCG to train the 1-norm SVM, especially in the case of small SVs. Copyright © 2013 Elsevier Ltd. All rights reserved.

  18. Sparse kernel methods for high-dimensional survival data.

    PubMed

    Evers, Ludger; Messow, Claudia-Martina

    2008-07-15

    Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.

  19. Online Artifact Removal for Brain-Computer Interfaces Using Support Vector Machines and Blind Source Separation

    PubMed Central

    Halder, Sebastian; Bensch, Michael; Mellinger, Jürgen; Bogdan, Martin; Kübler, Andrea; Birbaumer, Niels; Rosenstiel, Wolfgang

    2007-01-01

    We propose a combination of blind source separation (BSS) and independent component analysis (ICA) (signal decomposition into artifacts and nonartifacts) with support vector machines (SVMs) (automatic classification) that are designed for online usage. In order to select a suitable BSS/ICA method, three ICA algorithms (JADE, Infomax, and FastICA) and one BSS algorithm (AMUSE) are evaluated to determine their ability to isolate electromyographic (EMG) and electrooculographic (EOG) artifacts into individual components. An implementation of the selected BSS/ICA method with SVMs trained to classify EMG and EOG artifacts, which enables the usage of the method as a filter in measurements with online feedback, is described. This filter is evaluated on three BCI datasets as a proof-of-concept of the method. PMID:18288259

  20. Online artifact removal for brain-computer interfaces using support vector machines and blind source separation.

    PubMed

    Halder, Sebastian; Bensch, Michael; Mellinger, Jürgen; Bogdan, Martin; Kübler, Andrea; Birbaumer, Niels; Rosenstiel, Wolfgang

    2007-01-01

    We propose a combination of blind source separation (BSS) and independent component analysis (ICA) (signal decomposition into artifacts and nonartifacts) with support vector machines (SVMs) (automatic classification) that are designed for online usage. In order to select a suitable BSS/ICA method, three ICA algorithms (JADE, Infomax, and FastICA) and one BSS algorithm (AMUSE) are evaluated to determine their ability to isolate electromyographic (EMG) and electrooculographic (EOG) artifacts into individual components. An implementation of the selected BSS/ICA method with SVMs trained to classify EMG and EOG artifacts, which enables the usage of the method as a filter in measurements with online feedback, is described. This filter is evaluated on three BCI datasets as a proof-of-concept of the method.

  1. Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model.

    PubMed

    An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying; Hu, Ji-Pu

    2016-10-01

    Predicting protein-protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high-throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM-BiGP that combines the relevance vector machine (RVM) model and Bi-gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi-gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five-fold cross-validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-BiGP method is significantly better than the SVM-based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM-BiGP-PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/. © 2016 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  2. Progressive Vector Quantization on a massively parallel SIMD machine with application to multispectral image data

    NASA Technical Reports Server (NTRS)

    Manohar, Mareboyana; Tilton, James C.

    1994-01-01

    A progressive vector quantization (VQ) compression approach is discussed which decomposes image data into a number of levels using full search VQ. The final level is losslessly compressed, enabling lossless reconstruction. The computational difficulties are addressed by implementation on a massively parallel SIMD machine. We demonstrate progressive VQ on multispectral imagery obtained from the Advanced Very High Resolution Radiometer instrument and other Earth observation image data, and investigate the trade-offs in selecting the number of decomposition levels and codebook training method.

  3. A Method for Extracting Important Segments from Documents Using Support Vector Machines

    NASA Astrophysics Data System (ADS)

    Suzuki, Daisuke; Utsumi, Akira

    In this paper we propose an extraction-based method for automatic summarization. The proposed method consists of two processes: important segment extraction and sentence compaction. The process of important segment extraction classifies each segment in a document as important or not by Support Vector Machines (SVMs). The process of sentence compaction then determines grammatically appropriate portions of a sentence for a summary according to its dependency structure and the classification result by SVMs. To test the performance of our method, we conducted an evaluation experiment using the Text Summarization Challenge (TSC-1) corpus of human-prepared summaries. The result was that our method achieved better performance than a segment-extraction-only method and the Lead method, especially for sentences only a part of which was included in human summaries. Further analysis of the experimental results suggests that a hybrid method that integrates sentence extraction with segment extraction may generate better summaries.

  4. Stochastic subset selection for learning with kernel machines.

    PubMed

    Rhinelander, Jason; Liu, Xiaoping P

    2012-06-01

    Kernel machines have gained much popularity in applications of machine learning. Support vector machines (SVMs) are a subset of kernel machines and generalize well for classification, regression, and anomaly detection tasks. The training procedure for traditional SVMs involves solving a quadratic programming (QP) problem. The QP problem scales super linearly in computational effort with the number of training samples and is often used for the offline batch processing of data. Kernel machines operate by retaining a subset of observed data during training. The data vectors contained within this subset are referred to as support vectors (SVs). The work presented in this paper introduces a subset selection method for the use of kernel machines in online, changing environments. Our algorithm works by using a stochastic indexing technique when selecting a subset of SVs when computing the kernel expansion. The work described here is novel because it separates the selection of kernel basis functions from the training algorithm used. The subset selection algorithm presented here can be used in conjunction with any online training technique. It is important for online kernel machines to be computationally efficient due to the real-time requirements of online environments. Our algorithm is an important contribution because it scales linearly with the number of training samples and is compatible with current training techniques. Our algorithm outperforms standard techniques in terms of computational efficiency and provides increased recognition accuracy in our experiments. We provide results from experiments using both simulated and real-world data sets to verify our algorithm.

  5. Weighted K-means support vector machine for cancer prediction.

    PubMed

    Kim, SungHwan

    2016-01-01

    To date, the support vector machine (SVM) has been widely applied to diverse bio-medical fields to address disease subtype identification and pathogenicity of genetic variants. In this paper, I propose the weighted K-means support vector machine (wKM-SVM) and weighted support vector machine (wSVM), for which I allow the SVM to impose weights to the loss term. Besides, I demonstrate the numerical relations between the objective function of the SVM and weights. Motivated by general ensemble techniques, which are known to improve accuracy, I directly adopt the boosting algorithm to the newly proposed weighted KM-SVM (and wSVM). For predictive performance, a range of simulation studies demonstrate that the weighted KM-SVM (and wSVM) with boosting outperforms the standard KM-SVM (and SVM) including but not limited to many popular classification rules. I applied the proposed methods to simulated data and two large-scale real applications in the TCGA pan-cancer methylation data of breast and kidney cancer. In conclusion, the weighted KM-SVM (and wSVM) increases accuracy of the classification model, and will facilitate disease diagnosis and clinical treatment decisions to benefit patients. A software package (wSVM) is publicly available at the R-project webpage (https://www.r-project.org).

  6. Detection of Alzheimer's Disease by Three-Dimensional Displacement Field Estimation in Structural Magnetic Resonance Imaging.

    PubMed

    Wang, Shuihua; Zhang, Yudong; Liu, Ge; Phillips, Preetha; Yuan, Ti-Fei

    2016-01-01

    Within the past decade, computer scientists have developed many methods using computer vision and machine learning techniques to detect Alzheimer's disease (AD) in its early stages. However, some of these methods are unable to achieve excellent detection accuracy, and several other methods are unable to locate AD-related regions. Hence, our goal was to develop a novel AD brain detection method. In this study, our method was based on the three-dimensional (3D) displacement-field (DF) estimation between subjects in the healthy elder control group and AD group. The 3D-DF was treated with AD-related features. The three feature selection measures were used in the Bhattacharyya distance, Student's t-test, and Welch's t-test (WTT). Two non-parallel support vector machines, i.e., generalized eigenvalue proximal support vector machine and twin support vector machine (TSVM), were then used for classification. A 50 × 10-fold cross validation was implemented for statistical analysis. The results showed that "3D-DF+WTT+TSVM" achieved the best performance, with an accuracy of 93.05 ± 2.18, a sensitivity of 92.57 ± 3.80, a specificity of 93.18 ± 3.35, and a precision of 79.51 ± 2.86. This method also exceled in 13 state-of-the-art approaches. Additionally, we were able to detect 17 regions related to AD by using the pure computer-vision technique. These regions include sub-gyral, inferior parietal lobule, precuneus, angular gyrus, lingual gyrus, supramarginal gyrus, postcentral gyrus, third ventricle, superior parietal lobule, thalamus, middle temporal gyrus, precentral gyrus, superior temporal gyrus, superior occipital gyrus, cingulate gyrus, culmen, and insula. These regions were reported in recent publications. The 3D-DF is effective in AD subject and related region detection.

  7. Highly accurate prediction of protein self-interactions by incorporating the average block and PSSM information into the general PseAAC.

    PubMed

    Zhai, Jing-Xuan; Cao, Tian-Jie; An, Ji-Yong; Bian, Yong-Tao

    2017-11-07

    It is a challenging task for fundamental research whether proteins can interact with their partners. Protein self-interaction (SIP) is a special case of PPIs, which plays a key role in the regulation of cellular functions. Due to the limitations of experimental self-interaction identification, it is very important to develop an effective biological tool for predicting SIPs based on protein sequences. In the study, we developed a novel computational method called RVM-AB that combines the Relevance Vector Machine (RVM) model and Average Blocks (AB) for detecting SIPs from protein sequences. Firstly, Average Blocks (AB) feature extraction method is employed to represent protein sequences on a Position Specific Scoring Matrix (PSSM). Secondly, Principal Component Analysis (PCA) method is used to reduce the dimension of AB vector for reducing the influence of noise. Then, by employing the Relevance Vector Machine (RVM) algorithm, the performance of RVM-AB is assessed and compared with the state-of-the-art support vector machine (SVM) classifier and other exiting methods on yeast and human datasets respectively. Using the fivefold test experiment, RVM-AB model achieved very high accuracies of 93.01% and 97.72% on yeast and human datasets respectively, which are significantly better than the method based on SVM classifier and other previous methods. The experimental results proved that the RVM-AB prediction model is efficient and robust. It can be an automatic decision support tool for detecting SIPs. For facilitating extensive studies for future proteomics research, the RVMAB server is freely available for academic use at http://219.219.62.123:8888/SIP_AB. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. New fuzzy support vector machine for the class imbalance problem in medical datasets classification.

    PubMed

    Gu, Xiaoqing; Ni, Tongguang; Wang, Hongyuan

    2014-01-01

    In medical datasets classification, support vector machine (SVM) is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM) for the class imbalance problem (called FSVM-CIP) is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP.

  9. Support vector machines-based fault diagnosis for turbo-pump rotor

    NASA Astrophysics Data System (ADS)

    Yuan, Sheng-Fa; Chu, Fu-Lei

    2006-05-01

    Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.

  10. Scorebox extraction from mobile sports videos using Support Vector Machines

    NASA Astrophysics Data System (ADS)

    Kim, Wonjun; Park, Jimin; Kim, Changick

    2008-08-01

    Scorebox plays an important role in understanding contents of sports videos. However, the tiny scorebox may give the small-display-viewers uncomfortable experience in grasping the game situation. In this paper, we propose a novel framework to extract the scorebox from sports video frames. We first extract candidates by using accumulated intensity and edge information after short learning period. Since there are various types of scoreboxes inserted in sports videos, multiple attributes need to be used for efficient extraction. Based on those attributes, the optimal information gain is computed and top three ranked attributes in terms of information gain are selected as a three-dimensional feature vector for Support Vector Machines (SVM) to distinguish the scorebox from other candidates, such as logos and advertisement boards. The proposed method is tested on various videos of sports games and experimental results show the efficiency and robustness of our proposed method.

  11. HYBRID NEURAL NETWORK AND SUPPORT VECTOR MACHINE METHOD FOR OPTIMIZATION

    NASA Technical Reports Server (NTRS)

    Rai, Man Mohan (Inventor)

    2005-01-01

    System and method for optimization of a design associated with a response function, using a hybrid neural net and support vector machine (NN/SVM) analysis to minimize or maximize an objective function, optionally subject to one or more constraints. As a first example, the NN/SVM analysis is applied iteratively to design of an aerodynamic component, such as an airfoil shape, where the objective function measures deviation from a target pressure distribution on the perimeter of the aerodynamic component. As a second example, the NN/SVM analysis is applied to data classification of a sequence of data points in a multidimensional space. The NN/SVM analysis is also applied to data regression.

  12. Hybrid Neural Network and Support Vector Machine Method for Optimization

    NASA Technical Reports Server (NTRS)

    Rai, Man Mohan (Inventor)

    2007-01-01

    System and method for optimization of a design associated with a response function, using a hybrid neural net and support vector machine (NN/SVM) analysis to minimize or maximize an objective function, optionally subject to one or more constraints. As a first example, the NN/SVM analysis is applied iteratively to design of an aerodynamic component, such as an airfoil shape, where the objective function measures deviation from a target pressure distribution on the perimeter of the aerodynamic component. As a second example, the NN/SVM analysis is applied to data classification of a sequence of data points in a multidimensional space. The NN/SVM analysis is also applied to data regression.

  13. An Auto-flag Method of Radio Visibility Data Based on Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Dai, Hui-mei; Mei, Ying; Wang, Wei; Deng, Hui; Wang, Feng

    2017-01-01

    The Mingantu Ultrawide Spectral Radioheliograph (MUSER) has entered a test observation stage. After the construction of the data acquisition and storage system, it is urgent to automatically flag and eliminate the abnormal visibility data so as to improve the imaging quality. In this paper, according to the observational records, we create a credible visibility set, and further obtain the corresponding flag model of visibility data by using the support vector machine (SVM) technique. The results show that the SVM is a robust approach to flag the MUSER visibility data, and can attain an accuracy of about 86%. Meanwhile, this method will not be affected by solar activities, such as flare eruptions.

  14. Prediction on sunspot activity based on fuzzy information granulation and support vector machine

    NASA Astrophysics Data System (ADS)

    Peng, Lingling; Yan, Haisheng; Yang, Zhigang

    2018-04-01

    In order to analyze the range of sunspots, a combined prediction method of forecasting the fluctuation range of sunspots based on fuzzy information granulation (FIG) and support vector machine (SVM) was put forward. Firstly, employing the FIG to granulate sample data and extract va)alid information of each window, namely the minimum value, the general average value and the maximum value of each window. Secondly, forecasting model is built respectively with SVM and then cross method is used to optimize these parameters. Finally, the fluctuation range of sunspots is forecasted with the optimized SVM model. Case study demonstrates that the model have high accuracy and can effectively predict the fluctuation of sunspots.

  15. Machine Learning Methods for Attack Detection in the Smart Grid.

    PubMed

    Ozay, Mete; Esnaola, Inaki; Yarman Vural, Fatos Tunay; Kulkarni, Sanjeev R; Poor, H Vincent

    2016-08-01

    Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

  16. New model for prediction binary mixture of antihistamine decongestant using artificial neural networks and least squares support vector machine by spectrophotometry method

    NASA Astrophysics Data System (ADS)

    Mofavvaz, Shirin; Sohrabi, Mahmoud Reza; Nezamzadeh-Ejhieh, Alireza

    2017-07-01

    In the present study, artificial neural networks (ANNs) and least squares support vector machines (LS-SVM) as intelligent methods based on absorption spectra in the range of 230-300 nm have been used for determination of antihistamine decongestant contents. In the first step, one type of network (feed-forward back-propagation) from the artificial neural network with two different training algorithms, Levenberg-Marquardt (LM) and gradient descent with momentum and adaptive learning rate back-propagation (GDX) algorithm, were employed and their performance was evaluated. The performance of the LM algorithm was better than the GDX algorithm. In the second one, the radial basis network was utilized and results compared with the previous network. In the last one, the other intelligent method named least squares support vector machine was proposed to construct the antihistamine decongestant prediction model and the results were compared with two of the aforementioned networks. The values of the statistical parameters mean square error (MSE), Regression coefficient (R2), correlation coefficient (r) and also mean recovery (%), relative standard deviation (RSD) used for selecting the best model between these methods. Moreover, the proposed methods were compared to the high- performance liquid chromatography (HPLC) as a reference method. One way analysis of variance (ANOVA) test at the 95% confidence level applied to the comparison results of suggested and reference methods that there were no significant differences between them.

  17. Comparison of different wind data interpolation methods for a region with complex terrain in Central Asia

    NASA Astrophysics Data System (ADS)

    Reinhardt, Katja; Samimi, Cyrus

    2018-01-01

    While climatological data of high spatial resolution are largely available in most developed countries, the network of climatological stations in many other regions of the world still constitutes large gaps. Especially for those regions, interpolation methods are important tools to fill these gaps and to improve the data base indispensible for climatological research. Over the last years, new hybrid methods of machine learning and geostatistics have been developed which provide innovative prospects in spatial predictive modelling. This study will focus on evaluating the performance of 12 different interpolation methods for the wind components \\overrightarrow{u} and \\overrightarrow{v} in a mountainous region of Central Asia. Thereby, a special focus will be on applying new hybrid methods on spatial interpolation of wind data. This study is the first evaluating and comparing the performance of several of these hybrid methods. The overall aim of this study is to determine whether an optimal interpolation method exists, which can equally be applied for all pressure levels, or whether different interpolation methods have to be used for the different pressure levels. Deterministic (inverse distance weighting) and geostatistical interpolation methods (ordinary kriging) were explored, which take into account only the initial values of \\overrightarrow{u} and \\overrightarrow{v} . In addition, more complex methods (generalized additive model, support vector machine and neural networks as single methods and as hybrid methods as well as regression-kriging) that consider additional variables were applied. The analysis of the error indices revealed that regression-kriging provided the most accurate interpolation results for both wind components and all pressure heights. At 200 and 500 hPa, regression-kriging is followed by the different kinds of neural networks and support vector machines and for 850 hPa it is followed by the different types of support vector machine and ordinary kriging. Overall, explanatory variables improve the interpolation results.

  18. Scaling Support Vector Machines On Modern HPC Platforms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    You, Yang; Fu, Haohuan; Song, Shuaiwen

    2015-02-01

    We designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multicore and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools.

  19. Source localization in an ocean waveguide using supervised machine learning.

    PubMed

    Niu, Haiqiang; Reeves, Emma; Gerstoft, Peter

    2017-09-01

    Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.

  20. Signal detection using support vector machines in the presence of ultrasonic speckle

    NASA Astrophysics Data System (ADS)

    Kotropoulos, Constantine L.; Pitas, Ioannis

    2002-04-01

    Support Vector Machines are a general algorithm based on guaranteed risk bounds of statistical learning theory. They have found numerous applications, such as in classification of brain PET images, optical character recognition, object detection, face verification, text categorization and so on. In this paper we propose the use of support vector machines to segment lesions in ultrasound images and we assess thoroughly their lesion detection ability. We demonstrate that trained support vector machines with a Radial Basis Function kernel segment satisfactorily (unseen) ultrasound B-mode images as well as clinical ultrasonic images.

  1. Deep Restricted Kernel Machines Using Conjugate Feature Duality.

    PubMed

    Suykens, Johan A K

    2017-08-01

    The aim of this letter is to propose a theory of deep restricted kernel machines offering new foundations for deep learning with kernel machines. From the viewpoint of deep learning, it is partially related to restricted Boltzmann machines, which are characterized by visible and hidden units in a bipartite graph without hidden-to-hidden connections and deep learning extensions as deep belief networks and deep Boltzmann machines. From the viewpoint of kernel machines, it includes least squares support vector machines for classification and regression, kernel principal component analysis (PCA), matrix singular value decomposition, and Parzen-type models. A key element is to first characterize these kernel machines in terms of so-called conjugate feature duality, yielding a representation with visible and hidden units. It is shown how this is related to the energy form in restricted Boltzmann machines, with continuous variables in a nonprobabilistic setting. In this new framework of so-called restricted kernel machine (RKM) representations, the dual variables correspond to hidden features. Deep RKM are obtained by coupling the RKMs. The method is illustrated for deep RKM, consisting of three levels with a least squares support vector machine regression level and two kernel PCA levels. In its primal form also deep feedforward neural networks can be trained within this framework.

  2. Use of CYBER 203 and CYBER 205 computers for three-dimensional transonic flow calculations

    NASA Technical Reports Server (NTRS)

    Melson, N. D.; Keller, J. D.

    1983-01-01

    Experiences are discussed for modifying two three-dimensional transonic flow computer programs (FLO 22 and FLO 27) for use on the CDC CYBER 203 computer system. Both programs were originally written for use on serial machines. Several methods were attempted to optimize the execution of the two programs on the vector machine: leaving the program in a scalar form (i.e., serial computation) with compiler software used to optimize and vectorize the program, vectorizing parts of the existing algorithm in the program, and incorporating a vectorizable algorithm (ZEBRA I or ZEBRA II) in the program. Comparison runs of the programs were made on CDC CYBER 175. CYBER 203, and two pipe CDC CYBER 205 computer systems.

  3. Predicting complications of percutaneous coronary intervention using a novel support vector method.

    PubMed

    Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan

    2013-01-01

    To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer-Lemeshow χ(2) value (seven cases) and the mean cross-entropy error (eight cases). The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains.

  4. A real-time neutron-gamma discriminator based on the support vector machine method for the time-of-flight neutron spectrometer

    NASA Astrophysics Data System (ADS)

    Wei, ZHANG; Tongyu, WU; Bowen, ZHENG; Shiping, LI; Yipo, ZHANG; Zejie, YIN

    2018-04-01

    A new neutron-gamma discriminator based on the support vector machine (SVM) method is proposed to improve the performance of the time-of-flight neutron spectrometer. The neutron detector is an EJ-299-33 plastic scintillator with pulse-shape discrimination (PSD) property. The SVM algorithm is implemented in field programmable gate array (FPGA) to carry out the real-time sifting of neutrons in neutron-gamma mixed radiation fields. This study compares the ability of the pulse gradient analysis method and the SVM method. The results show that this SVM discriminator can provide a better discrimination accuracy of 99.1%. The accuracy and performance of the SVM discriminator based on FPGA have been evaluated in the experiments. It can get a figure of merit of 1.30.

  5. Mechanical Fault Diagnosis of High Voltage Circuit Breakers Based on Variational Mode Decomposition and Multi-Layer Classifier.

    PubMed

    Huang, Nantian; Chen, Huaijin; Cai, Guowei; Fang, Lihua; Wang, Yuqiang

    2016-11-10

    Mechanical fault diagnosis of high-voltage circuit breakers (HVCBs) based on vibration signal analysis is one of the most significant issues in improving the reliability and reducing the outage cost for power systems. The limitation of training samples and types of machine faults in HVCBs causes the existing mechanical fault diagnostic methods to recognize new types of machine faults easily without training samples as either a normal condition or a wrong fault type. A new mechanical fault diagnosis method for HVCBs based on variational mode decomposition (VMD) and multi-layer classifier (MLC) is proposed to improve the accuracy of fault diagnosis. First, HVCB vibration signals during operation are measured using an acceleration sensor. Second, a VMD algorithm is used to decompose the vibration signals into several intrinsic mode functions (IMFs). The IMF matrix is divided into submatrices to compute the local singular values (LSV). The maximum singular values of each submatrix are selected as the feature vectors for fault diagnosis. Finally, a MLC composed of two one-class support vector machines (OCSVMs) and a support vector machine (SVM) is constructed to identify the fault type. Two layers of independent OCSVM are adopted to distinguish normal or fault conditions with known or unknown fault types, respectively. On this basis, SVM recognizes the specific fault type. Real diagnostic experiments are conducted with a real SF₆ HVCB with normal and fault states. Three different faults (i.e., jam fault of the iron core, looseness of the base screw, and poor lubrication of the connecting lever) are simulated in a field experiment on a real HVCB to test the feasibility of the proposed method. Results show that the classification accuracy of the new method is superior to other traditional methods.

  6. Mechanical Fault Diagnosis of High Voltage Circuit Breakers Based on Variational Mode Decomposition and Multi-Layer Classifier

    PubMed Central

    Huang, Nantian; Chen, Huaijin; Cai, Guowei; Fang, Lihua; Wang, Yuqiang

    2016-01-01

    Mechanical fault diagnosis of high-voltage circuit breakers (HVCBs) based on vibration signal analysis is one of the most significant issues in improving the reliability and reducing the outage cost for power systems. The limitation of training samples and types of machine faults in HVCBs causes the existing mechanical fault diagnostic methods to recognize new types of machine faults easily without training samples as either a normal condition or a wrong fault type. A new mechanical fault diagnosis method for HVCBs based on variational mode decomposition (VMD) and multi-layer classifier (MLC) is proposed to improve the accuracy of fault diagnosis. First, HVCB vibration signals during operation are measured using an acceleration sensor. Second, a VMD algorithm is used to decompose the vibration signals into several intrinsic mode functions (IMFs). The IMF matrix is divided into submatrices to compute the local singular values (LSV). The maximum singular values of each submatrix are selected as the feature vectors for fault diagnosis. Finally, a MLC composed of two one-class support vector machines (OCSVMs) and a support vector machine (SVM) is constructed to identify the fault type. Two layers of independent OCSVM are adopted to distinguish normal or fault conditions with known or unknown fault types, respectively. On this basis, SVM recognizes the specific fault type. Real diagnostic experiments are conducted with a real SF6 HVCB with normal and fault states. Three different faults (i.e., jam fault of the iron core, looseness of the base screw, and poor lubrication of the connecting lever) are simulated in a field experiment on a real HVCB to test the feasibility of the proposed method. Results show that the classification accuracy of the new method is superior to other traditional methods. PMID:27834902

  7. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    PubMed Central

    2011-01-01

    Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043

  8. Support Vector Machines Model of Computed Tomography for Assessing Lymph Node Metastasis in Esophageal Cancer with Neoadjuvant Chemotherapy.

    PubMed

    Wang, Zhi-Long; Zhou, Zhi-Guo; Chen, Ying; Li, Xiao-Ting; Sun, Ying-Shi

    The aim of this study was to diagnose lymph node metastasis of esophageal cancer by support vector machines model based on computed tomography. A total of 131 esophageal cancer patients with preoperative chemotherapy and radical surgery were included. Various indicators (tumor thickness, tumor length, tumor CT value, total number of lymph nodes, and long axis and short axis sizes of largest lymph node) on CT images before and after neoadjuvant chemotherapy were recorded. A support vector machines model based on these CT indicators was built to predict lymph node metastasis. Support vector machines model diagnosed lymph node metastasis better than preoperative short axis size of largest lymph node on CT. The area under the receiver operating characteristic curves were 0.887 and 0.705, respectively. The support vector machine model of CT images can help diagnose lymph node metastasis in esophageal cancer with preoperative chemotherapy.

  9. A Real-Time Interference Monitoring Technique for GNSS Based on a Twin Support Vector Machine Method.

    PubMed

    Li, Wutao; Huang, Zhigang; Lang, Rongling; Qin, Honglei; Zhou, Kai; Cao, Yongbin

    2016-03-04

    Interferences can severely degrade the performance of Global Navigation Satellite System (GNSS) receivers. As the first step of GNSS any anti-interference measures, interference monitoring for GNSS is extremely essential and necessary. Since interference monitoring can be considered as a classification problem, a real-time interference monitoring technique based on Twin Support Vector Machine (TWSVM) is proposed in this paper. A TWSVM model is established, and TWSVM is solved by the Least Squares Twin Support Vector Machine (LSTWSVM) algorithm. The interference monitoring indicators are analyzed to extract features from the interfered GNSS signals. The experimental results show that the chosen observations can be used as the interference monitoring indicators. The interference monitoring performance of the proposed method is verified by using GPS L1 C/A code signal and being compared with that of standard SVM. The experimental results indicate that the TWSVM-based interference monitoring is much faster than the conventional SVM. Furthermore, the training time of TWSVM is on millisecond (ms) level and the monitoring time is on microsecond (μs) level, which make the proposed approach usable in practical interference monitoring applications.

  10. A Real-Time Interference Monitoring Technique for GNSS Based on a Twin Support Vector Machine Method

    PubMed Central

    Li, Wutao; Huang, Zhigang; Lang, Rongling; Qin, Honglei; Zhou, Kai; Cao, Yongbin

    2016-01-01

    Interferences can severely degrade the performance of Global Navigation Satellite System (GNSS) receivers. As the first step of GNSS any anti-interference measures, interference monitoring for GNSS is extremely essential and necessary. Since interference monitoring can be considered as a classification problem, a real-time interference monitoring technique based on Twin Support Vector Machine (TWSVM) is proposed in this paper. A TWSVM model is established, and TWSVM is solved by the Least Squares Twin Support Vector Machine (LSTWSVM) algorithm. The interference monitoring indicators are analyzed to extract features from the interfered GNSS signals. The experimental results show that the chosen observations can be used as the interference monitoring indicators. The interference monitoring performance of the proposed method is verified by using GPS L1 C/A code signal and being compared with that of standard SVM. The experimental results indicate that the TWSVM-based interference monitoring is much faster than the conventional SVM. Furthermore, the training time of TWSVM is on millisecond (ms) level and the monitoring time is on microsecond (μs) level, which make the proposed approach usable in practical interference monitoring applications. PMID:26959020

  11. Storage and computationally efficient permutations of factorized covariance and square-root information matrices

    NASA Technical Reports Server (NTRS)

    Muellerschoen, R. J.

    1988-01-01

    A unified method to permute vector-stored upper-triangular diagonal factorized covariance (UD) and vector stored upper-triangular square-root information filter (SRIF) arrays is presented. The method involves cyclical permutation of the rows and columns of the arrays and retriangularization with appropriate square-root-free fast Givens rotations or elementary slow Givens reflections. A minimal amount of computation is performed and only one scratch vector of size N is required, where N is the column dimension of the arrays. To make the method efficient for large SRIF arrays on a virtual memory machine, three additional scratch vectors each of size N are used to avoid expensive paging faults. The method discussed is compared with the methods and routines of Bierman's Estimation Subroutine Library (ESL).

  12. Measurement of aspheric mirror by nanoprofiler using normal vector tracing

    NASA Astrophysics Data System (ADS)

    Kitayama, Takao; Shiraji, Hiroki; Yamamura, Kazuya; Endo, Katsuyoshi

    2016-09-01

    Aspheric or free-form optics with high accuracy are necessary in many fields such as third-generation synchrotron radiation and extreme-ultraviolet lithography. Therefore the demand of measurement method for aspherical or free-form surface with nanometer accuracy increases. Purpose of our study is to develop a non-contact measurement technology for aspheric or free-form surfaces directly with high repeatability. To achieve this purpose we have developed threedimensional Nanoprofiler which detects normal vectors of sample surface. The measurement principle is based on the straightness of laser light and the accurate motion of rotational goniometers. This machine consists of four rotational stages, one translational stage and optical head which has the quadrant photodiode (QPD) and laser source. In this measurement method, we conform the incident light beam to reflect the beam by controlling five stages and determine the normal vectors and the coordinates of the surface from signal of goniometers, translational stage and QPD. We can obtain three-dimensional figure from the normal vectors and their coordinates by surface reconstruction algorithm. To evaluate performance of this machine we measure a concave aspheric mirror with diameter of 150 mm. As a result we achieve to measure large area of 150mm diameter. And we observe influence of systematic errors which the machine has. Then we simulated the influence and subtracted it from measurement result.

  13. T-ray relevant frequencies for osteosarcoma classification

    NASA Astrophysics Data System (ADS)

    Withayachumnankul, W.; Ferguson, B.; Rainsford, T.; Findlay, D.; Mickan, S. P.; Abbott, D.

    2006-01-01

    We investigate the classification of the T-ray response of normal human bone cells and human osteosarcoma cells, grown in culture. Given the magnitude and phase responses within a reliable spectral range as features for input vectors, a trained support vector machine can correctly classify the two cell types to some extent. Performance of the support vector machine is deteriorated by the curse of dimensionality, resulting from the comparatively large number of features in the input vectors. Feature subset selection methods are used to select only an optimal number of relevant features for inputs. As a result, an improvement in generalization performance is attainable, and the selected frequencies can be used for further describing different mechanisms of the cells, responding to T-rays. We demonstrate a consistent classification accuracy of 89.6%, while the only one fifth of the original features are retained in the data set.

  14. Support Vector Machines Trained with Evolutionary Algorithms Employing Kernel Adatron for Large Scale Classification of Protein Structures.

    PubMed

    Arana-Daniel, Nancy; Gallegos, Alberto A; López-Franco, Carlos; Alanís, Alma Y; Morales, Jacob; López-Franco, Adriana

    2016-01-01

    With the increasing power of computers, the amount of data that can be processed in small periods of time has grown exponentially, as has the importance of classifying large-scale data efficiently. Support vector machines have shown good results classifying large amounts of high-dimensional data, such as data generated by protein structure prediction, spam recognition, medical diagnosis, optical character recognition and text classification, etc. Most state of the art approaches for large-scale learning use traditional optimization methods, such as quadratic programming or gradient descent, which makes the use of evolutionary algorithms for training support vector machines an area to be explored. The present paper proposes an approach that is simple to implement based on evolutionary algorithms and Kernel-Adatron for solving large-scale classification problems, focusing on protein structure prediction. The functional properties of proteins depend upon their three-dimensional structures. Knowing the structures of proteins is crucial for biology and can lead to improvements in areas such as medicine, agriculture and biofuels.

  15. The potential of latent semantic analysis for machine grading of clinical case summaries.

    PubMed

    Kintsch, Walter

    2002-02-01

    This paper introduces latent semantic analysis (LSA), a machine learning method for representing the meaning of words, sentences, and texts. LSA induces a high-dimensional semantic space from reading a very large amount of texts. The meaning of words and texts can be represented as vectors in this space and hence can be compared automatically and objectively. A generative theory of the mental lexicon based on LSA is described. The word vectors LSA constructs are context free, and each word, irrespective of how many meanings or senses it has, is represented by a single vector. However, when a word is used in different contexts, context appropriate word senses emerge. Several applications of LSA to educational software are described, involving the ability of LSA to quickly compare the content of texts, such as an essay written by a student and a target essay. An LSA-based software tool is sketched for machine grading of clinical case summaries written by medical students.

  16. Support vector machine multiuser receiver for DS-CDMA signals in multipath channels.

    PubMed

    Chen, S; Samingan, A K; Hanzo, L

    2001-01-01

    The problem of constructing an adaptive multiuser detector (MUD) is considered for direct sequence code division multiple access (DS-CDMA) signals transmitted through multipath channels. The emerging learning technique, called support vector machines (SVM), is proposed as a method of obtaining a nonlinear MUD from a relatively small training data block. Computer simulation is used to study this SVM MUD, and the results show that it can closely match the performance of the optimal Bayesian one-shot detector. Comparisons with an adaptive radial basis function (RBF) MUD trained by an unsupervised clustering algorithm are discussed.

  17. Implementation of support vector machine for classification of speech marked hijaiyah letters based on Mel frequency cepstrum coefficient feature extraction

    NASA Astrophysics Data System (ADS)

    Adhi Pradana, Wisnu; Adiwijaya; Novia Wisesty, Untari

    2018-03-01

    Support Vector Machine or commonly called SVM is one method that can be used to process the classification of a data. SVM classifies data from 2 different classes with hyperplane. In this study, the system was built using SVM to develop Arabic Speech Recognition. In the development of the system, there are 2 kinds of speakers that have been tested that is dependent speakers and independent speakers. The results from this system is an accuracy of 85.32% for speaker dependent and 61.16% for independent speakers.

  18. Recursive feature selection with significant variables of support vectors.

    PubMed

    Tsai, Chen-An; Huang, Chien-Hsun; Chang, Ching-Wei; Chen, Chun-Houh

    2012-01-01

    The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  19. Syndrome diagnosis: human intuition or machine intelligence?

    PubMed

    Braaten, Oivind; Friestad, Johannes

    2008-01-01

    The aim of this study was to investigate whether artificial intelligence methods can represent objective methods that are essential in syndrome diagnosis. Most syndromes have no external criterion standard of diagnosis. The predictive value of a clinical sign used in diagnosis is dependent on the prior probability of the syndrome diagnosis. Clinicians often misjudge the probabilities involved. Syndromology needs objective methods to ensure diagnostic consistency, and take prior probabilities into account. We applied two basic artificial intelligence methods to a database of machine-generated patients - a 'vector method' and a set method. As reference methods we ran an ID3 algorithm, a cluster analysis and a naive Bayes' calculation on the same patient series. The overall diagnostic error rate for the the vector algorithm was 0.93%, and for the ID3 0.97%. For the clinical signs found by the set method, the predictive values varied between 0.71 and 1.0. The artificial intelligence methods that we used, proved simple, robust and powerful, and represent objective diagnostic methods.

  20. RVMAB: Using the Relevance Vector Machine Model Combined with Average Blocks to Predict the Interactions of Proteins from Protein Sequences.

    PubMed

    An, Ji-Yong; You, Zhu-Hong; Meng, Fan-Rong; Xu, Shu-Juan; Wang, Yin

    2016-05-18

    Protein-Protein Interactions (PPIs) play essential roles in most cellular processes. Knowledge of PPIs is becoming increasingly more important, which has prompted the development of technologies that are capable of discovering large-scale PPIs. Although many high-throughput biological technologies have been proposed to detect PPIs, there are unavoidable shortcomings, including cost, time intensity, and inherently high false positive and false negative rates. For the sake of these reasons, in silico methods are attracting much attention due to their good performances in predicting PPIs. In this paper, we propose a novel computational method known as RVM-AB that combines the Relevance Vector Machine (RVM) model and Average Blocks (AB) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the AB feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We performed five-fold cross-validation experiments on yeast and Helicobacter pylori datasets, and achieved very high accuracies of 92.98% and 95.58% respectively, which is significantly better than previous works. In addition, we also obtained good prediction accuracies of 88.31%, 89.46%, 91.08%, 91.55%, and 94.81% on other five independent datasets C. elegans, M. musculus, H. sapiens, H. pylori, and E. coli for cross-species prediction. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-AB method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool. To facilitate extensive studies for future proteomics research, we developed a freely available web server called RVMAB-PPI in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/ppi_ab/.

  1. Rapid authentication of adulteration of olive oil by near-infrared spectroscopy using support vector machines

    NASA Astrophysics Data System (ADS)

    Wu, Jingzhu; Dong, Jingjing; Dong, Wenfei; Chen, Yan; Liu, Cuiling

    2016-10-01

    A classification method of support vector machines with linear kernel was employed to authenticate genuine olive oil based on near-infrared spectroscopy. There were three types of adulteration of olive oil experimented in the study. The adulterated oil was respectively soybean oil, rapeseed oil and the mixture of soybean and rapeseed oil. The average recognition rate of second experiment was more than 90% and that of the third experiment was reach to 100%. The results showed the method had good performance in classifying genuine olive oil and the adulteration with small variation range of adulterated concentration and it was a promising and rapid technique for the detection of oil adulteration and fraud in the food industry.

  2. Text mining approach to predict hospital admissions using early medical records from the emergency department.

    PubMed

    Lucini, Filipe R; S Fogliatto, Flavio; C da Silveira, Giovani J; L Neyeloff, Jeruza; Anzanello, Michel J; de S Kuchenbecker, Ricardo; D Schaan, Beatriz

    2017-04-01

    Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ 2 and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.

  3. Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods.

    PubMed

    Qu, Kaiyang; Han, Ke; Wu, Song; Wang, Guohua; Wei, Leyi

    2017-09-22

    DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.

  4. Syndrome Diagnosis: Human Intuition or Machine Intelligence?

    PubMed Central

    Braaten, Øivind; Friestad, Johannes

    2008-01-01

    The aim of this study was to investigate whether artificial intelligence methods can represent objective methods that are essential in syndrome diagnosis. Most syndromes have no external criterion standard of diagnosis. The predictive value of a clinical sign used in diagnosis is dependent on the prior probability of the syndrome diagnosis. Clinicians often misjudge the probabilities involved. Syndromology needs objective methods to ensure diagnostic consistency, and take prior probabilities into account. We applied two basic artificial intelligence methods to a database of machine-generated patients - a ‘vector method’ and a set method. As reference methods we ran an ID3 algorithm, a cluster analysis and a naive Bayes’ calculation on the same patient series. The overall diagnostic error rate for the the vector algorithm was 0.93%, and for the ID3 0.97%. For the clinical signs found by the set method, the predictive values varied between 0.71 and 1.0. The artificial intelligence methods that we used, proved simple, robust and powerful, and represent objective diagnostic methods. PMID:19415142

  5. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis.

    PubMed

    You, Zhu-Hong; Lei, Ying-Ke; Zhu, Lin; Xia, Junfeng; Wang, Bing

    2013-01-01

    Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.

  6. Extended robust support vector machine based on financial risk minimization.

    PubMed

    Takeda, Akiko; Fujiwara, Shuhei; Kanamori, Takafumi

    2014-11-01

    Financial risk measures have been used recently in machine learning. For example, ν-support vector machine ν-SVM) minimizes the conditional value at risk (CVaR) of margin distribution. The measure is popular in finance because of the subadditivity property, but it is very sensitive to a few outliers in the tail of the distribution. We propose a new classification method, extended robust SVM (ER-SVM), which minimizes an intermediate risk measure between the CVaR and value at risk (VaR) by expecting that the resulting model becomes less sensitive than ν-SVM to outliers. We can regard ER-SVM as an extension of robust SVM, which uses a truncated hinge loss. Numerical experiments imply the ER-SVM's possibility of achieving a better prediction performance with proper parameter setting.

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dayman, Ken J; Ade, Brian J; Weber, Charles F

    High-dimensional, nonlinear function estimation using large datasets is a current area of interest in the machine learning community, and applications may be found throughout the analytical sciences, where ever-growing datasets are making more information available to the analyst. In this paper, we leverage the existing relevance vector machine, a sparse Bayesian version of the well-studied support vector machine, and expand the method to include integrated feature selection and automatic function shaping. These innovations produce an algorithm that is able to distinguish variables that are useful for making predictions of a response from variables that are unrelated or confusing. We testmore » the technology using synthetic data, conduct initial performance studies, and develop a model capable of making position-independent predictions of the coreaveraged burnup using a single specimen drawn randomly from a nuclear reactor core.« less

  8. Combining Relevance Vector Machines and exponential regression for bearing residual life estimation

    NASA Astrophysics Data System (ADS)

    Di Maio, Francesco; Tsui, Kwok Leung; Zio, Enrico

    2012-08-01

    In this paper we present a new procedure for estimating the bearing Residual Useful Life (RUL) by combining data-driven and model-based techniques. Respectively, we resort to (i) Relevance Vector Machines (RVMs) for selecting a low number of significant basis functions, called Relevant Vectors (RVs), and (ii) exponential regression to compute and continuously update residual life estimations. The combination of these techniques is developed with reference to partially degraded thrust ball bearings and tested on real world vibration-based degradation data. On the case study considered, the proposed procedure outperforms other model-based methods, with the added value of an adequate representation of the uncertainty associated to the estimates of the quantification of the credibility of the results by the Prognostic Horizon (PH) metric.

  9. Support Vector Machines to improve physiologic hot flash measures: application to the ambulatory setting.

    PubMed

    Thurston, Rebecca C; Hernandez, Javier; Del Rio, Jose M; De La Torre, Fernando

    2011-07-01

    Most midlife women have hot flashes. The conventional criterion (≥2 μmho rise/30 s) for classifying hot flashes physiologically has shown poor performance. We improved this performance in the laboratory with Support Vector Machines (SVMs), a pattern classification method. We aimed to compare conventional to SVM methods to classify hot flashes in the ambulatory setting. Thirty-one women with hot flashes underwent 24 h of ambulatory sternal skin conductance monitoring. Hot flashes were quantified with conventional (≥2 μmho/30 s) and SVM methods. Conventional methods had low sensitivity (sensitivity=.57, specificity=.98, positive predictive value (PPV)=.91, negative predictive value (NPV)=.90, F1=.60), with performance lower with higher body mass index (BMI). SVMs improved this performance (sensitivity=.87, specificity=.97, PPV=.90, NPV=.96, F1=.88) and reduced BMI variation. SVMs can improve ambulatory physiologic hot flash measures. Copyright © 2010 Society for Psychophysiological Research.

  10. Applying a machine learning model using a locally preserving projection based feature regeneration algorithm to predict breast cancer risk

    NASA Astrophysics Data System (ADS)

    Heidari, Morteza; Zargari Khuzani, Abolfazl; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qian, Wei; Zheng, Bin

    2018-03-01

    Both conventional and deep machine learning has been used to develop decision-support tools applied in medical imaging informatics. In order to take advantages of both conventional and deep learning approach, this study aims to investigate feasibility of applying a locally preserving projection (LPP) based feature regeneration algorithm to build a new machine learning classifier model to predict short-term breast cancer risk. First, a computer-aided image processing scheme was used to segment and quantify breast fibro-glandular tissue volume. Next, initially computed 44 image features related to the bilateral mammographic tissue density asymmetry were extracted. Then, an LLP-based feature combination method was applied to regenerate a new operational feature vector using a maximal variance approach. Last, a k-nearest neighborhood (KNN) algorithm based machine learning classifier using the LPP-generated new feature vectors was developed to predict breast cancer risk. A testing dataset involving negative mammograms acquired from 500 women was used. Among them, 250 were positive and 250 remained negative in the next subsequent mammography screening. Applying to this dataset, LLP-generated feature vector reduced the number of features from 44 to 4. Using a leave-onecase-out validation method, area under ROC curve produced by the KNN classifier significantly increased from 0.62 to 0.68 (p < 0.05) and odds ratio was 4.60 with a 95% confidence interval of [3.16, 6.70]. Study demonstrated that this new LPP-based feature regeneration approach enabled to produce an optimal feature vector and yield improved performance in assisting to predict risk of women having breast cancer detected in the next subsequent mammography screening.

  11. On the use of feature selection to improve the detection of sea oil spills in SAR images

    NASA Astrophysics Data System (ADS)

    Mera, David; Bolon-Canedo, Veronica; Cotos, J. M.; Alonso-Betanzos, Amparo

    2017-03-01

    Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each detected dark spot is independently examined. Traditionally, detection systems use an arbitrary set of features to discriminate between oil spills and look-alikes phenomena. However, Feature Selection (FS) methods based on Machine Learning (ML) have proved to be very useful in real domains for enhancing the generalization capabilities of the classifiers, while discarding the existing irrelevant features. In this work, we present a generic and systematic approach, based on FS methods, for choosing a concise and relevant set of features to improve the oil spill detection systems. We have compared five FS methods: Correlation-based feature selection (CFS), Consistency-based filter, Information Gain, ReliefF and Recursive Feature Elimination for Support Vector Machine (SVM-RFE). They were applied on a 141-input vector composed of features from a collection of outstanding studies. Selected features were validated via a Support Vector Machine (SVM) classifier and the results were compared with previous works. Test experiments revealed that the classifier trained with the 6-input feature vector proposed by SVM-RFE achieved the best accuracy and Cohen's kappa coefficient (87.1% and 74.06% respectively). This is a smaller feature combination with similar or even better classification accuracy than previous works. The presented finding allows to speed up the feature extraction phase without reducing the classifier accuracy. Experiments also confirmed the significance of the geometrical features since 75.0% of the different features selected by the applied FS methods as well as 66.67% of the proposed 6-input feature vector belong to this category.

  12. Classification of ECG signal with Support Vector Machine Method for Arrhythmia Detection

    NASA Astrophysics Data System (ADS)

    Turnip, Arjon; Ilham Rizqywan, M.; Kusumandari, Dwi E.; Turnip, Mardi; Sihombing, Poltak

    2018-03-01

    An electrocardiogram is a potential bioelectric record that occurs as a result of cardiac activity. QRS Detection with zero crossing calculation is one method that can precisely determine peak R of QRS wave as part of arrhythmia detection. In this paper, two experimental scheme (2 minutes duration with different activities: relaxed and, typing) were conducted. From the two experiments it were obtained: accuracy, sensitivity, and positive predictivity about 100% each for the first experiment and about 79%, 93%, 83% for the second experiment, respectively. Furthermore, the feature set of MIT-BIH arrhythmia using the support vector machine (SVM) method on the WEKA software is evaluated. By combining the available attributes on the WEKA algorithm, the result is constant since all classes of SVM goes to the normal class with average 88.49% accuracy.

  13. Losses analysis of soft magnetic ring core under sinusoidal pulse width modulation (SPWM) and space vector pulse width modulation (SVPWM) excitations

    NASA Astrophysics Data System (ADS)

    Gao, Hezhe; Li, Yongjian; Wang, Shanming; Zhu, Jianguo; Yang, Qingxin; Zhang, Changgeng; Li, Jingsong

    2018-05-01

    Practical core losses in electrical machines differ significantly from those experimental results using the standardized measurement method, i.e. Epstein Frame method. In order to obtain a better approximation of the losses in an electrical machine, a simulation method considering sinusoidal pulse width modulation (SPWM) and space vector pulse width modulation (SVPWM) waveforms is proposed. The influence of the pulse width modulation (PWM) parameters on the harmonic components in SPWM and SVPWM is discussed by fast Fourier transform (FFT). Three-level SPWM and SVPWM are analyzed and compared both by simulation and experiment. The core losses of several ring samples magnetized by SPWM, SVPWM and sinusoidal alternating current (AC) are obtained. In addition, the temperature rise of the samples under SPWM, sinusoidal excitation are analyzed and compared.

  14. Brain Biology Machine Initiative: Developing Innovative Novel Methods to Improve Neuro-Rehabilitation for Amputees and Treatment for Patients at Remote Sites with Acute Brain Injury

    DTIC Science & Technology

    2010-10-01

    bode well for the future. The paper we submitted to the Journal of Neuroscience detailing the TVAG rabies tracer system was accepted with revisions...of brain electrical activity. Stas Kounitsky successfully completed the port of the new vector-additive implicit (VAI) method for the anisotropic ...Alternating Difference 14 Implicit (ADI) for isotropic head models, and the Vector Additive Implicit (VAI) for anisotropic head models. The ADI method

  15. Subpixel urban land cover estimation: comparing cubist, random forests, and support vector regression

    Treesearch

    Jeffrey T. Walton

    2008-01-01

    Three machine learning subpixel estimation methods (Cubist, Random Forests, and support vector regression) were applied to estimate urban cover. Urban forest canopy cover and impervious surface cover were estimated from Landsat-7 ETM+ imagery using a higher resolution cover map resampled to 30 m as training and reference data. Three different band combinations (...

  16. Divide and Recombine for Large Complex Data

    DTIC Science & Technology

    2017-12-01

    Empirical Methods in Natural Language Processing , October 2014 Keywords Enter keywords for the publication. URL Enter the URL...low-latency data processing systems. Declarative Languages for Interactive Visualization: The Reactive Vega Stack Another thread of XDATA research...for array processing operations embedded in the R programming language . Vector virtual machines work well for long vectors. One of the most

  17. Retrieval of aerosol optical depth from surface solar radiation measurements using machine learning algorithms, non-linear regression and a radiative transfer-based look-up table

    NASA Astrophysics Data System (ADS)

    Huttunen, Jani; Kokkola, Harri; Mielonen, Tero; Esa Juhani Mononen, Mika; Lipponen, Antti; Reunanen, Juha; Vilhelm Lindfors, Anders; Mikkonen, Santtu; Erkki Juhani Lehtinen, Kari; Kouremeti, Natalia; Bais, Alkiviadis; Niska, Harri; Arola, Antti

    2016-07-01

    In order to have a good estimate of the current forcing by anthropogenic aerosols, knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from the 1990s onward. One option to lengthen the AOD time series beyond the 1990s is to retrieve AOD from surface solar radiation (SSR) measurements taken with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a non-linear regression method and four machine learning methods (Gaussian process, neural network, random forest and support vector machine) with AOD observations carried out with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and non-linear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the random forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval, whereas the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during the observation period.

  18. The construction of support vector machine classifier using the firefly algorithm.

    PubMed

    Chao, Chih-Feng; Horng, Ming-Huwi

    2015-01-01

    The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.

  19. The Construction of Support Vector Machine Classifier Using the Firefly Algorithm

    PubMed Central

    Chao, Chih-Feng; Horng, Ming-Huwi

    2015-01-01

    The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy. PMID:25802511

  20. Localization of thermal anomalies in electrical equipment using Infrared Thermography and support vector machine

    NASA Astrophysics Data System (ADS)

    Laib dit Leksir, Y.; Mansour, M.; Moussaoui, A.

    2018-03-01

    Analysis and processing of databases obtained from infrared thermal inspections made on electrical installations require the development of new tools to obtain more information to visual inspections. Consequently, methods based on the capture of thermal images show a great potential and are increasingly employed in this field. However, there is a need for the development of effective techniques to analyse these databases in order to extract significant information relating to the state of the infrastructures. This paper presents a technique explaining how this approach can be implemented and proposes a system that can help to detect faults in thermal images of electrical installations. The proposed method classifies and identifies the region of interest (ROI). The identification is conducted using support vector machine (SVM) algorithm. The aim here is to capture the faults that exist in electrical equipments during an inspection of some machines using A40 FLIR camera. After that, binarization techniques are employed to select the region of interest. Later the comparative analysis of the obtained misclassification errors using the proposed method with Fuzzy c means and Ostu, has also be addressed.

  1. Application of machine learning on brain cancer multiclass classification

    NASA Astrophysics Data System (ADS)

    Panca, V.; Rustam, Z.

    2017-07-01

    Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.

  2. Universal Parameter Measurement and Sensorless Vector Control of Induction and Permanent Magnet Synchronous Motors

    NASA Astrophysics Data System (ADS)

    Yamamoto, Shu; Ara, Takahiro

    Recently, induction motors (IMs) and permanent-magnet synchronous motors (PMSMs) have been used in various industrial drive systems. The features of the hardware device used for controlling the adjustable-speed drive in these motors are almost identical. Despite this, different techniques are generally used for parameter measurement and speed-sensorless control of these motors. If the same technique can be used for parameter measurement and sensorless control, a highly versatile adjustable-speed-drive system can be realized. In this paper, the authors describe a new universal sensorless control technique for both IMs and PMSMs (including salient pole and nonsalient pole machines). A mathematical model applicable for IMs and PMSMs is discussed. Using this model, the authors derive the proposed universal sensorless vector control algorithm on the basis of estimation of the stator flux linkage vector. All the electrical motor parameters are determined by a unified test procedure. The proposed method is implemented on three test machines. The actual driving test results demonstrate the validity of the proposed method.

  3. Methods, systems and apparatus for adjusting duty cycle of pulse width modulated (PWM) waveforms

    DOEpatents

    Gallegos-Lopez, Gabriel; Kinoshita, Michael H; Ransom, Ray M; Perisic, Milun

    2013-05-21

    Embodiments of the present invention relate to methods, systems and apparatus for controlling operation of a multi-phase machine in a vector controlled motor drive system when the multi-phase machine operates in an overmodulation region. The disclosed embodiments provide a mechanism for adjusting a duty cycle of PWM waveforms so that the correct phase voltage command signals are applied at the angle transitions. This can reduce variations/errors in the phase voltage command signals applied to the multi-phase machine so that phase current may be properly regulated thus reducing current/torque oscillation, which can in turn improve machine efficiency and performance, as well as utilization of the DC voltage source.

  4. Automated image segmentation using support vector machines

    NASA Astrophysics Data System (ADS)

    Powell, Stephanie; Magnotta, Vincent A.; Andreasen, Nancy C.

    2007-03-01

    Neurodegenerative and neurodevelopmental diseases demonstrate problems associated with brain maturation and aging. Automated methods to delineate brain structures of interest are required to analyze large amounts of imaging data like that being collected in several on going multi-center studies. We have previously reported on using artificial neural networks (ANN) to define subcortical brain structures including the thalamus (0.88), caudate (0.85) and the putamen (0.81). In this work, apriori probability information was generated using Thirion's demons registration algorithm. The input vector consisted of apriori probability, spherical coordinates, and an iris of surrounding signal intensity values. We have applied the support vector machine (SVM) machine learning algorithm to automatically segment subcortical and cerebellar regions using the same input vector information. SVM architecture was derived from the ANN framework. Training was completed using a radial-basis function kernel with gamma equal to 5.5. Training was performed using 15,000 vectors collected from 15 training images in approximately 10 minutes. The resulting support vectors were applied to delineate 10 images not part of the training set. Relative overlap calculated for the subcortical structures was 0.87 for the thalamus, 0.84 for the caudate, 0.84 for the putamen, and 0.72 for the hippocampus. Relative overlap for the cerebellar lobes ranged from 0.76 to 0.86. The reliability of the SVM based algorithm was similar to the inter-rater reliability between manual raters and can be achieved without rater intervention.

  5. TU-H-CAMPUS-JeP2-03: Machine-Learning-Based Delineation Framework of GTV Regions of Solid and Ground Glass Opacity Lung Tumors at Datasets of Planning CT and PET/CT Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ikushima, K; Arimura, H; Jin, Z

    Purpose: In radiation treatment planning, delineation of gross tumor volume (GTV) is very important, because the GTVs affect the accuracies of radiation therapy procedure. To assist radiation oncologists in the delineation of GTV regions while treatment planning for lung cancer, we have proposed a machine-learning-based delineation framework of GTV regions of solid and ground glass opacity (GGO) lung tumors following by optimum contour selection (OCS) method. Methods: Our basic idea was to feed voxel-based image features around GTV contours determined by radiation oncologists into a machine learning classifier in the training step, after which the classifier produced the degree ofmore » GTV for each voxel in the testing step. Ten data sets of planning CT and PET/CT images were selected for this study. The support vector machine (SVM), which learned voxel-based features which include voxel value and magnitudes of image gradient vector that obtained from each voxel in the planning CT and PET/CT images, extracted initial GTV regions. The final GTV regions were determined using the OCS method that was able to select a global optimum object contour based on multiple active delineations with a level set method around the GTV. To evaluate the results of proposed framework for ten cases (solid:6, GGO:4), we used the three-dimensional Dice similarity coefficient (DSC), which denoted the degree of region similarity between the GTVs delineated by radiation oncologists and the proposed framework. Results: The proposed method achieved an average three-dimensional DSC of 0.81 for ten lung cancer patients, while a standardized uptake value-based method segmented GTV regions with the DSC of 0.43. The average DSCs for solid and GGO were 0.84 and 0.76, respectively, obtained by the proposed framework. Conclusion: The proposed framework with the support vector machine may be useful for assisting radiation oncologists in delineating solid and GGO lung tumors.« less

  6. Development of a sugar-binding residue prediction system from protein sequences using support vector machine.

    PubMed

    Banno, Masaki; Komiyama, Yusuke; Cao, Wei; Oku, Yuya; Ueki, Kokoro; Sumikoshi, Kazuya; Nakamura, Shugo; Terada, Tohru; Shimizu, Kentaro

    2017-02-01

    Several methods have been proposed for protein-sugar binding site prediction using machine learning algorithms. However, they are not effective to learn various properties of binding site residues caused by various interactions between proteins and sugars. In this study, we classified sugars into acidic and nonacidic sugars and showed that their binding sites have different amino acid occurrence frequencies. By using this result, we developed sugar-binding residue predictors dedicated to the two classes of sugars: an acid sugar binding predictor and a nonacidic sugar binding predictor. We also developed a combination predictor which combines the results of the two predictors. We showed that when a sugar is known to be an acidic sugar, the acidic sugar binding predictor achieves the best performance, and showed that when a sugar is known to be a nonacidic sugar or is not known to be either of the two classes, the combination predictor achieves the best performance. Our method uses only amino acid sequences for prediction. Support vector machine was used as a machine learning algorithm and the position-specific scoring matrix created by the position-specific iterative basic local alignment search tool was used as the feature vector. We evaluated the performance of the predictors using five-fold cross-validation. We have launched our system, as an open source freeware tool on the GitHub repository (https://doi.org/10.5281/zenodo.61513). Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  7. Channelized relevance vector machine as a numerical observer for cardiac perfusion defect detection task

    NASA Astrophysics Data System (ADS)

    Kalayeh, Mahdi M.; Marin, Thibault; Pretorius, P. Hendrik; Wernick, Miles N.; Yang, Yongyi; Brankov, Jovan G.

    2011-03-01

    In this paper, we present a numerical observer for image quality assessment, aiming to predict human observer accuracy in a cardiac perfusion defect detection task for single-photon emission computed tomography (SPECT). In medical imaging, image quality should be assessed by evaluating the human observer accuracy for a specific diagnostic task. This approach is known as task-based assessment. Such evaluations are important for optimizing and testing imaging devices and algorithms. Unfortunately, human observer studies with expert readers are costly and time-demanding. To address this problem, numerical observers have been developed as a surrogate for human readers to predict human diagnostic performance. The channelized Hotelling observer (CHO) with internal noise model has been found to predict human performance well in some situations, but does not always generalize well to unseen data. We have argued in the past that finding a model to predict human observers could be viewed as a machine learning problem. Following this approach, in this paper we propose a channelized relevance vector machine (CRVM) to predict human diagnostic scores in a detection task. We have previously used channelized support vector machines (CSVM) to predict human scores and have shown that this approach offers better and more robust predictions than the classical CHO method. The comparison of the proposed CRVM with our previously introduced CSVM method suggests that CRVM can achieve similar generalization accuracy, while dramatically reducing model complexity and computation time.

  8. Improvements on ν-Twin Support Vector Machine.

    PubMed

    Khemchandani, Reshma; Saigal, Pooja; Chandra, Suresh

    2016-07-01

    In this paper, we propose two novel binary classifiers termed as "Improvements on ν-Twin Support Vector Machine: Iν-TWSVM and Iν-TWSVM (Fast)" that are motivated by ν-Twin Support Vector Machine (ν-TWSVM). Similar to ν-TWSVM, Iν-TWSVM determines two nonparallel hyperplanes such that they are closer to their respective classes and are at least ρ distance away from the other class. The significant advantage of Iν-TWSVM over ν-TWSVM is that Iν-TWSVM solves one smaller-sized Quadratic Programming Problem (QPP) and one Unconstrained Minimization Problem (UMP); as compared to solving two related QPPs in ν-TWSVM. Further, Iν-TWSVM (Fast) avoids solving a smaller sized QPP and transforms it as a unimodal function, which can be solved using line search methods and similar to Iν-TWSVM, the other problem is solved as a UMP. Due to their novel formulation, the proposed classifiers are faster than ν-TWSVM and have comparable generalization ability. Iν-TWSVM also implements structural risk minimization (SRM) principle by introducing a regularization term, along with minimizing the empirical risk. The other properties of Iν-TWSVM, related to support vectors (SVs), are similar to that of ν-TWSVM. To test the efficacy of the proposed method, experiments have been conducted on a wide range of UCI and a skewed variation of NDC datasets. We have also given the application of Iν-TWSVM as a binary classifier for pixel classification of color images. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Use seismic colored inversion and power law committee machine based on imperial competitive algorithm for improving porosity prediction in a heterogeneous reservoir

    NASA Astrophysics Data System (ADS)

    Ansari, Hamid Reza

    2014-09-01

    In this paper we propose a new method for predicting rock porosity based on a combination of several artificial intelligence systems. The method focuses on one of the Iranian carbonate fields in the Persian Gulf. Because there is strong heterogeneity in carbonate formations, estimation of rock properties experiences more challenge than sandstone. For this purpose, seismic colored inversion (SCI) and a new approach of committee machine are used in order to improve porosity estimation. The study comprises three major steps. First, a series of sample-based attributes is calculated from 3D seismic volume. Acoustic impedance is an important attribute that is obtained by the SCI method in this study. Second, porosity log is predicted from seismic attributes using common intelligent computation systems including: probabilistic neural network (PNN), radial basis function network (RBFN), multi-layer feed forward network (MLFN), ε-support vector regression (ε-SVR) and adaptive neuro-fuzzy inference system (ANFIS). Finally, a power law committee machine (PLCM) is constructed based on imperial competitive algorithm (ICA) to combine the results of all previous predictions in a single solution. This technique is called PLCM-ICA in this paper. The results show that PLCM-ICA model improved the results of neural networks, support vector machine and neuro-fuzzy system.

  10. Identification of handwriting by using the genetic algorithm (GA) and support vector machine (SVM)

    NASA Astrophysics Data System (ADS)

    Zhang, Qigui; Deng, Kai

    2016-12-01

    As portable digital camera and a camera phone comes more and more popular, and equally pressing is meeting the requirements of people to shoot at any time, to identify and storage handwritten character. In this paper, genetic algorithm(GA) and support vector machine(SVM)are used for identification of handwriting. Compare with parameters-optimized method, this technique overcomes two defects: first, it's easy to trap in the local optimum; second, finding the best parameters in the larger range will affects the efficiency of classification and prediction. As the experimental results suggest, GA-SVM has a higher recognition rate.

  11. Optimization of Support Vector Machine (SVM) for Object Classification

    NASA Technical Reports Server (NTRS)

    Scholten, Matthew; Dhingra, Neil; Lu, Thomas T.; Chao, Tien-Hsin

    2012-01-01

    The Support Vector Machine (SVM) is a powerful algorithm, useful in classifying data into species. The SVMs implemented in this research were used as classifiers for the final stage in a Multistage Automatic Target Recognition (ATR) system. A single kernel SVM known as SVMlight, and a modified version known as a SVM with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SVM as a method for classification. From trial to trial, SVM produces consistent results.

  12. Ischemic stroke lesion segmentation in multi-spectral MR images with support vector machine classifiers

    NASA Astrophysics Data System (ADS)

    Maier, Oskar; Wilms, Matthias; von der Gablentz, Janina; Krämer, Ulrike; Handels, Heinz

    2014-03-01

    Automatic segmentation of ischemic stroke lesions in magnetic resonance (MR) images is important in clinical practice and for neuroscientific trials. The key problem is to detect largely inhomogeneous regions of varying sizes, shapes and locations. We present a stroke lesion segmentation method based on local features extracted from multi-spectral MR data that are selected to model a human observer's discrimination criteria. A support vector machine classifier is trained on expert-segmented examples and then used to classify formerly unseen images. Leave-one-out cross validation on eight datasets with lesions of varying appearances is performed, showing our method to compare favourably with other published approaches in terms of accuracy and robustness. Furthermore, we compare a number of feature selectors and closely examine each feature's and MR sequence's contribution.

  13. Spectrophotometric determination of ternary mixtures of thiamin, riboflavin and pyridoxal in pharmaceutical and human plasma by least-squares support vector machines.

    PubMed

    Niazi, Ali; Zolgharnein, Javad; Afiuni-Zadeh, Somaie

    2007-11-01

    Ternary mixtures of thiamin, riboflavin and pyridoxal have been simultaneously determined in synthetic and real samples by applications of spectrophotometric and least-squares support vector machines. The calibration graphs were linear in the ranges of 1.0 - 20.0, 1.0 - 10.0 and 1.0 - 20.0 microg ml(-1) with detection limits of 0.6, 0.5 and 0.7 microg ml(-1) for thiamin, riboflavin and pyridoxal, respectively. The experimental calibration matrix was designed with 21 mixtures of these chemicals. The concentrations were varied between calibration graph concentrations of vitamins. The simultaneous determination of these vitamin mixtures by using spectrophotometric methods is a difficult problem, due to spectral interferences. The partial least squares (PLS) modeling and least-squares support vector machines were used for the multivariate calibration of the spectrophotometric data. An excellent model was built using LS-SVM, with low prediction errors and superior performance in relation to PLS. The root mean square errors of prediction (RMSEP) for thiamin, riboflavin and pyridoxal with PLS and LS-SVM were 0.6926, 0.3755, 0.4322 and 0.0421, 0.0318, 0.0457, respectively. The proposed method was satisfactorily applied to the rapid simultaneous determination of thiamin, riboflavin and pyridoxal in commercial pharmaceutical preparations and human plasma samples.

  14. Introducing the Filtered Park's and Filtered Extended Park's Vector Approach to detect broken rotor bars in induction motors independently from the rotor slots number

    NASA Astrophysics Data System (ADS)

    Gyftakis, Konstantinos N.; Marques Cardoso, Antonio J.; Antonino-Daviu, Jose A.

    2017-09-01

    The Park's Vector Approach (PVA), together with its variations, has been one of the most widespread diagnostic methods for electrical machines and drives. Regarding the broken rotor bars fault diagnosis in induction motors, the common practice is to rely on the width increase of the Park's Vector (PV) ring and then apply some more sophisticated signal processing methods. It is shown in this paper that this method can be unreliable and is strongly dependent on the magnetic poles and rotor slot numbers. To overcome this constraint, the novel Filtered Park's/Extended Park's Vector Approach (FPVA/FEPVA) is introduced. The investigation is carried out with FEM simulations and experimental testing. The results prove to satisfyingly coincide, whereas the proposed advanced FPVA method is desirably reliable.

  15. Biomarkers of Eating Disorders Using Support Vector Machine Analysis of Structural Neuroimaging Data: Preliminary Results

    PubMed Central

    Cerasa, Antonio; Castiglioni, Isabella; Salvatore, Christian; Funaro, Angela; Martino, Iolanda; Alfano, Stefania; Donzuso, Giulia; Perrotta, Paolo; Gioia, Maria Cecilia; Gilardi, Maria Carla; Quattrone, Aldo

    2015-01-01

    Presently, there are no valid biomarkers to identify individuals with eating disorders (ED). The aim of this work was to assess the feasibility of a machine learning method for extracting reliable neuroimaging features allowing individual categorization of patients with ED. Support Vector Machine (SVM) technique, combined with a pattern recognition method, was employed utilizing structural magnetic resonance images. Seventeen females with ED (six with diagnosis of anorexia nervosa and 11 with bulimia nervosa) were compared against 17 body mass index-matched healthy controls (HC). Machine learning allowed individual diagnosis of ED versus HC with an Accuracy ≥ 0.80. Voxel-based pattern recognition analysis demonstrated that voxels influencing the classification Accuracy involved the occipital cortex, the posterior cerebellar lobule, precuneus, sensorimotor/premotor cortices, and the medial prefrontal cortex, all critical regions known to be strongly involved in the pathophysiological mechanisms of ED. Although these findings should be considered preliminary given the small size investigated, SVM analysis highlights the role of well-known brain regions as possible biomarkers to distinguish ED from HC at an individual level, thus encouraging the translational implementation of this new multivariate approach in the clinical practice. PMID:26648660

  16. Comparison of four machine learning methods for object-oriented change detection in high-resolution satellite imagery

    NASA Astrophysics Data System (ADS)

    Bai, Ting; Sun, Kaimin; Deng, Shiquan; Chen, Yan

    2018-03-01

    High resolution image change detection is one of the key technologies of remote sensing application, which is of great significance for resource survey, environmental monitoring, fine agriculture, military mapping and battlefield environment detection. In this paper, for high-resolution satellite imagery, Random Forest (RF), Support Vector Machine (SVM), Deep belief network (DBN), and Adaboost models were established to verify the possibility of different machine learning applications in change detection. In order to compare detection accuracy of four machine learning Method, we applied these four machine learning methods for two high-resolution images. The results shows that SVM has higher overall accuracy at small samples compared to RF, Adaboost, and DBN for binary and from-to change detection. With the increase in the number of samples, RF has higher overall accuracy compared to Adaboost, SVM and DBN.

  17. Cervical cancer survival prediction using hybrid of SMOTE, CART and smooth support vector machine

    NASA Astrophysics Data System (ADS)

    Purnami, S. W.; Khasanah, P. M.; Sumartini, S. H.; Chosuvivatwong, V.; Sriplung, H.

    2016-04-01

    According to the WHO, every two minutes there is one patient who died from cervical cancer. The high mortality rate is due to the lack of awareness of women for early detection. There are several factors that supposedly influence the survival of cervical cancer patients, including age, anemia status, stage, type of treatment, complications and secondary disease. This study wants to classify/predict cervical cancer survival based on those factors. Various classifications methods: classification and regression tree (CART), smooth support vector machine (SSVM), three order spline SSVM (TSSVM) were used. Since the data of cervical cancer are imbalanced, synthetic minority oversampling technique (SMOTE) is used for handling imbalanced dataset. Performances of these methods are evaluated using accuracy, sensitivity and specificity. Results of this study show that balancing data using SMOTE as preprocessing can improve performance of classification. The SMOTE-SSVM method provided better result than SMOTE-TSSVM and SMOTE-CART.

  18. Content-Based Discovery for Web Map Service using Support Vector Machine and User Relevance Feedback

    PubMed Central

    Cheng, Xiaoqiang; Qi, Kunlun; Zheng, Jie; You, Lan; Wu, Huayi

    2016-01-01

    Many discovery methods for geographic information services have been proposed. There are approaches for finding and matching geographic information services, methods for constructing geographic information service classification schemes, and automatic geographic information discovery. Overall, the efficiency of the geographic information discovery keeps improving., There are however, still two problems in Web Map Service (WMS) discovery that must be solved. Mismatches between the graphic contents of a WMS and the semantic descriptions in the metadata make discovery difficult for human users. End-users and computers comprehend WMSs differently creating semantic gaps in human-computer interactions. To address these problems, we propose an improved query process for WMSs based on the graphic contents of WMS layers, combining Support Vector Machine (SVM) and user relevance feedback. Our experiments demonstrate that the proposed method can improve the accuracy and efficiency of WMS discovery. PMID:27861505

  19. Breast Cancer Recognition Using a Novel Hybrid Intelligent Method

    PubMed Central

    Addeh, Jalil; Ebrahimzadeh, Ata

    2012-01-01

    Breast cancer is the second largest cause of cancer deaths among women. At the same time, it is also among the most curable cancer types if it can be diagnosed early. This paper presents a novel hybrid intelligent method for recognition of breast cancer tumors. The proposed method includes three main modules: the feature extraction module, the classifier module, and the optimization module. In the feature extraction module, fuzzy features are proposed as the efficient characteristic of the patterns. In the classifier module, because of the promising generalization capability of support vector machines (SVM), a SVM-based classifier is proposed. In support vector machine training, the hyperparameters have very important roles for its recognition accuracy. Therefore, in the optimization module, the bees algorithm (BA) is proposed for selecting appropriate parameters of the classifier. The proposed system is tested on Wisconsin Breast Cancer database and simulation results show that the recommended system has a high accuracy. PMID:23626945

  20. Content-Based Discovery for Web Map Service using Support Vector Machine and User Relevance Feedback.

    PubMed

    Hu, Kai; Gui, Zhipeng; Cheng, Xiaoqiang; Qi, Kunlun; Zheng, Jie; You, Lan; Wu, Huayi

    2016-01-01

    Many discovery methods for geographic information services have been proposed. There are approaches for finding and matching geographic information services, methods for constructing geographic information service classification schemes, and automatic geographic information discovery. Overall, the efficiency of the geographic information discovery keeps improving., There are however, still two problems in Web Map Service (WMS) discovery that must be solved. Mismatches between the graphic contents of a WMS and the semantic descriptions in the metadata make discovery difficult for human users. End-users and computers comprehend WMSs differently creating semantic gaps in human-computer interactions. To address these problems, we propose an improved query process for WMSs based on the graphic contents of WMS layers, combining Support Vector Machine (SVM) and user relevance feedback. Our experiments demonstrate that the proposed method can improve the accuracy and efficiency of WMS discovery.

  1. Effective 2D-3D medical image registration using Support Vector Machine.

    PubMed

    Qi, Wenyuan; Gu, Lixu; Zhao, Qiang

    2008-01-01

    Registration of pre-operative 3D volume dataset and intra-operative 2D images gradually becomes an important technique to assist radiologists in diagnosing complicated diseases easily and quickly. In this paper, we proposed a novel 2D/3D registration framework based on Support Vector Machine (SVM) to compensate the disadvantages of generating large number of DRR images in the stage of intra-operation. Estimated similarity metric distribution could be built up from the relationship between parameters of transform and prior sparse target metric values by means of SVR method. Based on which, global optimal parameters of transform are finally searched out by an optimizer in order to guide 3D volume dataset to match intra-operative 2D image. Experiments reveal that our proposed registration method improved performance compared to conventional registration method and also provided a precise registration result efficiently.

  2. Protein Kinase Classification with 2866 Hidden Markov Models and One Support Vector Machine

    NASA Technical Reports Server (NTRS)

    Weber, Ryan; New, Michael H.; Fonda, Mark (Technical Monitor)

    2002-01-01

    The main application considered in this paper is predicting true kinases from randomly permuted kinases that share the same length and amino acid distributions as the true kinases. Numerous methods already exist for this classification task, such as HMMs, motif-matchers, and sequence comparison algorithms. We build on some of these efforts by creating a vector from the output of thousands of structurally based HMMs, created offline with Pfam-A seed alignments using SAM-T99, which then must be combined into an overall classification for the protein. Then we use a Support Vector Machine for classifying this large ensemble Pfam-Vector, with a polynomial and chisquared kernel. In particular, the chi-squared kernel SVM performs better than the HMMs and better than the BLAST pairwise comparisons, when predicting true from false kinases in some respects, but no one algorithm is best for all purposes or in all instances so we consider the particular strengths and weaknesses of each.

  3. Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm.

    PubMed

    Mao, Yong; Zhou, Xiao-Bo; Pi, Dao-Ying; Sun, You-Xian; Wong, Stephen T C

    2005-10-01

    In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.

  4. Application of biomonitoring and support vector machine in water quality assessment*

    PubMed Central

    Liao, Yue; Xu, Jian-yu; Wang, Zhu-wei

    2012-01-01

    The behavior of schools of zebrafish (Danio rerio) was studied in acute toxicity environments. Behavioral features were extracted and a method for water quality assessment using support vector machine (SVM) was developed. The behavioral parameters of fish were recorded and analyzed during one hour in an environment of a 24-h half-lethal concentration (LC50) of a pollutant. The data were used to develop a method to evaluate water quality, so as to give an early indication of toxicity. Four kinds of metal ions (Cu2+, Hg2+, Cr6+, and Cd2+) were used for toxicity testing. To enhance the efficiency and accuracy of assessment, a method combining SVM and a genetic algorithm (GA) was used. The results showed that the average prediction accuracy of the method was over 80% and the time cost was acceptable. The method gave satisfactory results for a variety of metal pollutants, demonstrating that this is an effective approach to the classification of water quality. PMID:22467374

  5. [Discrimination of types of polyacrylamide based on near infrared spectroscopy coupled with least square support vector machine].

    PubMed

    Zhang, Hong-Guang; Yang, Qin-Min; Lu, Jian-Gang

    2014-04-01

    In this paper, a novel discriminant methodology based on near infrared spectroscopic analysis technique and least square support vector machine was proposed for rapid and nondestructive discrimination of different types of Polyacrylamide. The diffuse reflectance spectra of samples of Non-ionic Polyacrylamide, Anionic Polyacrylamide and Cationic Polyacrylamide were measured. Then principal component analysis method was applied to reduce the dimension of the spectral data and extract of the principal compnents. The first three principal components were used for cluster analysis of the three different types of Polyacrylamide. Then those principal components were also used as inputs of least square support vector machine model. The optimization of the parameters and the number of principal components used as inputs of least square support vector machine model was performed through cross validation based on grid search. 60 samples of each type of Polyacrylamide were collected. Thus a total of 180 samples were obtained. 135 samples, 45 samples for each type of Polyacrylamide, were randomly split into a training set to build calibration model and the rest 45 samples were used as test set to evaluate the performance of the developed model. In addition, 5 Cationic Polyacrylamide samples and 5 Anionic Polyacrylamide samples adulterated with different proportion of Non-ionic Polyacrylamide were also prepared to show the feasibilty of the proposed method to discriminate the adulterated Polyacrylamide samples. The prediction error threshold for each type of Polyacrylamide was determined by F statistical significance test method based on the prediction error of the training set of corresponding type of Polyacrylamide in cross validation. The discrimination accuracy of the built model was 100% for prediction of the test set. The prediction of the model for the 10 mixing samples was also presented, and all mixing samples were accurately discriminated as adulterated samples. The overall results demonstrate that the discrimination method proposed in the present paper can rapidly and nondestructively discriminate the different types of Polyacrylamide and the adulterated Polyacrylamide samples, and offered a new approach to discriminate the types of Polyacrylamide.

  6. Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data.

    PubMed

    Balabin, Roman M; Lomakina, Ekaterina I

    2011-04-21

    In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.

  7. Multidirectional Scanning Model, MUSCLE, to Vectorize Raster Images with Straight Lines

    PubMed Central

    Karas, Ismail Rakip; Bayram, Bulent; Batuk, Fatmagul; Akay, Abdullah Emin; Baz, Ibrahim

    2008-01-01

    This paper presents a new model, MUSCLE (Multidirectional Scanning for Line Extraction), for automatic vectorization of raster images with straight lines. The algorithm of the model implements the line thinning and the simple neighborhood methods to perform vectorization. The model allows users to define specified criteria which are crucial for acquiring the vectorization process. In this model, various raster images can be vectorized such as township plans, maps, architectural drawings, and machine plans. The algorithm of the model was developed by implementing an appropriate computer programming and tested on a basic application. Results, verified by using two well known vectorization programs (WinTopo and Scan2CAD), indicated that the model can successfully vectorize the specified raster data quickly and accurately. PMID:27879843

  8. Multidirectional Scanning Model, MUSCLE, to Vectorize Raster Images with Straight Lines.

    PubMed

    Karas, Ismail Rakip; Bayram, Bulent; Batuk, Fatmagul; Akay, Abdullah Emin; Baz, Ibrahim

    2008-04-15

    This paper presents a new model, MUSCLE (Multidirectional Scanning for Line Extraction), for automatic vectorization of raster images with straight lines. The algorithm of the model implements the line thinning and the simple neighborhood methods to perform vectorization. The model allows users to define specified criteria which are crucial for acquiring the vectorization process. In this model, various raster images can be vectorized such as township plans, maps, architectural drawings, and machine plans. The algorithm of the model was developed by implementing an appropriate computer programming and tested on a basic application. Results, verified by using two well known vectorization programs (WinTopo and Scan2CAD), indicated that the model can successfully vectorize the specified raster data quickly and accurately.

  9. MO-F-CAMPUS-J-02: Automatic Recognition of Patient Treatment Site in Portal Images Using Machine Learning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, X; Yang, D

    Purpose: To investigate the method to automatically recognize the treatment site in the X-Ray portal images. It could be useful to detect potential treatment errors, and to provide guidance to sequential tasks, e.g. automatically verify the patient daily setup. Methods: The portal images were exported from MOSAIQ as DICOM files, and were 1) processed with a threshold based intensity transformation algorithm to enhance contrast, and 2) where then down-sampled (from 1024×768 to 128×96) by using bi-cubic interpolation algorithm. An appearance-based vector space model (VSM) was used to rearrange the images into vectors. A principal component analysis (PCA) method was usedmore » to reduce the vector dimensions. A multi-class support vector machine (SVM), with radial basis function kernel, was used to build the treatment site recognition models. These models were then used to recognize the treatment sites in the portal image. Portal images of 120 patients were included in the study. The images were selected to cover six treatment sites: brain, head and neck, breast, lung, abdomen and pelvis. Each site had images of the twenty patients. Cross-validation experiments were performed to evaluate the performance. Results: MATLAB image processing Toolbox and scikit-learn (a machine learning library in python) were used to implement the proposed method. The average accuracies using the AP and RT images separately were 95% and 94% respectively. The average accuracy using AP and RT images together was 98%. Computation time was ∼0.16 seconds per patient with AP or RT image, ∼0.33 seconds per patient with both of AP and RT images. Conclusion: The proposed method of treatment site recognition is efficient and accurate. It is not sensitive to the differences of image intensity, size and positions of patients in the portal images. It could be useful for the patient safety assurance. The work was partially supported by a research grant from Varian Medical System.« less

  10. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.

    PubMed

    Zhang, Jieru; Ju, Ying; Lu, Huijuan; Xuan, Ping; Zou, Quan

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.

  11. Component Pin Recognition Using Algorithms Based on Machine Learning

    NASA Astrophysics Data System (ADS)

    Xiao, Yang; Hu, Hong; Liu, Ze; Xu, Jiangchang

    2018-04-01

    The purpose of machine vision for a plug-in machine is to improve the machine’s stability and accuracy, and recognition of the component pin is an important part of the vision. This paper focuses on component pin recognition using three different techniques. The first technique involves traditional image processing using the core algorithm for binary large object (BLOB) analysis. The second technique uses the histogram of oriented gradients (HOG), to experimentally compare the effect of the support vector machine (SVM) and the adaptive boosting machine (AdaBoost) learning meta-algorithm classifiers. The third technique is the use of an in-depth learning method known as convolution neural network (CNN), which involves identifying the pin by comparing a sample to its training. The main purpose of the research presented in this paper is to increase the knowledge of learning methods used in the plug-in machine industry in order to achieve better results.

  12. Anytime query-tuned kernel machine classifiers via Cholesky factorization

    NASA Technical Reports Server (NTRS)

    DeCoste, D.

    2002-01-01

    We recently demonstrated 2 to 64-fold query-time speedups of Support Vector Machine and Kernel Fisher classifiers via a new computational geometry method for anytime output bounds (DeCoste,2002). This new paper refines our approach in two key ways. First, we introduce a simple linear algebra formulation based on Cholesky factorization, yielding simpler equations and lower computational overhead. Second, this new formulation suggests new methods for achieving additional speedups, including tuning on query samples. We demonstrate effectiveness on benchmark datasets.

  13. a Hyperspectral Image Classification Method Using Isomap and Rvm

    NASA Astrophysics Data System (ADS)

    Chang, H.; Wang, T.; Fang, H.; Su, Y.

    2018-04-01

    Classification is one of the most significant applications of hyperspectral image processing and even remote sensing. Though various algorithms have been proposed to implement and improve this application, there are still drawbacks in traditional classification methods. Thus further investigations on some aspects, such as dimension reduction, data mining, and rational use of spatial information, should be developed. In this paper, we used a widely utilized global manifold learning approach, isometric feature mapping (ISOMAP), to address the intrinsic nonlinearities of hyperspectral image for dimension reduction. Considering the impropriety of Euclidean distance in spectral measurement, we applied spectral angle (SA) for substitute when constructed the neighbourhood graph. Then, relevance vector machines (RVM) was introduced to implement classification instead of support vector machines (SVM) for simplicity, generalization and sparsity. Therefore, a probability result could be obtained rather than a less convincing binary result. Moreover, taking into account the spatial information of the hyperspectral image, we employ a spatial vector formed by different classes' ratios around the pixel. At last, we combined the probability results and spatial factors with a criterion to decide the final classification result. To verify the proposed method, we have implemented multiple experiments with standard hyperspectral images compared with some other methods. The results and different evaluation indexes illustrated the effectiveness of our method.

  14. Differentiation of Glioblastoma and Lymphoma Using Feature Extraction and Support Vector Machine.

    PubMed

    Yang, Zhangjing; Feng, Piaopiao; Wen, Tian; Wan, Minghua; Hong, Xunning

    2017-01-01

    Differentiation of glioblastoma multiformes (GBMs) and lymphomas using multi-sequence magnetic resonance imaging (MRI) is an important task that is valuable for treatment planning. However, this task is a challenge because GBMs and lymphomas may have a similar appearance in MRI images. This similarity may lead to misclassification and could affect the treatment results. In this paper, we propose a semi-automatic method based on multi-sequence MRI to differentiate these two types of brain tumors. Our method consists of three steps: 1) the key slice is selected from 3D MRIs and region of interests (ROIs) are drawn around the tumor region; 2) different features are extracted based on prior clinical knowledge and validated using a t-test; and 3) features that are helpful for classification are used to build an original feature vector and a support vector machine is applied to perform classification. In total, 58 GBM cases and 37 lymphoma cases are used to validate our method. A leave-one-out crossvalidation strategy is adopted in our experiments. The global accuracy of our method was determined as 96.84%, which indicates that our method is effective for the differentiation of GBM and lymphoma and can be applied in clinical diagnosis. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  15. Accurate prediction of protein-protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier.

    PubMed

    Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Li, Li-Ping; Huang, De-Shuang; Yan, Gui-Ying; Nie, Ru; Huang, Yu-An

    2017-04-04

    Identification of protein-protein interactions (PPIs) is of critical importance for deciphering the underlying mechanisms of almost all biological processes of cell and providing great insight into the study of human disease. Although much effort has been devoted to identifying PPIs from various organisms, existing high-throughput biological techniques are time-consuming, expensive, and have high false positive and negative results. Thus it is highly urgent to develop in silico methods to predict PPIs efficiently and accurately in this post genomic era. In this article, we report a novel computational model combining our newly developed discriminative vector machine classifier (DVM) and an improved Weber local descriptor (IWLD) for the prediction of PPIs. Two components, differential excitation and orientation, are exploited to build evolutionary features for each protein sequence. The main characteristics of the proposed method lies in introducing an effective feature descriptor IWLD which can capture highly discriminative evolutionary information from position-specific scoring matrixes (PSSM) of protein data, and employing the powerful and robust DVM classifier. When applying the proposed method to Yeast and H. pylori data sets, we obtained excellent prediction accuracies as high as 96.52% and 91.80%, respectively, which are significantly better than the previous methods. Extensive experiments were then performed for predicting cross-species PPIs and the predictive results were also pretty promising. To further validate the performance of the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier on Human data set. The experimental results obtained indicate that our method is highly effective for PPIs prediction and can be taken as a supplementary tool for future proteomics research.

  16. Machine learning methods in chemoinformatics

    PubMed Central

    Mitchell, John B O

    2014-01-01

    Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naïve Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481. How to cite this article: WIREs Comput Mol Sci 2014, 4:468–481. doi:10.1002/wcms.1183 PMID:25285160

  17. Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods.

    PubMed

    Lise, Stefano; Archambeau, Cedric; Pontil, Massimiliano; Jones, David T

    2009-10-30

    Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (DeltaDeltaG) measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots") at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition. We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which DeltaDeltaG >or= 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%. We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been applied separately to biomolecular problems, the results of our investigation indicate that there are substantial benefits to be gained by their integration.

  18. Analyzing big data with the hybrid interval regression methods.

    PubMed

    Huang, Chia-Hui; Yang, Keng-Chieh; Kao, Han-Ying

    2014-01-01

    Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes.

  19. Analyzing Big Data with the Hybrid Interval Regression Methods

    PubMed Central

    Kao, Han-Ying

    2014-01-01

    Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes. PMID:25143968

  20. A Support Vector Machine-Based Gender Identification Using Speech Signal

    NASA Astrophysics Data System (ADS)

    Lee, Kye-Hwan; Kang, Sang-Ick; Kim, Deok-Hwan; Chang, Joon-Hyuk

    We propose an effective voice-based gender identification method using a support vector machine (SVM). The SVM is a binary classification algorithm that classifies two groups by finding the voluntary nonlinear boundary in a feature space and is known to yield high classification performance. In the present work, we compare the identification performance of the SVM with that of a Gaussian mixture model (GMM)-based method using the mel frequency cepstral coefficients (MFCC). A novel approach of incorporating a features fusion scheme based on a combination of the MFCC and the fundamental frequency is proposed with the aim of improving the performance of gender identification. Experimental results demonstrate that the gender identification performance using the SVM is significantly better than that of the GMM-based scheme. Moreover, the performance is substantially improved when the proposed features fusion technique is applied.

  1. Application of a support vector machine algorithm to the safety precaution technique of medium-low pressure gas regulators

    NASA Astrophysics Data System (ADS)

    Hao, Xuejun; An, Xaioran; Wu, Bo; He, Shaoping

    2018-02-01

    In the gas pipeline system, safe operation of a gas regulator determines the stability of the fuel gas supply, and the medium-low pressure gas regulator of the safety precaution system is not perfect at the present stage in the Beijing Gas Group; therefore, safety precaution technique optimization has important social and economic significance. In this paper, according to the running status of the medium-low pressure gas regulator in the SCADA system, a new method for gas regulator safety precaution based on the support vector machine (SVM) is presented. This method takes the gas regulator outlet pressure data as input variables of the SVM model, the fault categories and degree as output variables, which will effectively enhance the precaution accuracy as well as save significant manpower and material resources.

  2. Speech sound classification and detection of articulation disorders with support vector machines and wavelets.

    PubMed

    Georgoulas, George; Georgopoulos, Voula C; Stylios, Chrysostomos D

    2006-01-01

    This paper proposes a novel integrated methodology to extract features and classify speech sounds with intent to detect the possible existence of a speech articulation disorder in a speaker. Articulation, in effect, is the specific and characteristic way that an individual produces the speech sounds. A methodology to process the speech signal, extract features and finally classify the signal and detect articulation problems in a speaker is presented. The use of support vector machines (SVMs), for the classification of speech sounds and detection of articulation disorders is introduced. The proposed method is implemented on a data set where different sets of features and different schemes of SVMs are tested leading to satisfactory performance.

  3. Prediction of mutagenic toxicity by combination of Recursive Partitioning and Support Vector Machines.

    PubMed

    Liao, Quan; Yao, Jianhua; Yuan, Shengang

    2007-05-01

    The study of prediction of toxicity is very important and necessary because measurement of toxicity is typically time-consuming and expensive. In this paper, Recursive Partitioning (RP) method was used to select descriptors. RP and Support Vector Machines (SVM) were used to construct structure-toxicity relationship models, RP model and SVM model, respectively. The performances of the two models are different. The prediction accuracies of the RP model are 80.2% for mutagenic compounds in MDL's toxicity database, 83.4% for compounds in CMC and 84.9% for agrochemicals in in-house database respectively. Those of SVM model are 81.4%, 87.0% and 87.3% respectively.

  4. StruLocPred: structure-based protein subcellular localisation prediction using multi-class support vector machine.

    PubMed

    Zhou, Wengang; Dickerson, Julie A

    2012-01-01

    Knowledge of protein subcellular locations can help decipher a protein's biological function. This work proposes new features: sequence-based: Hybrid Amino Acid Pair (HAAP) and two structure-based: Secondary Structural Element Composition (SSEC) and solvent accessibility state frequency. A multi-class Support Vector Machine is developed to predict the locations. Testing on two established data sets yields better prediction accuracies than the best available systems. Comparisons with existing methods show comparable results to ESLPred2. When StruLocPred is applied to the entire Arabidopsis proteome, over 77% of proteins with known locations match the prediction results. An implementation of this system is at http://wgzhou.ece. iastate.edu/StruLocPred/.

  5. Using support vector machines to improve elemental ion identification in macromolecular crystal structures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morshed, Nader; Lawrence Berkeley National Laboratory, Berkeley, CA 94720; Echols, Nathaniel, E-mail: nechols@lbl.gov

    2015-05-01

    A method to automatically identify possible elemental ions in X-ray crystal structures has been extended to use support vector machine (SVM) classifiers trained on selected structures in the PDB, with significantly improved sensitivity over manually encoded heuristics. In the process of macromolecular model building, crystallographers must examine electron density for isolated atoms and differentiate sites containing structured solvent molecules from those containing elemental ions. This task requires specific knowledge of metal-binding chemistry and scattering properties and is prone to error. A method has previously been described to identify ions based on manually chosen criteria for a number of elements. Here,more » the use of support vector machines (SVMs) to automatically classify isolated atoms as either solvent or one of various ions is described. Two data sets of protein crystal structures, one containing manually curated structures deposited with anomalous diffraction data and another with automatically filtered, high-resolution structures, were constructed. On the manually curated data set, an SVM classifier was able to distinguish calcium from manganese, zinc, iron and nickel, as well as all five of these ions from water molecules, with a high degree of accuracy. Additionally, SVMs trained on the automatically curated set of high-resolution structures were able to successfully classify most common elemental ions in an independent validation test set. This method is readily extensible to other elemental ions and can also be used in conjunction with previous methods based on a priori expectations of the chemical environment and X-ray scattering.« less

  6. Storage and computationally efficient permutations of factorized covariance and square-root information arrays

    NASA Technical Reports Server (NTRS)

    Muellerschoen, R. J.

    1988-01-01

    A unified method to permute vector stored Upper triangular Diagonal factorized covariance and vector stored upper triangular Square Root Information arrays is presented. The method involves cyclic permutation of the rows and columns of the arrays and retriangularization with fast (slow) Givens rotations (reflections). Minimal computation is performed, and a one dimensional scratch array is required. To make the method efficient for large arrays on a virtual memory machine, computations are arranged so as to avoid expensive paging faults. This method is potentially important for processing large volumes of radio metric data in the Deep Space Network.

  7. Boosting specificity of MEG artifact removal by weighted support vector machine.

    PubMed

    Duan, Fang; Phothisonothai, Montri; Kikuchi, Mitsuru; Yoshimura, Yuko; Minabe, Yoshio; Watanabe, Kastumi; Aihara, Kazuyuki

    2013-01-01

    An automatic artifact removal method of magnetoencephalogram (MEG) was presented in this paper. The method proposed is based on independent components analysis (ICA) and support vector machine (SVM). However, different from the previous studies, in this paper we consider two factors which would influence the performance. First, the imbalance factor of independent components (ICs) of MEG is handled by weighted SVM. Second, instead of simply setting a fixed weight to each class, a re-weighting scheme is used for the preservation of useful MEG ICs. Experimental results on manually marked MEG dataset showed that the method proposed could correctly distinguish the artifacts from the MEG ICs. Meanwhile, 99.72% ± 0.67 of MEG ICs were preserved. The classification accuracy was 97.91% ± 1.39. In addition, it was found that this method was not sensitive to individual differences. The cross validation (leave-one-subject-out) results showed an averaged accuracy of 97.41% ± 2.14.

  8. Feature Selection in Order to Extract Multiple Sclerosis Lesions Automatically in 3D Brain Magnetic Resonance Images Using Combination of Support Vector Machine and Genetic Algorithm.

    PubMed

    Khotanlou, Hassan; Afrasiabi, Mahlagha

    2012-10-01

    This paper presents a new feature selection approach for automatically extracting multiple sclerosis (MS) lesions in three-dimensional (3D) magnetic resonance (MR) images. Presented method is applicable to different types of MS lesions. In this method, T1, T2, and fluid attenuated inversion recovery (FLAIR) images are firstly preprocessed. In the next phase, effective features to extract MS lesions are selected by using a genetic algorithm (GA). The fitness function of the GA is the Similarity Index (SI) of a support vector machine (SVM) classifier. The results obtained on different types of lesions have been evaluated by comparison with manual segmentations. This algorithm is evaluated on 15 real 3D MR images using several measures. As a result, the SI between MS regions determined by the proposed method and radiologists was 87% on average. Experiments and comparisons with other methods show the effectiveness and the efficiency of the proposed approach.

  9. Support vector machines for prediction and analysis of beta and gamma-turns in proteins.

    PubMed

    Pham, Tho Hoan; Satou, Kenji; Ho, Tu Bao

    2005-04-01

    Tight turns have long been recognized as one of the three important features of proteins, together with alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns and most of the rest are gamma-turns. Analysis and prediction of beta-turns and gamma-turns is very useful for design of new molecules such as drugs, pesticides, and antigens. In this paper we investigated two aspects of applying support vector machine (SVM), a promising machine learning method for bioinformatics, to prediction and analysis of beta-turns and gamma-turns. First, we developed two SVM-based methods, called BTSVM and GTSVM, which predict beta-turns and gamma-turns in a protein from its sequence. When compared with other methods, BTSVM has a superior performance and GTSVM is competitive. Second, we used SVMs with a linear kernel to estimate the support of amino acids for the formation of beta-turns and gamma-turns depending on their position in a protein. Our analysis results are more comprehensive and easier to use than the previous results in designing turns in proteins.

  10. Using Support Vector Machine to identify imaging biomarkers of neurological and psychiatric disease: a critical review.

    PubMed

    Orrù, Graziella; Pettersson-Yeo, William; Marquand, Andre F; Sartori, Giuseppe; Mechelli, Andrea

    2012-04-01

    Standard univariate analysis of neuroimaging data has revealed a host of neuroanatomical and functional differences between healthy individuals and patients suffering a wide range of neurological and psychiatric disorders. Significant only at group level however these findings have had limited clinical translation, and recent attention has turned toward alternative forms of analysis, including Support-Vector-Machine (SVM). A type of machine learning, SVM allows categorisation of an individual's previously unseen data into a predefined group using a classification algorithm, developed on a training data set. In recent years, SVM has been successfully applied in the context of disease diagnosis, transition prediction and treatment prognosis, using both structural and functional neuroimaging data. Here we provide a brief overview of the method and review those studies that applied it to the investigation of Alzheimer's disease, schizophrenia, major depression, bipolar disorder, presymptomatic Huntington's disease, Parkinson's disease and autistic spectrum disorder. We conclude by discussing the main theoretical and practical challenges associated with the implementation of this method into the clinic and possible future directions. Copyright © 2012 Elsevier Ltd. All rights reserved.

  11. Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins.

    PubMed

    Zhang, Guangya; Ge, Huihua

    2013-10-01

    Understanding of proteins adaptive to hypersaline environment and identifying them is a challenging task and would help to design stable proteins. Here, we have systematically analyzed the normalized amino acid compositions of 2121 halophilic and 2400 non-halophilic proteins. The results showed that halophilic protein contained more Asp at the expense of Lys, Ile, Cys and Met, fewer small and hydrophobic residues, and showed a large excess of acidic over basic amino acids. Then, we introduce a support vector machine method to discriminate the halophilic and non-halophilic proteins, by using a novel Pearson VII universal function based kernel. In the three validation check methods, it achieved an overall accuracy of 97.7%, 91.7% and 86.9% and outperformed other machine learning algorithms. We also address the influence of protein size on prediction accuracy and found the worse performance for small size proteins might be some significant residues (Cys and Lys) were missing in the proteins. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.

  12. Stroke localization and classification using microwave tomography with k-means clustering and support vector machine.

    PubMed

    Guo, Lei; Abbosh, Amin

    2018-05-01

    For any chance for stroke patients to survive, the stroke type should be classified to enable giving medication within a few hours of the onset of symptoms. In this paper, a microwave-based stroke localization and classification framework is proposed. It is based on microwave tomography, k-means clustering, and a support vector machine (SVM) method. The dielectric profile of the brain is first calculated using the Born iterative method, whereas the amplitude of the dielectric profile is then taken as the input to k-means clustering. The cluster is selected as the feature vector for constructing and testing the SVM. A database of MRI-derived realistic head phantoms at different signal-to-noise ratios is used in the classification procedure. The performance of the proposed framework is evaluated using the receiver operating characteristic (ROC) curve. The results based on a two-dimensional framework show that 88% classification accuracy, with a sensitivity of 91% and a specificity of 87%, can be achieved. Bioelectromagnetics. 39:312-324, 2018. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.

  13. Optimizing support vector machine learning for semi-arid vegetation mapping by using clustering analysis

    NASA Astrophysics Data System (ADS)

    Su, Lihong

    In remote sensing communities, support vector machine (SVM) learning has recently received increasing attention. SVM learning usually requires large memory and enormous amounts of computation time on large training sets. According to SVM algorithms, the SVM classification decision function is fully determined by support vectors, which compose a subset of the training sets. In this regard, a solution to optimize SVM learning is to efficiently reduce training sets. In this paper, a data reduction method based on agglomerative hierarchical clustering is proposed to obtain smaller training sets for SVM learning. Using a multiple angle remote sensing dataset of a semi-arid region, the effectiveness of the proposed method is evaluated by classification experiments with a series of reduced training sets. The experiments show that there is no loss of SVM accuracy when the original training set is reduced to 34% using the proposed approach. Maximum likelihood classification (MLC) also is applied on the reduced training sets. The results show that MLC can also maintain the classification accuracy. This implies that the most informative data instances can be retained by this approach.

  14. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis

    PubMed Central

    2013-01-01

    Background Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. Results We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. Conclusions When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time. PMID:23815620

  15. Evaluation of a Machine-Learning Classifier for Keratoconus Detection Based on Scheimpflug Tomography.

    PubMed

    Ruiz Hidalgo, Irene; Rodriguez, Pablo; Rozema, Jos J; Ní Dhubhghaill, Sorcha; Zakaria, Nadia; Tassignon, Marie-José; Koppen, Carina

    2016-06-01

    To evaluate the performance of a support vector machine algorithm that automatically and objectively identifies corneal patterns based on a combination of 22 parameters obtained from Pentacam measurements and to compare this method with other known keratoconus (KC) classification methods. Pentacam data from 860 eyes were included in the study and divided into 5 groups: 454 KC, 67 forme fruste (FF), 28 astigmatic, 117 after refractive surgery (PR), and 194 normal eyes (N). Twenty-two parameters were used for classification using a support vector machine algorithm developed in Weka, a machine-learning computer software. The cross-validation accuracy for 3 different classification tasks (KC vs. N, FF vs. N and all 5 groups) was calculated and compared with other known classification methods. The accuracy achieved in the KC versus N discrimination task was 98.9%, with 99.1% sensitivity and 98.5% specificity for KC detection. The accuracy in the FF versus N task was 93.1%, with 79.1% sensitivity and 97.9% specificity for the FF discrimination. Finally, for the 5-groups classification, the accuracy was 88.8%, with a weighted average sensitivity of 89.0% and specificity of 95.2%. Despite using the strictest definition for FF KC, the present study obtained comparable or better results than the single-parameter methods and indices reported in the literature. In some cases, direct comparisons with the literature were not possible because of differences in the compositions and definitions of the study groups, especially the FF KC.

  16. The Improvement of the Closed Bounded Volume (CBV) Evaluation Methods to Compute a Feasible Rough Machining Area Based on Faceted Models

    NASA Astrophysics Data System (ADS)

    Hadi Sutrisno, Himawan; Kiswanto, Gandjar; Istiyanto, Jos

    2017-06-01

    The rough machining is aimed at shaping a workpiece towards to its final form. This process takes up a big proportion of the machining time due to the removal of the bulk material which may affect the total machining time. In certain models, the rough machining has limitations especially on certain surfaces such as turbine blade and impeller. CBV evaluation is one of the concepts which is used to detect of areas admissible in the process of machining. While in the previous research, CBV area detection used a pair of normal vectors, in this research, the writer simplified the process to detect CBV area with a slicing line for each point cloud formed. The simulation resulted in three steps used for this method and they are: 1. Triangulation from CAD design models, 2. Development of CC point from the point cloud, 3. The slicing line method which is used to evaluate each point cloud position (under CBV and outer CBV). The result of this evaluation method can be used as a tool for orientation set-up on each CC point position of feasible areas in rough machining.

  17. Identification of Alzheimer's disease and mild cognitive impairment using multimodal sparse hierarchical extreme learning machine.

    PubMed

    Kim, Jongin; Lee, Boreom

    2018-05-07

    Different modalities such as structural MRI, FDG-PET, and CSF have complementary information, which is likely to be very useful for diagnosis of AD and MCI. Therefore, it is possible to develop a more effective and accurate AD/MCI automatic diagnosis method by integrating complementary information of different modalities. In this paper, we propose multi-modal sparse hierarchical extreme leaning machine (MSH-ELM). We used volume and mean intensity extracted from 93 regions of interest (ROIs) as features of MRI and FDG-PET, respectively, and used p-tau, t-tau, and Aβ42 as CSF features. In detail, high-level representation was individually extracted from each of MRI, FDG-PET, and CSF using a stacked sparse extreme learning machine auto-encoder (sELM-AE). Then, another stacked sELM-AE was devised to acquire a joint hierarchical feature representation by fusing the high-level representations obtained from each modality. Finally, we classified joint hierarchical feature representation using a kernel-based extreme learning machine (KELM). The results of MSH-ELM were compared with those of conventional ELM, single kernel support vector machine (SK-SVM), multiple kernel support vector machine (MK-SVM) and stacked auto-encoder (SAE). Performance was evaluated through 10-fold cross-validation. In the classification of AD vs. HC and MCI vs. HC problem, the proposed MSH-ELM method showed mean balanced accuracies of 96.10% and 86.46%, respectively, which is much better than those of competing methods. In summary, the proposed algorithm exhibits consistently better performance than SK-SVM, ELM, MK-SVM and SAE in the two binary classification problems (AD vs. HC and MCI vs. HC). © 2018 Wiley Periodicals, Inc.

  18. Arbitrary norm support vector machines.

    PubMed

    Huang, Kaizhu; Zheng, Danian; King, Irwin; Lyu, Michael R

    2009-02-01

    Support vector machines (SVM) are state-of-the-art classifiers. Typically L2-norm or L1-norm is adopted as a regularization term in SVMs, while other norm-based SVMs, for example, the L0-norm SVM or even the L(infinity)-norm SVM, are rarely seen in the literature. The major reason is that L0-norm describes a discontinuous and nonconvex term, leading to a combinatorially NP-hard optimization problem. In this letter, motivated by Bayesian learning, we propose a novel framework that can implement arbitrary norm-based SVMs in polynomial time. One significant feature of this framework is that only a sequence of sequential minimal optimization problems needs to be solved, thus making it practical in many real applications. The proposed framework is important in the sense that Bayesian priors can be efficiently plugged into most learning methods without knowing the explicit form. Hence, this builds a connection between Bayesian learning and the kernel machines. We derive the theoretical framework, demonstrate how our approach works on the L0-norm SVM as a typical example, and perform a series of experiments to validate its advantages. Experimental results on nine benchmark data sets are very encouraging. The implemented L0-norm is competitive with or even better than the standard L2-norm SVM in terms of accuracy but with a reduced number of support vectors, -9.46% of the number on average. When compared with another sparse model, the relevance vector machine, our proposed algorithm also demonstrates better sparse properties with a training speed over seven times faster.

  19. Target specific compound identification using a support vector machine.

    PubMed

    Plewczynski, Dariusz; von Grotthuss, Marcin; Spieser, Stephane A H; Rychlewski, Leszek; Wyrwicz, Lucjan S; Ginalski, Krzysztof; Koch, Uwe

    2007-03-01

    In many cases at the beginning of an HTS-campaign, some information about active molecules is already available. Often known active compounds (such as substrate analogues, natural products, inhibitors of a related protein or ligands published by a pharmaceutical company) are identified in low-throughput validation studies of the biochemical target. In this study we evaluate the effectiveness of a support vector machine applied for those compounds and used to classify a collection with unknown activity. This approach was aimed at reducing the number of compounds to be tested against the given target. Our method predicts the biological activity of chemical compounds based on only the atom pairs (AP) two dimensional topological descriptors. The supervised support vector machine (SVM) method herein is trained on compounds from the MDL drug data report (MDDR) known to be active for specific protein target. For detailed analysis, five different biological targets were selected including cyclooxygenase-2, dihydrofolate reductase, thrombin, HIV-reverse transcriptase and antagonists of the estrogen receptor. The accuracy of compound identification was estimated using the recall and precision values. The sensitivities for all protein targets exceeded 80% and the classification performance reached 100% for selected targets. In another application of the method, we addressed the absence of an initial set of active compounds for a selected protein target at the beginning of an HTS-campaign. In such a case, virtual high-throughput screening (vHTS) is usually applied by using a flexible docking procedure. However, the vHTS experiment typically contains a large percentage of false positives that should be verified by costly and time-consuming experimental follow-up assays. The subsequent use of our machine learning method was found to improve the speed (since the docking procedure was not required for all compounds from the database) and also the accuracy of the HTS hit lists (the enrichment factor).

  20. A Wavelet Support Vector Machine Combination Model for Singapore Tourist Arrival to Malaysia

    NASA Astrophysics Data System (ADS)

    Rafidah, A.; Shabri, Ani; Nurulhuda, A.; Suhaila, Y.

    2017-08-01

    In this study, wavelet support vector machine model (WSVM) is proposed and applied for monthly data Singapore tourist time series prediction. The WSVM model is combination between wavelet analysis and support vector machine (SVM). In this study, we have two parts, first part we compare between the kernel function and second part we compare between the developed models with single model, SVM. The result showed that kernel function linear better than RBF while WSVM outperform with single model SVM to forecast monthly Singapore tourist arrival to Malaysia.

  1. Quantum Support Vector Machine for Big Data Classification

    NASA Astrophysics Data System (ADS)

    Rebentrost, Patrick; Mohseni, Masoud; Lloyd, Seth

    2014-09-01

    Supervised machine learning is the classification of new data based on already classified training examples. In this work, we show that the support vector machine, an optimized binary classifier, can be implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number of training examples. In cases where classical sampling algorithms require polynomial time, an exponential speedup is obtained. At the core of this quantum big data algorithm is a nonsparse matrix exponentiation technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix.

  2. The Application of Auto-Disturbance Rejection Control Optimized by Least Squares Support Vector Machines Method and Time-Frequency Representation in Voltage Source Converter-High Voltage Direct Current System.

    PubMed

    Liu, Ying-Pei; Liang, Hai-Ping; Gao, Zhong-Ke

    2015-01-01

    In order to improve the performance of voltage source converter-high voltage direct current (VSC-HVDC) system, we propose an improved auto-disturbance rejection control (ADRC) method based on least squares support vector machines (LSSVM) in the rectifier side. Firstly, we deduce the high frequency transient mathematical model of VSC-HVDC system. Then we investigate the ADRC and LSSVM principles. We ignore the tracking differentiator in the ADRC controller aiming to improve the system dynamic response speed. On this basis, we derive the mathematical model of ADRC controller optimized by LSSVM for direct current voltage loop. Finally we carry out simulations to verify the feasibility and effectiveness of our proposed control method. In addition, we employ the time-frequency representation methods, i.e., Wigner-Ville distribution (WVD) and adaptive optimal kernel (AOK) time-frequency representation, to demonstrate our proposed method performs better than the traditional method from the perspective of energy distribution in time and frequency plane.

  3. The Application of Auto-Disturbance Rejection Control Optimized by Least Squares Support Vector Machines Method and Time-Frequency Representation in Voltage Source Converter-High Voltage Direct Current System

    PubMed Central

    Gao, Zhong-Ke

    2015-01-01

    In order to improve the performance of voltage source converter-high voltage direct current (VSC-HVDC) system, we propose an improved auto-disturbance rejection control (ADRC) method based on least squares support vector machines (LSSVM) in the rectifier side. Firstly, we deduce the high frequency transient mathematical model of VSC-HVDC system. Then we investigate the ADRC and LSSVM principles. We ignore the tracking differentiator in the ADRC controller aiming to improve the system dynamic response speed. On this basis, we derive the mathematical model of ADRC controller optimized by LSSVM for direct current voltage loop. Finally we carry out simulations to verify the feasibility and effectiveness of our proposed control method. In addition, we employ the time-frequency representation methods, i.e., Wigner-Ville distribution (WVD) and adaptive optimal kernel (AOK) time-frequency representation, to demonstrate our proposed method performs better than the traditional method from the perspective of energy distribution in time and frequency plane. PMID:26098556

  4. Improving Non-Destructive Concrete Strength Tests Using Support Vector Machines

    PubMed Central

    Shih, Yi-Fan; Wang, Yu-Ren; Lin, Kuo-Liang; Chen, Chin-Wen

    2015-01-01

    Non-destructive testing (NDT) methods are important alternatives when destructive tests are not feasible to examine the in situ concrete properties without damaging the structure. The rebound hammer test and the ultrasonic pulse velocity test are two popular NDT methods to examine the properties of concrete. The rebound of the hammer depends on the hardness of the test specimen and ultrasonic pulse travelling speed is related to density, uniformity, and homogeneity of the specimen. Both of these two methods have been adopted to estimate the concrete compressive strength. Statistical analysis has been implemented to establish the relationship between hammer rebound values/ultrasonic pulse velocities and concrete compressive strength. However, the estimated results can be unreliable. As a result, this research proposes an Artificial Intelligence model using support vector machines (SVMs) for the estimation. Data from 95 cylinder concrete samples are collected to develop and validate the model. The results show that combined NDT methods (also known as SonReb method) yield better estimations than single NDT methods. The results also show that the SVMs model is more accurate than the statistical regression model. PMID:28793627

  5. Optimization of large matrix calculations for execution on the Cray X-MP vector supercomputer

    NASA Technical Reports Server (NTRS)

    Hornfeck, William A.

    1988-01-01

    A considerable volume of large computational computer codes were developed for NASA over the past twenty-five years. This code represents algorithms developed for machines of earlier generation. With the emergence of the vector supercomputer as a viable, commercially available machine, an opportunity exists to evaluate optimization strategies to improve the efficiency of existing software. This result is primarily due to architectural differences in the latest generation of large-scale machines and the earlier, mostly uniprocessor, machines. A sofware package being used by NASA to perform computations on large matrices is described, and a strategy for conversion to the Cray X-MP vector supercomputer is also described.

  6. Comparative Analysis of Automatic Exudate Detection between Machine Learning and Traditional Approaches

    NASA Astrophysics Data System (ADS)

    Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Thomas

    To prevent blindness from diabetic retinopathy, periodic screening and early diagnosis are neccessary. Due to lack of expert ophthalmologists in rural area, automated early exudate (one of visible sign of diabetic retinopathy) detection could help to reduce the number of blindness in diabetic patients. Traditional automatic exudate detection methods are based on specific parameter configuration, while the machine learning approaches which seems more flexible may be computationally high cost. A comparative analysis of traditional and machine learning of exudates detection, namely, mathematical morphology, fuzzy c-means clustering, naive Bayesian classifier, Support Vector Machine and Nearest Neighbor classifier are presented. Detected exudates are validated with expert ophthalmologists' hand-drawn ground-truths. The sensitivity, specificity, precision, accuracy and time complexity of each method are also compared.

  7. Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine.

    PubMed

    Meng, Jun; Liu, Dong; Sun, Chao; Luan, Yushi

    2014-12-30

    MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies have shown that miRNAs are involved in biological responses to a variety of biotic and abiotic stresses. Identification of these molecules and their targets can aid the understanding of regulatory processes. Recently, prediction methods based on machine learning have been widely used for miRNA prediction. However, most of these methods were designed for mammalian miRNA prediction, and few are available for predicting miRNAs in the pre-miRNAs of specific plant species. Although the complete Solanum lycopersicum genome has been published, only 77 Solanum lycopersicum miRNAs have been identified, far less than the estimated number. Therefore, it is essential to develop a prediction method based on machine learning to identify new plant miRNAs. A novel classification model based on a support vector machine (SVM) was trained to identify real and pseudo plant pre-miRNAs together with their miRNAs. An initial set of 152 novel features related to sequential structures was used to train the model. By applying feature selection, we obtained the best subset of 47 features for use with the Back Support Vector Machine-Recursive Feature Elimination (B-SVM-RFE) method for the classification of plant pre-miRNAs. Using this method, 63 features were obtained for plant miRNA classification. We then developed an integrated classification model, miPlantPreMat, which comprises MiPlantPre and MiPlantMat, to identify plant pre-miRNAs and their miRNAs. This model achieved approximately 90% accuracy using plant datasets from nine plant species, including Arabidopsis thaliana, Glycine max, Oryza sativa, Physcomitrella patens, Medicago truncatula, Sorghum bicolor, Arabidopsis lyrata, Zea mays and Solanum lycopersicum. Using miPlantPreMat, 522 Solanum lycopersicum miRNAs were identified in the Solanum lycopersicum genome sequence. We developed an integrated classification model, miPlantPreMat, based on structure-sequence features and SVM. MiPlantPreMat was used to identify both plant pre-miRNAs and the corresponding mature miRNAs. An improved feature selection method was proposed, resulting in high classification accuracy, sensitivity and specificity.

  8. Monitoring Hitting Load in Tennis Using Inertial Sensors and Machine Learning.

    PubMed

    Whiteside, David; Cant, Olivia; Connolly, Molly; Reid, Machar

    2017-10-01

    Quantifying external workload is fundamental to training prescription in sport. In tennis, global positioning data are imprecise and fail to capture hitting loads. The current gold standard (manual notation) is time intensive and often not possible given players' heavy travel schedules. To develop an automated stroke-classification system to help quantify hitting load in tennis. Nineteen athletes wore an inertial measurement unit (IMU) on their wrist during 66 video-recorded training sessions. Video footage was manually notated such that known shot type (serve, rally forehand, slice forehand, forehand volley, rally backhand, slice backhand, backhand volley, smash, or false positive) was associated with the corresponding IMU data for 28,582 shots. Six types of machine-learning models were then constructed to classify true shot type from the IMU signals. Across 10-fold cross-validation, a cubic-kernel support vector machine classified binned shots (overhead, forehand, or backhand) with an accuracy of 97.4%. A second cubic-kernel support vector machine achieved 93.2% accuracy when classifying all 9 shot types. With a view to monitoring external load, the combination of miniature inertial sensors and machine learning offers a practical and automated method of quantifying shot counts and discriminating shot types in elite tennis players.

  9. Predicting primary progressive aphasias with support vector machine approaches in structural MRI data.

    PubMed

    Bisenius, Sandrine; Mueller, Karsten; Diehl-Schmid, Janine; Fassbender, Klaus; Grimmer, Timo; Jessen, Frank; Kassubek, Jan; Kornhuber, Johannes; Landwehrmeyer, Bernhard; Ludolph, Albert; Schneider, Anja; Anderl-Straub, Sarah; Stuke, Katharina; Danek, Adrian; Otto, Markus; Schroeter, Matthias L

    2017-01-01

    Primary progressive aphasia (PPA) encompasses the three subtypes nonfluent/agrammatic variant PPA, semantic variant PPA, and the logopenic variant PPA, which are characterized by distinct patterns of language difficulties and regional brain atrophy. To validate the potential of structural magnetic resonance imaging data for early individual diagnosis, we used support vector machine classification on grey matter density maps obtained by voxel-based morphometry analysis to discriminate PPA subtypes (44 patients: 16 nonfluent/agrammatic variant PPA, 17 semantic variant PPA, 11 logopenic variant PPA) from 20 healthy controls (matched for sample size, age, and gender) in the cohort of the multi-center study of the German consortium for frontotemporal lobar degeneration. Here, we compared a whole-brain with a meta-analysis-based disease-specific regions-of-interest approach for support vector machine classification. We also used support vector machine classification to discriminate the three PPA subtypes from each other. Whole brain support vector machine classification enabled a very high accuracy between 91 and 97% for identifying specific PPA subtypes vs. healthy controls, and 78/95% for the discrimination between semantic variant vs. nonfluent/agrammatic or logopenic PPA variants. Only for the discrimination between nonfluent/agrammatic and logopenic PPA variants accuracy was low with 55%. Interestingly, the regions that contributed the most to the support vector machine classification of patients corresponded largely to the regions that were atrophic in these patients as revealed by group comparisons. Although the whole brain approach took also into account regions that were not covered in the regions-of-interest approach, both approaches showed similar accuracies due to the disease-specificity of the selected networks. Conclusion, support vector machine classification of multi-center structural magnetic resonance imaging data enables prediction of PPA subtypes with a very high accuracy paving the road for its application in clinical settings.

  10. Ship localization in Santa Barbara Channel using machine learning classifiers.

    PubMed

    Niu, Haiqiang; Ozanich, Emma; Gerstoft, Peter

    2017-11-01

    Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.

  11. Failure prediction using machine learning and time series in optical network.

    PubMed

    Wang, Zhilong; Zhang, Min; Wang, Danshi; Song, Chuang; Liu, Min; Li, Jin; Lou, Liqi; Liu, Zhuo

    2017-08-07

    In this paper, we propose a performance monitoring and failure prediction method in optical networks based on machine learning. The primary algorithms of this method are the support vector machine (SVM) and double exponential smoothing (DES). With a focus on risk-aware models in optical networks, the proposed protection plan primarily investigates how to predict the risk of an equipment failure. To the best of our knowledge, this important problem has not yet been fully considered. Experimental results showed that the average prediction accuracy of our method was 95% when predicting the optical equipment failure state. This finding means that our method can forecast an equipment failure risk with high accuracy. Therefore, our proposed DES-SVM method can effectively improve traditional risk-aware models to protect services from possible failures and enhance the optical network stability.

  12. Support vector machines

    NASA Technical Reports Server (NTRS)

    Garay, Michael J.; Mazzoni, Dominic; Davies, Roger; Wagstaff, Kiri

    2004-01-01

    Support Vector Machines (SVMs) are a type of supervised learning algorith,, other examples of which are Artificial Neural Networks (ANNs), Decision Trees, and Naive Bayesian Classifiers. Supervised learning algorithms are used to classify objects labled by a 'supervisor' - typically a human 'expert.'.

  13. Predicting the dissolution kinetics of silicate glasses using machine learning

    NASA Astrophysics Data System (ADS)

    Anoop Krishnan, N. M.; Mangalathu, Sujith; Smedskjaer, Morten M.; Tandia, Adama; Burton, Henry; Bauchy, Mathieu

    2018-05-01

    Predicting the dissolution rates of silicate glasses in aqueous conditions is a complex task as the underlying mechanism(s) remain poorly understood and the dissolution kinetics can depend on a large number of intrinsic and extrinsic factors. Here, we assess the potential of data-driven models based on machine learning to predict the dissolution rates of various aluminosilicate glasses exposed to a wide range of solution pH values, from acidic to caustic conditions. Four classes of machine learning methods are investigated, namely, linear regression, support vector machine regression, random forest, and artificial neural network. We observe that, although linear methods all fail to describe the dissolution kinetics, the artificial neural network approach offers excellent predictions, thanks to its inherent ability to handle non-linear data. Overall, we suggest that a more extensive use of machine learning approaches could significantly accelerate the design of novel glasses with tailored properties.

  14. Control-group feature normalization for multivariate pattern analysis of structural MRI data using the support vector machine.

    PubMed

    Linn, Kristin A; Gaonkar, Bilwaj; Satterthwaite, Theodore D; Doshi, Jimit; Davatzikos, Christos; Shinohara, Russell T

    2016-05-15

    Normalization of feature vector values is a common practice in machine learning. Generally, each feature value is standardized to the unit hypercube or by normalizing to zero mean and unit variance. Classification decisions based on support vector machines (SVMs) or by other methods are sensitive to the specific normalization used on the features. In the context of multivariate pattern analysis using neuroimaging data, standardization effectively up- and down-weights features based on their individual variability. Since the standard approach uses the entire data set to guide the normalization, it utilizes the total variability of these features. This total variation is inevitably dependent on the amount of marginal separation between groups. Thus, such a normalization may attenuate the separability of the data in high dimensional space. In this work we propose an alternate approach that uses an estimate of the control-group standard deviation to normalize features before training. We study our proposed approach in the context of group classification using structural MRI data. We show that control-based normalization leads to better reproducibility of estimated multivariate disease patterns and improves the classifier performance in many cases. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.

    PubMed

    Pirooznia, Mehdi; Deng, Youping

    2006-12-12

    Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.

  16. SFM: A novel sequence-based fusion method for disease genes identification and prioritization.

    PubMed

    Yousef, Abdulaziz; Moghadam Charkari, Nasrollah

    2015-10-21

    The identification of disease genes from human genome is of great importance to improve diagnosis and treatment of disease. Several machine learning methods have been introduced to identify disease genes. However, these methods mostly differ in the prior knowledge used to construct the feature vector for each instance (gene), the ways of selecting negative data (non-disease genes) where there is no investigational approach to find them and the classification methods used to make the final decision. In this work, a novel Sequence-based fusion method (SFM) is proposed to identify disease genes. In this regard, unlike existing methods, instead of using a noisy and incomplete prior-knowledge, the amino acid sequence of the proteins which is universal data has been carried out to present the genes (proteins) into four different feature vectors. To select more likely negative data from candidate genes, the intersection set of four negative sets which are generated using distance approach is considered. Then, Decision Tree (C4.5) has been applied as a fusion method to combine the results of four independent state-of the-art predictors based on support vector machine (SVM) algorithm, and to make the final decision. The experimental results of the proposed method have been evaluated by some standard measures. The results indicate the precision, recall and F-measure of 82.6%, 85.6% and 84, respectively. These results confirm the efficiency and validity of the proposed method. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

    PubMed Central

    Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953

  18. A comparison of supervised machine learning algorithms and feature vectors for MS lesion segmentation using multimodal structural MRI.

    PubMed

    Sweeney, Elizabeth M; Vogelstein, Joshua T; Cuzzocreo, Jennifer L; Calabresi, Peter A; Reich, Daniel S; Crainiceanu, Ciprian M; Shinohara, Russell T

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance.

  19. Big Data and machine learning in radiation oncology: State of the art and future prospects.

    PubMed

    Bibault, Jean-Emmanuel; Giraud, Philippe; Burgun, Anita

    2016-11-01

    Precision medicine relies on an increasing amount of heterogeneous data. Advances in radiation oncology, through the use of CT Scan, dosimetry and imaging performed before each fraction, have generated a considerable flow of data that needs to be integrated. In the same time, Electronic Health Records now provide phenotypic profiles of large cohorts of patients that could be correlated to this information. In this review, we describe methods that could be used to create integrative predictive models in radiation oncology. Potential uses of machine learning methods such as support vector machine, artificial neural networks, and deep learning are also discussed. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  20. Process service quality evaluation based on Dempster-Shafer theory and support vector machine.

    PubMed

    Pei, Feng-Que; Li, Dong-Bo; Tong, Yi-Fei; He, Fei

    2017-01-01

    Human involvement influences traditional service quality evaluations, which triggers an evaluation's low accuracy, poor reliability and less impressive predictability. This paper proposes a method by employing a support vector machine (SVM) and Dempster-Shafer evidence theory to evaluate the service quality of a production process by handling a high number of input features with a low sampling data set, which is called SVMs-DS. Features that can affect production quality are extracted by a large number of sensors. Preprocessing steps such as feature simplification and normalization are reduced. Based on three individual SVM models, the basic probability assignments (BPAs) are constructed, which can help the evaluation in a qualitative and quantitative way. The process service quality evaluation results are validated by the Dempster rules; the decision threshold to resolve conflicting results is generated from three SVM models. A case study is presented to demonstrate the effectiveness of the SVMs-DS method.

  1. Real Time Monitoring System of Pollution Waste on Musi River Using Support Vector Machine (SVM) Method

    NASA Astrophysics Data System (ADS)

    Fachrurrozi, Muhammad; Saparudin; Erwin

    2017-04-01

    Real-time Monitoring and early detection system which measures the quality standard of waste in Musi River, Palembang, Indonesia is a system for determining air and water pollution level. This system was designed in order to create an integrated monitoring system and provide real time information that can be read. It is designed to measure acidity and water turbidity polluted by industrial waste, as well as to show and provide conditional data integrated in one system. This system consists of inputting and processing the data, and giving output based on processed data. Turbidity, substances, and pH sensor is used as a detector that produce analog electrical direct current voltage (DC). Early detection system works by determining the value of the ammonia threshold, acidity, and turbidity level of water in Musi River. The results is then presented based on the level group pollution by the Support Vector Machine classification method.

  2. Privacy preserving RBF kernel support vector machine.

    PubMed

    Li, Haoran; Xiong, Li; Ohno-Machado, Lucila; Jiang, Xiaoqian

    2014-01-01

    Data sharing is challenging but important for healthcare research. Methods for privacy-preserving data dissemination based on the rigorous differential privacy standard have been developed but they did not consider the characteristics of biomedical data and make full use of the available information. This often results in too much noise in the final outputs. We hypothesized that this situation can be alleviated by leveraging a small portion of open-consented data to improve utility without sacrificing privacy. We developed a hybrid privacy-preserving differentially private support vector machine (SVM) model that uses public data and private data together. Our model leverages the RBF kernel and can handle nonlinearly separable cases. Experiments showed that this approach outperforms two baselines: (1) SVMs that only use public data, and (2) differentially private SVMs that are built from private data. Our method demonstrated very close performance metrics compared to nonprivate SVMs trained on the private data.

  3. A support vector machine based control application to the experimental three-tank system.

    PubMed

    Iplikci, Serdar

    2010-07-01

    This paper presents a support vector machine (SVM) approach to generalized predictive control (GPC) of multiple-input multiple-output (MIMO) nonlinear systems. The possession of higher generalization potential and at the same time avoidance of getting stuck into the local minima have motivated us to employ SVM algorithms for modeling MIMO systems. Based on the SVM model, detailed and compact formulations for calculating predictions and gradient information, which are used in the computation of the optimal control action, are given in the paper. The proposed MIMO SVM-based GPC method has been verified on an experimental three-tank liquid level control system. Experimental results have shown that the proposed method can handle the control task successfully for different reference trajectories. Moreover, a detailed discussion on data gathering, model selection and effects of the control parameters have been given in this paper. 2010 ISA. Published by Elsevier Ltd. All rights reserved.

  4. Privacy Preserving RBF Kernel Support Vector Machine

    PubMed Central

    Xiong, Li; Ohno-Machado, Lucila

    2014-01-01

    Data sharing is challenging but important for healthcare research. Methods for privacy-preserving data dissemination based on the rigorous differential privacy standard have been developed but they did not consider the characteristics of biomedical data and make full use of the available information. This often results in too much noise in the final outputs. We hypothesized that this situation can be alleviated by leveraging a small portion of open-consented data to improve utility without sacrificing privacy. We developed a hybrid privacy-preserving differentially private support vector machine (SVM) model that uses public data and private data together. Our model leverages the RBF kernel and can handle nonlinearly separable cases. Experiments showed that this approach outperforms two baselines: (1) SVMs that only use public data, and (2) differentially private SVMs that are built from private data. Our method demonstrated very close performance metrics compared to nonprivate SVMs trained on the private data. PMID:25013805

  5. Prediction of Spirometric Forced Expiratory Volume (FEV1) Data Using Support Vector Regression

    NASA Astrophysics Data System (ADS)

    Kavitha, A.; Sujatha, C. M.; Ramakrishnan, S.

    2010-01-01

    In this work, prediction of forced expiratory volume in 1 second (FEV1) in pulmonary function test is carried out using the spirometer and support vector regression analysis. Pulmonary function data are measured with flow volume spirometer from volunteers (N=175) using a standard data acquisition protocol. The acquired data are then used to predict FEV1. Support vector machines with polynomial kernel function with four different orders were employed to predict the values of FEV1. The performance is evaluated by computing the average prediction accuracy for normal and abnormal cases. Results show that support vector machines are capable of predicting FEV1 in both normal and abnormal cases and the average prediction accuracy for normal subjects was higher than that of abnormal subjects. Accuracy in prediction was found to be high for a regularization constant of C=10. Since FEV1 is the most significant parameter in the analysis of spirometric data, it appears that this method of assessment is useful in diagnosing the pulmonary abnormalities with incomplete data and data with poor recording.

  6. A multi-label learning based kernel automatic recommendation method for support vector machine.

    PubMed

    Zhang, Xueying; Song, Qinbao

    2015-01-01

    Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance.

  7. A Multi-Label Learning Based Kernel Automatic Recommendation Method for Support Vector Machine

    PubMed Central

    Zhang, Xueying; Song, Qinbao

    2015-01-01

    Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance. PMID:25893896

  8. Epidermis area detection for immunofluorescence microscopy

    NASA Astrophysics Data System (ADS)

    Dovganich, Andrey; Krylov, Andrey; Nasonov, Andrey; Makhneva, Natalia

    2018-04-01

    We propose a novel image segmentation method for immunofluorescence microscopy images of skin tissue for the diagnosis of various skin diseases. The segmentation is based on machine learning algorithms. The feature vector is filled by three groups of features: statistical features, Laws' texture energy measures and local binary patterns. The images are preprocessed for better learning. Different machine learning algorithms have been used and the best results have been obtained with random forest algorithm. We use the proposed method to detect the epidermis region as a part of pemphigus diagnosis system.

  9. Analysis of miRNA expression profile based on SVM algorithm

    NASA Astrophysics Data System (ADS)

    Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian

    2018-05-01

    Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.

  10. Bayesian anomaly detection in monitoring data applying relevance vector machine

    NASA Astrophysics Data System (ADS)

    Saito, Tomoo

    2011-04-01

    A method for automatically classifying the monitoring data into two categories, normal and anomaly, is developed in order to remove anomalous data included in the enormous amount of monitoring data, applying the relevance vector machine (RVM) to a probabilistic discriminative model with basis functions and their weight parameters whose posterior PDF (probabilistic density function) conditional on the learning data set is given by Bayes' theorem. The proposed framework is applied to actual monitoring data sets containing some anomalous data collected at two buildings in Tokyo, Japan, which shows that the trained models discriminate anomalous data from normal data very clearly, giving high probabilities of being normal to normal data and low probabilities of being normal to anomalous data.

  11. Ranking Support Vector Machine with Kernel Approximation

    PubMed Central

    Dou, Yong

    2017-01-01

    Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms. PMID:28293256

  12. Ranking Support Vector Machine with Kernel Approximation.

    PubMed

    Chen, Kai; Li, Rongchun; Dou, Yong; Liang, Zhengfa; Lv, Qi

    2017-01-01

    Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms.

  13. Distance Metric Learning via Iterated Support Vector Machines.

    PubMed

    Zuo, Wangmeng; Wang, Faqiang; Zhang, David; Lin, Liang; Huang, Yuchi; Meng, Deyu; Zhang, Lei

    2017-07-11

    Distance metric learning aims to learn from the given training data a valid distance metric, with which the similarity between data samples can be more effectively evaluated for classification. Metric learning is often formulated as a convex or nonconvex optimization problem, while most existing methods are based on customized optimizers and become inefficient for large scale problems. In this paper, we formulate metric learning as a kernel classification problem with the positive semi-definite constraint, and solve it by iterated training of support vector machines (SVMs). The new formulation is easy to implement and efficient in training with the off-the-shelf SVM solvers. Two novel metric learning models, namely Positive-semidefinite Constrained Metric Learning (PCML) and Nonnegative-coefficient Constrained Metric Learning (NCML), are developed. Both PCML and NCML can guarantee the global optimality of their solutions. Experiments are conducted on general classification, face verification and person re-identification to evaluate our methods. Compared with the state-of-the-art approaches, our methods can achieve comparable classification accuracy and are efficient in training.

  14. Stable Isotope Ratio and Elemental Profile Combined with Support Vector Machine for Provenance Discrimination of Oolong Tea (Wuyi-Rock Tea)

    PubMed Central

    Lou, Yun-xiao; Fu, Xian-shu; Yu, Xiao-ping; Zhang, Ya-fen

    2017-01-01

    This paper focused on an effective method to discriminate the geographical origin of Wuyi-Rock tea by the stable isotope ratio (SIR) and metallic element profiling (MEP) combined with support vector machine (SVM) analysis. Wuyi-Rock tea (n = 99) collected from nine producing areas and non-Wuyi-Rock tea (n = 33) from eleven nonproducing areas were analysed for SIR and MEP by established methods. The SVM model based on coupled data produced the best prediction accuracy (0.9773). This prediction shows that instrumental methods combined with a classification model can provide an effective and stable tool for provenance discrimination. Moreover, every feature variable in stable isotope and metallic element data was ranked by its contribution to the model. The results show that δ2H, δ18O, Cs, Cu, Ca, and Rb contents are significant indications for provenance discrimination and not all of the metallic elements improve the prediction accuracy of the SVM model. PMID:28473941

  15. Design of Clinical Support Systems Using Integrated Genetic Algorithm and Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Chen, Yung-Fu; Huang, Yung-Fa; Jiang, Xiaoyi; Hsu, Yuan-Nian; Lin, Hsuan-Hung

    Clinical decision support system (CDSS) provides knowledge and specific information for clinicians to enhance diagnostic efficiency and improving healthcare quality. An appropriate CDSS can highly elevate patient safety, improve healthcare quality, and increase cost-effectiveness. Support vector machine (SVM) is believed to be superior to traditional statistical and neural network classifiers. However, it is critical to determine suitable combination of SVM parameters regarding classification performance. Genetic algorithm (GA) can find optimal solution within an acceptable time, and is faster than greedy algorithm with exhaustive searching strategy. By taking the advantage of GA in quickly selecting the salient features and adjusting SVM parameters, a method using integrated GA and SVM (IGS), which is different from the traditional method with GA used for feature selection and SVM for classification, was used to design CDSSs for prediction of successful ventilation weaning, diagnosis of patients with severe obstructive sleep apnea, and discrimination of different cell types form Pap smear. The results show that IGS is better than methods using SVM alone or linear discriminator.

  16. Rapid prediction of chemical metabolism by human UDP-glucuronosyltransferase isoforms using quantum chemical descriptors derived with the electronegativity equalization method.

    PubMed

    Sorich, Michael J; McKinnon, Ross A; Miners, John O; Winkler, David A; Smith, Paul A

    2004-10-07

    This study aimed to evaluate in silico models based on quantum chemical (QC) descriptors derived using the electronegativity equalization method (EEM) and to assess the use of QC properties to predict chemical metabolism by human UDP-glucuronosyltransferase (UGT) isoforms. Various EEM-derived QC molecular descriptors were calculated for known UGT substrates and nonsubstrates. Classification models were developed using support vector machine and partial least squares discriminant analysis. In general, the most predictive models were generated with the support vector machine. Combining QC and 2D descriptors (from previous work) using a consensus approach resulted in a statistically significant improvement in predictivity (to 84%) over both the QC and 2D models and the other methods of combining the descriptors. EEM-derived QC descriptors were shown to be both highly predictive and computationally efficient. It is likely that EEM-derived QC properties will be generally useful for predicting ADMET and physicochemical properties during drug discovery.

  17. CompareSVM: supervised, Support Vector Machine (SVM) inference of gene regularity networks.

    PubMed

    Gillani, Zeeshan; Akash, Muhammad Sajid Hamid; Rahaman, M D Matiur; Chen, Ming

    2014-11-30

    Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size. We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network. For network with nodes (<200) and average (over all sizes of networks), SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/ .

  18. Wavelet Entropy and Directed Acyclic Graph Support Vector Machine for Detection of Patients with Unilateral Hearing Loss in MRI Scanning

    PubMed Central

    Wang, Shuihua; Yang, Ming; Du, Sidan; Yang, Jiquan; Liu, Bin; Gorriz, Juan M.; Ramírez, Javier; Yuan, Ti-Fei; Zhang, Yudong

    2016-01-01

    Highlights We develop computer-aided diagnosis system for unilateral hearing loss detection in structural magnetic resonance imaging.Wavelet entropy is introduced to extract image global features from brain images. Directed acyclic graph is employed to endow support vector machine an ability to handle multi-class problems.The developed computer-aided diagnosis system achieves an overall accuracy of 95.1% for this three-class problem of differentiating left-sided and right-sided hearing loss from healthy controls. Aim: Sensorineural hearing loss (SNHL) is correlated to many neurodegenerative disease. Now more and more computer vision based methods are using to detect it in an automatic way. Materials: We have in total 49 subjects, scanned by 3.0T MRI (Siemens Medical Solutions, Erlangen, Germany). The subjects contain 14 patients with right-sided hearing loss (RHL), 15 patients with left-sided hearing loss (LHL), and 20 healthy controls (HC). Method: We treat this as a three-class classification problem: RHL, LHL, and HC. Wavelet entropy (WE) was selected from the magnetic resonance images of each subjects, and then submitted to a directed acyclic graph support vector machine (DAG-SVM). Results: The 10 repetition results of 10-fold cross validation shows 3-level decomposition will yield an overall accuracy of 95.10% for this three-class classification problem, higher than feedforward neural network, decision tree, and naive Bayesian classifier. Conclusions: This computer-aided diagnosis system is promising. We hope this study can attract more computer vision method for detecting hearing loss. PMID:27807415

  19. [Identification of varieties of cashmere by Vis/NIR spectroscopy technology based on PCA-SVM].

    PubMed

    Wu, Gui-Fang; He, Yong

    2009-06-01

    One mixed algorithm was presented to discriminate cashmere varieties with principal component analysis (PCA) and support vector machine (SVM). Cashmere fiber has such characteristics as threadlike, softness, glossiness and high tensile strength. The quality characters and economic value of each breed of cashmere are very different. In order to safeguard the consumer's rights and guarantee the quality of cashmere product, quickly, efficiently and correctly identifying cashmere has significant meaning to the production and transaction of cashmere material. The present research adopts Vis/NIRS spectroscopy diffuse techniques to collect the spectral data of cashmere. The near infrared fingerprint of cashmere was acquired by principal component analysis (PCA), and support vector machine (SVM) methods were used to further identify the cashmere material. The result of PCA indicated that the score map made by the scores of PC1, PC2 and PC3 was used, and 10 principal components (PCs) were selected as the input of support vector machine (SVM) based on the reliabilities of PCs of 99.99%. One hundred cashmere samples were used for calibration and the remaining 75 cashmere samples were used for validation. A one-against-all multi-class SVM model was built, the capabilities of SVM with different kernel function were comparatively analyzed, and the result showed that SVM possessing with the Gaussian kernel function has the best identification capabilities with the accuracy of 100%. This research indicated that the data mining method of PCA-SVM has a good identification effect, and can work as a new method for rapid identification of cashmere material varieties.

  20. Wavelet Entropy and Directed Acyclic Graph Support Vector Machine for Detection of Patients with Unilateral Hearing Loss in MRI Scanning.

    PubMed

    Wang, Shuihua; Yang, Ming; Du, Sidan; Yang, Jiquan; Liu, Bin; Gorriz, Juan M; Ramírez, Javier; Yuan, Ti-Fei; Zhang, Yudong

    2016-01-01

    Highlights We develop computer-aided diagnosis system for unilateral hearing loss detection in structural magnetic resonance imaging.Wavelet entropy is introduced to extract image global features from brain images. Directed acyclic graph is employed to endow support vector machine an ability to handle multi-class problems.The developed computer-aided diagnosis system achieves an overall accuracy of 95.1% for this three-class problem of differentiating left-sided and right-sided hearing loss from healthy controls. Aim: Sensorineural hearing loss (SNHL) is correlated to many neurodegenerative disease. Now more and more computer vision based methods are using to detect it in an automatic way. Materials: We have in total 49 subjects, scanned by 3.0T MRI (Siemens Medical Solutions, Erlangen, Germany). The subjects contain 14 patients with right-sided hearing loss (RHL), 15 patients with left-sided hearing loss (LHL), and 20 healthy controls (HC). Method: We treat this as a three-class classification problem: RHL, LHL, and HC. Wavelet entropy (WE) was selected from the magnetic resonance images of each subjects, and then submitted to a directed acyclic graph support vector machine (DAG-SVM). Results: The 10 repetition results of 10-fold cross validation shows 3-level decomposition will yield an overall accuracy of 95.10% for this three-class classification problem, higher than feedforward neural network, decision tree, and naive Bayesian classifier. Conclusions: This computer-aided diagnosis system is promising. We hope this study can attract more computer vision method for detecting hearing loss.

  1. Online Sequential Projection Vector Machine with Adaptive Data Mean Update

    PubMed Central

    Chen, Lin; Jia, Ji-Ting; Zhang, Qiong; Deng, Wan-Yu; Wei, Wei

    2016-01-01

    We propose a simple online learning algorithm especial for high-dimensional data. The algorithm is referred to as online sequential projection vector machine (OSPVM) which derives from projection vector machine and can learn from data in one-by-one or chunk-by-chunk mode. In OSPVM, data centering, dimension reduction, and neural network training are integrated seamlessly. In particular, the model parameters including (1) the projection vectors for dimension reduction, (2) the input weights, biases, and output weights, and (3) the number of hidden nodes can be updated simultaneously. Moreover, only one parameter, the number of hidden nodes, needs to be determined manually, and this makes it easy for use in real applications. Performance comparison was made on various high-dimensional classification problems for OSPVM against other fast online algorithms including budgeted stochastic gradient descent (BSGD) approach, adaptive multihyperplane machine (AMM), primal estimated subgradient solver (Pegasos), online sequential extreme learning machine (OSELM), and SVD + OSELM (feature selection based on SVD is performed before OSELM). The results obtained demonstrated the superior generalization performance and efficiency of the OSPVM. PMID:27143958

  2. Online Sequential Projection Vector Machine with Adaptive Data Mean Update.

    PubMed

    Chen, Lin; Jia, Ji-Ting; Zhang, Qiong; Deng, Wan-Yu; Wei, Wei

    2016-01-01

    We propose a simple online learning algorithm especial for high-dimensional data. The algorithm is referred to as online sequential projection vector machine (OSPVM) which derives from projection vector machine and can learn from data in one-by-one or chunk-by-chunk mode. In OSPVM, data centering, dimension reduction, and neural network training are integrated seamlessly. In particular, the model parameters including (1) the projection vectors for dimension reduction, (2) the input weights, biases, and output weights, and (3) the number of hidden nodes can be updated simultaneously. Moreover, only one parameter, the number of hidden nodes, needs to be determined manually, and this makes it easy for use in real applications. Performance comparison was made on various high-dimensional classification problems for OSPVM against other fast online algorithms including budgeted stochastic gradient descent (BSGD) approach, adaptive multihyperplane machine (AMM), primal estimated subgradient solver (Pegasos), online sequential extreme learning machine (OSELM), and SVD + OSELM (feature selection based on SVD is performed before OSELM). The results obtained demonstrated the superior generalization performance and efficiency of the OSPVM.

  3. Comparison of SVM RBF-NN and DT for crop and weed identification based on spectral measurement over corn fields

    USDA-ARS?s Scientific Manuscript database

    It is important to find an appropriate pattern-recognition method for in-field plant identification based on spectral measurement in order to classify the crop and weeds accurately. In this study, the method of Support Vector Machine (SVM) was evaluated and compared with two other methods, Decision ...

  4. Statistical analysis and machine learning algorithms for optical biopsy

    NASA Astrophysics Data System (ADS)

    Wu, Binlin; Liu, Cheng-hui; Boydston-White, Susie; Beckman, Hugh; Sriramoju, Vidyasagar; Sordillo, Laura; Zhang, Chunyuan; Zhang, Lin; Shi, Lingyan; Smith, Jason; Bailin, Jacob; Alfano, Robert R.

    2018-02-01

    Analyzing spectral or imaging data collected with various optical biopsy methods is often times difficult due to the complexity of the biological basis. Robust methods that can utilize the spectral or imaging data and detect the characteristic spectral or spatial signatures for different types of tissue is challenging but highly desired. In this study, we used various machine learning algorithms to analyze a spectral dataset acquired from human skin normal and cancerous tissue samples using resonance Raman spectroscopy with 532nm excitation. The algorithms including principal component analysis, nonnegative matrix factorization, and autoencoder artificial neural network are used to reduce dimension of the dataset and detect features. A support vector machine with a linear kernel is used to classify the normal tissue and cancerous tissue samples. The efficacies of the methods are compared.

  5. Evaluation of machine learning algorithms for improved risk assessment for Down's syndrome.

    PubMed

    Koivu, Aki; Korpimäki, Teemu; Kivelä, Petri; Pahikkala, Tapio; Sairanen, Mikko

    2018-05-04

    Prenatal screening generates a great amount of data that is used for predicting risk of various disorders. Prenatal risk assessment is based on multiple clinical variables and overall performance is defined by how well the risk algorithm is optimized for the population in question. This article evaluates machine learning algorithms to improve performance of first trimester screening of Down syndrome. Machine learning algorithms pose an adaptive alternative to develop better risk assessment models using the existing clinical variables. Two real-world data sets were used to experiment with multiple classification algorithms. Implemented models were tested with a third, real-world, data set and performance was compared to a predicate method, a commercial risk assessment software. Best performing deep neural network model gave an area under the curve of 0.96 and detection rate of 78% with 1% false positive rate with the test data. Support vector machine model gave area under the curve of 0.95 and detection rate of 61% with 1% false positive rate with the same test data. When compared with the predicate method, the best support vector machine model was slightly inferior, but an optimized deep neural network model was able to give higher detection rates with same false positive rate or similar detection rate but with markedly lower false positive rate. This finding could further improve the first trimester screening for Down syndrome, by using existing clinical variables and a large training data derived from a specific population. Copyright © 2018 Elsevier Ltd. All rights reserved.

  6. A Novel Extreme Learning Machine Classification Model for e-Nose Application Based on the Multiple Kernel Approach.

    PubMed

    Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong

    2017-06-19

    A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.

  7. Determination of efficiencies, loss mechanisms, and performance degradation factors in chopper controlled dc vehical motors. Section 2: The time dependent finite element modeling of the electromagnetic field in electrical machines: Methods and applications. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Hamilton, H. B.; Strangas, E.

    1980-01-01

    The time dependent solution of the magnetic field is introduced as a method for accounting for the variation, in time, of the machine parameters in predicting and analyzing the performance of the electrical machines. The method of time dependent finite element was used in combination with an also time dependent construction of a grid for the air gap region. The Maxwell stress tensor was used to calculate the airgap torque from the magnetic vector potential distribution. Incremental inductances were defined and calculated as functions of time, depending on eddy currents and saturation. The currents in all the machine circuits were calculated in the time domain based on these inductances, which were continuously updated. The method was applied to a chopper controlled DC series motor used for electric vehicle drive, and to a salient pole sychronous motor with damper bars. Simulation results were compared to experimentally obtained ones.

  8. Comparison of Machine Learning Methods for the Arterial Hypertension Diagnostics

    PubMed Central

    Belo, David; Gamboa, Hugo

    2017-01-01

    The paper presents results of machine learning approach accuracy applied analysis of cardiac activity. The study evaluates the diagnostics possibilities of the arterial hypertension by means of the short-term heart rate variability signals. Two groups were studied: 30 relatively healthy volunteers and 40 patients suffering from the arterial hypertension of II-III degree. The following machine learning approaches were studied: linear and quadratic discriminant analysis, k-nearest neighbors, support vector machine with radial basis, decision trees, and naive Bayes classifier. Moreover, in the study, different methods of feature extraction are analyzed: statistical, spectral, wavelet, and multifractal. All in all, 53 features were investigated. Investigation results show that discriminant analysis achieves the highest classification accuracy. The suggested approach of noncorrelated feature set search achieved higher results than data set based on the principal components. PMID:28831239

  9. A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification

    PubMed Central

    Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong

    2016-01-01

    Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs). PMID:26985826

  10. A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification.

    PubMed

    Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong

    2016-01-01

    Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs).

  11. LCD denoise and the vector mutual information method in the application of the gear fault diagnosis under different working conditions

    NASA Astrophysics Data System (ADS)

    Xiangfeng, Zhang; Hong, Jiang

    2018-03-01

    In this paper, the full vector LCD method is proposed to solve the misjudgment problem caused by the change of the working condition. First, the signal from different working condition is decomposed by LCD, to obtain the Intrinsic Scale Component (ISC)whose instantaneous frequency with physical significance. Then, calculate of the cross correlation coefficient between ISC and the original signal, signal denoising based on the principle of mutual information minimum. At last, calculate the sum of absolute Vector mutual information of the sample under different working condition and the denoised ISC as the characteristics to classify by use of Support vector machine (SVM). The wind turbines vibration platform gear box experiment proves that this method can identify fault characteristics under different working conditions. The advantage of this method is that it reduce dependence of man’s subjective experience, identify fault directly from the original data of vibration signal. It will has high engineering value.

  12. Kennard-Stone combined with least square support vector machine method for noncontact discriminating human blood species

    NASA Astrophysics Data System (ADS)

    Zhang, Linna; Li, Gang; Sun, Meixiu; Li, Hongxiao; Wang, Zhennan; Li, Yingxin; Lin, Ling

    2017-11-01

    Identifying whole bloods to be either human or nonhuman is an important responsibility for import-export ports and inspection and quarantine departments. Analytical methods and DNA testing methods are usually destructive. Previous studies demonstrated that visible diffuse reflectance spectroscopy method can realize noncontact human and nonhuman blood discrimination. An appropriate method for calibration set selection was very important for a robust quantitative model. In this paper, Random Selection (RS) method and Kennard-Stone (KS) method was applied in selecting samples for calibration set. Moreover, proper stoichiometry method can be greatly beneficial for improving the performance of classification model or quantification model. Partial Least Square Discrimination Analysis (PLSDA) method was commonly used in identification of blood species with spectroscopy methods. Least Square Support Vector Machine (LSSVM) was proved to be perfect for discrimination analysis. In this research, PLSDA method and LSSVM method was used for human blood discrimination. Compared with the results of PLSDA method, this method could enhance the performance of identified models. The overall results convinced that LSSVM method was more feasible for identifying human and animal blood species, and sufficiently demonstrated LSSVM method was a reliable and robust method for human blood identification, and can be more effective and accurate.

  13. Discriminative analysis of schizophrenia using support vector machine and recursive feature elimination on structural MRI images.

    PubMed

    Lu, Xiaobing; Yang, Yongzhe; Wu, Fengchun; Gao, Minjian; Xu, Yong; Zhang, Yue; Yao, Yongcheng; Du, Xin; Li, Chengwei; Wu, Lei; Zhong, Xiaomei; Zhou, Yanling; Fan, Ni; Zheng, Yingjun; Xiong, Dongsheng; Peng, Hongjun; Escudero, Javier; Huang, Biao; Li, Xiaobo; Ning, Yuping; Wu, Kai

    2016-07-01

    Structural abnormalities in schizophrenia (SZ) patients have been well documented with structural magnetic resonance imaging (MRI) data using voxel-based morphometry (VBM) and region of interest (ROI) analyses. However, these analyses can only detect group-wise differences and thus, have a poor predictive value for individuals. In the present study, we applied a machine learning method that combined support vector machine (SVM) with recursive feature elimination (RFE) to discriminate SZ patients from normal controls (NCs) using their structural MRI data. We first employed both VBM and ROI analyses to compare gray matter volume (GMV) and white matter volume (WMV) between 41 SZ patients and 42 age- and sex-matched NCs. The method of SVM combined with RFE was used to discriminate SZ patients from NCs using significant between-group differences in both GMV and WMV as input features. We found that SZ patients showed GM and WM abnormalities in several brain structures primarily involved in the emotion, memory, and visual systems. An SVM with a RFE classifier using the significant structural abnormalities identified by the VBM analysis as input features achieved the best performance (an accuracy of 88.4%, a sensitivity of 91.9%, and a specificity of 84.4%) in the discriminative analyses of SZ patients. These results suggested that distinct neuroanatomical profiles associated with SZ patients might provide a potential biomarker for disease diagnosis, and machine-learning methods can reveal neurobiological mechanisms in psychiatric diseases.

  14. [A prediction model for the activity of insecticidal crystal proteins from Bacillus thuringiensis based on support vector machine].

    PubMed

    Lin, Yi; Cai, Fu-Ying; Zhang, Guang-Ya

    2007-01-01

    A quantitative structure-property relationship (QSPR) model in terms of amino acid composition and the activity of Bacillus thuringiensis insecticidal crystal proteins was established. Support vector machine (SVM) is a novel general machine-learning tool based on the structural risk minimization principle that exhibits good generalization when fault samples are few; it is especially suitable for classification, forecasting, and estimation in cases where small amounts of samples are involved such as fault diagnosis; however, some parameters of SVM are selected based on the experience of the operator, which has led to decreased efficiency of SVM in practical application. The uniform design (UD) method was applied to optimize the running parameters of SVM. It was found that the average accuracy rate approached 73% when the penalty factor was 0.01, the epsilon 0.2, the gamma 0.05, and the range 0.5. The results indicated that UD might be used an effective method to optimize the parameters of SVM and SVM and could be used as an alternative powerful modeling tool for QSPR studies of the activity of Bacillus thuringiensis (Bt) insecticidal crystal proteins. Therefore, a novel method for predicting the insecticidal activity of Bt insecticidal crystal proteins was proposed by the authors of this study.

  15. PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine

    PubMed Central

    Manavalan, Balachandran; Shin, Tae H.; Lee, Gwang

    2018-01-01

    Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html. PMID:29616000

  16. PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.

    PubMed

    Manavalan, Balachandran; Shin, Tae H; Lee, Gwang

    2018-01-01

    Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.

  17. Development of a classification method for a crack on a pavement surface images using machine learning

    NASA Astrophysics Data System (ADS)

    Hizukuri, Akiyoshi; Nagata, Takeshi

    2017-03-01

    The purpose of this study is to develop a classification method for a crack on a pavement surface image using machine learning to reduce a maintenance fee. Our database consists of 3500 pavement surface images. This includes 800 crack and 2700 normal pavement surface images. The pavement surface images first are decomposed into several sub-images using a discrete wavelet transform (DWT) decomposition. We then calculate the wavelet sub-band histogram from each several sub-images at each level. The support vector machine (SVM) with computed wavelet sub-band histogram is employed for distinguishing between a crack and normal pavement surface images. The accuracies of the proposed classification method are 85.3% for crack and 84.4% for normal pavement images. The proposed classification method achieved high performance. Therefore, the proposed method would be useful in maintenance inspection.

  18. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    PubMed

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  19. The measurement of an aspherical mirror by three-dimensional nanoprofiler

    NASA Astrophysics Data System (ADS)

    Tokuta, Yusuke; Okita, Kenya; Okuda, Kohei; Kitayama, Takao; Nakano, Motohiro; Nakatani, Shun; Kudo, Ryota; Yamamura, Kazuya; Endo, Katsuyoshi

    2015-09-01

    Aspherical optical elements with high accuracy are important in several fields such as third-generation synchrotron radiation and extreme-ultraviolet lithography. Then the demand of measurement method for aspherical or free-form surface with nanometer resolution is rising. Our purpose is to develop a non-contact profiler to measure free-form surfaces directly with repeatability of figure error of less than 1 nm PV. To achieve this purpose we have developed three-dimensional Nanoprofiler which traces normal vectors of sample surface. The measurement principle is based on the straightness of LASER light and the accuracy of a rotational goniometer. This machine consists of four rotational stages, one translational stage and optical head which has the quadrant photodiode (QPD) and LASER head at optically equal position. In this measurement method, we conform the incident light beam to reflect the beam by controlling five stages and determine the normal vectors and the coordinates of the surface from signal of goniometers, translational stage and QPD. We can obtain three-dimensional figure from the normal vectors and the coordinates by a reconstruction algorithm. To evaluate performance of this machine we measure a concave aspherical mirror ten times. From ten results we calculate measurement repeatability, and we evaluate measurement uncertainty to compare the result with that measured by an interferometer. In consequence, the repeatability of measurement was 2.90 nm (σ) and the difference between the two profiles was +/-20 nm. We conclude that the two profiles was correspondent considering systematic errors of each machine.

  20. An image segmentation method for apple sorting and grading using support vector machine and Otsu's method

    USDA-ARS?s Scientific Manuscript database

    Segmentation is the first step in image analysis to subdivide an image into meaningful regions. The segmentation result directly affects the subsequent image analysis. The objective of the research was to develop an automatic adjustable algorithm for segmentation of color images, using linear suppor...

  1. Classification of large-sized hyperspectral imagery using fast machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Xia, Junshi; Yokoya, Naoto; Iwasaki, Akira

    2017-07-01

    We present a framework of fast machine learning algorithms in the context of large-sized hyperspectral images classification from the theoretical to a practical viewpoint. In particular, we assess the performance of random forest (RF), rotation forest (RoF), and extreme learning machine (ELM) and the ensembles of RF and ELM. These classifiers are applied to two large-sized hyperspectral images and compared to the support vector machines. To give the quantitative analysis, we pay attention to comparing these methods when working with high input dimensions and a limited/sufficient training set. Moreover, other important issues such as the computational cost and robustness against the noise are also discussed.

  2. Gender classification from face images by using local binary pattern and gray-level co-occurrence matrix

    NASA Astrophysics Data System (ADS)

    Uzbaş, Betül; Arslan, Ahmet

    2018-04-01

    Gender is an important step for human computer interactive processes and identification. Human face image is one of the important sources to determine gender. In the present study, gender classification is performed automatically from facial images. In order to classify gender, we propose a combination of features that have been extracted face, eye and lip regions by using a hybrid method of Local Binary Pattern and Gray-Level Co-Occurrence Matrix. The features have been extracted from automatically obtained face, eye and lip regions. All of the extracted features have been combined and given as input parameters to classification methods (Support Vector Machine, Artificial Neural Networks, Naive Bayes and k-Nearest Neighbor methods) for gender classification. The Nottingham Scan face database that consists of the frontal face images of 100 people (50 male and 50 female) is used for this purpose. As the result of the experimental studies, the highest success rate has been achieved as 98% by using Support Vector Machine. The experimental results illustrate the efficacy of our proposed method.

  3. Ecological footprint model using the support vector machine technique.

    PubMed

    Ma, Haibo; Chang, Wenjuan; Cui, Guangbai

    2012-01-01

    The per capita ecological footprint (EF) is one of the most widely recognized measures of environmental sustainability. It aims to quantify the Earth's biological resources required to support human activity. In this paper, we summarize relevant previous literature, and present five factors that influence per capita EF. These factors are: National gross domestic product (GDP), urbanization (independent of economic development), distribution of income (measured by the Gini coefficient), export dependence (measured by the percentage of exports to total GDP), and service intensity (measured by the percentage of service to total GDP). A new ecological footprint model based on a support vector machine (SVM), which is a machine-learning method based on the structural risk minimization principle from statistical learning theory was conducted to calculate the per capita EF of 24 nations using data from 123 nations. The calculation accuracy was measured by average absolute error and average relative error. They were 0.004883 and 0.351078% respectively. Our results demonstrate that the EF model based on SVM has good calculation performance.

  4. Efficient boundary hunting via vector quantization

    NASA Astrophysics Data System (ADS)

    Diamantini, Claudia; Panti, Maurizio

    2001-03-01

    A great amount of information about a classification problem is contained in those instances falling near the decision boundary. This intuition dates back to the earliest studies in pattern recognition, and in the more recent adaptive approaches to the so called boundary hunting, such as the work of Aha et alii on Instance Based Learning and the work of Vapnik et alii on Support Vector Machines. The last work is of particular interest, since theoretical and experimental results ensure the accuracy of boundary reconstruction. However, its optimization approach has heavy computational and memory requirements, which limits its application on huge amounts of data. In the paper we describe an alternative approach to boundary hunting based on adaptive labeled quantization architectures. The adaptation is performed by a stochastic gradient algorithm for the minimization of the error probability. Error probability minimization guarantees the accurate approximation of the optimal decision boundary, while the use of a stochastic gradient algorithm defines an efficient method to reach such approximation. In the paper comparisons to Support Vector Machines are considered.

  5. Support vector machine firefly algorithm based optimization of lens system.

    PubMed

    Shamshirband, Shahaboddin; Petković, Dalibor; Pavlović, Nenad T; Ch, Sudheer; Altameem, Torki A; Gani, Abdullah

    2015-01-01

    Lens system design is an important factor in image quality. The main aspect of the lens system design methodology is the optimization procedure. Since optimization is a complex, nonlinear task, soft computing optimization algorithms can be used. There are many tools that can be employed to measure optical performance, but the spot diagram is the most useful. The spot diagram gives an indication of the image of a point object. In this paper, the spot size radius is considered an optimization criterion. Intelligent soft computing scheme support vector machines (SVMs) coupled with the firefly algorithm (FFA) are implemented. The performance of the proposed estimators is confirmed with the simulation results. The result of the proposed SVM-FFA model has been compared with support vector regression (SVR), artificial neural networks, and generic programming methods. The results show that the SVM-FFA model performs more accurately than the other methodologies. Therefore, SVM-FFA can be used as an efficient soft computing technique in the optimization of lens system designs.

  6. MLACP: machine-learning-based prediction of anticancer peptides

    PubMed Central

    Manavalan, Balachandran; Basith, Shaherin; Shin, Tae Hwan; Choi, Sun; Kim, Myeong Ok; Lee, Gwang

    2017-01-01

    Cancer is the second leading cause of death globally, and use of therapeutic peptides to target and kill cancer cells has received considerable attention in recent years. Identification of anticancer peptides (ACPs) through wet-lab experimentation is expensive and often time consuming; therefore, development of an efficient computational method is essential to identify potential ACP candidates prior to in vitro experimentation. In this study, we developed support vector machine- and random forest-based machine-learning methods for the prediction of ACPs using the features calculated from the amino acid sequence, including amino acid composition, dipeptide composition, atomic composition, and physicochemical properties. We trained our methods using the Tyagi-B dataset and determined the machine parameters by 10-fold cross-validation. Furthermore, we evaluated the performance of our methods on two benchmarking datasets, with our results showing that the random forest-based method outperformed the existing methods with an average accuracy and Matthews correlation coefficient value of 88.7% and 0.78, respectively. To assist the scientific community, we also developed a publicly accessible web server at www.thegleelab.org/MLACP.html. PMID:29100375

  7. Full-motion video analysis for improved gender classification

    NASA Astrophysics Data System (ADS)

    Flora, Jeffrey B.; Lochtefeld, Darrell F.; Iftekharuddin, Khan M.

    2014-06-01

    The ability of computer systems to perform gender classification using the dynamic motion of the human subject has important applications in medicine, human factors, and human-computer interface systems. Previous works in motion analysis have used data from sensors (including gyroscopes, accelerometers, and force plates), radar signatures, and video. However, full-motion video, motion capture, range data provides a higher resolution time and spatial dataset for the analysis of dynamic motion. Works using motion capture data have been limited by small datasets in a controlled environment. In this paper, we explore machine learning techniques to a new dataset that has a larger number of subjects. Additionally, these subjects move unrestricted through a capture volume, representing a more realistic, less controlled environment. We conclude that existing linear classification methods are insufficient for the gender classification for larger dataset captured in relatively uncontrolled environment. A method based on a nonlinear support vector machine classifier is proposed to obtain gender classification for the larger dataset. In experimental testing with a dataset consisting of 98 trials (49 subjects, 2 trials per subject), classification rates using leave-one-out cross-validation are improved from 73% using linear discriminant analysis to 88% using the nonlinear support vector machine classifier.

  8. A hybrid approach to select features and classify diseases based on medical data

    NASA Astrophysics Data System (ADS)

    AbdelLatif, Hisham; Luo, Jiawei

    2018-03-01

    Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms

  9. A Model-Free Machine Learning Method for Risk Classification and Survival Probability Prediction.

    PubMed

    Geng, Yuan; Lu, Wenbin; Zhang, Hao Helen

    2014-01-01

    Risk classification and survival probability prediction are two major goals in survival data analysis since they play an important role in patients' risk stratification, long-term diagnosis, and treatment selection. In this article, we propose a new model-free machine learning framework for risk classification and survival probability prediction based on weighted support vector machines. The new procedure does not require any specific parametric or semiparametric model assumption on data, and is therefore capable of capturing nonlinear covariate effects. We use numerous simulation examples to demonstrate finite sample performance of the proposed method under various settings. Applications to a glioma tumor data and a breast cancer gene expression survival data are shown to illustrate the new methodology in real data analysis.

  10. Support vector machines for nuclear reactor state estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zavaljevski, N.; Gross, K. C.

    2000-02-14

    Validation of nuclear power reactor signals is often performed by comparing signal prototypes with the actual reactor signals. The signal prototypes are often computed based on empirical data. The implementation of an estimation algorithm which can make predictions on limited data is an important issue. A new machine learning algorithm called support vector machines (SVMS) recently developed by Vladimir Vapnik and his coworkers enables a high level of generalization with finite high-dimensional data. The improved generalization in comparison with standard methods like neural networks is due mainly to the following characteristics of the method. The input data space is transformedmore » into a high-dimensional feature space using a kernel function, and the learning problem is formulated as a convex quadratic programming problem with a unique solution. In this paper the authors have applied the SVM method for data-based state estimation in nuclear power reactors. In particular, they implemented and tested kernels developed at Argonne National Laboratory for the Multivariate State Estimation Technique (MSET), a nonlinear, nonparametric estimation technique with a wide range of applications in nuclear reactors. The methodology has been applied to three data sets from experimental and commercial nuclear power reactor applications. The results are promising. The combination of MSET kernels with the SVM method has better noise reduction and generalization properties than the standard MSET algorithm.« less

  11. Intelligent Gearbox Diagnosis Methods Based on SVM, Wavelet Lifting and RBR

    PubMed Central

    Gao, Lixin; Ren, Zhiqiang; Tang, Wenliang; Wang, Huaqing; Chen, Peng

    2010-01-01

    Given the problems in intelligent gearbox diagnosis methods, it is difficult to obtain the desired information and a large enough sample size to study; therefore, we propose the application of various methods for gearbox fault diagnosis, including wavelet lifting, a support vector machine (SVM) and rule-based reasoning (RBR). In a complex field environment, it is less likely for machines to have the same fault; moreover, the fault features can also vary. Therefore, a SVM could be used for the initial diagnosis. First, gearbox vibration signals were processed with wavelet packet decomposition, and the signal energy coefficients of each frequency band were extracted and used as input feature vectors in SVM for normal and faulty pattern recognition. Second, precision analysis using wavelet lifting could successfully filter out the noisy signals while maintaining the impulse characteristics of the fault; thus effectively extracting the fault frequency of the machine. Lastly, the knowledge base was built based on the field rules summarized by experts to identify the detailed fault type. Results have shown that SVM is a powerful tool to accomplish gearbox fault pattern recognition when the sample size is small, whereas the wavelet lifting scheme can effectively extract fault features, and rule-based reasoning can be used to identify the detailed fault type. Therefore, a method that combines SVM, wavelet lifting and rule-based reasoning ensures effective gearbox fault diagnosis. PMID:22399894

  12. Establishing glucose- and ABA-regulated transcription networks in Arabidopsis by microarray analysis and promoter classification using a Relevance Vector Machine.

    PubMed

    Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W

    2006-03-01

    Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.

  13. Intelligent gearbox diagnosis methods based on SVM, wavelet lifting and RBR.

    PubMed

    Gao, Lixin; Ren, Zhiqiang; Tang, Wenliang; Wang, Huaqing; Chen, Peng

    2010-01-01

    Given the problems in intelligent gearbox diagnosis methods, it is difficult to obtain the desired information and a large enough sample size to study; therefore, we propose the application of various methods for gearbox fault diagnosis, including wavelet lifting, a support vector machine (SVM) and rule-based reasoning (RBR). In a complex field environment, it is less likely for machines to have the same fault; moreover, the fault features can also vary. Therefore, a SVM could be used for the initial diagnosis. First, gearbox vibration signals were processed with wavelet packet decomposition, and the signal energy coefficients of each frequency band were extracted and used as input feature vectors in SVM for normal and faulty pattern recognition. Second, precision analysis using wavelet lifting could successfully filter out the noisy signals while maintaining the impulse characteristics of the fault; thus effectively extracting the fault frequency of the machine. Lastly, the knowledge base was built based on the field rules summarized by experts to identify the detailed fault type. Results have shown that SVM is a powerful tool to accomplish gearbox fault pattern recognition when the sample size is small, whereas the wavelet lifting scheme can effectively extract fault features, and rule-based reasoning can be used to identify the detailed fault type. Therefore, a method that combines SVM, wavelet lifting and rule-based reasoning ensures effective gearbox fault diagnosis.

  14. Multivariate Models for Prediction of Human Skin Sensitization ...

    EPA Pesticide Factsheets

    One of the lnteragency Coordinating Committee on the Validation of Alternative Method's (ICCVAM) top priorities is the development and evaluation of non-animal approaches to identify potential skin sensitizers. The complexity of biological events necessary to produce skin sensitization suggests that no single alternative method will replace the currently accepted animal tests. ICCVAM is evaluating an integrated approach to testing and assessment based on the adverse outcome pathway for skin sensitization that uses machine learning approaches to predict human skin sensitization hazard. We combined data from three in chemico or in vitro assays - the direct peptide reactivity assay (DPRA), human cell line activation test (h-CLAT) and KeratinoSens TM assay - six physicochemical properties and an in silico read-across prediction of skin sensitization hazard into 12 variable groups. The variable groups were evaluated using two machine learning approaches , logistic regression and support vector machine, to predict human skin sensitization hazard. Models were trained on 72 substances and tested on an external set of 24 substances. The six models (three logistic regression and three support vector machine) with the highest accuracy (92%) used: (1) DPRA, h-CLAT and read-across; (2) DPRA, h-CLAT, read-across and KeratinoSens; or (3) DPRA, h-CLAT, read-across, KeratinoSens and log P. The models performed better at predicting human skin sensitization hazard than the murine

  15. Resident Space Object Characterization and Behavior Understanding via Machine Learning and Ontology-based Bayesian Networks

    NASA Astrophysics Data System (ADS)

    Furfaro, R.; Linares, R.; Gaylor, D.; Jah, M.; Walls, R.

    2016-09-01

    In this paper, we present an end-to-end approach that employs machine learning techniques and Ontology-based Bayesian Networks (BN) to characterize the behavior of resident space objects. State-of-the-Art machine learning architectures (e.g. Extreme Learning Machines, Convolutional Deep Networks) are trained on physical models to learn the Resident Space Object (RSO) features in the vectorized energy and momentum states and parameters. The mapping from measurements to vectorized energy and momentum states and parameters enables behavior characterization via clustering in the features space and subsequent RSO classification. Additionally, Space Object Behavioral Ontologies (SOBO) are employed to define and capture the domain knowledge-base (KB) and BNs are constructed from the SOBO in a semi-automatic fashion to execute probabilistic reasoning over conclusions drawn from trained classifiers and/or directly from processed data. Such an approach enables integrating machine learning classifiers and probabilistic reasoning to support higher-level decision making for space domain awareness applications. The innovation here is to use these methods (which have enjoyed great success in other domains) in synergy so that it enables a "from data to discovery" paradigm by facilitating the linkage and fusion of large and disparate sources of information via a Big Data Science and Analytics framework.

  16. Merged or monolithic? Using machine-learning to reconstruct the dynamical history of simulated star clusters

    NASA Astrophysics Data System (ADS)

    Pasquato, Mario; Chung, Chul

    2016-05-01

    Context. Machine-learning (ML) solves problems by learning patterns from data with limited or no human guidance. In astronomy, ML is mainly applied to large observational datasets, e.g. for morphological galaxy classification. Aims: We apply ML to gravitational N-body simulations of star clusters that are either formed by merging two progenitors or evolved in isolation, planning to later identify globular clusters (GCs) that may have a history of merging from observational data. Methods: We create mock-observations from simulated GCs, from which we measure a set of parameters (also called features in the machine-learning field). After carrying out dimensionality reduction on the feature space, the resulting datapoints are fed in to various classification algorithms. Using repeated random subsampling validation, we check whether the groups identified by the algorithms correspond to the underlying physical distinction between mergers and monolithically evolved simulations. Results: The three algorithms we considered (C5.0 trees, k-nearest neighbour, and support-vector machines) all achieve a test misclassification rate of about 10% without parameter tuning, with support-vector machines slightly outperforming the others. The first principal component of feature space correlates with cluster concentration. If we exclude it from the regression, the performance of the algorithms is only slightly reduced.

  17. Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning.

    PubMed

    Liu, Bin; Wang, Shanyi; Dong, Qiwen; Li, Shumin; Liu, Xuan

    2016-04-20

    DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. With the rapid development of next generation of sequencing technique, the number of protein sequences is unprecedentedly increasing. Thus it is necessary to develop computational methods to identify the DNA-binding proteins only based on the protein sequence information. In this study, a novel method called iDNA-KACC is presented, which combines the Support Vector Machine (SVM) and the auto-cross covariance transformation. The protein sequences are first converted into profile-based protein representation, and then converted into a series of fixed-length vectors by the auto-cross covariance transformation with Kmer composition. The sequence order effect can be effectively captured by this scheme. These vectors are then fed into Support Vector Machine (SVM) to discriminate the DNA-binding proteins from the non DNA-binding ones. iDNA-KACC achieves an overall accuracy of 75.16% and Matthew correlation coefficient of 0.5 by a rigorous jackknife test. Its performance is further improved by employing an ensemble learning approach, and the improved predictor is called iDNA-KACC-EL. Experimental results on an independent dataset shows that iDNA-KACC-EL outperforms all the other state-of-the-art predictors, indicating that it would be a useful computational tool for DNA binding protein identification. .

  18. Segmentation of mosaicism in cervicographic images using support vector machines

    NASA Astrophysics Data System (ADS)

    Xue, Zhiyun; Long, L. Rodney; Antani, Sameer; Jeronimo, Jose; Thoma, George R.

    2009-02-01

    The National Library of Medicine (NLM), in collaboration with the National Cancer Institute (NCI), is creating a large digital repository of cervicographic images for the study of uterine cervix cancer prevention. One of the research goals is to automatically detect diagnostic bio-markers in these images. Reliable bio-marker segmentation in large biomedical image collections is a challenging task due to the large variation in image appearance. Methods described in this paper focus on segmenting mosaicism, which is an important vascular feature used to visually assess the degree of cervical intraepithelial neoplasia. The proposed approach uses support vector machines (SVM) trained on a ground truth dataset annotated by medical experts (which circumvents the need for vascular structure extraction). We have evaluated the performance of the proposed algorithm and experimentally demonstrated its feasibility.

  19. Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs.

    PubMed

    Shamim, Mohammad Tabrez Anwar; Anwaruddin, Mohammad; Nagarajaram, H A

    2007-12-15

    Fold recognition is a key step in the protein structure discovery process, especially when traditional sequence comparison methods fail to yield convincing structural homologies. Although many methods have been developed for protein fold recognition, their accuracies remain low. This can be attributed to insufficient exploitation of fold discriminatory features. We have developed a new method for protein fold recognition using structural information of amino acid residues and amino acid residue pairs. Since protein fold recognition can be treated as a protein fold classification problem, we have developed a Support Vector Machine (SVM) based classifier approach that uses secondary structural state and solvent accessibility state frequencies of amino acids and amino acid pairs as feature vectors. Among the individual properties examined secondary structural state frequencies of amino acids gave an overall accuracy of 65.2% for fold discrimination, which is better than the accuracy by any method reported so far in the literature. Combination of secondary structural state frequencies with solvent accessibility state frequencies of amino acids and amino acid pairs further improved the fold discrimination accuracy to more than 70%, which is approximately 8% higher than the best available method. In this study we have also tested, for the first time, an all-together multi-class method known as Crammer and Singer method for protein fold classification. Our studies reveal that the three multi-class classification methods, namely one versus all, one versus one and Crammer and Singer method, yield similar predictions. Dataset and stand-alone program are available upon request.

  20. Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

    PubMed

    Marucci-Wellman, Helen R; Corns, Helen L; Lehto, Mark R

    2017-01-01

    Injury narratives are now available real time and include useful information for injury surveillance and prevention. However, manual classification of the cause or events leading to injury found in large batches of narratives, such as workers compensation claims databases, can be prohibitive. In this study we compare the utility of four machine learning algorithms (Naïve Bayes, Single word and Bi-gram models, Support Vector Machine and Logistic Regression) for classifying narratives into Bureau of Labor Statistics Occupational Injury and Illness event leading to injury classifications for a large workers compensation database. These algorithms are known to do well classifying narrative text and are fairly easy to implement with off-the-shelf software packages such as Python. We propose human-machine learning ensemble approaches which maximize the power and accuracy of the algorithms for machine-assigned codes and allow for strategic filtering of rare, emerging or ambiguous narratives for manual review. We compare human-machine approaches based on filtering on the prediction strength of the classifier vs. agreement between algorithms. Regularized Logistic Regression (LR) was the best performing algorithm alone. Using this algorithm and filtering out the bottom 30% of predictions for manual review resulted in high accuracy (overall sensitivity/positive predictive value of 0.89) of the final machine-human coded dataset. The best pairings of algorithms included Naïve Bayes with Support Vector Machine whereby the triple ensemble NB SW =NB BI-GRAM =SVM had very high performance (0.93 overall sensitivity/positive predictive value and high accuracy (i.e. high sensitivity and positive predictive values)) across both large and small categories leaving 41% of the narratives for manual review. Integrating LR into this ensemble mix improved performance only slightly. For large administrative datasets we propose incorporation of methods based on human-machine pairings such as we have done here, utilizing readily-available off-the-shelf machine learning techniques and resulting in only a fraction of narratives that require manual review. Human-machine ensemble methods are likely to improve performance over total manual coding. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  1. Statistical learning algorithms for identifying contrasting tillage practices with landsat thematic mapper data

    USDA-ARS?s Scientific Manuscript database

    Tillage management practices have direct impact on water holding capacity, evaporation, carbon sequestration, and water quality. This study examines the feasibility of two statistical learning algorithms, such as Least Square Support Vector Machine (LSSVM) and Relevance Vector Machine (RVM), for cla...

  2. Identifying saltcedar with hyperspectral data and support vector machines

    USDA-ARS?s Scientific Manuscript database

    Saltcedar (Tamarix spp.) are a group of dense phreatophytic shrubs and trees that are invasive to riparian areas throughout the United States. This study determined the feasibility of using hyperspectral data and a support vector machine (SVM) classifier to discriminate saltcedar from other cover t...

  3. A bibliography on parallel and vector numerical algorithms

    NASA Technical Reports Server (NTRS)

    Ortega, James M.; Voigt, Robert G.; Romine, Charles H.

    1988-01-01

    This is a bibliography on numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are also listed.

  4. A bibliography on parallel and vector numerical algorithms

    NASA Technical Reports Server (NTRS)

    Ortega, J. M.; Voigt, R. G.

    1987-01-01

    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also.

  5. A bibliography on parallel and vector numerical algorithms

    NASA Technical Reports Server (NTRS)

    Ortega, James M.; Voigt, Robert G.; Romine, Charles H.

    1990-01-01

    This is a bibliography on numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are also listed.

  6. Spray characterization of ULV sprayers typically used in vector control

    USDA-ARS?s Scientific Manuscript database

    Numerous spray machines are used to apply products for the control of human disease vectors, such as mosquitoes and flies. However, the selection and setup of these machines significantly affect the level of control achieved during an application. The droplet spectra produced by nine different ULV...

  7. Applying spectral unmixing and support vector machine to airborne hyperspectral imagery for detecting giant reed

    USDA-ARS?s Scientific Manuscript database

    This study evaluated linear spectral unmixing (LSU), mixture tuned matched filtering (MTMF) and support vector machine (SVM) techniques for detecting and mapping giant reed (Arundo donax L.), an invasive weed that presents a severe threat to agroecosystems and riparian areas throughout the southern ...

  8. Support vector machines classifiers of physical activities in preschoolers

    USDA-ARS?s Scientific Manuscript database

    The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...

  9. Comparison of Support Vector Machine, Neural Network, and CART Algorithms for the Land-Cover Classification Using Limited Training Data Points

    EPA Science Inventory

    Support vector machine (SVM) was applied for land-cover characterization using MODIS time-series data. Classification performance was examined with respect to training sample size, sample variability, and landscape homogeneity (purity). The results were compared to two convention...

  10. Applying machine learning methods for characterization of hexagonal prisms from their 2D scattering patterns - an investigation using modelled scattering data

    NASA Astrophysics Data System (ADS)

    Salawu, Emmanuel Oluwatobi; Hesse, Evelyn; Stopford, Chris; Davey, Neil; Sun, Yi

    2017-11-01

    Better understanding and characterization of cloud particles, whose properties and distributions affect climate and weather, are essential for the understanding of present climate and climate change. Since imaging cloud probes have limitations of optical resolution, especially for small particles (with diameter < 25 μm), instruments like the Small Ice Detector (SID) probes, which capture high-resolution spatial light scattering patterns from individual particles down to 1 μm in size, have been developed. In this work, we have proposed a method using Machine Learning techniques to estimate simulated particles' orientation-averaged projected sizes (PAD) and aspect ratio from their 2D scattering patterns. The two-dimensional light scattering patterns (2DLSP) of hexagonal prisms are computed using the Ray Tracing with Diffraction on Facets (RTDF) model. The 2DLSP cover the same angular range as the SID probes. We generated 2DLSP for 162 hexagonal prisms at 133 orientations for each. In a first step, the 2DLSP were transformed into rotation-invariant Zernike moments (ZMs), which are particularly suitable for analyses of pattern symmetry. Then we used ZMs, summed intensities, and root mean square contrast as inputs to the advanced Machine Learning methods. We created one random forests classifier for predicting prism orientation, 133 orientation-specific (OS) support vector classification models for predicting the prism aspect-ratios, 133 OS support vector regression models for estimating prism sizes, and another 133 OS Support Vector Regression (SVR) models for estimating the size PADs. We have achieved a high accuracy of 0.99 in predicting prism aspect ratios, and a low value of normalized mean square error of 0.004 for estimating the particle's size and size PADs.

  11. Identification of type 2 diabetes-associated combination of SNPs using support vector machine.

    PubMed

    Ban, Hyo-Jeong; Heo, Jee Yeon; Oh, Kyung-Soo; Park, Keun-Joon

    2010-04-23

    Type 2 diabetes mellitus (T2D), a metabolic disorder characterized by insulin resistance and relative insulin deficiency, is a complex disease of major public health importance. Its incidence is rapidly increasing in the developed countries. Complex diseases are caused by interactions between multiple genes and environmental factors. Most association studies aim to identify individual susceptibility single markers using a simple disease model. Recent studies are trying to estimate the effects of multiple genes and multi-locus in genome-wide association. However, estimating the effects of association is very difficult. We aim to assess the rules for classifying diseased and normal subjects by evaluating potential gene-gene interactions in the same or distinct biological pathways. We analyzed the importance of gene-gene interactions in T2D susceptibility by investigating 408 single nucleotide polymorphisms (SNPs) in 87 genes involved in major T2D-related pathways in 462 T2D patients and 456 healthy controls from the Korean cohort studies. We evaluated the support vector machine (SVM) method to differentiate between cases and controls using SNP information in a 10-fold cross-validation test. We achieved a 65.3% prediction rate with a combination of 14 SNPs in 12 genes by using the radial basis function (RBF)-kernel SVM. Similarly, we investigated subpopulation data sets of men and women and identified different SNP combinations with the prediction rates of 70.9% and 70.6%, respectively. As the high-throughput technology for genome-wide SNPs improves, it is likely that a much higher prediction rate with biologically more interesting combination of SNPs can be acquired by using this method. Support Vector Machine based feature selection method in this research found novel association between combinations of SNPs and T2D in a Korean population.

  12. Equivalent model of a dually-fed machine for electric drive control systems

    NASA Astrophysics Data System (ADS)

    Ostrovlyanchik, I. Yu; Popolzin, I. Yu

    2018-05-01

    The article shows that the mathematical model of a dually-fed machine is complicated because of the presence of a controlled voltage source in the rotor circuit. As a method of obtaining a mathematical model, the method of a generalized two-phase electric machine is applied and a rotating orthogonal coordinate system is chosen that is associated with the representing vector of a stator current. In the chosen coordinate system in the operator form the differential equations of electric equilibrium for the windings of the generalized machine (the Kirchhoff equation) are written together with the expression for the moment, which determines the electromechanical energy transformation in the machine. Equations are transformed so that they connect the currents of the windings, that determine the moment of the machine, and the voltages on these windings. The structural diagram of the machine is assigned to the written equations. Based on the written equations and accepted assumptions, expressions were obtained for the balancing the EMF of windings, and on the basis of these expressions an equivalent mathematical model of a dually-fed machine is proposed, convenient for use in electric drive control systems.

  13. Classification of older adults with/without a fall history using machine learning methods.

    PubMed

    Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D

    2015-01-01

    Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls.

  14. Application of Machine Learning Approaches for Protein-protein Interactions Prediction.

    PubMed

    Zhang, Mengying; Su, Qiang; Lu, Yi; Zhao, Manman; Niu, Bing

    2017-01-01

    Proteomics endeavors to study the structures, functions and interactions of proteins. Information of the protein-protein interactions (PPIs) helps to improve our knowledge of the functions and the 3D structures of proteins. Thus determining the PPIs is essential for the study of the proteomics. In this review, in order to study the application of machine learning in predicting PPI, some machine learning approaches such as support vector machine (SVM), artificial neural networks (ANNs) and random forest (RF) were selected, and the examples of its applications in PPIs were listed. SVM and RF are two commonly used methods. Nowadays, more researchers predict PPIs by combining more than two methods. This review presents the application of machine learning approaches in predicting PPI. Many examples of success in identification and prediction in the area of PPI prediction have been discussed, and the PPIs research is still in progress. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  15. Identifying predictive features in drug response using machine learning: opportunities and challenges.

    PubMed

    Vidyasagar, Mathukumalli

    2015-01-01

    This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and k-means clustering. Several references indicative of the application of these methods to cancer biology are discussed.

  16. SOLAR FLARE PREDICTION USING SDO/HMI VECTOR MAGNETIC FIELD DATA WITH A MACHINE-LEARNING ALGORITHM

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bobra, M. G.; Couvidat, S., E-mail: couvidat@stanford.edu

    2015-01-10

    We attempt to forecast M- and X-class solar flares using a machine-learning algorithm, called support vector machine (SVM), and four years of data from the Solar Dynamics Observatory's Helioseismic and Magnetic Imager, the first instrument to continuously map the full-disk photospheric vector magnetic field from space. Most flare forecasting efforts described in the literature use either line-of-sight magnetograms or a relatively small number of ground-based vector magnetograms. This is the first time a large data set of vector magnetograms has been used to forecast solar flares. We build a catalog of flaring and non-flaring active regions sampled from a databasemore » of 2071 active regions, comprised of 1.5 million active region patches of vector magnetic field data, and characterize each active region by 25 parameters. We then train and test the machine-learning algorithm and we estimate its performances using forecast verification metrics with an emphasis on the true skill statistic (TSS). We obtain relatively high TSS scores and overall predictive abilities. We surmise that this is partly due to fine-tuning the SVM for this purpose and also to an advantageous set of features that can only be calculated from vector magnetic field data. We also apply a feature selection algorithm to determine which of our 25 features are useful for discriminating between flaring and non-flaring active regions and conclude that only a handful are needed for good predictive abilities.« less

  17. Differentiation of Enhancing Glioma and Primary Central Nervous System Lymphoma by Texture-Based Machine Learning.

    PubMed

    Alcaide-Leon, P; Dufort, P; Geraldo, A F; Alshafai, L; Maralani, P J; Spears, J; Bharatha, A

    2017-06-01

    Accurate preoperative differentiation of primary central nervous system lymphoma and enhancing glioma is essential to avoid unnecessary neurosurgical resection in patients with primary central nervous system lymphoma. The purpose of the study was to evaluate the diagnostic performance of a machine-learning algorithm by using texture analysis of contrast-enhanced T1-weighted images for differentiation of primary central nervous system lymphoma and enhancing glioma. Seventy-one adult patients with enhancing gliomas and 35 adult patients with primary central nervous system lymphomas were included. The tumors were manually contoured on contrast-enhanced T1WI, and the resulting volumes of interest were mined for textural features and subjected to a support vector machine-based machine-learning protocol. Three readers classified the tumors independently on contrast-enhanced T1WI. Areas under the receiver operating characteristic curves were estimated for each reader and for the support vector machine classifier. A noninferiority test for diagnostic accuracy based on paired areas under the receiver operating characteristic curve was performed with a noninferiority margin of 0.15. The mean areas under the receiver operating characteristic curve were 0.877 (95% CI, 0.798-0.955) for the support vector machine classifier; 0.878 (95% CI, 0.807-0.949) for reader 1; 0.899 (95% CI, 0.833-0.966) for reader 2; and 0.845 (95% CI, 0.757-0.933) for reader 3. The mean area under the receiver operating characteristic curve of the support vector machine classifier was significantly noninferior to the mean area under the curve of reader 1 ( P = .021), reader 2 ( P = .035), and reader 3 ( P = .007). Support vector machine classification based on textural features of contrast-enhanced T1WI is noninferior to expert human evaluation in the differentiation of primary central nervous system lymphoma and enhancing glioma. © 2017 by American Journal of Neuroradiology.

  18. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling

    PubMed Central

    Cuperlovic-Culf, Miroslava

    2018-01-01

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies. PMID:29324649

  19. Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling.

    PubMed

    Cuperlovic-Culf, Miroslava

    2018-01-11

    Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic. To a lesser extent, machine learning has also been utilized to take advantage of the increasing availability of genomics and metabolomics data for the optimization of metabolic network models and their analysis. In this context, machine learning has aided the development of metabolic networks, the calculation of parameters for stoichiometric and kinetic models, as well as the analysis of major features in the model for the optimal application of bioreactors. Examples of this very interesting, albeit highly complex, application of machine learning for metabolism modeling will be the primary focus of this review presenting several different types of applications for model optimization, parameter determination or system analysis using models, as well as the utilization of several different types of machine learning technologies.

  20. Motion direction estimation based on active RFID with changing environment

    NASA Astrophysics Data System (ADS)

    Jie, Wu; Minghua, Zhu; Wei, He

    2018-05-01

    The gate system is used to estimate the direction of RFID tags carriers when they are going through the gate. Normally, it is difficult to achieve and keep a high accuracy in estimating motion direction of RFID tags because the received signal strength of tag changes sharply according to the changing electromagnetic environment. In this paper, a method of motion direction estimation for RFID tags is presented. To improve estimation accuracy, the machine leaning algorithm is used to get the fitting function of the received data by readers which are deployed inside and outside gate respectively. Then the fitted data are sampled to get the standard vector. We compare the stand vector with template vectors to get the motion direction estimation result. Then the corresponding template vector is updated according to the surrounding environment. We conducted the simulation and implement of the proposed method and the result shows that the proposed method in this work can improve and keep a high accuracy under the condition of the constantly changing environment.

  1. Research on software behavior trust based on hierarchy evaluation

    NASA Astrophysics Data System (ADS)

    Long, Ke; Xu, Haishui

    2017-08-01

    In view of the correlation software behavior, we evaluate software behavior credibility from two levels of control flow and data flow. In control flow level, method of the software behavior of trace based on support vector machine (SVM) is proposed. In data flow level, behavioral evidence evaluation based on fuzzy decision analysis method is put forward.

  2. Walsh-Hadamard transform kernel-based feature vector for shot boundary detection.

    PubMed

    Lakshmi, Priya G G; Domnic, S

    2014-12-01

    Video shot boundary detection (SBD) is the first step of video analysis, summarization, indexing, and retrieval. In SBD process, videos are segmented into basic units called shots. In this paper, a new SBD method is proposed using color, edge, texture, and motion strength as vector of features (feature vector). Features are extracted by projecting the frames on selected basis vectors of Walsh-Hadamard transform (WHT) kernel and WHT matrix. After extracting the features, based on the significance of the features, weights are calculated. The weighted features are combined to form a single continuity signal, used as input for Procedure Based shot transition Identification process (PBI). Using the procedure, shot transitions are classified into abrupt and gradual transitions. Experimental results are examined using large-scale test sets provided by the TRECVID 2007, which has evaluated hard cut and gradual transition detection. To evaluate the robustness of the proposed method, the system evaluation is performed. The proposed method yields F1-Score of 97.4% for cut, 78% for gradual, and 96.1% for overall transitions. We have also evaluated the proposed feature vector with support vector machine classifier. The results show that WHT-based features can perform well than the other existing methods. In addition to this, few more video sequences are taken from the Openvideo project and the performance of the proposed method is compared with the recent existing SBD method.

  3. Support Vector Machine algorithm for regression and classification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu, Chenggang; Zavaljevski, Nela

    2001-08-01

    The software is an implementation of the Support Vector Machine (SVM) algorithm that was invented and developed by Vladimir Vapnik and his co-workers at AT&T Bell Laboratories. The specific implementation reported here is an Active Set method for solving a quadratic optimization problem that forms the major part of any SVM program. The implementation is tuned to specific constraints generated in the SVM learning. Thus, it is more efficient than general-purpose quadratic optimization programs. A decomposition method has been implemented in the software that enables processing large data sets. The size of the learning data is virtually unlimited by themore » capacity of the computer physical memory. The software is flexible and extensible. Two upper bounds are implemented to regulate the SVM learning for classification, which allow users to adjust the false positive and false negative rates. The software can be used either as a standalone, general-purpose SVM regression or classification program, or be embedded into a larger software system.« less

  4. Computer-assisted segmentation of white matter lesions in 3D MR images using support vector machine.

    PubMed

    Lao, Zhiqiang; Shen, Dinggang; Liu, Dengfeng; Jawad, Abbas F; Melhem, Elias R; Launer, Lenore J; Bryan, R Nick; Davatzikos, Christos

    2008-03-01

    Brain lesions, especially white matter lesions (WMLs), are associated with cardiac and vascular disease, but also with normal aging. Quantitative analysis of WML in large clinical trials is becoming more and more important. In this article, we present a computer-assisted WML segmentation method, based on local features extracted from multiparametric magnetic resonance imaging (MRI) sequences (ie, T1-weighted, T2-weighted, proton density-weighted, and fluid attenuation inversion recovery MRI scans). A support vector machine classifier is first trained on expert-defined WMLs, and is then used to classify new scans. Postprocessing analysis further reduces false positives by using anatomic knowledge and measures of distance from the training set. Cross-validation on a population of 35 patients from three different imaging sites with WMLs of varying sizes, shapes, and locations tests the robustness and accuracy of the proposed segmentation method, compared with the manual segmentation results from two experienced neuroradiologists.

  5. Application of support vector machine for the separation of mineralised zones in the Takht-e-Gonbad porphyry deposit, SE Iran

    NASA Astrophysics Data System (ADS)

    Mahvash Mohammadi, Neda; Hezarkhani, Ardeshir

    2018-07-01

    Classification of mineralised zones is an important factor for the analysis of economic deposits. In this paper, the support vector machine (SVM), a supervised learning algorithm, based on subsurface data is proposed for classification of mineralised zones in the Takht-e-Gonbad porphyry Cu-deposit (SE Iran). The effects of the input features are evaluated via calculating the accuracy rates on the SVM performance. Ultimately, the SVM model, is developed based on input features namely lithology, alteration, mineralisation, the level and, radial basis function (RBF) as a kernel function. Moreover, the optimal amount of parameters λ and C, using n-fold cross-validation method, are calculated at level 0.001 and 0.01 respectively. The accuracy of this model is 0.931 for classification of mineralised zones in the Takht-e-Gonbad porphyry deposit. The results of the study confirm the efficiency of SVM method for classification the mineralised zones.

  6. Distributed support vector machine in master-slave mode.

    PubMed

    Chen, Qingguo; Cao, Feilong

    2018-05-01

    It is well known that the support vector machine (SVM) is an effective learning algorithm. The alternating direction method of multipliers (ADMM) algorithm has emerged as a powerful technique for solving distributed optimisation models. This paper proposes a distributed SVM algorithm in a master-slave mode (MS-DSVM), which integrates a distributed SVM and ADMM acting in a master-slave configuration where the master node and slave nodes are connected, meaning the results can be broadcasted. The distributed SVM is regarded as a regularised optimisation problem and modelled as a series of convex optimisation sub-problems that are solved by ADMM. Additionally, the over-relaxation technique is utilised to accelerate the convergence rate of the proposed MS-DSVM. Our theoretical analysis demonstrates that the proposed MS-DSVM has linear convergence, meaning it possesses the fastest convergence rate among existing standard distributed ADMM algorithms. Numerical examples demonstrate that the convergence and accuracy of the proposed MS-DSVM are superior to those of existing methods under the ADMM framework. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Nonlinear temperature compensation of fluxgate magnetometers with a least-squares support vector machine

    NASA Astrophysics Data System (ADS)

    Pang, Hongfeng; Chen, Dixiang; Pan, Mengchun; Luo, Shitu; Zhang, Qi; Luo, Feilu

    2012-02-01

    Fluxgate magnetometers are widely used for magnetic field measurement. However, their accuracy is influenced by temperature. In this paper, a new method was proposed to compensate the temperature drift of fluxgate magnetometers, in which a least-squares support vector machine (LSSVM) is utilized. The compensation performance was analyzed by simulation, which shows that the LSSVM has better performance and less training time than backpropagation and radical basis function neural networks. The temperature characteristics of a DM fluxgate magnetometer were measured with a temperature experiment box. Forty-five measured data under different magnetic fields and temperatures were obtained and divided into 36 training data and nine test data. The training data were used to obtain the parameters of the LSSVM model, and the compensation performance of the LSSVM model was verified by the test data. Experimental results show that the temperature drift of magnetometer is reduced from 109.3 to 3.3 nT after compensation, which suggests that this compensation method is effective for the accuracy improvement of fluxgate magnetometers.

  8. Inline Measurement of Particle Concentrations in Multicomponent Suspensions using Ultrasonic Sensor and Least Squares Support Vector Machines.

    PubMed

    Zhan, Xiaobin; Jiang, Shulan; Yang, Yili; Liang, Jian; Shi, Tielin; Li, Xiwen

    2015-09-18

    This paper proposes an ultrasonic measurement system based on least squares support vector machines (LS-SVM) for inline measurement of particle concentrations in multicomponent suspensions. Firstly, the ultrasonic signals are analyzed and processed, and the optimal feature subset that contributes to the best model performance is selected based on the importance of features. Secondly, the LS-SVM model is tuned, trained and tested with different feature subsets to obtain the optimal model. In addition, a comparison is made between the partial least square (PLS) model and the LS-SVM model. Finally, the optimal LS-SVM model with the optimal feature subset is applied to inline measurement of particle concentrations in the mixing process. The results show that the proposed method is reliable and accurate for inline measuring the particle concentrations in multicomponent suspensions and the measurement accuracy is sufficiently high for industrial application. Furthermore, the proposed method is applicable to the modeling of the nonlinear system dynamically and provides a feasible way to monitor industrial processes.

  9. A Novel Extreme Learning Machine Classification Model for e-Nose Application Based on the Multiple Kernel Approach

    PubMed Central

    Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong

    2017-01-01

    A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification. PMID:28629202

  10. Classification of highly unbalanced CYP450 data of drugs using cost sensitive machine learning techniques.

    PubMed

    Eitrich, T; Kless, A; Druska, C; Meyer, W; Grotendorst, J

    2007-01-01

    In this paper, we study the classifications of unbalanced data sets of drugs. As an example we chose a data set of 2D6 inhibitors of cytochrome P450. The human cytochrome P450 2D6 isoform plays a key role in the metabolism of many drugs in the preclinical drug discovery process. We have collected a data set from annotated public data and calculated physicochemical properties with chemoinformatics methods. On top of this data, we have built classifiers based on machine learning methods. Data sets with different class distributions lead to the effect that conventional machine learning methods are biased toward the larger class. To overcome this problem and to obtain sensitive but also accurate classifiers we combine machine learning and feature selection methods with techniques addressing the problem of unbalanced classification, such as oversampling and threshold moving. We have used our own implementation of a support vector machine algorithm as well as the maximum entropy method. Our feature selection is based on the unsupervised McCabe method. The classification results from our test set are compared structurally with compounds from the training set. We show that the applied algorithms enable the effective high throughput in silico classification of potential drug candidates.

  11. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics

    PubMed Central

    HUANG, SHUJUN; CAI, NIANGUANG; PACHECO, PEDRO PENZUTI; NARANDES, SHAVIRA; WANG, YANG; XU, WAYNE

    2017-01-01

    Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. PMID:29275361

  12. Predicting subcellular location of apoptosis proteins based on wavelet transform and support vector machine.

    PubMed

    Qiu, Jian-Ding; Luo, San-Hua; Huang, Jian-Hua; Sun, Xing-Yu; Liang, Ru-Ping

    2010-04-01

    Apoptosis proteins have a central role in the development and homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. As a result of genome and other sequencing projects, the gap between the number of known apoptosis protein sequences and the number of known apoptosis protein structures is widening rapidly. Because of this extremely unbalanced state, it would be worthwhile to develop a fast and reliable method to identify their subcellular locations so as to gain better insight into their biological functions. In view of this, a new method, in which the support vector machine combines with discrete wavelet transform, has been developed to predict the subcellular location of apoptosis proteins. The results obtained by the jackknife test were quite promising, and indicated that the proposed method can remarkably improve the prediction accuracy of subcellular locations, and might also become a useful high-throughput tool in characterizing other attributes of proteins, such as enzyme class, membrane protein type, and nuclear receptor subfamily according to their sequences.

  13. Hybrid method to predict the resonant frequencies and to characterise dual band proximity coupled microstrip antennas

    NASA Astrophysics Data System (ADS)

    Varma, Ruchi; Ghosh, Jayanta

    2018-06-01

    A new hybrid technique, which is a combination of neural network (NN) and support vector machine, is proposed for designing of different slotted dual band proximity coupled microstrip antennas. Slots on the patch are employed to produce the second resonance along with size reduction. The proposed hybrid model provides flexibility to design the dual band antennas in the frequency range from 1 to 6 GHz. This includes DCS (1.71-1.88 GHz), PCS (1.88-1.99 GHz), UMTS (1.92-2.17 GHz), LTE2300 (2.3-2.4 GHz), Bluetooth (2.4-2.485 GHz), WiMAX (3.3-3.7 GHz), and WLAN (5.15-5.35 GHz, 5.725-5.825 GHz) bands applications. Also, the comparative study of this proposed technique is done with the existing methods like knowledge based NN and support vector machine. The proposed method is found to be more accurate in terms of % error and root mean square % error and the results are in good accord with the measured values.

  14. High-performance Chinese multiclass traffic sign detection via coarse-to-fine cascade and parallel support vector machine detectors

    NASA Astrophysics Data System (ADS)

    Chang, Faliang; Liu, Chunsheng

    2017-09-01

    The high variability of sign colors and shapes in uncontrolled environments has made the detection of traffic signs a challenging problem in computer vision. We propose a traffic sign detection (TSD) method based on coarse-to-fine cascade and parallel support vector machine (SVM) detectors to detect Chinese warning and danger traffic signs. First, a region of interest (ROI) extraction method is proposed to extract ROIs using color contrast features in local regions. The ROI extraction can reduce scanning regions and save detection time. For multiclass TSD, we propose a structure that combines a coarse-to-fine cascaded tree with a parallel structure of histogram of oriented gradients (HOG) + SVM detectors. The cascaded tree is designed to detect different types of traffic signs in a coarse-to-fine process. The parallel HOG + SVM detectors are designed to do fine detection of different types of traffic signs. The experiments demonstrate the proposed TSD method can rapidly detect multiclass traffic signs with different colors and shapes in high accuracy.

  15. lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.

    PubMed

    Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia

    2015-01-01

    Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.

  16. Noninvasive prostate cancer screening based on serum surface-enhanced Raman spectroscopy and support vector machine

    NASA Astrophysics Data System (ADS)

    Li, Shaoxin; Zhang, Yanjiao; Xu, Junfa; Li, Linfang; Zeng, Qiuyao; Lin, Lin; Guo, Zhouyi; Liu, Zhiming; Xiong, Honglian; Liu, Songhao

    2014-09-01

    This study aims to present a noninvasive prostate cancer screening methods using serum surface-enhanced Raman scattering (SERS) and support vector machine (SVM) techniques through peripheral blood sample. SERS measurements are performed using serum samples from 93 prostate cancer patients and 68 healthy volunteers by silver nanoparticles. Three types of kernel functions including linear, polynomial, and Gaussian radial basis function (RBF) are employed to build SVM diagnostic models for classifying measured SERS spectra. For comparably evaluating the performance of SVM classification models, the standard multivariate statistic analysis method of principal component analysis (PCA) is also applied to classify the same datasets. The study results show that for the RBF kernel SVM diagnostic model, the diagnostic accuracy of 98.1% is acquired, which is superior to the results of 91.3% obtained from PCA methods. The receiver operating characteristic curve of diagnostic models further confirm above research results. This study demonstrates that label-free serum SERS analysis technique combined with SVM diagnostic algorithm has great potential for noninvasive prostate cancer screening.

  17. Prediction and Identification of Krüppel-Like Transcription Factors by Machine Learning Method.

    PubMed

    Liao, Zhijun; Wang, Xinrui; Chen, Xingyong; Zou, Quan

    2017-01-01

    The Krüppel-like factors (KLFs) are a family of containing Zn finger(ZF) motif transcription factors with 18 members in human genome, among them, KLF18 is predicted by bioinformatics. KLFs possess various physiological function involving in a number of cancers and other diseases. Here we perform a binary-class classification of KLFs and non-KLFs by machine learning methods. The protein sequences of KLFs and non-KLFs were searched from UniProt and randomly separate them into training dataset(containing positive and negative sequences) and test dataset(containing only negative sequences), after extracting the 188-dimensional(188D) feature vectors we carry out category with four classifiers(GBDT, libSVM, RF, and k-NN). On the human KLFs, we further dig into the evolutionary relationship and motif distribution, and finally we analyze the conserved amino acid residue of three zinc fingers. The classifier model from training dataset were well constructed, and the highest specificity(Sp) was 99.83% from a library for support vector machine(libSVM) and all the correctly classified rates were over 70% for 10-fold cross-validation on test dataset. The 18 human KLFs can be further divided into 7 groups and the zinc finger domains were located at the carboxyl terminus, and many conserved amino acid residues including Cysteine and Histidine, and the span and interval between them were consistent in the three ZF domains. Two classification models for KLFs prediction have been built by novel machine learning methods. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Energy-free machine learning force field for aluminum.

    PubMed

    Kruglov, Ivan; Sergeev, Oleg; Yanilkin, Alexey; Oganov, Artem R

    2017-08-17

    We used the machine learning technique of Li et al. (PRL 114, 2015) for molecular dynamics simulations. Atomic configurations were described by feature matrix based on internal vectors, and linear regression was used as a learning technique. We implemented this approach in the LAMMPS code. The method was applied to crystalline and liquid aluminum and uranium at different temperatures and densities, and showed the highest accuracy among different published potentials. Phonon density of states, entropy and melting temperature of aluminum were calculated using this machine learning potential. The results are in excellent agreement with experimental data and results of full ab initio calculations.

  19. A Simple Deep Learning Method for Neuronal Spike Sorting

    NASA Astrophysics Data System (ADS)

    Yang, Kai; Wu, Haifeng; Zeng, Yu

    2017-10-01

    Spike sorting is one of key technique to understand brain activity. With the development of modern electrophysiology technology, some recent multi-electrode technologies have been able to record the activity of thousands of neuronal spikes simultaneously. The spike sorting in this case will increase the computational complexity of conventional sorting algorithms. In this paper, we will focus spike sorting on how to reduce the complexity, and introduce a deep learning algorithm, principal component analysis network (PCANet) to spike sorting. The introduced method starts from a conventional model and establish a Toeplitz matrix. Through the column vectors in the matrix, we trains a PCANet, where some eigenvalue vectors of spikes could be extracted. Finally, support vector machine (SVM) is used to sort spikes. In experiments, we choose two groups of simulated data from public databases availably and compare this introduced method with conventional methods. The results indicate that the introduced method indeed has lower complexity with the same sorting errors as the conventional methods.

  20. An Android malware detection system based on machine learning

    NASA Astrophysics Data System (ADS)

    Wen, Long; Yu, Haiyang

    2017-08-01

    The Android smartphone, with its open source character and excellent performance, has attracted many users. However, the convenience of the Android platform also has motivated the development of malware. The traditional method which detects the malware based on the signature is unable to detect unknown applications. The article proposes a machine learning-based lightweight system that is capable of identifying malware on Android devices. In this system we extract features based on the static analysis and the dynamitic analysis, then a new feature selection approach based on principle component analysis (PCA) and relief are presented in the article to decrease the dimensions of the features. After that, a model will be constructed with support vector machine (SVM) for classification. Experimental results show that our system provides an effective method in Android malware detection.

  1. Detection of distorted frames in retinal video-sequences via machine learning

    NASA Astrophysics Data System (ADS)

    Kolar, Radim; Liberdova, Ivana; Odstrcilik, Jan; Hracho, Michal; Tornow, Ralf P.

    2017-07-01

    This paper describes detection of distorted frames in retinal sequences based on set of global features extracted from each frame. The feature vector is consequently used in classification step, in which three types of classifiers are tested. The best classification accuracy 96% has been achieved with support vector machine approach.

  2. Antepartum fetal heart rate feature extraction and classification using empirical mode decomposition and support vector machine

    PubMed Central

    2011-01-01

    Background Cardiotocography (CTG) is the most widely used tool for fetal surveillance. The visual analysis of fetal heart rate (FHR) traces largely depends on the expertise and experience of the clinician involved. Several approaches have been proposed for the effective interpretation of FHR. In this paper, a new approach for FHR feature extraction based on empirical mode decomposition (EMD) is proposed, which was used along with support vector machine (SVM) for the classification of FHR recordings as 'normal' or 'at risk'. Methods The FHR were recorded from 15 subjects at a sampling rate of 4 Hz and a dataset consisting of 90 randomly selected records of 20 minutes duration was formed from these. All records were labelled as 'normal' or 'at risk' by two experienced obstetricians. A training set was formed by 60 records, the remaining 30 left as the testing set. The standard deviations of the EMD components are input as features to a support vector machine (SVM) to classify FHR samples. Results For the training set, a five-fold cross validation test resulted in an accuracy of 86% whereas the overall geometric mean of sensitivity and specificity was 94.8%. The Kappa value for the training set was .923. Application of the proposed method to the testing set (30 records) resulted in a geometric mean of 81.5%. The Kappa value for the testing set was .684. Conclusions Based on the overall performance of the system it can be stated that the proposed methodology is a promising new approach for the feature extraction and classification of FHR signals. PMID:21244712

  3. Obstacle Recognition Based on Machine Learning for On-Chip LiDAR Sensors in a Cyber-Physical System

    PubMed Central

    Beruvides, Gerardo

    2017-01-01

    Collision avoidance is an important feature in advanced driver-assistance systems, aimed at providing correct, timely and reliable warnings before an imminent collision (with objects, vehicles, pedestrians, etc.). The obstacle recognition library is designed and implemented to address the design and evaluation of obstacle detection in a transportation cyber-physical system. The library is integrated into a co-simulation framework that is supported on the interaction between SCANeR software and Matlab/Simulink. From the best of the authors’ knowledge, two main contributions are reported in this paper. Firstly, the modelling and simulation of virtual on-chip light detection and ranging sensors in a cyber-physical system, for traffic scenarios, is presented. The cyber-physical system is designed and implemented in SCANeR. Secondly, three specific artificial intelligence-based methods for obstacle recognition libraries are also designed and applied using a sensory information database provided by SCANeR. The computational library has three methods for obstacle detection: a multi-layer perceptron neural network, a self-organization map and a support vector machine. Finally, a comparison among these methods under different weather conditions is presented, with very promising results in terms of accuracy. The best results are achieved using the multi-layer perceptron in sunny and foggy conditions, the support vector machine in rainy conditions and the self-organized map in snowy conditions. PMID:28906450

  4. Forecasting Solar Flares Using Magnetogram-based Predictors and Machine Learning

    NASA Astrophysics Data System (ADS)

    Florios, Kostas; Kontogiannis, Ioannis; Park, Sung-Hong; Guerra, Jordan A.; Benvenuto, Federico; Bloomfield, D. Shaun; Georgoulis, Manolis K.

    2018-02-01

    We propose a forecasting approach for solar flares based on data from Solar Cycle 24, taken by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO) mission. In particular, we use the Space-weather HMI Active Region Patches (SHARP) product that facilitates cut-out magnetograms of solar active regions (AR) in the Sun in near-realtime (NRT), taken over a five-year interval (2012 - 2016). Our approach utilizes a set of thirteen predictors, which are not included in the SHARP metadata, extracted from line-of-sight and vector photospheric magnetograms. We exploit several machine learning (ML) and conventional statistics techniques to predict flares of peak magnitude {>} M1 and {>} C1 within a 24 h forecast window. The ML methods used are multi-layer perceptrons (MLP), support vector machines (SVM), and random forests (RF). We conclude that random forests could be the prediction technique of choice for our sample, with the second-best method being multi-layer perceptrons, subject to an entropy objective function. A Monte Carlo simulation showed that the best-performing method gives accuracy ACC=0.93(0.00), true skill statistic TSS=0.74(0.02), and Heidke skill score HSS=0.49(0.01) for {>} M1 flare prediction with probability threshold 15% and ACC=0.84(0.00), TSS=0.60(0.01), and HSS=0.59(0.01) for {>} C1 flare prediction with probability threshold 35%.

  5. Obstacle Recognition Based on Machine Learning for On-Chip LiDAR Sensors in a Cyber-Physical System.

    PubMed

    Castaño, Fernando; Beruvides, Gerardo; Haber, Rodolfo E; Artuñedo, Antonio

    2017-09-14

    Collision avoidance is an important feature in advanced driver-assistance systems, aimed at providing correct, timely and reliable warnings before an imminent collision (with objects, vehicles, pedestrians, etc.). The obstacle recognition library is designed and implemented to address the design and evaluation of obstacle detection in a transportation cyber-physical system. The library is integrated into a co-simulation framework that is supported on the interaction between SCANeR software and Matlab/Simulink. From the best of the authors' knowledge, two main contributions are reported in this paper. Firstly, the modelling and simulation of virtual on-chip light detection and ranging sensors in a cyber-physical system, for traffic scenarios, is presented. The cyber-physical system is designed and implemented in SCANeR. Secondly, three specific artificial intelligence-based methods for obstacle recognition libraries are also designed and applied using a sensory information database provided by SCANeR. The computational library has three methods for obstacle detection: a multi-layer perceptron neural network, a self-organization map and a support vector machine. Finally, a comparison among these methods under different weather conditions is presented, with very promising results in terms of accuracy. The best results are achieved using the multi-layer perceptron in sunny and foggy conditions, the support vector machine in rainy conditions and the self-organized map in snowy conditions.

  6. A Novel Approach for Lie Detection Based on F-Score and Extreme Learning Machine

    PubMed Central

    Gao, Junfeng; Wang, Zhao; Yang, Yong; Zhang, Wenjia; Tao, Chunyi; Guan, Jinan; Rao, Nini

    2013-01-01

    A new machine learning method referred to as F-score_ELM was proposed to classify the lying and truth-telling using the electroencephalogram (EEG) signals from 28 guilty and innocent subjects. Thirty-one features were extracted from the probe responses from these subjects. Then, a recently-developed classifier called extreme learning machine (ELM) was combined with F-score, a simple but effective feature selection method, to jointly optimize the number of the hidden nodes of ELM and the feature subset by a grid-searching training procedure. The method was compared to two classification models combining principal component analysis with back-propagation network and support vector machine classifiers. We thoroughly assessed the performance of these classification models including the training and testing time, sensitivity and specificity from the training and testing sets, as well as network size. The experimental results showed that the number of the hidden nodes can be effectively optimized by the proposed method. Also, F-score_ELM obtained the best classification accuracy and required the shortest training and testing time. PMID:23755136

  7. Application of the support vector machine to predict subclinical mastitis in dairy cattle.

    PubMed

    Mammadova, Nazira; Keskin, Ismail

    2013-01-01

    This study presented a potentially useful alternative approach to ascertain the presence of subclinical and clinical mastitis in dairy cows using support vector machine (SVM) techniques. The proposed method detected mastitis in a cross-sectional representative sample of Holstein dairy cattle milked using an automatic milking system. The study used such suspected indicators of mastitis as lactation rank, milk yield, electrical conductivity, average milking duration, and control season as input data. The output variable was somatic cell counts obtained from milk samples collected monthly throughout the 15 months of the control period. Cattle were judged to be healthy or infected based on those somatic cell counts. This study undertook a detailed scrutiny of the SVM methodology, constructing and examining a model which showed 89% sensitivity, 92% specificity, and 50% error in mastitis detection.

  8. Salient Feature Identification and Analysis using Kernel-Based Classification Techniques for Synthetic Aperture Radar Automatic Target Recognition

    DTIC Science & Technology

    2014-03-27

    and machine learning for a range of research including such topics as medical imaging [10] and handwriting recognition [11]. The type of feature...1989. [11] C. Bahlmann, B. Haasdonk, and H. Burkhardt, “Online handwriting recognition with support vector machines-a kernel approach,” in Eighth...International Workshop on Frontiers in Handwriting Recognition, pp. 49–54, IEEE, 2002. [12] C. Cortes and V. Vapnik, “Support-vector networks,” Machine

  9. THRESHOLD ELEMENTS AND THE DESIGN OF SEQUENTIAL SWITCHING NETWORKS.

    DTIC Science & Technology

    The report covers research performed from March 1966 to March 1967. The major topics treated are: (1) methods for finding weight- threshold vectors...that realize a given switching function in multi- threshold linear logic; (2) synthesis of sequential machines by means of shift registers and simple

  10. Support vector machine-based facial-expression recognition method combining shape and appearance

    NASA Astrophysics Data System (ADS)

    Han, Eun Jung; Kang, Byung Jun; Park, Kang Ryoung; Lee, Sangyoun

    2010-11-01

    Facial expression recognition can be widely used for various applications, such as emotion-based human-machine interaction, intelligent robot interfaces, face recognition robust to expression variation, etc. Previous studies have been classified as either shape- or appearance-based recognition. The shape-based method has the disadvantage that the individual variance of facial feature points exists irrespective of similar expressions, which can cause a reduction of the recognition accuracy. The appearance-based method has a limitation in that the textural information of the face is very sensitive to variations in illumination. To overcome these problems, a new facial-expression recognition method is proposed, which combines both shape and appearance information, based on the support vector machine (SVM). This research is novel in the following three ways as compared to previous works. First, the facial feature points are automatically detected by using an active appearance model. From these, the shape-based recognition is performed by using the ratios between the facial feature points based on the facial-action coding system. Second, the SVM, which is trained to recognize the same and different expression classes, is proposed to combine two matching scores obtained from the shape- and appearance-based recognitions. Finally, a single SVM is trained to discriminate four different expressions, such as neutral, a smile, anger, and a scream. By determining the expression of the input facial image whose SVM output is at a minimum, the accuracy of the expression recognition is much enhanced. The experimental results showed that the recognition accuracy of the proposed method was better than previous researches and other fusion methods.

  11. A low cost implementation of multi-parameter patient monitor using intersection kernel support vector machine classifier

    NASA Astrophysics Data System (ADS)

    Mohan, Dhanya; Kumar, C. Santhosh

    2016-03-01

    Predicting the physiological condition (normal/abnormal) of a patient is highly desirable to enhance the quality of health care. Multi-parameter patient monitors (MPMs) using heart rate, arterial blood pressure, respiration rate and oxygen saturation (S pO2) as input parameters were developed to monitor the condition of patients, with minimum human resource utilization. The Support vector machine (SVM), an advanced machine learning approach popularly used for classification and regression is used for the realization of MPMs. For making MPMs cost effective, we experiment on the hardware implementation of the MPM using support vector machine classifier. The training of the system is done using the matlab environment and the detection of the alarm/noalarm condition is implemented in hardware. We used different kernels for SVM classification and note that the best performance was obtained using intersection kernel SVM (IKSVM). The intersection kernel support vector machine classifier MPM has outperformed the best known MPM using radial basis function kernel by an absoute improvement of 2.74% in accuracy, 1.86% in sensitivity and 3.01% in specificity. The hardware model was developed based on the improved performance system using Verilog Hardware Description Language and was implemented on Altera cyclone-II development board.

  12. New support vector machine-based method for microRNA target prediction.

    PubMed

    Li, L; Gao, Q; Mao, X; Cao, Y

    2014-06-09

    MicroRNA (miRNA) plays important roles in cell differentiation, proliferation, growth, mobility, and apoptosis. An accurate list of precise target genes is necessary in order to fully understand the importance of miRNAs in animal development and disease. Several computational methods have been proposed for miRNA target-gene identification. However, these methods still have limitations with respect to their sensitivity and accuracy. Thus, we developed a new miRNA target-prediction method based on the support vector machine (SVM) model. The model supplies information of two binding sites (primary and secondary) for a radial basis function kernel as a similarity measure for SVM features. The information is categorized based on structural, thermodynamic, and sequence conservation. Using high-confidence datasets selected from public miRNA target databases, we obtained a human miRNA target SVM classifier model with high performance and provided an efficient tool for human miRNA target gene identification. Experiments have shown that our method is a reliable tool for miRNA target-gene prediction, and a successful application of an SVM classifier. Compared with other methods, the method proposed here improves the sensitivity and accuracy of miRNA prediction. Its performance can be further improved by providing more training examples.

  13. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules.

    PubMed

    Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

    2007-12-01

    We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.

  14. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules.

    PubMed

    Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

    2007-09-01

    We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.

  15. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules

    NASA Astrophysics Data System (ADS)

    Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

    2007-12-01

    We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.

  16. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules

    NASA Astrophysics Data System (ADS)

    Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

    2007-09-01

    We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.

  17. Using support vector machines with tract-based spatial statistics for automated classification of Tourette syndrome children

    NASA Astrophysics Data System (ADS)

    Wen, Hongwei; Liu, Yue; Wang, Jieqiong; Zhang, Jishui; Peng, Yun; He, Huiguang

    2016-03-01

    Tourette syndrome (TS) is a developmental neuropsychiatric disorder with the cardinal symptoms of motor and vocal tics which emerges in early childhood and fluctuates in severity in later years. To date, the neural basis of TS is not fully understood yet and TS has a long-term prognosis that is difficult to accurately estimate. Few studies have looked at the potential of using diffusion tensor imaging (DTI) in conjunction with machine learning algorithms in order to automate the classification of healthy children and TS children. Here we apply Tract-Based Spatial Statistics (TBSS) method to 44 TS children and 48 age and gender matched healthy children in order to extract the diffusion values from each voxel in the white matter (WM) skeleton, and a feature selection algorithm (ReliefF) was used to select the most salient voxels for subsequent classification with support vector machine (SVM). We use a nested cross validation to yield an unbiased assessment of the classification method and prevent overestimation. The accuracy (88.04%), sensitivity (88.64%) and specificity (87.50%) were achieved in our method as peak performance of the SVM classifier was achieved using the axial diffusion (AD) metric, demonstrating the potential of a joint TBSS and SVM pipeline for fast, objective classification of healthy and TS children. These results support that our methods may be useful for the early identification of subjects with TS, and hold promise for predicting prognosis and treatment outcome for individuals with TS.

  18. Assessing the druggability of protein-protein interactions by a supervised machine-learning method.

    PubMed

    Sugaya, Nobuyoshi; Ikeda, Kazuyoshi

    2009-08-25

    Protein-protein interactions (PPIs) are challenging but attractive targets of small molecule drugs for therapeutic interventions of human diseases. In this era of rapid accumulation of PPI data, there is great need for a methodology that can efficiently select drug target PPIs by holistically assessing the druggability of PPIs. To address this need, we propose here a novel approach based on a supervised machine-learning method, support vector machine (SVM). To assess the druggability of the PPIs, 69 attributes were selected to cover a wide range of structural, drug and chemical, and functional information on the PPIs. These attributes were used as feature vectors in the SVM-based method. Thirty PPIs known to be druggable were carefully selected from previous studies; these were used as positive instances. Our approach was applied to 1,295 human PPIs with tertiary structures of their protein complexes already solved. The best SVM model constructed discriminated the already-known target PPIs from others at an accuracy of 81% (sensitivity, 82%; specificity, 79%) in cross-validation. Among the attributes, the two with the greatest discriminative power in the best SVM model were the number of interacting proteins and the number of pathways. Using the model, we predicted several promising candidates for druggable PPIs, such as SMAD4/SKI. As more PPI data are accumulated in the near future, our method will have increased ability to accelerate the discovery of druggable PPIs.

  19. Thai Language Sentence Similarity Computation Based on Syntactic Structure and Semantic Vector

    NASA Astrophysics Data System (ADS)

    Wang, Hongbin; Feng, Yinhan; Cheng, Liang

    2018-03-01

    Sentence similarity computation plays an increasingly important role in text mining, Web page retrieval, machine translation, speech recognition and question answering systems. Thai language as a kind of resources scarce language, it is not like Chinese language with HowNet and CiLin resources. So the Thai sentence similarity research faces some challenges. In order to solve this problem of the Thai language sentence similarity computation. This paper proposes a novel method to compute the similarity of Thai language sentence based on syntactic structure and semantic vector. This method firstly uses the Part-of-Speech (POS) dependency to calculate two sentences syntactic structure similarity, and then through the word vector to calculate two sentences semantic similarity. Finally, we combine the two methods to calculate two Thai language sentences similarity. The proposed method not only considers semantic, but also considers the sentence syntactic structure. The experiment result shows that this method in Thai language sentence similarity computation is feasible.

  20. A Novel Degradation Identification Method for Wind Turbine Pitch System

    NASA Astrophysics Data System (ADS)

    Guo, Hui-Dong

    2018-04-01

    It’s difficult for traditional threshold value method to identify degradation of operating equipment accurately. An novel degradation evaluation method suitable for wind turbine condition maintenance strategy implementation was proposed in this paper. Based on the analysis of typical variable-speed pitch-to-feather control principle and monitoring parameters for pitch system, a multi input multi output (MIMO) regression model was applied to pitch system, where wind speed, power generation regarding as input parameters, wheel rotation speed, pitch angle and motor driving currency for three blades as output parameters. Then, the difference between the on-line measurement and the calculated value from the MIMO regression model applying least square support vector machines (LSSVM) method was defined as the Observed Vector of the system. The Gaussian mixture model (GMM) was applied to fitting the distribution of the multi dimension Observed Vectors. Applying the model established, the Degradation Index was calculated using the SCADA data of a wind turbine damaged its pitch bearing retainer and rolling body, which illustrated the feasibility of the provided method.

  1. Predicting High Imaging Utilization Based on Initial Radiology Reports: A Feasibility Study of Machine Learning.

    PubMed

    Hassanpour, Saeed; Langlotz, Curtis P

    2016-01-01

    Imaging utilization has significantly increased over the last two decades, and is only recently showing signs of moderating. To help healthcare providers identify patients at risk for high imaging utilization, we developed a prediction model to recognize high imaging utilizers based on their initial imaging reports. The prediction model uses a machine learning text classification framework. In this study, we used radiology reports from 18,384 patients with at least one abdomen computed tomography study in their imaging record at Stanford Health Care as the training set. We modeled the radiology reports in a vector space and trained a support vector machine classifier for this prediction task. We evaluated our model on a separate test set of 4791 patients. In addition to high prediction accuracy, in our method, we aimed at achieving high specificity to identify patients at high risk for high imaging utilization. Our results (accuracy: 94.0%, sensitivity: 74.4%, specificity: 97.9%, positive predictive value: 87.3%, negative predictive value: 95.1%) show that a prediction model can enable healthcare providers to identify in advance patients who are likely to be high utilizers of imaging services. Machine learning classifiers developed from narrative radiology reports are feasible methods to predict imaging utilization. Such systems can be used to identify high utilizers, inform future image ordering behavior, and encourage judicious use of imaging. Copyright © 2016 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.

  2. Analysis of algae growth mechanism and water bloom prediction under the effect of multi-affecting factor.

    PubMed

    Wang, Li; Wang, Xiaoyi; Jin, Xuebo; Xu, Jiping; Zhang, Huiyan; Yu, Jiabin; Sun, Qian; Gao, Chong; Wang, Lingbin

    2017-03-01

    The formation process of algae is described inaccurately and water blooms are predicted with a low precision by current methods. In this paper, chemical mechanism of algae growth is analyzed, and a correlation analysis of chlorophyll-a and algal density is conducted by chemical measurement. Taking into account the influence of multi-factors on algae growth and water blooms, the comprehensive prediction method combined with multivariate time series and intelligent model is put forward in this paper. Firstly, through the process of photosynthesis, the main factors that affect the reproduction of the algae are analyzed. A compensation prediction method of multivariate time series analysis based on neural network and Support Vector Machine has been put forward which is combined with Kernel Principal Component Analysis to deal with dimension reduction of the influence factors of blooms. Then, Genetic Algorithm is applied to improve the generalization ability of the BP network and Least Squares Support Vector Machine. Experimental results show that this method could better compensate the prediction model of multivariate time series analysis which is an effective way to improve the description accuracy of algae growth and prediction precision of water blooms.

  3. Quantitative structure-retention relationship models for the prediction of the reversed-phase HPLC gradient retention based on the heuristic method and support vector machine.

    PubMed

    Du, Hongying; Wang, Jie; Yao, Xiaojun; Hu, Zhide

    2009-01-01

    The heuristic method (HM) and support vector machine (SVM) were used to construct quantitative structure-retention relationship models by a series of compounds to predict the gradient retention times of reversed-phase high-performance liquid chromatography (HPLC) in three different columns. The aims of this investigation were to predict the retention times of multifarious compounds, to find the main properties of the three columns, and to indicate the theory of separation procedures. In our method, we correlated the retention times of many diverse structural analytes in three columns (Symmetry C18, Chromolith, and SG-MIX) with their representative molecular descriptors, calculated from the molecular structures alone. HM was used to select the most important molecular descriptors and build linear regression models. Furthermore, non-linear regression models were built using the SVM method; the performance of the SVM models were better than that of the HM models, and the prediction results were in good agreement with the experimental values. This paper could give some insights into the factors that were likely to govern the gradient retention process of the three investigated HPLC columns, which could theoretically supervise the practical experiment.

  4. A prediction model of drug-induced ototoxicity developed by an optimal support vector machine (SVM) method.

    PubMed

    Zhou, Shu; Li, Guo-Bo; Huang, Lu-Yi; Xie, Huan-Zhang; Zhao, Ying-Lan; Chen, Yu-Zong; Li, Lin-Li; Yang, Sheng-Yong

    2014-08-01

    Drug-induced ototoxicity, as a toxic side effect, is an important issue needed to be considered in drug discovery. Nevertheless, current experimental methods used to evaluate drug-induced ototoxicity are often time-consuming and expensive, indicating that they are not suitable for a large-scale evaluation of drug-induced ototoxicity in the early stage of drug discovery. We thus, in this investigation, established an effective computational prediction model of drug-induced ototoxicity using an optimal support vector machine (SVM) method, GA-CG-SVM. Three GA-CG-SVM models were developed based on three training sets containing agents bearing different risk levels of drug-induced ototoxicity. For comparison, models based on naïve Bayesian (NB) and recursive partitioning (RP) methods were also used on the same training sets. Among all the prediction models, the GA-CG-SVM model II showed the best performance, which offered prediction accuracies of 85.33% and 83.05% for two independent test sets, respectively. Overall, the good performance of the GA-CG-SVM model II indicates that it could be used for the prediction of drug-induced ototoxicity in the early stage of drug discovery. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. A Genetic Algorithm Based Support Vector Machine Model for Blood-Brain Barrier Penetration Prediction

    PubMed Central

    Zhang, Daqing; Xiao, Jianfeng; Zhou, Nannan; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian

    2015-01-01

    Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. PMID:26504797

  6. Geometric and computer-aided spline hob modeling

    NASA Astrophysics Data System (ADS)

    Brailov, I. G.; Myasoedova, T. M.; Panchuk, K. L.; Krysova, I. V.; Rogoza, YU A.

    2018-03-01

    The paper considers acquiring the spline hob geometric model. The objective of the research is the development of a mathematical model of spline hob for spline shaft machining. The structure of the spline hob is described taking into consideration the motion in parameters of the machine tool system of cutting edge positioning and orientation. Computer-aided study is performed with the use of CAD and on the basis of 3D modeling methods. Vector representation of cutting edge geometry is accepted as the principal method of spline hob mathematical model development. The paper defines the correlations described by parametric vector functions representing helical cutting edges designed for spline shaft machining with consideration for helical movement in two dimensions. An application for acquiring the 3D model of spline hob is developed on the basis of AutoLISP for AutoCAD environment. The application presents the opportunity for the use of the acquired model for milling process imitation. An example of evaluation, analytical representation and computer modeling of the proposed geometrical model is reviewed. In the mentioned example, a calculation of key spline hob parameters assuring the capability of hobbing a spline shaft of standard design is performed. The polygonal and solid spline hob 3D models are acquired by the use of imitational computer modeling.

  7. Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.

    PubMed

    Hernandez, Troy; Yang, Jie

    2016-10-01

    The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request.

  8. Study on Temperature and Synthetic Compensation of Piezo-Resistive Differential Pressure Sensors by Coupled Simulated Annealing and Simplex Optimized Kernel Extreme Learning Machine

    PubMed Central

    Li, Ji; Hu, Guoqing; Zhou, Yonghong; Zou, Chong; Peng, Wei; Alam SM, Jahangir

    2017-01-01

    As a high performance-cost ratio solution for differential pressure measurement, piezo-resistive differential pressure sensors are widely used in engineering processes. However, their performance is severely affected by the environmental temperature and the static pressure applied to them. In order to modify the non-linear measuring characteristics of the piezo-resistive differential pressure sensor, compensation actions should synthetically consider these two aspects. Advantages such as nonlinear approximation capability, highly desirable generalization ability and computational efficiency make the kernel extreme learning machine (KELM) a practical approach for this critical task. Since the KELM model is intrinsically sensitive to the regularization parameter and the kernel parameter, a searching scheme combining the coupled simulated annealing (CSA) algorithm and the Nelder-Mead simplex algorithm is adopted to find an optimal KLEM parameter set. A calibration experiment at different working pressure levels was conducted within the temperature range to assess the proposed method. In comparison with other compensation models such as the back-propagation neural network (BP), radius basis neural network (RBF), particle swarm optimization optimized support vector machine (PSO-SVM), particle swarm optimization optimized least squares support vector machine (PSO-LSSVM) and extreme learning machine (ELM), the compensation results show that the presented compensation algorithm exhibits a more satisfactory performance with respect to temperature compensation and synthetic compensation problems. PMID:28422080

  9. Study on Temperature and Synthetic Compensation of Piezo-Resistive Differential Pressure Sensors by Coupled Simulated Annealing and Simplex Optimized Kernel Extreme Learning Machine.

    PubMed

    Li, Ji; Hu, Guoqing; Zhou, Yonghong; Zou, Chong; Peng, Wei; Alam Sm, Jahangir

    2017-04-19

    As a high performance-cost ratio solution for differential pressure measurement, piezo-resistive differential pressure sensors are widely used in engineering processes. However, their performance is severely affected by the environmental temperature and the static pressure applied to them. In order to modify the non-linear measuring characteristics of the piezo-resistive differential pressure sensor, compensation actions should synthetically consider these two aspects. Advantages such as nonlinear approximation capability, highly desirable generalization ability and computational efficiency make the kernel extreme learning machine (KELM) a practical approach for this critical task. Since the KELM model is intrinsically sensitive to the regularization parameter and the kernel parameter, a searching scheme combining the coupled simulated annealing (CSA) algorithm and the Nelder-Mead simplex algorithm is adopted to find an optimal KLEM parameter set. A calibration experiment at different working pressure levels was conducted within the temperature range to assess the proposed method. In comparison with other compensation models such as the back-propagation neural network (BP), radius basis neural network (RBF), particle swarm optimization optimized support vector machine (PSO-SVM), particle swarm optimization optimized least squares support vector machine (PSO-LSSVM) and extreme learning machine (ELM), the compensation results show that the presented compensation algorithm exhibits a more satisfactory performance with respect to temperature compensation and synthetic compensation problems.

  10. Machine-learning in grading of gliomas based on multi-parametric magnetic resonance imaging at 3T.

    PubMed

    Citak-Er, Fusun; Firat, Zeynep; Kovanlikaya, Ilhami; Ture, Ugur; Ozturk-Isik, Esin

    2018-06-15

    The objective of this study was to assess the contribution of multi-parametric (mp) magnetic resonance imaging (MRI) quantitative features in the machine learning-based grading of gliomas with a multi-region-of-interests approach. Forty-three patients who were newly diagnosed as having a glioma were included in this study. The patients were scanned prior to any therapy using a standard brain tumor magnetic resonance (MR) imaging protocol that included T1 and T2-weighted, diffusion-weighted, diffusion tensor, MR perfusion and MR spectroscopic imaging. Three different regions-of-interest were drawn for each subject to encompass tumor, immediate tumor periphery, and distant peritumoral edema/normal. The normalized mp-MRI features were used to build machine-learning models for differentiating low-grade gliomas (WHO grades I and II) from high grades (WHO grades III and IV). In order to assess the contribution of regional mp-MRI quantitative features to the classification models, a support vector machine-based recursive feature elimination method was applied prior to classification. A machine-learning model based on support vector machine algorithm with linear kernel achieved an accuracy of 93.0%, a specificity of 86.7%, and a sensitivity of 96.4% for the grading of gliomas using ten-fold cross validation based on the proposed subset of the mp-MRI features. In this study, machine-learning based on multiregional and multi-parametric MRI data has proven to be an important tool in grading glial tumors accurately even in this limited patient population. Future studies are needed to investigate the use of machine learning algorithms for brain tumor classification in a larger patient cohort. Copyright © 2018. Published by Elsevier Ltd.

  11. A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.; Markos, A. T.

    1975-01-01

    A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

  12. Support Vector Machines: Relevance Feedback and Information Retrieval.

    ERIC Educational Resources Information Center

    Drucker, Harris; Shahrary, Behzad; Gibbon, David C.

    2002-01-01

    Compares support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred. Includes nine tables. (Contains 24…

  13. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.

    PubMed

    Huang, Shujun; Cai, Nianguang; Pacheco, Pedro Penzuti; Narrandes, Shavira; Wang, Yang; Xu, Wayne

    2018-01-01

    Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. Copyright© 2018, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.

  14. nu-Anomica: A Fast Support Vector Based Novelty Detection Technique

    NASA Technical Reports Server (NTRS)

    Das, Santanu; Bhaduri, Kanishka; Oza, Nikunj C.; Srivastava, Ashok N.

    2009-01-01

    In this paper we propose nu-Anomica, a novel anomaly detection technique that can be trained on huge data sets with much reduced running time compared to the benchmark one-class Support Vector Machines algorithm. In -Anomica, the idea is to train the machine such that it can provide a close approximation to the exact decision plane using fewer training points and without losing much of the generalization performance of the classical approach. We have tested the proposed algorithm on a variety of continuous data sets under different conditions. We show that under all test conditions the developed procedure closely preserves the accuracy of standard one-class Support Vector Machines while reducing both the training time and the test time by 5 - 20 times.

  15. Support vector machine for automatic pain recognition

    NASA Astrophysics Data System (ADS)

    Monwar, Md Maruf; Rezaei, Siamak

    2009-02-01

    Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.

  16. Semi-supervised prediction of gene regulatory networks using machine learning algorithms.

    PubMed

    Patel, Nihir; Wang, Jason T L

    2015-10-01

    Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.

  17. Vowel Imagery Decoding toward Silent Speech BCI Using Extreme Learning Machine with Electroencephalogram

    PubMed Central

    Kim, Jongin; Park, Hyeong-jun

    2016-01-01

    The purpose of this study is to classify EEG data on imagined speech in a single trial. We recorded EEG data while five subjects imagined different vowels, /a/, /e/, /i/, /o/, and /u/. We divided each single trial dataset into thirty segments and extracted features (mean, variance, standard deviation, and skewness) from all segments. To reduce the dimension of the feature vector, we applied a feature selection algorithm based on the sparse regression model. These features were classified using a support vector machine with a radial basis function kernel, an extreme learning machine, and two variants of an extreme learning machine with different kernels. Because each single trial consisted of thirty segments, our algorithm decided the label of the single trial by selecting the most frequent output among the outputs of the thirty segments. As a result, we observed that the extreme learning machine and its variants achieved better classification rates than the support vector machine with a radial basis function kernel and linear discrimination analysis. Thus, our results suggested that EEG responses to imagined speech could be successfully classified in a single trial using an extreme learning machine with a radial basis function and linear kernel. This study with classification of imagined speech might contribute to the development of silent speech BCI systems. PMID:28097128

  18. An Improved TA-SVM Method Without Matrix Inversion and Its Fast Implementation for Nonstationary Datasets.

    PubMed

    Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong

    2015-09-01

    Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Webb-Robertson, Bobbie-Jo M.

    Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptidemore » identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample« less

  20. Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature

    PubMed Central

    Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar

    2017-01-01

    Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems. PMID:29099838

  1. Application of support vector machines for copper potential mapping in Kerman region, Iran

    NASA Astrophysics Data System (ADS)

    Shabankareh, Mahdi; Hezarkhani, Ardeshir

    2017-04-01

    The first step in systematic exploration studies is mineral potential mapping, which involves classification of the study area to favorable and unfavorable parts. Support vector machines (SVM) are designed for supervised classification based on statistical learning theory. This method named support vector classification (SVC). This paper describes SVC model, which combine exploration data in the regional-scale for copper potential mapping in Kerman copper bearing belt in south of Iran. Data layers or evidential maps were in six datasets namely lithology, tectonic, airborne geophysics, ferric alteration, hydroxide alteration and geochemistry. The SVC modeling result selected 2220 pixels as favorable zones, approximately 25 percent of the study area. Besides, 66 out of 86 copper indices, approximately 78.6% of all, were located in favorable zones. Other main goal of this study was to determine how each input affects favorable output. For this purpose, the histogram of each normalized input data to its favorable output was drawn. The histograms of each input dataset for favorable output showed that each information layer had a certain pattern. These patterns of SVC results could be considered as regional copper exploration characteristics.

  2. Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature.

    PubMed

    Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar

    2017-01-01

    Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems.

  3. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set.

    PubMed

    Lenselink, Eelke B; Ten Dijke, Niels; Bongers, Brandon; Papadatos, George; van Vlijmen, Herman W T; Kowalczyk, Wojtek; IJzerman, Adriaan P; van Westen, Gerard J P

    2017-08-14

    The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics. In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution. Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method ('DNN_PCM') performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi-task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized 'DNN_PCM'). Here, a standardized set to test and evaluate different machine learning algorithms in the context of multi-task learning is offered by providing the data and the protocols. Graphical Abstract .

  4. Relevance Vector Machine and Support Vector Machine Classifier Analysis of Scanning Laser Polarimetry Retinal Nerve Fiber Layer Measurements

    PubMed Central

    Bowd, Christopher; Medeiros, Felipe A.; Zhang, Zuohua; Zangwill, Linda M.; Hao, Jiucang; Lee, Te-Won; Sejnowski, Terrence J.; Weinreb, Robert N.; Goldbaum, Michael H.

    2010-01-01

    Purpose To classify healthy and glaucomatous eyes using relevance vector machine (RVM) and support vector machine (SVM) learning classifiers trained on retinal nerve fiber layer (RNFL) thickness measurements obtained by scanning laser polarimetry (SLP). Methods Seventy-two eyes of 72 healthy control subjects (average age = 64.3 ± 8.8 years, visual field mean deviation =−0.71 ± 1.2 dB) and 92 eyes of 92 patients with glaucoma (average age = 66.9 ± 8.9 years, visual field mean deviation =−5.32 ± 4.0 dB) were imaged with SLP with variable corneal compensation (GDx VCC; Laser Diagnostic Technologies, San Diego, CA). RVM and SVM learning classifiers were trained and tested on SLP-determined RNFL thickness measurements from 14 standard parameters and 64 sectors (approximately 5.6° each) obtained in the circumpapillary area under the instrument-defined measurement ellipse (total 78 parameters). Tenfold cross-validation was used to train and test RVM and SVM classifiers on unique subsets of the full 164-eye data set and areas under the receiver operating characteristic (AUROC) curve for the classification of eyes in the test set were generated. AUROC curve results from RVM and SVM were compared to those for 14 SLP software-generated global and regional RNFL thickness parameters. Also reported was the AUROC curve for the GDx VCC software-generated nerve fiber indicator (NFI). Results The AUROC curves for RVM and SVM were 0.90 and 0.91, respectively, and increased to 0.93 and 0.94 when the training sets were optimized with sequential forward and backward selection (resulting in reduced dimensional data sets). AUROC curves for optimized RVM and SVM were significantly larger than those for all individual SLP parameters. The AUROC curve for the NFI was 0.87. Conclusions Results from RVM and SVM trained on SLP RNFL thickness measurements are similar and provide accurate classification of glaucomatous and healthy eyes. RVM may be preferable to SVM, because it provides a Bayesian-derived probability of glaucoma as an output. These results suggest that these machine learning classifiers show good potential for glaucoma diagnosis. PMID:15790898

  5. Using machine learning for sequence-level automated MRI protocol selection in neuroradiology.

    PubMed

    Brown, Andrew D; Marotta, Thomas R

    2018-05-01

    Incorrect imaging protocol selection can lead to important clinical findings being missed, contributing to both wasted health care resources and patient harm. We present a machine learning method for analyzing the unstructured text of clinical indications and patient demographics from magnetic resonance imaging (MRI) orders to automatically protocol MRI procedures at the sequence level. We compared 3 machine learning models - support vector machine, gradient boosting machine, and random forest - to a baseline model that predicted the most common protocol for all observations in our test set. The gradient boosting machine model significantly outperformed the baseline and demonstrated the best performance of the 3 models in terms of accuracy (95%), precision (86%), recall (80%), and Hamming loss (0.0487). This demonstrates the feasibility of automating sequence selection by applying machine learning to MRI orders. Automated sequence selection has important safety, quality, and financial implications and may facilitate improvements in the quality and safety of medical imaging service delivery.

  6. Diffuse Interface Methods for Multiclass Segmentation of High-Dimensional Data

    DTIC Science & Technology

    2014-03-04

    handwritten digits , 1998. http://yann.lecun.com/exdb/mnist/. [19] S. Nene, S. Nayar, H. Murase, Columbia Object Image Library (COIL-100), Technical Report... recognition on smartphones using a multiclass hardware-friendly support vector machine, in: Ambient Assisted Living and Home Care, Springer, 2012, pp. 216–223.

  7. SVM-based feature extraction and classification of aflatoxin contaminated corn using fluorescence hyperspectral data

    USDA-ARS?s Scientific Manuscript database

    Support Vector Machine (SVM) was used in the Genetic Algorithms (GA) process to select and classify a subset of hyperspectral image bands. The method was applied to fluorescence hyperspectral data for the detection of aflatoxin contamination in Aspergillus flavus infected single corn kernels. In the...

  8. Evaluating the statistical performance of less applied algorithms in classification of worldview-3 imagery data in an urbanized landscape

    NASA Astrophysics Data System (ADS)

    Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa

    2018-03-01

    In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.

  9. Classification of different kinds of pesticide residues on lettuce based on fluorescence spectra and WT-BCC-SVM algorithm

    NASA Astrophysics Data System (ADS)

    Zhou, Xin; Jun, Sun; Zhang, Bing; Jun, Wu

    2017-07-01

    In order to improve the reliability of the spectrum feature extracted by wavelet transform, a method combining wavelet transform (WT) with bacterial colony chemotaxis algorithm and support vector machine (BCC-SVM) algorithm (WT-BCC-SVM) was proposed in this paper. Besides, we aimed to identify different kinds of pesticide residues on lettuce leaves in a novel and rapid non-destructive way by using fluorescence spectra technology. The fluorescence spectral data of 150 lettuce leaf samples of five different kinds of pesticide residues on the surface of lettuce were obtained using Cary Eclipse fluorescence spectrometer. Standard normalized variable detrending (SNV detrending), Savitzky-Golay coupled with Standard normalized variable detrending (SG-SNV detrending) were used to preprocess the raw spectra, respectively. Bacterial colony chemotaxis combined with support vector machine (BCC-SVM) and support vector machine (SVM) classification models were established based on full spectra (FS) and wavelet transform characteristics (WTC), respectively. Moreover, WTC were selected by WT. The results showed that the accuracy of training set, calibration set and the prediction set of the best optimal classification model (SG-SNV detrending-WT-BCC-SVM) were 100%, 98% and 93.33%, respectively. In addition, the results indicated that it was feasible to use WT-BCC-SVM to establish diagnostic model of different kinds of pesticide residues on lettuce leaves.

  10. Fuzzy support vector machine for microarray imbalanced data classification

    NASA Astrophysics Data System (ADS)

    Ladayya, Faroh; Purnami, Santi Wulan; Irhamah

    2017-11-01

    DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different contributions to the classifier. The minority classes have large fuzzy membership so FSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is a high dimensional data with a very large number of features, it is necessary to do feature selection first using Fast Correlation based Filter (FCBF). In this study will be analyzed by SVM, FSVM and both methods by applying FCBF and get the classification performance of them. Based on the overall results, FSVM on selected features has the best classification performance compared to SVM.

  11. Comparing SVM and ANN based Machine Learning Methods for Species Identification of Food Contaminating Beetles.

    PubMed

    Bisgin, Halil; Bera, Tanmay; Ding, Hongjian; Semey, Howard G; Wu, Leihong; Liu, Zhichao; Barnes, Amy E; Langley, Darryl A; Pava-Ripoll, Monica; Vyas, Himansu J; Tong, Weida; Xu, Joshua

    2018-04-25

    Insect pests, such as pantry beetles, are often associated with food contaminations and public health risks. Machine learning has the potential to provide a more accurate and efficient solution in detecting their presence in food products, which is currently done manually. In our previous research, we demonstrated such feasibility where Artificial Neural Network (ANN) based pattern recognition techniques could be implemented for species identification in the context of food safety. In this study, we present a Support Vector Machine (SVM) model which improved the average accuracy up to 85%. Contrary to this, the ANN method yielded ~80% accuracy after extensive parameter optimization. Both methods showed excellent genus level identification, but SVM showed slightly better accuracy  for most species. Highly accurate species level identification remains a challenge, especially in distinguishing between species from the same genus which may require improvements in both imaging and machine learning techniques. In summary, our work does illustrate a new SVM based technique and provides a good comparison with the ANN model in our context. We believe such insights will pave better way forward for the application of machine learning towards species identification and food safety.

  12. Identifying Green Infrastructure from Social Media and Crowdsourcing- An Image Based Machine-Learning Approach.

    NASA Astrophysics Data System (ADS)

    Rai, A.; Minsker, B. S.

    2016-12-01

    In this work we introduce a novel dataset GRID: GReen Infrastructure Detection Dataset and a framework for identifying urban green storm water infrastructure (GI) designs (wetlands/ponds, urban trees, and rain gardens/bioswales) from social media and satellite aerial images using computer vision and machine learning methods. Along with the hydrologic benefits of GI, such as reducing runoff volumes and urban heat islands, GI also provides important socio-economic benefits such as stress recovery and community cohesion. However, GI is installed by many different parties and cities typically do not know where GI is located, making study of its impacts or siting new GI difficult. We use object recognition learning methods (template matching, sliding window approach, and Random Hough Forest method) and supervised machine learning algorithms (e.g., support vector machines) as initial screening approaches to detect potential GI sites, which can then be investigated in more detail using on-site surveys. Training data were collected from GPS locations of Flickr and Instagram image postings and Amazon Mechanical Turk identification of each GI type. Sliding window method outperformed other methods and achieved an average F measure, which is combined metric for precision and recall performance measure of 0.78.

  13. Improved Correction of Atmospheric Pressure Data Obtained by Smartphones through Machine Learning

    PubMed Central

    Kim, Yong-Hyuk; Ha, Ji-Hun; Kim, Na-Young; Im, Hyo-Hyuc; Sim, Sangjin; Choi, Reno K. Y.

    2016-01-01

    A correction method using machine learning aims to improve the conventional linear regression (LR) based method for correction of atmospheric pressure data obtained by smartphones. The method proposed in this study conducts clustering and regression analysis with time domain classification. Data obtained in Gyeonggi-do, one of the most populous provinces in South Korea surrounding Seoul with the size of 10,000 km2, from July 2014 through December 2014, using smartphones were classified with respect to time of day (daytime or nighttime) as well as day of the week (weekday or weekend) and the user's mobility, prior to the expectation-maximization (EM) clustering. Subsequently, the results were analyzed for comparison by applying machine learning methods such as multilayer perceptron (MLP) and support vector regression (SVR). The results showed a mean absolute error (MAE) 26% lower on average when regression analysis was performed through EM clustering compared to that obtained without EM clustering. For machine learning methods, the MAE for SVR was around 31% lower for LR and about 19% lower for MLP. It is concluded that pressure data from smartphones are as good as the ones from national automatic weather station (AWS) network. PMID:27524999

  14. Osteoporosis risk prediction using machine learning and conventional methods.

    PubMed

    Kim, Sung Kean; Yoo, Tae Keun; Oh, Ein; Kim, Deok Won

    2013-01-01

    A number of clinical decision tools for osteoporosis risk assessment have been developed to select postmenopausal women for the measurement of bone mineral density. We developed and validated machine learning models with the aim of more accurately identifying the risk of osteoporosis in postmenopausal women, and compared with the ability of a conventional clinical decision tool, osteoporosis self-assessment tool (OST). We collected medical records from Korean postmenopausal women based on the Korea National Health and Nutrition Surveys (KNHANES V-1). The training data set was used to construct models based on popular machine learning algorithms such as support vector machines (SVM), random forests (RF), artificial neural networks (ANN), and logistic regression (LR) based on various predictors associated with low bone density. The learning models were compared with OST. SVM had significantly better area under the curve (AUC) of the receiver operating characteristic (ROC) than ANN, LR, and OST. Validation on the test set showed that SVM predicted osteoporosis risk with an AUC of 0.827, accuracy of 76.7%, sensitivity of 77.8%, and specificity of 76.0%. We were the first to perform comparisons of the performance of osteoporosis prediction between the machine learning and conventional methods using population-based epidemiological data. The machine learning methods may be effective tools for identifying postmenopausal women at high risk for osteoporosis.

  15. A Fast Reduced Kernel Extreme Learning Machine.

    PubMed

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Improving the Accuracy and Training Speed of Motor Imagery Brain-Computer Interfaces Using Wavelet-Based Combined Feature Vectors and Gaussian Mixture Model-Supervectors.

    PubMed

    Lee, David; Park, Sang-Hoon; Lee, Sang-Goog

    2017-10-07

    In this paper, we propose a set of wavelet-based combined feature vectors and a Gaussian mixture model (GMM)-supervector to enhance training speed and classification accuracy in motor imagery brain-computer interfaces. The proposed method is configured as follows: first, wavelet transforms are applied to extract the feature vectors for identification of motor imagery electroencephalography (EEG) and principal component analyses are used to reduce the dimensionality of the feature vectors and linearly combine them. Subsequently, the GMM universal background model is trained by the expectation-maximization (EM) algorithm to purify the training data and reduce its size. Finally, a purified and reduced GMM-supervector is used to train the support vector machine classifier. The performance of the proposed method was evaluated for three different motor imagery datasets in terms of accuracy, kappa, mutual information, and computation time, and compared with the state-of-the-art algorithms. The results from the study indicate that the proposed method achieves high accuracy with a small amount of training data compared with the state-of-the-art algorithms in motor imagery EEG classification.

  17. A Novelty Design Of Minimization Of Electrical Losses In A Vector Controlled Induction Machine Drive

    NASA Astrophysics Data System (ADS)

    Aryza, Solly; Irwanto, M.; Lubis, Zulkarnain; Putera Utama Siahaan, Andysah; Rahim, Robbi; Furqan, Mhd.

    2018-01-01

    The induction motor has in the industry . More attention has been a focus to develop and design of induction motor drive. With the method of vector control novelty prove the efficiency of induction motor over their entire speed range. In this paper desirable to design a loss minimization controller which can improve the efficiency. Also, this research described Modeling of an induction motor with core loss included. Realization of methods vector control for an induction motor drive with loss element included. The case of the loss minimization condition. The procedure was successful to calculate the gains of a PI controller. Though the problem of obtaining a robust and sensorless induction motor drive is by no means completely solved, the results obtained as part of this work point in a promising direction.

  18. On the sparseness of 1-norm support vector machines.

    PubMed

    Zhang, Li; Zhou, Weida

    2010-04-01

    There is some empirical evidence available showing that 1-norm Support Vector Machines (1-norm SVMs) have good sparseness; however, both how good sparseness 1-norm SVMs can reach and whether they have a sparser representation than that of standard SVMs are not clear. In this paper we take into account the sparseness of 1-norm SVMs. Two upper bounds on the number of nonzero coefficients in the decision function of 1-norm SVMs are presented. First, the number of nonzero coefficients in 1-norm SVMs is at most equal to the number of only the exact support vectors lying on the +1 and -1 discriminating surfaces, while that in standard SVMs is equal to the number of support vectors, which implies that 1-norm SVMs have better sparseness than that of standard SVMs. Second, the number of nonzero coefficients is at most equal to the rank of the sample matrix. A brief review of the geometry of linear programming and the primal steepest edge pricing simplex method are given, which allows us to provide the proof of the two upper bounds and evaluate their tightness by experiments. Experimental results on toy data sets and the UCI data sets illustrate our analysis. Copyright 2009 Elsevier Ltd. All rights reserved.

  19. Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context.

    PubMed

    Wang, Yong-Cui; Wang, Yong; Yang, Zhi-Xia; Deng, Nai-Yang

    2011-06-20

    Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism. In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew's correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships. Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.

  20. Wavelet images and Chou's pseudo amino acid composition for protein classification.

    PubMed

    Nanni, Loris; Brahnam, Sheryl; Lumini, Alessandra

    2012-08-01

    The last decade has seen an explosion in the collection of protein data. To actualize the potential offered by this wealth of data, it is important to develop machine systems capable of classifying and extracting features from proteins. Reliable machine systems for protein classification offer many benefits, including the promise of finding novel drugs and vaccines. In developing our system, we analyze and compare several feature extraction methods used in protein classification that are based on the calculation of texture descriptors starting from a wavelet representation of the protein. We then feed these texture-based representations of the protein into an Adaboost ensemble of neural network or a support vector machine classifier. In addition, we perform experiments that combine our feature extraction methods with a standard method that is based on the Chou's pseudo amino acid composition. Using several datasets, we show that our best approach outperforms standard methods. The Matlab code of the proposed protein descriptors is available at http://bias.csr.unibo.it/nanni/wave.rar .

  1. Vibration Sensor Monitoring of Nickel-Titanium Alloy Turning for Machinability Evaluation.

    PubMed

    Segreto, Tiziana; Caggiano, Alessandra; Karam, Sara; Teti, Roberto

    2017-12-12

    Nickel-Titanium (Ni-Ti) alloys are very difficult-to-machine materials causing notable manufacturing problems due to their unique mechanical properties, including superelasticity, high ductility, and severe strain-hardening. In this framework, the aim of this paper is to assess the machinability of Ni-Ti alloys with reference to turning processes in order to realize a reliable and robust in-process identification of machinability conditions. An on-line sensor monitoring procedure based on the acquisition of vibration signals was implemented during the experimental turning tests. The detected vibration sensorial data were processed through an advanced signal processing method in time-frequency domain based on wavelet packet transform (WPT). The extracted sensorial features were used to construct WPT pattern feature vectors to send as input to suitably configured neural networks (NNs) for cognitive pattern recognition in order to evaluate the correlation between input sensorial information and output machinability conditions.

  2. Vibration Sensor Monitoring of Nickel-Titanium Alloy Turning for Machinability Evaluation

    PubMed Central

    Segreto, Tiziana; Karam, Sara; Teti, Roberto

    2017-01-01

    Nickel-Titanium (Ni-Ti) alloys are very difficult-to-machine materials causing notable manufacturing problems due to their unique mechanical properties, including superelasticity, high ductility, and severe strain-hardening. In this framework, the aim of this paper is to assess the machinability of Ni-Ti alloys with reference to turning processes in order to realize a reliable and robust in-process identification of machinability conditions. An on-line sensor monitoring procedure based on the acquisition of vibration signals was implemented during the experimental turning tests. The detected vibration sensorial data were processed through an advanced signal processing method in time-frequency domain based on wavelet packet transform (WPT). The extracted sensorial features were used to construct WPT pattern feature vectors to send as input to suitably configured neural networks (NNs) for cognitive pattern recognition in order to evaluate the correlation between input sensorial information and output machinability conditions. PMID:29231864

  3. Estimation of Teacher Practices Based on Text Transcripts of Teacher Speech Using a Support Vector Machine Algorithm

    ERIC Educational Resources Information Center

    Araya, Roberto; Plana, Francisco; Dartnell, Pablo; Soto-Andrade, Jorge; Luci, Gina; Salinas, Elena; Araya, Marylen

    2012-01-01

    Teacher practice is normally assessed by observers who watch classes or videos of classes. Here, we analyse an alternative strategy that uses text transcripts and a support vector machine classifier. For each one of the 710 videos of mathematics classes from the 2005 Chilean National Teacher Assessment Programme, a single 4-minute slice was…

  4. Machine Learning Prediction of the Energy Gap of Graphene Nanoflakes Using Topological Autocorrelation Vectors.

    PubMed

    Fernandez, Michael; Abreu, Jose I; Shi, Hongqing; Barnard, Amanda S

    2016-11-14

    The possibility of band gap engineering in graphene opens countless new opportunities for application in nanoelectronics. In this work, the energy gaps of 622 computationally optimized graphene nanoflakes were mapped to topological autocorrelation vectors using machine learning techniques. Machine learning modeling revealed that the most relevant correlations appear at topological distances in the range of 1 to 42 with prediction accuracy higher than 80%. The data-driven model can statistically discriminate between graphene nanoflakes with different energy gaps on the basis of their molecular topology.

  5. Machine learning-based methods for prediction of linear B-cell epitopes.

    PubMed

    Wang, Hsin-Wei; Pai, Tun-Wen

    2014-01-01

    B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.

  6. Support vector machine as a binary classifier for automated object detection in remotely sensed data

    NASA Astrophysics Data System (ADS)

    Wardaya, P. D.

    2014-02-01

    In the present paper, author proposes the application of Support Vector Machine (SVM) for the analysis of satellite imagery. One of the advantages of SVM is that, with limited training data, it may generate comparable or even better results than the other methods. The SVM algorithm is used for automated object detection and characterization. Specifically, the SVM is applied in its basic nature as a binary classifier where it classifies two classes namely, object and background. The algorithm aims at effectively detecting an object from its background with the minimum training data. The synthetic image containing noises is used for algorithm testing. Furthermore, it is implemented to perform remote sensing image analysis such as identification of Island vegetation, water body, and oil spill from the satellite imagery. It is indicated that SVM provides the fast and accurate analysis with the acceptable result.

  7. Prediction of biochar yield from cattle manure pyrolysis via least squares support vector machine intelligent approach.

    PubMed

    Cao, Hongliang; Xin, Ya; Yuan, Qiaoxia

    2016-02-01

    To predict conveniently the biochar yield from cattle manure pyrolysis, intelligent modeling approach was introduced in this research. A traditional artificial neural networks (ANN) model and a novel least squares support vector machine (LS-SVM) model were developed. For the identification and prediction evaluation of the models, a data set with 33 experimental data was used, which were obtained using a laboratory-scale fixed bed reaction system. The results demonstrated that the intelligent modeling approach is greatly convenient and effective for the prediction of the biochar yield. In particular, the novel LS-SVM model has a more satisfying predicting performance and its robustness is better than the traditional ANN model. The introduction and application of the LS-SVM modeling method gives a successful example, which is a good reference for the modeling study of cattle manure pyrolysis process, even other similar processes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. A Relevance Vector Machine-Based Approach with Application to Oil Sand Pump Prognostics

    PubMed Central

    Hu, Jinfei; Tse, Peter W.

    2013-01-01

    Oil sand pumps are widely used in the mining industry for the delivery of mixtures of abrasive solids and liquids. Because they operate under highly adverse conditions, these pumps usually experience significant wear. Consequently, equipment owners are quite often forced to invest substantially in system maintenance to avoid unscheduled downtime. In this study, an approach combining relevance vector machines (RVMs) with a sum of two exponential functions was developed to predict the remaining useful life (RUL) of field pump impellers. To handle field vibration data, a novel feature extracting process was proposed to arrive at a feature varying with the development of damage in the pump impellers. A case study involving two field datasets demonstrated the effectiveness of the developed method. Compared with standalone exponential fitting, the proposed RVM-based model was much better able to predict the remaining useful life of pump impellers. PMID:24051527

  9. Cosmic string detection with tree-based machine learning

    NASA Astrophysics Data System (ADS)

    Vafaei Sadr, A.; Farhang, M.; Movahed, S. M. S.; Bassett, B.; Kunz, M.

    2018-07-01

    We explore the use of random forest and gradient boosting, two powerful tree-based machine learning algorithms, for the detection of cosmic strings in maps of the cosmic microwave background (CMB), through their unique Gott-Kaiser-Stebbins effect on the temperature anisotropies. The information in the maps is compressed into feature vectors before being passed to the learning units. The feature vectors contain various statistical measures of the processed CMB maps that boost cosmic string detectability. Our proposed classifiers, after training, give results similar to or better than claimed detectability levels from other methods for string tension, Gμ. They can make 3σ detection of strings with Gμ ≳ 2.1 × 10-10 for noise-free, 0.9'-resolution CMB observations. The minimum detectable tension increases to Gμ ≳ 3.0 × 10-8 for a more realistic, CMB S4-like (II) strategy, improving over previous results.

  10. Cosmic String Detection with Tree-Based Machine Learning

    NASA Astrophysics Data System (ADS)

    Vafaei Sadr, A.; Farhang, M.; Movahed, S. M. S.; Bassett, B.; Kunz, M.

    2018-05-01

    We explore the use of random forest and gradient boosting, two powerful tree-based machine learning algorithms, for the detection of cosmic strings in maps of the cosmic microwave background (CMB), through their unique Gott-Kaiser-Stebbins effect on the temperature anisotropies. The information in the maps is compressed into feature vectors before being passed to the learning units. The feature vectors contain various statistical measures of the processed CMB maps that boost cosmic string detectability. Our proposed classifiers, after training, give results similar to or better than claimed detectability levels from other methods for string tension, Gμ. They can make 3σ detection of strings with Gμ ≳ 2.1 × 10-10 for noise-free, 0.9΄-resolution CMB observations. The minimum detectable tension increases to Gμ ≳ 3.0 × 10-8 for a more realistic, CMB S4-like (II) strategy, improving over previous results.

  11. A relevance vector machine-based approach with application to oil sand pump prognostics.

    PubMed

    Hu, Jinfei; Tse, Peter W

    2013-09-18

    Oil sand pumps are widely used in the mining industry for the delivery of mixtures of abrasive solids and liquids. Because they operate under highly adverse conditions, these pumps usually experience significant wear. Consequently, equipment owners are quite often forced to invest substantially in system maintenance to avoid unscheduled downtime. In this study, an approach combining relevance vector machines (RVMs) with a sum of two exponential functions was developed to predict the remaining useful life (RUL) of field pump impellers. To handle field vibration data, a novel feature extracting process was proposed to arrive at a feature varying with the development of damage in the pump impellers. A case study involving two field datasets demonstrated the effectiveness of the developed method. Compared with standalone exponential fitting, the proposed RVM-based model was much better able to predict the remaining useful life of pump impellers.

  12. Detection of periods of food intake using Support Vector Machines.

    PubMed

    Lopez-Meyer, Paulo; Schuckers, Stephanie; Makeyev, Oleksandr; Sazonov, Edward

    2010-01-01

    Studies of obesity and eating disorders need objective tools of Monitoring of Ingestive Behavior (MIB) that can detect and characterize food intake. In this paper we describe detection of food intake by a Support Vector Machine classifier trained on time history of chews and swallows. The training was performed on data collected from 18 subjects in 72 experiments involving eating and other activities (for example, talking). The highest accuracy of detecting food intake (94%) was achieved in configuration where both chews and swallows were used as predictors. Using only swallowing as a predictor resulted in 80% accuracy. Experimental results suggest that these two predictors may be used for differentiation between periods of resting and food intake with a resolution of 30 seconds. Proposed methods may be utilized for development of an accurate, inexpensive, and non-intrusive methodology to objectively monitor food intake in free living conditions.

  13. Harnessing Computational Biology for Exact Linear B-Cell Epitope Prediction: A Novel Amino Acid Composition-Based Feature Descriptor.

    PubMed

    Saravanan, Vijayakumar; Gautham, Namasivayam

    2015-10-01

    Proteins embody epitopes that serve as their antigenic determinants. Epitopes occupy a central place in integrative biology, not to mention as targets for novel vaccine, pharmaceutical, and systems diagnostics development. The presence of T-cell and B-cell epitopes has been extensively studied due to their potential in synthetic vaccine design. However, reliable prediction of linear B-cell epitope remains a formidable challenge. Earlier studies have reported discrepancy in amino acid composition between the epitopes and non-epitopes. Hence, this study proposed and developed a novel amino acid composition-based feature descriptor, Dipeptide Deviation from Expected Mean (DDE), to distinguish the linear B-cell epitopes from non-epitopes effectively. In this study, for the first time, only exact linear B-cell epitopes and non-epitopes have been utilized for developing the prediction method, unlike the use of epitope-containing regions in earlier reports. To evaluate the performance of the DDE feature vector, models have been developed with two widely used machine-learning techniques Support Vector Machine and AdaBoost-Random Forest. Five-fold cross-validation performance of the proposed method with error-free dataset and dataset from other studies achieved an overall accuracy between nearly 61% and 73%, with balance between sensitivity and specificity metrics. Performance of the DDE feature vector was better (with accuracy difference of about 2% to 12%), in comparison to other amino acid-derived features on different datasets. This study reflects the efficiency of the DDE feature vector in enhancing the linear B-cell epitope prediction performance, compared to other feature representations. The proposed method is made as a stand-alone tool available freely for researchers, particularly for those interested in vaccine design and novel molecular target development for systems therapeutics and diagnostics: https://github.com/brsaran/LBEEP.

  14. Progress in computational toxicology.

    PubMed

    Ekins, Sean

    2014-01-01

    Computational methods have been widely applied to toxicology across pharmaceutical, consumer product and environmental fields over the past decade. Progress in computational toxicology is now reviewed. A literature review was performed on computational models for hepatotoxicity (e.g. for drug-induced liver injury (DILI)), cardiotoxicity, renal toxicity and genotoxicity. In addition various publications have been highlighted that use machine learning methods. Several computational toxicology model datasets from past publications were used to compare Bayesian and Support Vector Machine (SVM) learning methods. The increasing amounts of data for defined toxicology endpoints have enabled machine learning models that have been increasingly used for predictions. It is shown that across many different models Bayesian and SVM perform similarly based on cross validation data. Considerable progress has been made in computational toxicology in a decade in both model development and availability of larger scale or 'big data' models. The future efforts in toxicology data generation will likely provide us with hundreds of thousands of compounds that are readily accessible for machine learning models. These models will cover relevant chemistry space for pharmaceutical, consumer product and environmental applications. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. Joint Data Management for MOVINT Data-to-Decision Making

    DTIC Science & Technology

    2011-07-01

    flux tensor , aligned motion history images, and related approaches have been shown to be versatile approaches [12, 16, 17, 18]. Scaling these...methods include voting , neural networks, fuzzy logic, neuro-dynamic programming, support vector machines, Bayesian and Dempster-Shafer methods. One way...Information Fusion, 2010. [16] F. Bunyak, K. Palaniappan, S. K. Nath, G. Seetharaman, “Flux tensor constrained geodesic active contours with sensor fusion

  16. Evaluation of Iron Loss in Interior Permanent Magnet Synchronous Motor with Consideration of Rotational Field

    NASA Astrophysics Data System (ADS)

    Ma, Lei; Sanada, Masayuki; Morimoto, Shigeo; Takeda, Yoji; Kaido, Chikara; Wakisaka, Takeaki

    Loss evaluation is an important issue in the design of electrical machines. Due to the complicate structure and flux distribution, it is difficult to predict the iron loss in the machines exactly. This paper studies the iron loss in interior permanent magnet synchronous motors based on the finite element method. The iron loss test data of core material are used in the fitting of the hysteresis and eddy current loss constants. For motors in practical operation, additional iron losses due to the appearance of rotation of flux density vector and harmonic flux density distribution makes the calculation data deviates from the measured ones. Revision is made to account for these excess iron losses which exist in the practical operating condition. Calculation results show good consistence with the experimental ones. The proposed method provides a possible way to predict the iron loss of the electrical machine with good precision, and may be helpful in the selection of the core material which is best suitable for a certain machine.

  17. A multilevel-ROI-features-based machine learning method for detection of morphometric biomarkers in Parkinson's disease.

    PubMed

    Peng, Bo; Wang, Suhong; Zhou, Zhiyong; Liu, Yan; Tong, Baotong; Zhang, Tao; Dai, Yakang

    2017-06-09

    Machine learning methods have been widely used in recent years for detection of neuroimaging biomarkers in regions of interest (ROIs) and assisting diagnosis of neurodegenerative diseases. The innovation of this study is to use multilevel-ROI-features-based machine learning method to detect sensitive morphometric biomarkers in Parkinson's disease (PD). Specifically, the low-level ROI features (gray matter volume, cortical thickness, etc.) and high-level correlative features (connectivity between ROIs) are integrated to construct the multilevel ROI features. Filter- and wrapper- based feature selection method and multi-kernel support vector machine (SVM) are used in the classification algorithm. T1-weighted brain magnetic resonance (MR) images of 69 PD patients and 103 normal controls from the Parkinson's Progression Markers Initiative (PPMI) dataset are included in the study. The machine learning method performs well in classification between PD patients and normal controls with an accuracy of 85.78%, a specificity of 87.79%, and a sensitivity of 87.64%. The most sensitive biomarkers between PD patients and normal controls are mainly distributed in frontal lobe, parental lobe, limbic lobe, temporal lobe, and central region. The classification performance of our method with multilevel ROI features is significantly improved comparing with other classification methods using single-level features. The proposed method shows promising identification ability for detecting morphometric biomarkers in PD, thus confirming the potentiality of our method in assisting diagnosis of the disease. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Discrimination of Active and Weakly Active Human BACE1 Inhibitors Using Self-Organizing Map and Support Vector Machine.

    PubMed

    Li, Hang; Wang, Maolin; Gong, Ya-Nan; Yan, Aixia

    2016-01-01

    β-secretase (BACE1) is an aspartyl protease, which is considered as a novel vital target in Alzheimer`s disease therapy. We collected a data set of 294 BACE1 inhibitors, and built six classification models to discriminate active and weakly active inhibitors using Kohonen's Self-Organizing Map (SOM) method and Support Vector Machine (SVM) method. Each molecular descriptor was calculated using the program ADRIANA.Code. We adopted two different methods: random method and Self-Organizing Map method, for training/test set split. The descriptors were selected by F-score and stepwise linear regression analysis. The best SVM model Model2C has a good prediction performance on test set with prediction accuracy, sensitivity (SE), specificity (SP) and Matthews correlation coefficient (MCC) of 89.02%, 90%, 88%, 0.78, respectively. Model 1A is the best SOM model, whose accuracy and MCC of the test set were 94.57% and 0.98, respectively. The lone pair electronegativity and polarizability related descriptors importantly contributed to bioactivity of BACE1 inhibitor. The Extended-Connectivity Finger-Prints_4 (ECFP_4) analysis found some vitally key substructural features, which could be helpful for further drug design research. The SOM and SVM models built in this study can be obtained from the authors by email or other contacts.

  19. Support Vector Machine-Based Endmember Extraction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Filippi, Anthony M; Archibald, Richard K

    Introduced in this paper is the utilization of Support Vector Machines (SVMs) to automatically perform endmember extraction from hyperspectral data. The strengths of SVM are exploited to provide a fast and accurate calculated representation of high-dimensional data sets that may consist of multiple distributions. Once this representation is computed, the number of distributions can be determined without prior knowledge. For each distribution, an optimal transform can be determined that preserves informational content while reducing the data dimensionality, and hence, the computational cost. Finally, endmember extraction for the whole data set is accomplished. Results indicate that this Support Vector Machine-Based Endmembermore » Extraction (SVM-BEE) algorithm has the capability of autonomously determining endmembers from multiple clusters with computational speed and accuracy, while maintaining a robust tolerance to noise.« less

  20. Solving the Cauchy-Riemann equations on parallel computers

    NASA Technical Reports Server (NTRS)

    Fatoohi, Raad A.; Grosch, Chester E.

    1987-01-01

    Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.

  1. TWSVR: Regression via Twin Support Vector Machine.

    PubMed

    Khemchandani, Reshma; Goyal, Keshav; Chandra, Suresh

    2016-02-01

    Taking motivation from Twin Support Vector Machine (TWSVM) formulation, Peng (2010) attempted to propose Twin Support Vector Regression (TSVR) where the regressor is obtained via solving a pair of quadratic programming problems (QPPs). In this paper we argue that TSVR formulation is not in the true spirit of TWSVM. Further, taking motivation from Bi and Bennett (2003), we propose an alternative approach to find a formulation for Twin Support Vector Regression (TWSVR) which is in the true spirit of TWSVM. We show that our proposed TWSVR can be derived from TWSVM for an appropriately constructed classification problem. To check the efficacy of our proposed TWSVR we compare its performance with TSVR and classical Support Vector Regression(SVR) on various regression datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Classification of diesel pool refinery streams through near infrared spectroscopy and support vector machines using C-SVC and ν-SVC.

    PubMed

    Alves, Julio Cesar L; Henriques, Claudete B; Poppi, Ronei J

    2014-01-03

    The use of near infrared (NIR) spectroscopy combined with chemometric methods have been widely used in petroleum and petrochemical industry and provides suitable methods for process control and quality control. The algorithm support vector machines (SVM) has demonstrated to be a powerful chemometric tool for development of classification models due to its ability to nonlinear modeling and with high generalization capability and these characteristics can be especially important for treating near infrared (NIR) spectroscopy data of complex mixtures such as petroleum refinery streams. In this work, a study on the performance of the support vector machines algorithm for classification was carried out, using C-SVC and ν-SVC, applied to near infrared (NIR) spectroscopy data of different types of streams that make up the diesel pool in a petroleum refinery: light gas oil, heavy gas oil, hydrotreated diesel, kerosene, heavy naphtha and external diesel. In addition to these six streams, the diesel final blend produced in the refinery was added to complete the data set. C-SVC and ν-SVC classification models with 2, 4, 6 and 7 classes were developed for comparison between its results and also for comparison with the soft independent modeling of class analogy (SIMCA) models results. It is demonstrated the superior performance of SVC models especially using ν-SVC for development of classification models for 6 and 7 classes leading to an improvement of sensitivity on validation sample sets of 24% and 15%, respectively, when compared to SIMCA models, providing better identification of chemical compositions of different diesel pool refinery streams. Copyright © 2013 Elsevier B.V. All rights reserved.

  3. On Docking, Scoring and Assessing Protein-DNA Complexes in a Rigid-Body Framework

    PubMed Central

    Parisien, Marc; Freed, Karl F.; Sosnick, Tobin R.

    2012-01-01

    We consider the identification of interacting protein-nucleic acid partners using the rigid body docking method FTdock, which is systematic and exhaustive in the exploration of docking conformations. The accuracy of rigid body docking methods is tested using known protein-DNA complexes for which the docked and undocked structures are both available. Additional tests with large decoy sets probe the efficacy of two published statistically derived scoring functions that contain a huge number of parameters. In contrast, we demonstrate that state-of-the-art machine learning techniques can enormously reduce the number of parameters required, thereby identifying the relevant docking features using a miniscule fraction of the number of parameters in the prior works. The present machine learning study considers a 300 dimensional vector (dependent on only 15 parameters), termed the Chemical Context Profile (CCP), where each dimension reflects a specific type of protein amino acid-nucleic acid base interaction. The CCP is designed to capture the chemical complementarities of the interface and is well suited for machine learning techniques. Our objective function is the Chemical Context Discrepancy (CCD), which is defined as the angle between the native system's CCP vector and the decoy's vector and which serves as a substitute for the more commonly used root mean squared deviation (RMSD). We demonstrate that the CCP provides a useful scoring function when certain dimensions are properly weighted. Finally, we explore how the amino acids on a protein's surface can help guide DNA binding, first through long-range interactions, followed by direct contacts, according to specific preferences for either the major or minor grooves of the DNA. PMID:22393431

  4. Expected energy-based restricted Boltzmann machine for classification.

    PubMed

    Elfwing, S; Uchibe, E; Doya, K

    2015-04-01

    In classification tasks, restricted Boltzmann machines (RBMs) have predominantly been used in the first stage, either as feature extractors or to provide initialization of neural networks. In this study, we propose a discriminative learning approach to provide a self-contained RBM method for classification, inspired by free-energy based function approximation (FE-RBM), originally proposed for reinforcement learning. For classification, the FE-RBM method computes the output for an input vector and a class vector by the negative free energy of an RBM. Learning is achieved by stochastic gradient-descent using a mean-squared error training objective. In an earlier study, we demonstrated that the performance and the robustness of FE-RBM function approximation can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that the learning performance of RBM function approximation can be further improved by computing the output by the negative expected energy (EE-RBM), instead of the negative free energy. To create a deep learning architecture, we stack several RBMs on top of each other. We also connect the class nodes to all hidden layers to try to improve the performance even further. We validate the classification performance of EE-RBM using the MNIST data set and the NORB data set, achieving competitive performance compared with other classifiers such as standard neural networks, deep belief networks, classification RBMs, and support vector machines. The purpose of using the NORB data set is to demonstrate that EE-RBM with binary input nodes can achieve high performance in the continuous input domain. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

  5. Multivariate models for prediction of human skin sensitization hazard.

    PubMed

    Strickland, Judy; Zang, Qingda; Paris, Michael; Lehmann, David M; Allen, David; Choksi, Neepa; Matheson, Joanna; Jacobs, Abigail; Casey, Warren; Kleinstreuer, Nicole

    2017-03-01

    One of the Interagency Coordinating Committee on the Validation of Alternative Method's (ICCVAM) top priorities is the development and evaluation of non-animal approaches to identify potential skin sensitizers. The complexity of biological events necessary to produce skin sensitization suggests that no single alternative method will replace the currently accepted animal tests. ICCVAM is evaluating an integrated approach to testing and assessment based on the adverse outcome pathway for skin sensitization that uses machine learning approaches to predict human skin sensitization hazard. We combined data from three in chemico or in vitro assays - the direct peptide reactivity assay (DPRA), human cell line activation test (h-CLAT) and KeratinoSens™ assay - six physicochemical properties and an in silico read-across prediction of skin sensitization hazard into 12 variable groups. The variable groups were evaluated using two machine learning approaches, logistic regression and support vector machine, to predict human skin sensitization hazard. Models were trained on 72 substances and tested on an external set of 24 substances. The six models (three logistic regression and three support vector machine) with the highest accuracy (92%) used: (1) DPRA, h-CLAT and read-across; (2) DPRA, h-CLAT, read-across and KeratinoSens; or (3) DPRA, h-CLAT, read-across, KeratinoSens and log P. The models performed better at predicting human skin sensitization hazard than the murine local lymph node assay (accuracy 88%), any of the alternative methods alone (accuracy 63-79%) or test batteries combining data from the individual methods (accuracy 75%). These results suggest that computational methods are promising tools to identify effectively the potential human skin sensitizers without animal testing. Published 2016. This article has been contributed to by US Government employees and their work is in the public domain in the USA. Published 2016. This article has been contributed to by US Government employees and their work is in the public domain in the USA.

  6. An assessment of support vector machines for land cover classification

    USGS Publications Warehouse

    Huang, C.; Davis, L.S.; Townshend, J.R.G.

    2002-01-01

    The support vector machine (SVM) is a group of theoretically superior machine learning algorithms. It was found competitive with the best available machine learning algorithms in classifying high-dimensional data sets. This paper gives an introduction to the theoretical development of the SVM and an experimental evaluation of its accuracy, stability and training speed in deriving land cover classifications from satellite images. The SVM was compared to three other popular classifiers, including the maximum likelihood classifier (MLC), neural network classifiers (NNC) and decision tree classifiers (DTC). The impacts of kernel configuration on the performance of the SVM and of the selection of training data and input variables on the four classifiers were also evaluated in this experiment.

  7. Fast adaptive composite grid methods on distributed parallel architectures

    NASA Technical Reports Server (NTRS)

    Lemke, Max; Quinlan, Daniel

    1992-01-01

    The fast adaptive composite (FAC) grid method is compared with the adaptive composite method (AFAC) under variety of conditions including vectorization and parallelization. Results are given for distributed memory multiprocessor architectures (SUPRENUM, Intel iPSC/2 and iPSC/860). It is shown that the good performance of AFAC and its superiority over FAC in a parallel environment is a property of the algorithm and not dependent on peculiarities of any machine.

  8. A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.

    PubMed

    Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2012-01-01

    A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner.

  9. Optical Coherence Tomography Machine Learning Classifiers for Glaucoma Detection: A Preliminary Study

    PubMed Central

    Burgansky-Eliash, Zvia; Wollstein, Gadi; Chu, Tianjiao; Ramsey, Joseph D.; Glymour, Clark; Noecker, Robert J.; Ishikawa, Hiroshi; Schuman, Joel S.

    2007-01-01

    Purpose Machine-learning classifiers are trained computerized systems with the ability to detect the relationship between multiple input parameters and a diagnosis. The present study investigated whether the use of machine-learning classifiers improves optical coherence tomography (OCT) glaucoma detection. Methods Forty-seven patients with glaucoma (47 eyes) and 42 healthy subjects (42 eyes) were included in this cross-sectional study. Of the glaucoma patients, 27 had early disease (visual field mean deviation [MD] ≥ −6 dB) and 20 had advanced glaucoma (MD < −6 dB). Machine-learning classifiers were trained to discriminate between glaucomatous and healthy eyes using parameters derived from OCT output. The classifiers were trained with all 38 parameters as well as with only 8 parameters that correlated best with the visual field MD. Five classifiers were tested: linear discriminant analysis, support vector machine, recursive partitioning and regression tree, generalized linear model, and generalized additive model. For the last two classifiers, a backward feature selection was used to find the minimal number of parameters that resulted in the best and most simple prediction. The cross-validated receiver operating characteristic (ROC) curve and accuracies were calculated. Results The largest area under the ROC curve (AROC) for glaucoma detection was achieved with the support vector machine using eight parameters (0.981). The sensitivity at 80% and 95% specificity was 97.9% and 92.5%, respectively. This classifier also performed best when judged by cross-validated accuracy (0.966). The best classification between early glaucoma and advanced glaucoma was obtained with the generalized additive model using only three parameters (AROC = 0.854). Conclusions Automated machine classifiers of OCT data might be useful for enhancing the utility of this technology for detecting glaucomatous abnormality. PMID:16249492

  10. Improved Saturated Hydraulic Conductivity Pedotransfer Functions Using Machine Learning Methods

    NASA Astrophysics Data System (ADS)

    Araya, S. N.; Ghezzehei, T. A.

    2017-12-01

    Saturated hydraulic conductivity (Ks) is one of the fundamental hydraulic properties of soils. Its measurement, however, is cumbersome and instead pedotransfer functions (PTFs) are often used to estimate it. Despite a lot of progress over the years, generic PTFs that estimate hydraulic conductivity generally don't have a good performance. We develop significantly improved PTFs by applying state of the art machine learning techniques coupled with high-performance computing on a large database of over 20,000 soils—USKSAT and the Florida Soil Characterization databases. We compared the performance of four machine learning algorithms (k-nearest neighbors, gradient boosted model, support vector machine, and relevance vector machine) and evaluated the relative importance of several soil properties in explaining Ks. An attempt is also made to better account for soil structural properties; we evaluated the importance of variables derived from transformations of soil water retention characteristics and other soil properties. The gradient boosted models gave the best performance with root mean square errors less than 0.7 and mean errors in the order of 0.01 on a log scale of Ks [cm/h]. The effective particle size, D10, was found to be the single most important predictor. Other important predictors included percent clay, bulk density, organic carbon percent, coefficient of uniformity and values derived from water retention characteristics. Model performances were consistently better for Ks values greater than 10 cm/h. This study maximizes the extraction of information from a large database to develop generic machine learning based PTFs to estimate Ks. The study also evaluates the importance of various soil properties and their transformations in explaining Ks.

  11. Monthly evaporation forecasting using artificial neural networks and support vector machines

    NASA Astrophysics Data System (ADS)

    Tezel, Gulay; Buyukyildiz, Meral

    2016-04-01

    Evaporation is one of the most important components of the hydrological cycle, but is relatively difficult to estimate, due to its complexity, as it can be influenced by numerous factors. Estimation of evaporation is important for the design of reservoirs, especially in arid and semi-arid areas. Artificial neural network methods and support vector machines (SVM) are frequently utilized to estimate evaporation and other hydrological variables. In this study, usability of artificial neural networks (ANNs) (multilayer perceptron (MLP) and radial basis function network (RBFN)) and ɛ-support vector regression (SVR) artificial intelligence methods was investigated to estimate monthly pan evaporation. For this aim, temperature, relative humidity, wind speed, and precipitation data for the period 1972 to 2005 from Beysehir meteorology station were used as input variables while pan evaporation values were used as output. The Romanenko and Meyer method was also considered for the comparison. The results were compared with observed class A pan evaporation data. In MLP method, four different training algorithms, gradient descent with momentum and adaptive learning rule backpropagation (GDX), Levenberg-Marquardt (LVM), scaled conjugate gradient (SCG), and resilient backpropagation (RBP), were used. Also, ɛ-SVR model was used as SVR model. The models were designed via 10-fold cross-validation (CV); algorithm performance was assessed via mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R 2). According to the performance criteria, the ANN algorithms and ɛ-SVR had similar results. The ANNs and ɛ-SVR methods were found to perform better than the Romanenko and Meyer methods. Consequently, the best performance using the test data was obtained using SCG(4,2,2,1) with R 2 = 0.905.

  12. Automatic detection of sleep apnea based on EEG detrended fluctuation analysis and support vector machine.

    PubMed

    Zhou, Jing; Wu, Xiao-ming; Zeng, Wei-jie

    2015-12-01

    Sleep apnea syndrome (SAS) is prevalent in individuals and recently, there are many studies focus on using simple and efficient methods for SAS detection instead of polysomnography. However, not much work has been done on using nonlinear behavior of the electroencephalogram (EEG) signals. The purpose of this study is to find a novel and simpler method for detecting apnea patients and to quantify nonlinear characteristics of the sleep apnea. 30 min EEG scaling exponents that quantify power-law correlations were computed using detrended fluctuation analysis (DFA) and compared between six SAS and six healthy subjects during sleep. The mean scaling exponents were calculated every 30 s and 360 control values and 360 apnea values were obtained. These values were compared between the two groups and support vector machine (SVM) was used to classify apnea patients. Significant difference was found between EEG scaling exponents of the two groups (p < 0.001). SVM was used and obtained high and consistent recognition rate: average classification accuracy reached 95.1% corresponding to the sensitivity 93.2% and specificity 98.6%. DFA of EEG is an efficient and practicable method and is helpful clinically in diagnosis of sleep apnea.

  13. Improving near-infrared prediction model robustness with support vector machine regression: a pharmaceutical tablet assay example.

    PubMed

    Igne, Benoît; Drennen, James K; Anderson, Carl A

    2014-01-01

    Changes in raw materials and process wear and tear can have significant effects on the prediction error of near-infrared calibration models. When the variability that is present during routine manufacturing is not included in the calibration, test, and validation sets, the long-term performance and robustness of the model will be limited. Nonlinearity is a major source of interference. In near-infrared spectroscopy, nonlinearity can arise from light path-length differences that can come from differences in particle size or density. The usefulness of support vector machine (SVM) regression to handle nonlinearity and improve the robustness of calibration models in scenarios where the calibration set did not include all the variability present in test was evaluated. Compared to partial least squares (PLS) regression, SVM regression was less affected by physical (particle size) and chemical (moisture) differences. The linearity of the SVM predicted values was also improved. Nevertheless, although visualization and interpretation tools have been developed to enhance the usability of SVM-based methods, work is yet to be done to provide chemometricians in the pharmaceutical industry with a regression method that can supplement PLS-based methods.

  14. Least-Squares Support Vector Machine Approach to Viral Replication Origin Prediction

    PubMed Central

    Cruz-Cano, Raul; Chew, David S.H.; Kwok-Pui, Choi; Ming-Ying, Leung

    2010-01-01

    Replication of their DNA genomes is a central step in the reproduction of many viruses. Procedures to find replication origins, which are initiation sites of the DNA replication process, are therefore of great importance for controlling the growth and spread of such viruses. Existing computational methods for viral replication origin prediction have mostly been tested within the family of herpesviruses. This paper proposes a new approach by least-squares support vector machines (LS-SVMs) and tests its performance not only on the herpes family but also on a collection of caudoviruses coming from three viral families under the order of caudovirales. The LS-SVM approach provides sensitivities and positive predictive values superior or comparable to those given by the previous methods. When suitably combined with previous methods, the LS-SVM approach further improves the prediction accuracy for the herpesvirus replication origins. Furthermore, by recursive feature elimination, the LS-SVM has also helped find the most significant features of the data sets. The results suggest that the LS-SVMs will be a highly useful addition to the set of computational tools for viral replication origin prediction and illustrate the value of optimization-based computing techniques in biomedical applications. PMID:20729987

  15. Least-Squares Support Vector Machine Approach to Viral Replication Origin Prediction.

    PubMed

    Cruz-Cano, Raul; Chew, David S H; Kwok-Pui, Choi; Ming-Ying, Leung

    2010-06-01

    Replication of their DNA genomes is a central step in the reproduction of many viruses. Procedures to find replication origins, which are initiation sites of the DNA replication process, are therefore of great importance for controlling the growth and spread of such viruses. Existing computational methods for viral replication origin prediction have mostly been tested within the family of herpesviruses. This paper proposes a new approach by least-squares support vector machines (LS-SVMs) and tests its performance not only on the herpes family but also on a collection of caudoviruses coming from three viral families under the order of caudovirales. The LS-SVM approach provides sensitivities and positive predictive values superior or comparable to those given by the previous methods. When suitably combined with previous methods, the LS-SVM approach further improves the prediction accuracy for the herpesvirus replication origins. Furthermore, by recursive feature elimination, the LS-SVM has also helped find the most significant features of the data sets. The results suggest that the LS-SVMs will be a highly useful addition to the set of computational tools for viral replication origin prediction and illustrate the value of optimization-based computing techniques in biomedical applications.

  16. The development of vector based 2.5D print methods for a painting machine

    NASA Astrophysics Data System (ADS)

    Parraman, Carinna

    2013-02-01

    Through recent trends in the application of digitally printed decorative finishes to products, CAD, 3D additive layer manufacturing and research in material perception, [1, 2] there is a growing interest in the accurate rendering of materials and tangible displays. Although current advances in colour management and inkjet printing has meant that users can take for granted high-quality colour and resolution in their printed images, digital methods for transferring a photographic coloured image from screen to paper is constrained by pixel count, file size, colorimetric conversion between colour spaces and the gamut limits of input and output devices. This paper considers new approaches to applying alternative colour palettes by using a vector-based approach through the application of paint mixtures, towards what could be described as a 2.5D printing method. The objective is to not apply an image to a textured surface, but where texture and colour are integral to the mark, that like a brush, delineates the contours in the image. The paper describes the difference between the way inks and paints are mixed and applied. When transcribing the fluid appearance of a brush stroke, there is a difference between a halftone printed mark and a painted mark. The issue of surface quality is significant to subjective qualities when studying the appearance of ink or paint on paper. The paper provides examples of a range of vector marks that are then transcribed into brush stokes by the painting machine.

  17. Machine learning and data science in soft materials engineering

    NASA Astrophysics Data System (ADS)

    Ferguson, Andrew L.

    2018-01-01

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by ‘de-jargonizing’ data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  18. Machine learning and data science in soft materials engineering.

    PubMed

    Ferguson, Andrew L

    2018-01-31

    In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by 'de-jargonizing' data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.

  19. Application of structured support vector machine backpropagation to a convolutional neural network for human pose estimation.

    PubMed

    Witoonchart, Peerajak; Chongstitvatana, Prabhas

    2017-08-01

    In this study, for the first time, we show how to formulate a structured support vector machine (SSVM) as two layers in a convolutional neural network, where the top layer is a loss augmented inference layer and the bottom layer is the normal convolutional layer. We show that a deformable part model can be learned with the proposed structured SVM neural network by backpropagating the error of the deformable part model to the convolutional neural network. The forward propagation calculates the loss augmented inference and the backpropagation calculates the gradient from the loss augmented inference layer to the convolutional layer. Thus, we obtain a new type of convolutional neural network called an Structured SVM convolutional neural network, which we applied to the human pose estimation problem. This new neural network can be used as the final layers in deep learning. Our method jointly learns the structural model parameters and the appearance model parameters. We implemented our method as a new layer in the existing Caffe library. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Automated detection of pulmonary nodules in CT images with support vector machines

    NASA Astrophysics Data System (ADS)

    Liu, Lu; Liu, Wanyu; Sun, Xiaoming

    2008-10-01

    Many methods have been proposed to avoid radiologists fail to diagnose small pulmonary nodules. Recently, support vector machines (SVMs) had received an increasing attention for pattern recognition. In this paper, we present a computerized system aimed at pulmonary nodules detection; it identifies the lung field, extracts a set of candidate regions with a high sensitivity ratio and then classifies candidates by the use of SVMs. The Computer Aided Diagnosis (CAD) system presented in this paper supports the diagnosis of pulmonary nodules from Computed Tomography (CT) images as inflammation, tuberculoma, granuloma..sclerosing hemangioma, and malignant tumor. Five texture feature sets were extracted for each lesion, while a genetic algorithm based feature selection method was applied to identify the most robust features. The selected feature set was fed into an ensemble of SVMs classifiers. The achieved classification performance was 100%, 92.75% and 90.23% in the training, validation and testing set, respectively. It is concluded that computerized analysis of medical images in combination with artificial intelligence can be used in clinical practice and may contribute to more efficient diagnosis.

  1. Using Support Vector Machine Ensembles for Target Audience Classification on Twitter

    PubMed Central

    Lo, Siaw Ling; Chiong, Raymond; Cornforth, David

    2015-01-01

    The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space. PMID:25874768

  2. Identification and Mapping of Tree Species in Urban Areas Using WORLDVIEW-2 Imagery

    NASA Astrophysics Data System (ADS)

    Mustafa, Y. T.; Habeeb, H. N.; Stein, A.; Sulaiman, F. Y.

    2015-10-01

    Monitoring and mapping of urban trees are essential to provide urban forestry authorities with timely and consistent information. Modern techniques increasingly facilitate these tasks, but require the development of semi-automatic tree detection and classification methods. In this article, we propose an approach to delineate and map the crown of 15 tree species in the city of Duhok, Kurdistan Region of Iraq using WorldView-2 (WV-2) imagery. A tree crown object is identified first and is subsequently delineated as an image object (IO) using vegetation indices and texture measurements. Next, three classification methods: Maximum Likelihood, Neural Network, and Support Vector Machine were used to classify IOs using selected IO features. The best results are obtained with Support Vector Machine classification that gives the best map of urban tree species in Duhok. The overall accuracy was between 60.93% to 88.92% and κ-coefficient was between 0.57 to 0.75. We conclude that fifteen tree species were identified and mapped at a satisfactory accuracy in urban areas of this study.

  3. Using support vector machine ensembles for target audience classification on Twitter.

    PubMed

    Lo, Siaw Ling; Chiong, Raymond; Cornforth, David

    2015-01-01

    The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.

  4. Study of support vector machine and serum surface-enhanced Raman spectroscopy for noninvasive esophageal cancer detection

    NASA Astrophysics Data System (ADS)

    Li, Shao-Xin; Zeng, Qiu-Yao; Li, Lin-Fang; Zhang, Yan-Jiao; Wan, Ming-Ming; Liu, Zhi-Ming; Xiong, Hong-Lian; Guo, Zhou-Yi; Liu, Song-Hao

    2013-02-01

    The ability of combining serum surface-enhanced Raman spectroscopy (SERS) with support vector machine (SVM) for improving classification esophageal cancer patients from normal volunteers is investigated. Two groups of serum SERS spectra based on silver nanoparticles (AgNPs) are obtained: one group from patients with pathologically confirmed esophageal cancer (n=30) and the other group from healthy volunteers (n=31). Principal components analysis (PCA), conventional SVM (C-SVM) and conventional SVM combination with PCA (PCA-SVM) methods are implemented to classify the same spectral dataset. Results show that a diagnostic accuracy of 77.0% is acquired for PCA technique, while diagnostic accuracies of 83.6% and 85.2% are obtained for C-SVM and PCA-SVM methods based on radial basis functions (RBF) models. The results prove that RBF SVM models are superior to PCA algorithm in classification serum SERS spectra. The study demonstrates that serum SERS in combination with SVM technique has great potential to provide an effective and accurate diagnostic schema for noninvasive detection of esophageal cancer.

  5. Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine.

    PubMed

    Kumar, Ravindra; Kumari, Bandana; Kumar, Manish

    2017-01-01

    The endoplasmic reticulum plays an important role in many cellular processes, which includes protein synthesis, folding and post-translational processing of newly synthesized proteins. It is also the site for quality control of misfolded proteins and entry point of extracellular proteins to the secretory pathway. Hence at any given point of time, endoplasmic reticulum contains two different cohorts of proteins, (i) proteins involved in endoplasmic reticulum-specific function, which reside in the lumen of the endoplasmic reticulum, called as endoplasmic reticulum resident proteins and (ii) proteins which are in process of moving to the extracellular space. Thus, endoplasmic reticulum resident proteins must somehow be distinguished from newly synthesized secretory proteins, which pass through the endoplasmic reticulum on their way out of the cell. Approximately only 50% of the proteins used in this study as training data had endoplasmic reticulum retention signal, which shows that these signals are not essentially present in all endoplasmic reticulum resident proteins. This also strongly indicates the role of additional factors in retention of endoplasmic reticulum-specific proteins inside the endoplasmic reticulum. This is a support vector machine based method, where we had used different forms of protein features as inputs for support vector machine to develop the prediction models. During training leave-one-out approach of cross-validation was used. Maximum performance was obtained with a combination of amino acid compositions of different part of proteins. In this study, we have reported a novel support vector machine based method for predicting endoplasmic reticulum resident proteins, named as ERPred. During training we achieved a maximum accuracy of 81.42% with leave-one-out approach of cross-validation. When evaluated on independent dataset, ERPred did prediction with sensitivity of 72.31% and specificity of 83.69%. We have also annotated six different proteomes to predict the candidate endoplasmic reticulum resident proteins in them. A webserver, ERPred, was developed to make the method available to the scientific community, which can be accessed at http://proteininformatics.org/mkumar/erpred/index.html. We found that out of 124 proteins of the training dataset, only 66 proteins had endoplasmic reticulum retention signals, which shows that these signals are not an absolute necessity for endoplasmic reticulum resident proteins to remain inside the endoplasmic reticulum. This observation also strongly indicates the role of additional factors in retention of proteins inside the endoplasmic reticulum. Our proposed predictor, ERPred, is a signal independent tool. It is tuned for the prediction of endoplasmic reticulum resident proteins, even if the query protein does not contain specific ER-retention signal.

  6. Hard-Rock Stability Analysis for Span Design in Entry-Type Excavations with Learning Classifiers

    PubMed Central

    García-Gonzalo, Esperanza; Fernández-Muñiz, Zulima; García Nieto, Paulino José; Bernardo Sánchez, Antonio; Menéndez Fernández, Marta

    2016-01-01

    The mining industry relies heavily on empirical analysis for design and prediction. An empirical design method, called the critical span graph, was developed specifically for rock stability analysis in entry-type excavations, based on an extensive case-history database of cut and fill mining in Canada. This empirical span design chart plots the critical span against rock mass rating for the observed case histories and has been accepted by many mining operations for the initial span design of cut and fill stopes. Different types of analysis have been used to classify the observed cases into stable, potentially unstable and unstable groups. The main purpose of this paper is to present a new method for defining rock stability areas of the critical span graph, which applies machine learning classifiers (support vector machine and extreme learning machine). The results show a reasonable correlation with previous guidelines. These machine learning methods are good tools for developing empirical methods, since they make no assumptions about the regression function. With this software, it is easy to add new field observations to a previous database, improving prediction output with the addition of data that consider the local conditions for each mine. PMID:28773653

  7. Hard-Rock Stability Analysis for Span Design in Entry-Type Excavations with Learning Classifiers.

    PubMed

    García-Gonzalo, Esperanza; Fernández-Muñiz, Zulima; García Nieto, Paulino José; Bernardo Sánchez, Antonio; Menéndez Fernández, Marta

    2016-06-29

    The mining industry relies heavily on empirical analysis for design and prediction. An empirical design method, called the critical span graph, was developed specifically for rock stability analysis in entry-type excavations, based on an extensive case-history database of cut and fill mining in Canada. This empirical span design chart plots the critical span against rock mass rating for the observed case histories and has been accepted by many mining operations for the initial span design of cut and fill stopes. Different types of analysis have been used to classify the observed cases into stable, potentially unstable and unstable groups. The main purpose of this paper is to present a new method for defining rock stability areas of the critical span graph, which applies machine learning classifiers (support vector machine and extreme learning machine). The results show a reasonable correlation with previous guidelines. These machine learning methods are good tools for developing empirical methods, since they make no assumptions about the regression function. With this software, it is easy to add new field observations to a previous database, improving prediction output with the addition of data that consider the local conditions for each mine.

  8. An immune-inspired semi-supervised algorithm for breast cancer diagnosis.

    PubMed

    Peng, Lingxi; Chen, Wenbin; Zhou, Wubai; Li, Fufang; Yang, Jin; Zhang, Jiandong

    2016-10-01

    Breast cancer is the most frequently and world widely diagnosed life-threatening cancer, which is the leading cause of cancer death among women. Early accurate diagnosis can be a big plus in treating breast cancer. Researchers have approached this problem using various data mining and machine learning techniques such as support vector machine, artificial neural network, etc. The computer immunology is also an intelligent method inspired by biological immune system, which has been successfully applied in pattern recognition, combination optimization, machine learning, etc. However, most of these diagnosis methods belong to a supervised diagnosis method. It is very expensive to obtain labeled data in biology and medicine. In this paper, we seamlessly integrate the state-of-the-art research on life science with artificial intelligence, and propose a semi-supervised learning algorithm to reduce the need for labeled data. We use two well-known benchmark breast cancer datasets in our study, which are acquired from the UCI machine learning repository. Extensive experiments are conducted and evaluated on those two datasets. Our experimental results demonstrate the effectiveness and efficiency of our proposed algorithm, which proves that our algorithm is a promising automatic diagnosis method for breast cancer. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  9. Curriculum Assessment Using Artificial Neural Network and Support Vector Machine Modeling Approaches: A Case Study. IR Applications. Volume 29

    ERIC Educational Resources Information Center

    Chen, Chau-Kuang

    2010-01-01

    Artificial Neural Network (ANN) and Support Vector Machine (SVM) approaches have been on the cutting edge of science and technology for pattern recognition and data classification. In the ANN model, classification accuracy can be achieved by using the feed-forward of inputs, back-propagation of errors, and the adjustment of connection weights. In…

  10. Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach

    PubMed Central

    Kudisthalert, Wasu

    2018-01-01

    Machine learning techniques are becoming popular in virtual screening tasks. One of the powerful machine learning algorithms is Extreme Learning Machine (ELM) which has been applied to many applications and has recently been applied to virtual screening. We propose the Weighted Similarity ELM (WS-ELM) which is based on a single layer feed-forward neural network in a conjunction of 16 different similarity coefficients as activation function in the hidden layer. It is known that the performance of conventional ELM is not robust due to random weight selection in the hidden layer. Thus, we propose a Clustering-based WS-ELM (CWS-ELM) that deterministically assigns weights by utilising clustering algorithms i.e. k-means clustering and support vector clustering. The experiments were conducted on one of the most challenging datasets–Maximum Unbiased Validation Dataset–which contains 17 activity classes carefully selected from PubChem. The proposed algorithms were then compared with other machine learning techniques such as support vector machine, random forest, and similarity searching. The results show that CWS-ELM in conjunction with support vector clustering yields the best performance when utilised together with Sokal/Sneath(1) coefficient. Furthermore, ECFP_6 fingerprint presents the best results in our framework compared to the other types of fingerprints, namely ECFP_4, FCFP_4, and FCFP_6. PMID:29652912

  11. Classification of Alzheimer's disease patients with hippocampal shape wrapper-based feature selection and support vector machine

    NASA Astrophysics Data System (ADS)

    Young, Jonathan; Ridgway, Gerard; Leung, Kelvin; Ourselin, Sebastien

    2012-02-01

    It is well known that hippocampal atrophy is a marker of the onset of Alzheimer's disease (AD) and as a result hippocampal volumetry has been used in a number of studies to provide early diagnosis of AD and predict conversion of mild cognitive impairment patients to AD. However, rates of atrophy are not uniform across the hippocampus making shape analysis a potentially more accurate biomarker. This study studies the hippocampi from 226 healthy controls, 148 AD patients and 330 MCI patients obtained from T1 weighted structural MRI images from the ADNI database. The hippocampi are anatomically segmented using the MAPS multi-atlas segmentation method, and the resulting binary images are then processed with SPHARM software to decompose their shapes as a weighted sum of spherical harmonic basis functions. The resulting parameterizations are then used as feature vectors in Support Vector Machine (SVM) classification. A wrapper based feature selection method was used as this considers the utility of features in discriminating classes in combination, fully exploiting the multivariate nature of the data and optimizing the selected set of features for the type of classifier that is used. The leave-one-out cross validated accuracy obtained on training data is 88.6% for classifying AD vs controls and 74% for classifying MCI-converters vs MCI-stable with very compact feature sets, showing that this is a highly promising method. There is currently a considerable fall in accuracy on unseen data indicating that the feature selection is sensitive to the data used, however feature ensemble methods may overcome this.

  12. Extraction and Analysis of Mega Cities’ Impervious Surface on Pixel-based and Object-oriented Support Vector Machine Classification Technology: A case of Bombay

    NASA Astrophysics Data System (ADS)

    Yu, S. S.; Sun, Z. C.; Sun, L.; Wu, M. F.

    2017-02-01

    The object of this paper is to study the impervious surface extraction method using remote sensing imagery and monitor the spatiotemporal changing patterns of mega cities. Megacity Bombay was selected as the interesting area. Firstly, the pixel-based and object-oriented support vector machine (SVM) classification methods were used to acquire the land use/land cover (LULC) products of Bombay in 2010. Consequently, the overall accuracy (OA) and overall Kappa (OK) of the pixel-based method were 94.97% and 0.96 with a running time of 78 minutes, the OA and OK of the object-oriented method were 93.72% and 0.94 with a running time of only 17s. Additionally, OA and OK of the object-oriented method after a post-classification were improved up to 95.8% and 0.94. Then, the dynamic impervious surfaces of Bombay in the period 1973-2015 were extracted and the urbanization pattern of Bombay was analysed. Results told that both the two SVM classification methods could accomplish the impervious surface extraction, but the object-oriented method should be a better choice. Urbanization of Bombay experienced a fast extending during the past 42 years, implying a dramatically urban sprawl of mega cities in the developing countries along the One Belt and One Road (OBOR).

  13. Mining protein function from text using term-based support vector machines

    PubMed Central

    Rice, Simon B; Nenadic, Goran; Stapley, Benjamin J

    2005-01-01

    Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2. PMID:15960835

  14. Automated Assessment of Cognitive Health Using Smart Home Technologies

    PubMed Central

    Dawadi, Prafulla N.; Cook, Diane J.; Schmitter-Edgecombe, Maureen; Parsey, Carolyn

    2014-01-01

    BACKGROUND The goal of this work is to develop intelligent systems to monitor the well being of individuals in their home environments. OBJECTIVE This paper introduces a machine learning-based method to automatically predict activity quality in smart homes and automatically assess cognitive health based on activity quality. METHODS This paper describes an automated framework to extract set of features from smart home sensors data that reflects the activity performance or ability of an individual to complete an activity which can be input to machine learning algorithms. Output from learning algorithms including principal component analysis, support vector machine, and logistic regression algorithms are used to quantify activity quality for a complex set of smart home activities and predict cognitive health of participants. RESULTS Smart home activity data was gathered from volunteer participants (n=263) who performed a complex set of activities in our smart home testbed. We compare our automated activity quality prediction and cognitive health prediction with direct observation scores and health assessment obtained from neuropsychologists. With all samples included, we obtained statistically significant correlation (r=0.54) between direct observation scores and predicted activity quality. Similarly, using a support vector machine classifier, we obtained reasonable classification accuracy (area under the ROC curve = 0.80, g-mean = 0.73) in classifying participants into two different cognitive classes, dementia and cognitive healthy. CONCLUSIONS The results suggest that it is possible to automatically quantify the task quality of smart home activities and perform limited assessment of the cognitive health of individual if smart home activities are properly chosen and learning algorithms are appropriately trained. PMID:23949177

  15. (Machine-)Learning to analyze in vivo microscopy: Support vector machines.

    PubMed

    Wang, Michael F Z; Fernandez-Gonzalez, Rodrigo

    2017-11-01

    The development of new microscopy techniques for super-resolved, long-term monitoring of cellular and subcellular dynamics in living organisms is revealing new fundamental aspects of tissue development and repair. However, new microscopy approaches present several challenges. In addition to unprecedented requirements for data storage, the analysis of high resolution, time-lapse images is too complex to be done manually. Machine learning techniques are ideally suited for the (semi-)automated analysis of multidimensional image data. In particular, support vector machines (SVMs), have emerged as an efficient method to analyze microscopy images obtained from animals. Here, we discuss the use of SVMs to analyze in vivo microscopy data. We introduce the mathematical framework behind SVMs, and we describe the metrics used by SVMs and other machine learning approaches to classify image data. We discuss the influence of different SVM parameters in the context of an algorithm for cell segmentation and tracking. Finally, we describe how the application of SVMs has been critical to study protein localization in yeast screens, for lineage tracing in C. elegans, or to determine the developmental stage of Drosophila embryos to investigate gene expression dynamics. We propose that SVMs will become central tools in the analysis of the complex image data that novel microscopy modalities have made possible. This article is part of a Special Issue entitled: Biophysics in Canada, edited by Lewis Kay, John Baenziger, Albert Berghuis and Peter Tieleman. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data

    PubMed Central

    Hepworth, Philip J.; Nefedov, Alexey V.; Muchnik, Ilya B.; Morgan, Kenton L.

    2012-01-01

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide. PMID:22319115

  17. Broiler chickens can benefit from machine learning: support vector machine analysis of observational epidemiological data.

    PubMed

    Hepworth, Philip J; Nefedov, Alexey V; Muchnik, Ilya B; Morgan, Kenton L

    2012-08-07

    Machine-learning algorithms pervade our daily lives. In epidemiology, supervised machine learning has the potential for classification, diagnosis and risk factor identification. Here, we report the use of support vector machine learning to identify the features associated with hock burn on commercial broiler farms, using routinely collected farm management data. These data lend themselves to analysis using machine-learning techniques. Hock burn, dermatitis of the skin over the hock, is an important indicator of broiler health and welfare. Remarkably, this classifier can predict the occurrence of high hock burn prevalence with accuracy of 0.78 on unseen data, as measured by the area under the receiver operating characteristic curve. We also compare the results with those obtained by standard multi-variable logistic regression and suggest that this technique provides new insights into the data. This novel application of a machine-learning algorithm, embedded in poultry management systems could offer significant improvements in broiler health and welfare worldwide.

  18. Comparison between Two Linear Supervised Learning Machines' Methods with Principle Component Based Methods for the Spectrofluorimetric Determination of Agomelatine and Its Degradants.

    PubMed

    Elkhoudary, Mahmoud M; Naguib, Ibrahim A; Abdel Salam, Randa A; Hadad, Ghada M

    2017-05-01

    Four accurate, sensitive and reliable stability indicating chemometric methods were developed for the quantitative determination of Agomelatine (AGM) whether in pure form or in pharmaceutical formulations. Two supervised learning machines' methods; linear artificial neural networks (PC-linANN) preceded by principle component analysis and linear support vector regression (linSVR), were compared with two principle component based methods; principle component regression (PCR) as well as partial least squares (PLS) for the spectrofluorimetric determination of AGM and its degradants. The results showed the benefits behind using linear learning machines' methods and the inherent merits of their algorithms in handling overlapped noisy spectral data especially during the challenging determination of AGM alkaline and acidic degradants (DG1 and DG2). Relative mean squared error of prediction (RMSEP) for the proposed models in the determination of AGM were 1.68, 1.72, 0.68 and 0.22 for PCR, PLS, SVR and PC-linANN; respectively. The results showed the superiority of supervised learning machines' methods over principle component based methods. Besides, the results suggested that linANN is the method of choice for determination of components in low amounts with similar overlapped spectra and narrow linearity range. Comparison between the proposed chemometric models and a reported HPLC method revealed the comparable performance and quantification power of the proposed models.

  19. Automatic migraine classification via feature selection committee and machine learning techniques over imaging and questionnaire data.

    PubMed

    Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya; Gomez-Beldarrain, Marian; Fernandez-Ruanova, Begonya; Garcia-Monco, Juan Carlos

    2017-04-13

    Feature selection methods are commonly used to identify subsets of relevant features to facilitate the construction of models for classification, yet little is known about how feature selection methods perform in diffusion tensor images (DTIs). In this study, feature selection and machine learning classification methods were tested for the purpose of automating diagnosis of migraines using both DTIs and questionnaire answers related to emotion and cognition - factors that influence of pain perceptions. We select 52 adult subjects for the study divided into three groups: control group (15), subjects with sporadic migraine (19) and subjects with chronic migraine and medication overuse (18). These subjects underwent magnetic resonance with diffusion tensor to see white matter pathway integrity of the regions of interest involved in pain and emotion. The tests also gather data about pathology. The DTI images and test results were then introduced into feature selection algorithms (Gradient Tree Boosting, L1-based, Random Forest and Univariate) to reduce features of the first dataset and classification algorithms (SVM (Support Vector Machine), Boosting (Adaboost) and Naive Bayes) to perform a classification of migraine group. Moreover we implement a committee method to improve the classification accuracy based on feature selection algorithms. When classifying the migraine group, the greatest improvements in accuracy were made using the proposed committee-based feature selection method. Using this approach, the accuracy of classification into three types improved from 67 to 93% when using the Naive Bayes classifier, from 90 to 95% with the support vector machine classifier, 93 to 94% in boosting. The features that were determined to be most useful for classification included are related with the pain, analgesics and left uncinate brain (connected with the pain and emotions). The proposed feature selection committee method improved the performance of migraine diagnosis classifiers compared to individual feature selection methods, producing a robust system that achieved over 90% accuracy in all classifiers. The results suggest that the proposed methods can be used to support specialists in the classification of migraines in patients undergoing magnetic resonance imaging.

  20. Support vector machine for the diagnosis of malignant mesothelioma

    NASA Astrophysics Data System (ADS)

    Ushasukhanya, S.; Nithyakalyani, A.; Sivakumar, V.

    2018-04-01

    Harmful mesothelioma is an illness in which threatening (malignancy) cells shape in the covering of the trunk or stomach area. Being presented to asbestos can influence the danger of threatening mesothelioma. Signs and side effects of threatening mesothelioma incorporate shortness of breath and agony under the rib confine. Tests that inspect within the trunk and belly are utilized to recognize (find) and analyse harmful mesothelioma. Certain elements influence forecast (shot of recuperation) and treatment choices. In this review, Support vector machine (SVM) classifiers were utilized for Mesothelioma sickness conclusion. SVM output is contrasted by concentrating on Mesothelioma’s sickness and findings by utilizing similar information set. The support vector machine algorithm gives 92.5% precision acquired by means of 3-overlap cross-approval. The Mesothelioma illness dataset were taken from an organization reports from Turkey.

  1. Prediction and analysis of beta-turns in proteins by support vector machine.

    PubMed

    Pham, Tho Hoan; Satou, Kenji; Ho, Tu Bao

    2003-01-01

    Tight turn has long been recognized as one of the three important features of proteins after the alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns. Analysis and prediction of beta-turns in particular and tight turns in general are very useful for the design of new molecules such as drugs, pesticides, and antigens. In this paper, we introduce a support vector machine (SVM) approach to prediction and analysis of beta-turns. We have investigated two aspects of applying SVM to the prediction and analysis of beta-turns. First, we developed a new SVM method, called BTSVM, which predicts beta-turns of a protein from its sequence. The prediction results on the dataset of 426 non-homologous protein chains by sevenfold cross-validation technique showed that our method is superior to the other previous methods. Second, we analyzed how amino acid positions support (or prevent) the formation of beta-turns based on the "multivariable" classification model of a linear SVM. This model is more general than the other ones of previous statistical methods. Our analysis results are more comprehensive and easier to use than previously published analysis results.

  2. Prediction of pH of cola beverage using Vis/NIR spectroscopy and least squares-support vector machine

    NASA Astrophysics Data System (ADS)

    Liu, Fei; He, Yong

    2008-02-01

    Visible and near infrared (Vis/NIR) transmission spectroscopy and chemometric methods were utilized to predict the pH values of cola beverages. Five varieties of cola were prepared and 225 samples (45 samples for each variety) were selected for the calibration set, while 75 samples (15 samples for each variety) for the validation set. The smoothing way of Savitzky-Golay and standard normal variate (SNV) followed by first-derivative were used as the pre-processing methods. Partial least squares (PLS) analysis was employed to extract the principal components (PCs) which were used as the inputs of least squares-support vector machine (LS-SVM) model according to their accumulative reliabilities. Then LS-SVM with radial basis function (RBF) kernel function and a two-step grid search technique were applied to build the regression model with a comparison of PLS regression. The correlation coefficient (r), root mean square error of prediction (RMSEP) and bias were 0.961, 0.040 and 0.012 for PLS, while 0.975, 0.031 and 4.697x10 -3 for LS-SVM, respectively. Both methods obtained a satisfying precision. The results indicated that Vis/NIR spectroscopy combined with chemometric methods could be applied as an alternative way for the prediction of pH of cola beverages.

  3. A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features.

    PubMed

    Li, Liqi; Luo, Qifa; Xiao, Weidong; Li, Jinhui; Zhou, Shiwen; Li, Yongsheng; Zheng, Xiaoqi; Yang, Hua

    2017-02-01

    Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.

  4. A Fault Alarm and Diagnosis Method Based on Sensitive Parameters and Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Zhang, Jinjie; Yao, Ziyun; Lv, Zhiquan; Zhu, Qunxiong; Xu, Fengtian; Jiang, Zhinong

    2015-08-01

    Study on the extraction of fault feature and the diagnostic technique of reciprocating compressor is one of the hot research topics in the field of reciprocating machinery fault diagnosis at present. A large number of feature extraction and classification methods have been widely applied in the related research, but the practical fault alarm and the accuracy of diagnosis have not been effectively improved. Developing feature extraction and classification methods to meet the requirements of typical fault alarm and automatic diagnosis in practical engineering is urgent task. The typical mechanical faults of reciprocating compressor are presented in the paper, and the existing data of online monitoring system is used to extract fault feature parameters within 15 types in total; the inner sensitive connection between faults and the feature parameters has been made clear by using the distance evaluation technique, also sensitive characteristic parameters of different faults have been obtained. On this basis, a method based on fault feature parameters and support vector machine (SVM) is developed, which will be applied to practical fault diagnosis. A better ability of early fault warning has been proved by the experiment and the practical fault cases. Automatic classification by using the SVM to the data of fault alarm has obtained better diagnostic accuracy.

  5. A Non-Parametric Approach for the Activation Detection of Block Design fMRI Simulated Data Using Self-Organizing Maps and Support Vector Machine.

    PubMed

    Bahrami, Sheyda; Shamsi, Mousa

    2017-01-01

    Functional magnetic resonance imaging (fMRI) is a popular method to probe the functional organization of the brain using hemodynamic responses. In this method, volume images of the entire brain are obtained with a very good spatial resolution and low temporal resolution. However, they always suffer from high dimensionality in the face of classification algorithms. In this work, we combine a support vector machine (SVM) with a self-organizing map (SOM) for having a feature-based classification by using SVM. Then, a linear kernel SVM is used for detecting the active areas. Here, we use SOM for feature extracting and labeling the datasets. SOM has two major advances: (i) it reduces dimension of data sets for having less computational complexity and (ii) it is useful for identifying brain regions with small onset differences in hemodynamic responses. Our non-parametric model is compared with parametric and non-parametric methods. We use simulated fMRI data sets and block design inputs in this paper and consider the contrast to noise ratio (CNR) value equal to 0.6 for simulated datasets. fMRI simulated dataset has contrast 1-4% in active areas. The accuracy of our proposed method is 93.63% and the error rate is 6.37%.

  6. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.

    PubMed

    Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

    2018-01-05

    DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.

  7. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

    PubMed Central

    Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

    2018-01-01

    DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html PMID:29416743

  8. Automated annotation of functional imaging experiments via multi-label classification

    PubMed Central

    Turner, Matthew D.; Chakrabarti, Chayan; Jones, Thomas B.; Xu, Jiawei F.; Fox, Peter T.; Luger, George F.; Laird, Angela R.; Turner, Jessica A.

    2013-01-01

    Identifying the experimental methods in human neuroimaging papers is important for grouping meaningfully similar experiments for meta-analyses. Currently, this can only be done by human readers. We present the performance of common machine learning (text mining) methods applied to the problem of automatically classifying or labeling this literature. Labeling terms are from the Cognitive Paradigm Ontology (CogPO), the text corpora are abstracts of published functional neuroimaging papers, and the methods use the performance of a human expert as training data. We aim to replicate the expert's annotation of multiple labels per abstract identifying the experimental stimuli, cognitive paradigms, response types, and other relevant dimensions of the experiments. We use several standard machine learning methods: naive Bayes (NB), k-nearest neighbor, and support vector machines (specifically SMO or sequential minimal optimization). Exact match performance ranged from only 15% in the worst cases to 78% in the best cases. NB methods combined with binary relevance transformations performed strongly and were robust to overfitting. This collection of results demonstrates what can be achieved with off-the-shelf software components and little to no pre-processing of raw text. PMID:24409112

  9. Adiabatic Quantum Anomaly Detection and Machine Learning

    NASA Astrophysics Data System (ADS)

    Pudenz, Kristen; Lidar, Daniel

    2012-02-01

    We present methods of anomaly detection and machine learning using adiabatic quantum computing. The machine learning algorithm is a boosting approach which seeks to optimally combine somewhat accurate classification functions to create a unified classifier which is much more accurate than its components. This algorithm then becomes the first part of the larger anomaly detection algorithm. In the anomaly detection routine, we first use adiabatic quantum computing to train two classifiers which detect two sets, the overlap of which forms the anomaly class. We call this the learning phase. Then, in the testing phase, the two learned classification functions are combined to form the final Hamiltonian for an adiabatic quantum computation, the low energy states of which represent the anomalies in a binary vector space.

  10. Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition.

    PubMed

    Tamura, Takeyuki; Akutsu, Tatsuya

    2007-11-30

    Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html.

  11. Automatic Recognition of Fetal Facial Standard Plane in Ultrasound Image via Fisher Vector.

    PubMed

    Lei, Baiying; Tan, Ee-Leng; Chen, Siping; Zhuo, Liu; Li, Shengli; Ni, Dong; Wang, Tianfu

    2015-01-01

    Acquisition of the standard plane is the prerequisite of biometric measurement and diagnosis during the ultrasound (US) examination. In this paper, a new algorithm is developed for the automatic recognition of the fetal facial standard planes (FFSPs) such as the axial, coronal, and sagittal planes. Specifically, densely sampled root scale invariant feature transform (RootSIFT) features are extracted and then encoded by Fisher vector (FV). The Fisher network with multi-layer design is also developed to extract spatial information to boost the classification performance. Finally, automatic recognition of the FFSPs is implemented by support vector machine (SVM) classifier based on the stochastic dual coordinate ascent (SDCA) algorithm. Experimental results using our dataset demonstrate that the proposed method achieves an accuracy of 93.27% and a mean average precision (mAP) of 99.19% in recognizing different FFSPs. Furthermore, the comparative analyses reveal the superiority of the proposed method based on FV over the traditional methods.

  12. Landslide susceptibility mapping & prediction using Support Vector Machine for Mandakini River Basin, Garhwal Himalaya, India

    NASA Astrophysics Data System (ADS)

    Kumar, Deepak; Thakur, Manoj; Dubey, Chandra S.; Shukla, Dericks P.

    2017-10-01

    In recent years, various machine learning techniques have been applied for landslide susceptibility mapping. In this study, three different variants of support vector machine viz., SVM, Proximal Support Vector Machine (PSVM) and L2-Support Vector Machine - Modified Finite Newton (L2-SVM-MFN) have been applied on the Mandakini River Basin in Uttarakhand, India to carry out the landslide susceptibility mapping. Eight thematic layers such as elevation, slope, aspect, drainages, geology/lithology, buffer of thrusts/faults, buffer of streams and soil along with the past landslide data were mapped in GIS environment and used for landslide susceptibility mapping in MATLAB. The study area covering 1625 km2 has merely 0.11% of area under landslides. There are 2009 pixels for past landslides out of which 50% (1000) landslides were considered as training set while remaining 50% as testing set. The performance of these techniques has been evaluated and the computational results show that L2-SVM-MFN obtains higher prediction values (0.829) of receiver operating characteristic curve (AUC-area under the curve) as compared to 0.807 for PSVM model and 0.79 for SVM. The results obtained from L2-SVM-MFN model are found to be superior than other SVM prediction models and suggest the usefulness of this technique to problem of landslide susceptibility mapping where training data is very less. However, these techniques can be used for satisfactory determination of susceptible zones with these inputs.

  13. Progressive Classification Using Support Vector Machines

    NASA Technical Reports Server (NTRS)

    Wagstaff, Kiri; Kocurek, Michael

    2009-01-01

    An algorithm for progressive classification of data, analogous to progressive rendering of images, makes it possible to compromise between speed and accuracy. This algorithm uses support vector machines (SVMs) to classify data. An SVM is a machine learning algorithm that builds a mathematical model of the desired classification concept by identifying the critical data points, called support vectors. Coarse approximations to the concept require only a few support vectors, while precise, highly accurate models require far more support vectors. Once the model has been constructed, the SVM can be applied to new observations. The cost of classifying a new observation is proportional to the number of support vectors in the model. When computational resources are limited, an SVM of the appropriate complexity can be produced. However, if the constraints are not known when the model is constructed, or if they can change over time, a method for adaptively responding to the current resource constraints is required. This capability is particularly relevant for spacecraft (or any other real-time systems) that perform onboard data analysis. The new algorithm enables the fast, interactive application of an SVM classifier to a new set of data. The classification process achieved by this algorithm is characterized as progressive because a coarse approximation to the true classification is generated rapidly and thereafter iteratively refined. The algorithm uses two SVMs: (1) a fast, approximate one and (2) slow, highly accurate one. New data are initially classified by the fast SVM, producing a baseline approximate classification. For each classified data point, the algorithm calculates a confidence index that indicates the likelihood that it was classified correctly in the first pass. Next, the data points are sorted by their confidence indices and progressively reclassified by the slower, more accurate SVM, starting with the items most likely to be incorrectly classified. The user can halt this reclassification process at any point, thereby obtaining the best possible result for a given amount of computation time. Alternatively, the results can be displayed as they are generated, providing the user with real-time feedback about the current accuracy of classification.

  14. Semisupervised Support Vector Machines With Tangent Space Intrinsic Manifold Regularization.

    PubMed

    Sun, Shiliang; Xie, Xijiong

    2016-09-01

    Semisupervised learning has been an active research topic in machine learning and data mining. One main reason is that labeling examples is expensive and time-consuming, while there are large numbers of unlabeled examples available in many practical problems. So far, Laplacian regularization has been widely used in semisupervised learning. In this paper, we propose a new regularization method called tangent space intrinsic manifold regularization. It is intrinsic to data manifold and favors linear functions on the manifold. Fundamental elements involved in the formulation of the regularization are local tangent space representations, which are estimated by local principal component analysis, and the connections that relate adjacent tangent spaces. Simultaneously, we explore its application to semisupervised classification and propose two new learning algorithms called tangent space intrinsic manifold regularized support vector machines (TiSVMs) and tangent space intrinsic manifold regularized twin SVMs (TiTSVMs). They effectively integrate the tangent space intrinsic manifold regularization consideration. The optimization of TiSVMs can be solved by a standard quadratic programming, while the optimization of TiTSVMs can be solved by a pair of standard quadratic programmings. The experimental results of semisupervised classification problems show the effectiveness of the proposed semisupervised learning algorithms.

  15. Fault diagnosis of automobile hydraulic brake system using statistical features and support vector machines

    NASA Astrophysics Data System (ADS)

    Jegadeeshwaran, R.; Sugumaran, V.

    2015-02-01

    Hydraulic brakes in automobiles are important components for the safety of passengers; therefore, the brakes are a good subject for condition monitoring. The condition of the brake components can be monitored by using the vibration characteristics. On-line condition monitoring by using machine learning approach is proposed in this paper as a possible solution to such problems. The vibration signals for both good as well as faulty conditions of brakes were acquired from a hydraulic brake test setup with the help of a piezoelectric transducer and a data acquisition system. Descriptive statistical features were extracted from the acquired vibration signals and the feature selection was carried out using the C4.5 decision tree algorithm. There is no specific method to find the right number of features required for classification for a given problem. Hence an extensive study is needed to find the optimum number of features. The effect of the number of features was also studied, by using the decision tree as well as Support Vector Machines (SVM). The selected features were classified using the C-SVM and Nu-SVM with different kernel functions. The results are discussed and the conclusion of the study is presented.

  16. A Support Vector Machine Approach for Truncated Fingerprint Image Detection from Sweeping Fingerprint Sensors

    PubMed Central

    Chen, Chi-Jim; Pai, Tun-Wen; Cheng, Mox

    2015-01-01

    A sweeping fingerprint sensor converts fingerprints on a row by row basis through image reconstruction techniques. However, a built fingerprint image might appear to be truncated and distorted when the finger was swept across a fingerprint sensor at a non-linear speed. If the truncated fingerprint images were enrolled as reference targets and collected by any automated fingerprint identification system (AFIS), successful prediction rates for fingerprint matching applications would be decreased significantly. In this paper, a novel and effective methodology with low time computational complexity was developed for detecting truncated fingerprints in a real time manner. Several filtering rules were implemented to validate existences of truncated fingerprints. In addition, a machine learning method of supported vector machine (SVM), based on the principle of structural risk minimization, was applied to reject pseudo truncated fingerprints containing similar characteristics of truncated ones. The experimental result has shown that an accuracy rate of 90.7% was achieved by successfully identifying truncated fingerprint images from testing images before AFIS enrollment procedures. The proposed effective and efficient methodology can be extensively applied to all existing fingerprint matching systems as a preliminary quality control prior to construction of fingerprint templates. PMID:25835186

  17. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression.

    PubMed

    Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson

    2010-08-01

    Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  18. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    NASA Astrophysics Data System (ADS)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  19. Predicting Protein-Protein Interactions by Combing Various Sequence-Derived.

    PubMed

    Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao

    2011-09-20

    Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.

  20. Spatial-spectral blood cell classification with microscopic hyperspectral imagery

    NASA Astrophysics Data System (ADS)

    Ran, Qiong; Chang, Lan; Li, Wei; Xu, Xiaofeng

    2017-10-01

    Microscopic hyperspectral images provide a new way for blood cell examination. The hyperspectral imagery can greatly facilitate the classification of different blood cells. In this paper, the microscopic hyperspectral images are acquired by connecting the microscope and the hyperspectral imager, and then tested for blood cell classification. For combined use of the spectral and spatial information provided by hyperspectral images, a spatial-spectral classification method is improved from the classical extreme learning machine (ELM) by integrating spatial context into the image classification task with Markov random field (MRF) model. Comparisons are done among ELM, ELM-MRF, support vector machines(SVM) and SVMMRF methods. Results show the spatial-spectral classification methods(ELM-MRF, SVM-MRF) perform better than pixel-based methods(ELM, SVM), and the proposed ELM-MRF has higher precision and show more accurate location of cells.

  1. A new discriminative kernel from probabilistic models.

    PubMed

    Tsuda, Koji; Kawanabe, Motoaki; Rätsch, Gunnar; Sonnenburg, Sören; Müller, Klaus-Robert

    2002-10-01

    Recently, Jaakkola and Haussler (1999) proposed a method for constructing kernel functions from probabilistic models. Their so-called Fisher kernel has been combined with discriminative classifiers such as support vector machines and applied successfully in, for example, DNA and protein analysis. Whereas the Fisher kernel is calculated from the marginal log-likelihood, we propose the TOP kernel derived; from tangent vectors of posterior log-odds. Furthermore, we develop a theoretical framework on feature extractors from probabilistic models and use it for analyzing the TOP kernel. In experiments, our new discriminative TOP kernel compares favorably to the Fisher kernel.

  2. Study of the modifications needed for efficient operation of NASTRAN on the Control Data Corporation STAR-100 computer

    NASA Technical Reports Server (NTRS)

    1975-01-01

    NASA structural analysis (NASTRAN) computer program is operational on three series of third generation computers. The problem and difficulties involved in adapting NASTRAN to a fourth generation computer, namely, the Control Data STAR-100, are discussed. The salient features which distinguish Control Data STAR-100 from third generation computers are hardware vector processing capability and virtual memory. A feasible method is presented for transferring NASTRAN to Control Data STAR-100 system while retaining much of the machine-independent code. Basic matrix operations are noted for optimization for vector processing.

  3. Figure of merit for macrouniformity based on image quality ruler evaluation and machine learning framework

    NASA Astrophysics Data System (ADS)

    Wang, Weibao; Overall, Gary; Riggs, Travis; Silveston-Keith, Rebecca; Whitney, Julie; Chiu, George; Allebach, Jan P.

    2013-01-01

    Assessment of macro-uniformity is a capability that is important for the development and manufacture of printer products. Our goal is to develop a metric that will predict macro-uniformity, as judged by human subjects, by scanning and analyzing printed pages. We consider two different machine learning frameworks for the metric: linear regression and the support vector machine. We have implemented the image quality ruler, based on the recommendations of the INCITS W1.1 macro-uniformity team. Using 12 subjects at Purdue University and 20 subjects at Lexmark, evenly balanced with respect to gender, we conducted subjective evaluations with a set of 35 uniform b/w prints from seven different printers with five levels of tint coverage. Our results suggest that the image quality ruler method provides a reliable means to assess macro-uniformity. We then defined and implemented separate features to measure graininess, mottle, large area variation, jitter, and large-scale non-uniformity. The algorithms that we used are largely based on ISO image quality standards. Finally, we used these features computed for a set of test pages and the subjects' image quality ruler assessments of these pages to train the two different predictors - one based on linear regression and the other based on the support vector machine (SVM). Using five-fold cross-validation, we confirmed the efficacy of our predictor.

  4. The classification of the patients with pulmonary diseases using breath air samples spectral analysis

    NASA Astrophysics Data System (ADS)

    Kistenev, Yury V.; Borisov, Alexey V.; Kuzmin, Dmitry A.; Bulanova, Anna A.

    2016-08-01

    Technique of exhaled breath sampling is discussed. The procedure of wavelength auto-calibration is proposed and tested. Comparison of the experimental data with the model absorption spectra of 5% CO2 is conducted. The classification results of three study groups obtained by using support vector machine and principal component analysis methods are presented.

  5. Discrimination of crop and weeds on visible and visible/near-infrared spectrums using support vector machines, artificial neural network and decision tree

    USDA-ARS?s Scientific Manuscript database

    Weeds are regarded as farmers' natural enemy. In order to avoid excessive pesticide residues, the destruction of ecological environment, and to guarantee the quality and safety of agricultural products, it is urgent to develop highly-efficient weed management methods. Amongst, weed discrimination is...

  6. Static vs. dynamic decoding algorithms in a non-invasive body-machine interface

    PubMed Central

    Seáñez-González, Ismael; Pierella, Camilla; Farshchiansadegh, Ali; Thorp, Elias B.; Abdollahi, Farnaz; Pedersen, Jessica; Mussa-Ivaldi, Ferdinando A.

    2017-01-01

    In this study, we consider a non-invasive body-machine interface that captures body motions still available to people with spinal cord injury (SCI) and maps them into a set of signals for controlling a computer user interface while engaging in a sustained level of mobility and exercise. We compare the effectiveness of two decoding algorithms that transform a high-dimensional body-signal vector into a lower dimensional control vector on 6 subjects with high-level SCI and 8 controls. One algorithm is based on a static map from current body signals to the current value of the control vector set through principal component analysis (PCA), the other on dynamic mapping a segment of body signals to the value and the temporal derivatives of the control vector set through a Kalman filter. SCI and control participants performed straighter and smoother cursor movements with the Kalman algorithm during center-out reaching, but their movements were faster and more precise when using PCA. All participants were able to use the BMI’s continuous, two-dimensional control to type on a virtual keyboard and play pong, and performance with both algorithms was comparable. However, seven of eight control participants preferred PCA as their method of virtual wheelchair control. The unsupervised PCA algorithm was easier to train and seemed sufficient to achieve a higher degree of learnability and perceived ease of use. PMID:28092564

  7. Extracting laboratory test information from biomedical text

    PubMed Central

    Kang, Yanna Shen; Kayaalp, Mehmet

    2013-01-01

    Background: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. Methods: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Results: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Conclusions: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. PMID:24083058

  8. Deep Learning Accurately Predicts Estrogen Receptor Status in Breast Cancer Metabolomics Data.

    PubMed

    Alakwaa, Fadhl M; Chaudhary, Kumardeep; Garmire, Lana X

    2018-01-05

    Metabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+), and 67 negative estrogen receptor (ER-) to test the accuracies of feed-forward networks, a deep learning (DL) framework, as well as six widely used machine learning models, namely random forest (RF), support vector machines (SVM), recursive partitioning and regression trees (RPART), linear discriminant analysis (LDA), prediction analysis for microarrays (PAM), and generalized boosted models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER- patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value <0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion and absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accuracy (AUC = 0.93) and better revelation of disease biology. We encourage the adoption of feed-forward networks based deep learning method in the metabolomics research community for classification.

  9. Perspectives on Machine Learning for Classification of Schizotypy Using fMRI Data.

    PubMed

    Madsen, Kristoffer H; Krohne, Laerke G; Cai, Xin-Lu; Wang, Yi; Chan, Raymond C K

    2018-03-15

    Functional magnetic resonance imaging is capable of estimating functional activation and connectivity in the human brain, and lately there has been increased interest in the use of these functional modalities combined with machine learning for identification of psychiatric traits. While these methods bear great potential for early diagnosis and better understanding of disease processes, there are wide ranges of processing choices and pitfalls that may severely hamper interpretation and generalization performance unless carefully considered. In this perspective article, we aim to motivate the use of machine learning schizotypy research. To this end, we describe common data processing steps while commenting on best practices and procedures. First, we introduce the important role of schizotypy to motivate the importance of reliable classification, and summarize existing machine learning literature on schizotypy. Then, we describe procedures for extraction of features based on fMRI data, including statistical parametric mapping, parcellation, complex network analysis, and decomposition methods, as well as classification with a special focus on support vector classification and deep learning. We provide more detailed descriptions and software as supplementary material. Finally, we present current challenges in machine learning for classification of schizotypy and comment on future trends and perspectives.

  10. A Collaborative Framework for Distributed Privacy-Preserving Support Vector Machine Learning

    PubMed Central

    Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2012-01-01

    A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates “privacy-insensitive” intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner. PMID:23304414

  11. Prototype Vector Machine for Large Scale Semi-Supervised Learning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Kai; Kwok, James T.; Parvin, Bahram

    2009-04-29

    Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of themore » kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.« less

  12. A novel representation for apoptosis protein subcellular localization prediction using support vector machine.

    PubMed

    Zhang, Li; Liao, Bo; Li, Dachao; Zhu, Wen

    2009-07-21

    Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test.

  13. Evaluation and recognition of skin images with aging by support vector machine

    NASA Astrophysics Data System (ADS)

    Hu, Liangjun; Wu, Shulian; Li, Hui

    2016-10-01

    Aging is a very important issue not only in dermatology, but also cosmetic science. Cutaneous aging involves both chronological and photoaging aging process. The evaluation and classification of aging is an important issue with the medical cosmetology workers nowadays. The purpose of this study is to assess chronological-age-related and photo-age-related of human skin. The texture features of skin surface skin, such as coarseness, contrast were analyzed by Fourier transform and Tamura. And the aim of it is to detect the object hidden in the skin texture in difference aging skin. Then, Support vector machine was applied to train the texture feature. The different age's states were distinguished by the support vector machine (SVM) classifier. The results help us to further understand the mechanism of different aging skin from texture feature and help us to distinguish the different aging states.

  14. A support vector machine approach for classification of welding defects from ultrasonic signals

    NASA Astrophysics Data System (ADS)

    Chen, Yuan; Ma, Hong-Wei; Zhang, Guang-Ming

    2014-07-01

    Defect classification is an important issue in ultrasonic non-destructive evaluation. A layered multi-class support vector machine (LMSVM) classification system, which combines multiple SVM classifiers through a layered architecture, is proposed in this paper. The proposed LMSVM classification system is applied to the classification of welding defects from ultrasonic test signals. The measured ultrasonic defect echo signals are first decomposed into wavelet coefficients by the wavelet packet transform. The energy of the wavelet coefficients at different frequency channels are used to construct the feature vectors. The bees algorithm (BA) is then used for feature selection and SVM parameter optimisation for the LMSVM classification system. The BA-based feature selection optimises the energy feature vectors. The optimised feature vectors are input to the LMSVM classification system for training and testing. Experimental results of classifying welding defects demonstrate that the proposed technique is highly robust, precise and reliable for ultrasonic defect classification.

  15. Support vector machine based decision for mechanical fault condition monitoring in induction motor using an advanced Hilbert-Park transform.

    PubMed

    Ben Salem, Samira; Bacha, Khmais; Chaari, Abdelkader

    2012-09-01

    In this work we suggest an original fault signature based on an improved combination of Hilbert and Park transforms. Starting from this combination we can create two fault signatures: Hilbert modulus current space vector (HMCSV) and Hilbert phase current space vector (HPCSV). These two fault signatures are subsequently analysed using the classical fast Fourier transform (FFT). The effects of mechanical faults on the HMCSV and HPCSV spectrums are described, and the related frequencies are determined. The magnitudes of spectral components, relative to the studied faults (air-gap eccentricity and outer raceway ball bearing defect), are extracted in order to develop the input vector necessary for learning and testing the support vector machine with an aim of classifying automatically the various states of the induction motor. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  16. Detecting epileptic seizure with different feature extracting strategies using robust machine learning classification techniques by applying advance parameter optimization approach.

    PubMed

    Hussain, Lal

    2018-06-01

    Epilepsy is a neurological disorder produced due to abnormal excitability of neurons in the brain. The research reveals that brain activity is monitored through electroencephalogram (EEG) of patients suffered from seizure to detect the epileptic seizure. The performance of EEG detection based epilepsy require feature extracting strategies. In this research, we have extracted varying features extracting strategies based on time and frequency domain characteristics, nonlinear, wavelet based entropy and few statistical features. A deeper study was undertaken using novel machine learning classifiers by considering multiple factors. The support vector machine kernels are evaluated based on multiclass kernel and box constraint level. Likewise, for K-nearest neighbors (KNN), we computed the different distance metrics, Neighbor weights and Neighbors. Similarly, the decision trees we tuned the paramours based on maximum splits and split criteria and ensemble classifiers are evaluated based on different ensemble methods and learning rate. For training/testing tenfold Cross validation was employed and performance was evaluated in form of TPR, NPR, PPV, accuracy and AUC. In this research, a deeper analysis approach was performed using diverse features extracting strategies using robust machine learning classifiers with more advanced optimal options. Support Vector Machine linear kernel and KNN with City block distance metric give the overall highest accuracy of 99.5% which was higher than using the default parameters for these classifiers. Moreover, highest separation (AUC = 0.9991, 0.9990) were obtained at different kernel scales using SVM. Additionally, the K-nearest neighbors with inverse squared distance weight give higher performance at different Neighbors. Moreover, to distinguish the postictal heart rate oscillations from epileptic ictal subjects, and highest performance of 100% was obtained using different machine learning classifiers.

  17. A new local-global approach for classification.

    PubMed

    Peres, R T; Pedreira, C E

    2010-09-01

    In this paper, we propose a new local-global pattern classification scheme that combines supervised and unsupervised approaches, taking advantage of both, local and global environments. We understand as global methods the ones concerned with the aim of constructing a model for the whole problem space using the totality of the available observations. Local methods focus into sub regions of the space, possibly using an appropriately selected subset of the sample. In the proposed method, the sample is first divided in local cells by using a Vector Quantization unsupervised algorithm, the LBG (Linde-Buzo-Gray). In a second stage, the generated assemblage of much easier problems is locally solved with a scheme inspired by Bayes' rule. Four classification methods were implemented for comparison purposes with the proposed scheme: Learning Vector Quantization (LVQ); Feedforward Neural Networks; Support Vector Machine (SVM) and k-Nearest Neighbors. These four methods and the proposed scheme were implemented in eleven datasets, two controlled experiments, plus nine public available datasets from the UCI repository. The proposed method has shown a quite competitive performance when compared to these classical and largely used classifiers. Our method is simple concerning understanding and implementation and is based on very intuitive concepts. Copyright 2010 Elsevier Ltd. All rights reserved.

  18. Machine learning classification with confidence: application of transductive conformal predictors to MRI-based diagnostic and prognostic markers in depression.

    PubMed

    Nouretdinov, Ilia; Costafreda, Sergi G; Gammerman, Alexander; Chervonenkis, Alexey; Vovk, Vladimir; Vapnik, Vladimir; Fu, Cynthia H Y

    2011-05-15

    There is rapidly accumulating evidence that the application of machine learning classification to neuroimaging measurements may be valuable for the development of diagnostic and prognostic prediction tools in psychiatry. However, current methods do not produce a measure of the reliability of the predictions. Knowing the risk of the error associated with a given prediction is essential for the development of neuroimaging-based clinical tools. We propose a general probabilistic classification method to produce measures of confidence for magnetic resonance imaging (MRI) data. We describe the application of transductive conformal predictor (TCP) to MRI images. TCP generates the most likely prediction and a valid measure of confidence, as well as the set of all possible predictions for a given confidence level. We present the theoretical motivation for TCP, and we have applied TCP to structural and functional MRI data in patients and healthy controls to investigate diagnostic and prognostic prediction in depression. We verify that TCP predictions are as accurate as those obtained with more standard machine learning methods, such as support vector machine, while providing the additional benefit of a valid measure of confidence for each prediction. Copyright © 2010 Elsevier Inc. All rights reserved.

  19. The Definition of Insulin Resistance Using HOMA-IR for Americans of Mexican Descent Using Machine Learning

    PubMed Central

    Qu, Hui-Qi; Li, Quan; Rentfro, Anne R.; Fisher-Hoch, Susan P.; McCormick, Joseph B.

    2011-01-01

    Objective The lack of standardized reference range for the homeostasis model assessment-estimated insulin resistance (HOMA-IR) index has limited its clinical application. This study defines the reference range of HOMA-IR index in an adult Hispanic population based with machine learning methods. Methods This study investigated a Hispanic population of 1854 adults, randomly selected on the basis of 2000 Census tract data in the city of Brownsville, Cameron County. Machine learning methods, support vector machine (SVM) and Bayesian Logistic Regression (BLR), were used to automatically identify measureable variables using standardized values that correlate with HOMA-IR; K-means clustering was then used to classify the individuals by insulin resistance. Results Our study showed that the best cutoff of HOMA-IR for identifying those with insulin resistance is 3.80. There are 39.1% individuals in this Hispanic population with HOMA-IR>3.80. Conclusions Our results are dramatically different using the popular clinical cutoff of 2.60. The high sensitivity and specificity of HOMA-IR>3.80 for insulin resistance provide a critical fundamental for our further efforts to improve the public health of this Hispanic population. PMID:21695082

  20. A Comparison of Machine Learning Approaches for Corn Yield Estimation

    NASA Astrophysics Data System (ADS)

    Kim, N.; Lee, Y. W.

    2017-12-01

    Machine learning is an efficient empirical method for classification and prediction, and it is another approach to crop yield estimation. The objective of this study is to estimate corn yield in the Midwestern United States by employing the machine learning approaches such as the support vector machine (SVM), random forest (RF), and deep neural networks (DNN), and to perform the comprehensive comparison for their results. We constructed the database using satellite images from MODIS, the climate data of PRISM climate group, and GLDAS soil moisture data. In addition, to examine the seasonal sensitivities of corn yields, two period groups were set up: May to September (MJJAS) and July and August (JA). In overall, the DNN showed the highest accuracies in term of the correlation coefficient for the two period groups. The differences between our predictions and USDA yield statistics were about 10-11 %.

Top