Sample records for sampling classifying multiclass

  1. Multiclass classification of microarray data samples with a reduced number of genes

    PubMed Central

    2011-01-01

    Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples. PMID:21342522

  2. EEG classification for motor imagery and resting state in BCI applications using multi-class Adaboost extreme learning machine

    NASA Astrophysics Data System (ADS)

    Gao, Lin; Cheng, Wei; Zhang, Jinhua; Wang, Jue

    2016-08-01

    Brain-computer interface (BCI) systems provide an alternative communication and control approach for people with limited motor function. Therefore, the feature extraction and classification approach should differentiate the relative unusual state of motion intention from a common resting state. In this paper, we sought a novel approach for multi-class classification in BCI applications. We collected electroencephalographic (EEG) signals registered by electrodes placed over the scalp during left hand motor imagery, right hand motor imagery, and resting state for ten healthy human subjects. We proposed using the Kolmogorov complexity (Kc) for feature extraction and a multi-class Adaboost classifier with extreme learning machine as base classifier for classification, in order to classify the three-class EEG samples. An average classification accuracy of 79.5% was obtained for ten subjects, which greatly outperformed commonly used approaches. Thus, it is concluded that the proposed method could improve the performance for classification of motor imagery tasks for multi-class samples. It could be applied in further studies to generate the control commands to initiate the movement of a robotic exoskeleton or orthosis, which finally facilitates the rehabilitation of disabled people.

  3. Application of machine learning on brain cancer multiclass classification

    NASA Astrophysics Data System (ADS)

    Panca, V.; Rustam, Z.

    2017-07-01

    Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.

  4. A fast learning method for large scale and multi-class samples of SVM

    NASA Astrophysics Data System (ADS)

    Fan, Yu; Guo, Huiming

    2017-06-01

    A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.

  5. Multivariate detrending of fMRI signal drifts for real-time multiclass pattern classification.

    PubMed

    Lee, Dongha; Jang, Changwon; Park, Hae-Jeong

    2015-03-01

    Signal drift in functional magnetic resonance imaging (fMRI) is an unavoidable artifact that limits classification performance in multi-voxel pattern analysis of fMRI. As conventional methods to reduce signal drift, global demeaning or proportional scaling disregards regional variations of drift, whereas voxel-wise univariate detrending is too sensitive to noisy fluctuations. To overcome these drawbacks, we propose a multivariate real-time detrending method for multiclass classification that involves spatial demeaning at each scan and the recursive detrending of drifts in the classifier outputs driven by a multiclass linear support vector machine. Experiments using binary and multiclass data showed that the linear trend estimation of the classifier output drift for each class (a weighted sum of drifts in the class-specific voxels) was more robust against voxel-wise artifacts that lead to inconsistent spatial patterns and the effect of online processing than voxel-wise detrending. The classification performance of the proposed method was significantly better, especially for multiclass data, than that of voxel-wise linear detrending, global demeaning, and classifier output detrending without demeaning. We concluded that the multivariate approach using classifier output detrending of fMRI signals with spatial demeaning preserves spatial patterns, is less sensitive than conventional methods to sample size, and increases classification performance, which is a useful feature for real-time fMRI classification. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Support vector machines-based fault diagnosis for turbo-pump rotor

    NASA Astrophysics Data System (ADS)

    Yuan, Sheng-Fa; Chu, Fu-Lei

    2006-05-01

    Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.

  7. Multiclass cancer diagnosis using tumor gene expression signatures

    DOE PAGES

    Ramaswamy, S.; Tamayo, P.; Rifkin, R.; ...

    2001-12-11

    The optimal treatment of patients with cancer depends on establishing accurate diagnoses by using a complex combination of clinical and histopathological data. In some instances, this task is difficult or impossible because of atypical clinical presentation or histopathology. To determine whether the diagnosis of multiple common adult malignancies could be achieved purely by molecular classification, we subjected 218 tumor samples, spanning 14 common tumor types, and 90 normal tissue samples to oligonucleotide microarray gene expression analysis. The expression levels of 16,063 genes and expressed sequence tags were used to evaluate the accuracy of a multiclass classifier based on a supportmore » vector machine algorithm. Overall classification accuracy was 78%, far exceeding the accuracy of random classification (9%). Poorly differentiated cancers resulted in low-confidence predictions and could not be accurately classified according to their tissue of origin, indicating that they are molecularly distinct entities with dramatically different gene expression patterns compared with their well differentiated counterparts. Taken together, these results demonstrate the feasibility of accurate, multiclass molecular cancer classification and suggest a strategy for future clinical implementation of molecular cancer diagnostics.« less

  8. Combining multiple decisions: applications to bioinformatics

    NASA Astrophysics Data System (ADS)

    Yukinawa, N.; Takenouchi, T.; Oba, S.; Ishii, S.

    2008-01-01

    Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods.

  9. Mexican Hat Wavelet Kernel ELM for Multiclass Classification.

    PubMed

    Wang, Jie; Song, Yi-Fan; Ma, Tian-Lei

    2017-01-01

    Kernel extreme learning machine (KELM) is a novel feedforward neural network, which is widely used in classification problems. To some extent, it solves the existing problems of the invalid nodes and the large computational complexity in ELM. However, the traditional KELM classifier usually has a low test accuracy when it faces multiclass classification problems. In order to solve the above problem, a new classifier, Mexican Hat wavelet KELM classifier, is proposed in this paper. The proposed classifier successfully improves the training accuracy and reduces the training time in the multiclass classification problems. Moreover, the validity of the Mexican Hat wavelet as a kernel function of ELM is rigorously proved. Experimental results on different data sets show that the performance of the proposed classifier is significantly superior to the compared classifiers.

  10. Pairwise Classifier Ensemble with Adaptive Sub-Classifiers for fMRI Pattern Analysis.

    PubMed

    Kim, Eunwoo; Park, HyunWook

    2017-02-01

    The multi-voxel pattern analysis technique is applied to fMRI data for classification of high-level brain functions using pattern information distributed over multiple voxels. In this paper, we propose a classifier ensemble for multiclass classification in fMRI analysis, exploiting the fact that specific neighboring voxels can contain spatial pattern information. The proposed method converts the multiclass classification to a pairwise classifier ensemble, and each pairwise classifier consists of multiple sub-classifiers using an adaptive feature set for each class-pair. Simulated and real fMRI data were used to verify the proposed method. Intra- and inter-subject analyses were performed to compare the proposed method with several well-known classifiers, including single and ensemble classifiers. The comparison results showed that the proposed method can be generally applied to multiclass classification in both simulations and real fMRI analyses.

  11. Optimal aggregation of binary classifiers for multiclass cancer diagnosis using gene expression profiles.

    PubMed

    Yukinawa, Naoto; Oba, Shigeyuki; Kato, Kikuya; Ishii, Shin

    2009-01-01

    Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the "optimal coding problem," has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.

  12. Discriminant analysis for fast multiclass data classification through regularized kernel function approximation.

    PubMed

    Ghorai, Santanu; Mukherjee, Anirban; Dutta, Pranab K

    2010-06-01

    In this brief we have proposed the multiclass data classification by computationally inexpensive discriminant analysis through vector-valued regularized kernel function approximation (VVRKFA). VVRKFA being an extension of fast regularized kernel function approximation (FRKFA), provides the vector-valued response at single step. The VVRKFA finds a linear operator and a bias vector by using a reduced kernel that maps a pattern from feature space into the low dimensional label space. The classification of patterns is carried out in this low dimensional label subspace. A test pattern is classified depending on its proximity to class centroids. The effectiveness of the proposed method is experimentally verified and compared with multiclass support vector machine (SVM) on several benchmark data sets as well as on gene microarray data for multi-category cancer classification. The results indicate the significant improvement in both training and testing time compared to that of multiclass SVM with comparable testing accuracy principally in large data sets. Experiments in this brief also serve as comparison of performance of VVRKFA with stratified random sampling and sub-sampling.

  13. Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery.

    PubMed

    Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I

    2017-01-01

    A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.

  14. Classifying Physical Morphology of Cocoa Beans Digital Images using Multiclass Ensemble Least-Squares Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Lawi, Armin; Adhitya, Yudhi

    2018-03-01

    The objective of this research is to determine the quality of cocoa beans through morphology of their digital images. Samples of cocoa beans were scattered on a bright white paper under a controlled lighting condition. A compact digital camera was used to capture the images. The images were then processed to extract their morphological parameters. Classification process begins with an analysis of cocoa beans image based on morphological feature extraction. Parameters for extraction of morphological or physical feature parameters, i.e., Area, Perimeter, Major Axis Length, Minor Axis Length, Aspect Ratio, Circularity, Roundness, Ferret Diameter. The cocoa beans are classified into 4 groups, i.e.: Normal Beans, Broken Beans, Fractured Beans, and Skin Damaged Beans. The model of classification used in this paper is the Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM), a proposed improvement model of SVM using ensemble method in which the separate hyperplanes are obtained by least square approach and the multiclass procedure uses One-Against- All method. The result of our proposed model showed that the classification with morphological feature input parameters were accurately as 99.705% for the four classes, respectively.

  15. Building Multiclass Classifiers for Remote Homology Detection and Fold Recognition

    DTIC Science & Technology

    2006-04-05

    classes. In this study we evaluate the effectiveness of one of these formulations that was developed by Crammer and Singer [9], which leads to...significantly more complex model can be learned by directly applying the Crammer -Singer multiclass formulation on the outputs of the binary classifiers...will refer to this as the Crammer -Singer (CS) model. Comparing the scaling approach to the Crammer -Singer approach we can see that the Crammer -Singer

  16. Multi-class texture analysis in colorectal cancer histology

    NASA Astrophysics Data System (ADS)

    Kather, Jakob Nikolas; Weis, Cleo-Aron; Bianconi, Francesco; Melchers, Susanne M.; Schad, Lothar R.; Gaiser, Timo; Marx, Alexander; Zöllner, Frank Gerrit

    2016-06-01

    Automatic recognition of different tissue types in histological images is an essential part in the digital pathology toolbox. Texture analysis is commonly used to address this problem; mainly in the context of estimating the tumour/stroma ratio on histological samples. However, although histological images typically contain more than two tissue types, only few studies have addressed the multi-class problem. For colorectal cancer, one of the most prevalent tumour types, there are in fact no published results on multiclass texture separation. In this paper we present a new dataset of 5,000 histological images of human colorectal cancer including eight different types of tissue. We used this set to assess the classification performance of a wide range of texture descriptors and classifiers. As a result, we found an optimal classification strategy that markedly outperformed traditional methods, improving the state of the art for tumour-stroma separation from 96.9% to 98.6% accuracy and setting a new standard for multiclass tissue separation (87.4% accuracy for eight classes). We make our dataset of histological images publicly available under a Creative Commons license and encourage other researchers to use it as a benchmark for their studies.

  17. Multi-class Mode of Action Classification of Toxic Compounds Using Logic Based Kernel Methods.

    PubMed

    Lodhi, Huma; Muggleton, Stephen; Sternberg, Mike J E

    2010-09-17

    Toxicity prediction is essential for drug design and development of effective therapeutics. In this paper we present an in silico strategy, to identify the mode of action of toxic compounds, that is based on the use of a novel logic based kernel method. The technique uses support vector machines in conjunction with the kernels constructed from first order rules induced by an Inductive Logic Programming system. It constructs multi-class models by using a divide and conquer reduction strategy that splits multi-classes into binary groups and solves each individual problem recursively hence generating an underlying decision list structure. In order to evaluate the effectiveness of the approach for chemoinformatics problems like predictive toxicology, we apply it to toxicity classification in aquatic systems. The method is used to identify and classify 442 compounds with respect to the mode of action. The experimental results show that the technique successfully classifies toxic compounds and can be useful in assessing environmental risks. Experimental comparison of the performance of the proposed multi-class scheme with the standard multi-class Inductive Logic Programming algorithm and multi-class Support Vector Machine yields statistically significant results and demonstrates the potential power and benefits of the approach in identifying compounds of various toxic mechanisms. Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. The construction of support vector machine classifier using the firefly algorithm.

    PubMed

    Chao, Chih-Feng; Horng, Ming-Huwi

    2015-01-01

    The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.

  19. The Construction of Support Vector Machine Classifier Using the Firefly Algorithm

    PubMed Central

    Chao, Chih-Feng; Horng, Ming-Huwi

    2015-01-01

    The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy. PMID:25802511

  20. Detection of circuit-board components with an adaptive multiclass correlation filter

    NASA Astrophysics Data System (ADS)

    Diaz-Ramirez, Victor H.; Kober, Vitaly

    2008-08-01

    A new method for reliable detection of circuit-board components is proposed. The method is based on an adaptive multiclass composite correlation filter. The filter is designed with the help of an iterative algorithm using complex synthetic discriminant functions. The impulse response of the filter contains information needed to localize and classify geometrically distorted circuit-board components belonging to different classes. Computer simulation results obtained with the proposed method are provided and compared with those of known multiclass correlation based techniques in terms of performance criteria for recognition and classification of objects.

  1. Metal Oxide Gas Sensor Drift Compensation Using a Two-Dimensional Classifier Ensemble

    PubMed Central

    Liu, Hang; Chu, Renzhi; Tang, Zhenan

    2015-01-01

    Sensor drift is the most challenging problem in gas sensing at present. We propose a novel two-dimensional classifier ensemble strategy to solve the gas discrimination problem, regardless of the gas concentration, with high accuracy over extended periods of time. This strategy is appropriate for multi-class classifiers that consist of combinations of pairwise classifiers, such as support vector machines. We compare the performance of the strategy with those of competing methods in an experiment based on a public dataset that was compiled over a period of three years. The experimental results demonstrate that the two-dimensional ensemble outperforms the other methods considered. Furthermore, we propose a pre-aging process inspired by that applied to the sensors to improve the stability of the classifier ensemble. The experimental results demonstrate that the weight of each multi-class classifier model in the ensemble remains fairly static before and after the addition of new classifier models to the ensemble, when a pre-aging procedure is applied. PMID:25942640

  2. SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition

    PubMed Central

    Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina

    2007-01-01

    Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145

  3. Classification of Automated Search Traffic

    NASA Astrophysics Data System (ADS)

    Buehrer, Greg; Stokes, Jack W.; Chellapilla, Kumar; Platt, John C.

    As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.

  4. Building gene expression profile classifiers with a simple and efficient rejection option in R.

    PubMed

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez

    2011-01-01

    The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be available.

  5. [Research on the methods for multi-class kernel CSP-based feature extraction].

    PubMed

    Wang, Jinjia; Zhang, Lingzhi; Hu, Bei

    2012-04-01

    To relax the presumption of strictly linear patterns in the common spatial patterns (CSP), we studied the kernel CSP (KCSP). A new multi-class KCSP (MKCSP) approach was proposed in this paper, which combines the kernel approach with multi-class CSP technique. In this approach, we used kernel spatial patterns for each class against all others, and extracted signal components specific to one condition from EEG data sets of multiple conditions. Then we performed classification using the Logistic linear classifier. Brain computer interface (BCI) competition III_3a was used in the experiment. Through the experiment, it can be proved that this approach could decompose the raw EEG singles into spatial patterns extracted from multi-class of single trial EEG, and could obtain good classification results.

  6. Multiclass Continuous Correspondence Learning

    NASA Technical Reports Server (NTRS)

    Bue, Brian D,; Thompson, David R.

    2011-01-01

    We extend the Structural Correspondence Learning (SCL) domain adaptation algorithm of Blitzer er al. to the realm of continuous signals. Given a set of labeled examples belonging to a 'source' domain, we select a set of unlabeled examples in a related 'target' domain that play similar roles in both domains. Using these 'pivot samples, we map both domains into a common feature space, allowing us to adapt a classifier trained on source examples to classify target examples. We show that when between-class distances are relatively preserved across domains, we can automatically select target pivots to bring the domains into correspondence.

  7. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

    PubMed

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.

  8. SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier

    PubMed Central

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306

  9. Gene-Based Multiclass Cancer Diagnosis with Class-Selective Rejections

    PubMed Central

    Jrad, Nisrine; Grall-Maës, Edith; Beauseroy, Pierre

    2009-01-01

    Supervised learning of microarray data is receiving much attention in recent years. Multiclass cancer diagnosis, based on selected gene profiles, are used as adjunct of clinical diagnosis. However, supervised diagnosis may hinder patient care, add expense or confound a result. To avoid this misleading, a multiclass cancer diagnosis with class-selective rejection is proposed. It rejects some patients from one, some, or all classes in order to ensure a higher reliability while reducing time and expense costs. Moreover, this classifier takes into account asymmetric penalties dependant on each class and on each wrong or partially correct decision. It is based on ν-1-SVM coupled with its regularization path and minimizes a general loss function defined in the class-selective rejection scheme. The state of art multiclass algorithms can be considered as a particular case of the proposed algorithm where the number of decisions is given by the classes and the loss function is defined by the Bayesian risk. Two experiments are carried out in the Bayesian and the class selective rejection frameworks. Five genes selected datasets are used to assess the performance of the proposed method. Results are discussed and accuracies are compared with those computed by the Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector Machines classifiers. PMID:19584932

  10. A comparative study of surface EMG classification by fuzzy relevance vector machine and fuzzy support vector machine.

    PubMed

    Xie, Hong-Bo; Huang, Hu; Wu, Jianhua; Liu, Lei

    2015-02-01

    We present a multiclass fuzzy relevance vector machine (FRVM) learning mechanism and evaluate its performance to classify multiple hand motions using surface electromyographic (sEMG) signals. The relevance vector machine (RVM) is a sparse Bayesian kernel method which avoids some limitations of the support vector machine (SVM). However, RVM still suffers the difficulty of possible unclassifiable regions in multiclass problems. We propose two fuzzy membership function-based FRVM algorithms to solve such problems, based on experiments conducted on seven healthy subjects and two amputees with six hand motions. Two feature sets, namely, AR model coefficients and room mean square value (AR-RMS), and wavelet transform (WT) features, are extracted from the recorded sEMG signals. Fuzzy support vector machine (FSVM) analysis was also conducted for wide comparison in terms of accuracy, sparsity, training and testing time, as well as the effect of training sample sizes. FRVM yielded comparable classification accuracy with dramatically fewer support vectors in comparison with FSVM. Furthermore, the processing delay of FRVM was much less than that of FSVM, whilst training time of FSVM much faster than FRVM. The results indicate that FRVM classifier trained using sufficient samples can achieve comparable generalization capability as FSVM with significant sparsity in multi-channel sEMG classification, which is more suitable for sEMG-based real-time control applications.

  11. A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data.

    PubMed

    Collell, Guillem; Prelec, Drazen; Patil, Kaustubh R

    2018-01-31

    Class imbalance presents a major hurdle in the application of classification methods. A commonly taken approach is to learn ensembles of classifiers using rebalanced data. Examples include bootstrap averaging (bagging) combined with either undersampling or oversampling of the minority class examples. However, rebalancing methods entail asymmetric changes to the examples of different classes, which in turn can introduce their own biases. Furthermore, these methods often require specifying the performance measure of interest a priori, i.e., before learning. An alternative is to employ the threshold moving technique, which applies a threshold to the continuous output of a model, offering the possibility to adapt to a performance measure a posteriori , i.e., a plug-in method. Surprisingly, little attention has been paid to this combination of a bagging ensemble and threshold-moving. In this paper, we study this combination and demonstrate its competitiveness. Contrary to the other resampling methods, we preserve the natural class distribution of the data resulting in well-calibrated posterior probabilities. Additionally, we extend the proposed method to handle multiclass data. We validated our method on binary and multiclass benchmark data sets by using both, decision trees and neural networks as base classifiers. We perform analyses that provide insights into the proposed method.

  12. Fisher classifier and its probability of error estimation

    NASA Technical Reports Server (NTRS)

    Chittineni, C. B.

    1979-01-01

    Computationally efficient expressions are derived for estimating the probability of error using the leave-one-out method. The optimal threshold for the classification of patterns projected onto Fisher's direction is derived. A simple generalization of the Fisher classifier to multiple classes is presented. Computational expressions are developed for estimating the probability of error of the multiclass Fisher classifier.

  13. Action Recognition Using 3D Histograms of Texture and A Multi-Class Boosting Classifier.

    PubMed

    Zhang, Baochang; Yang, Yun; Chen, Chen; Yang, Linlin; Han, Jungong; Shao, Ling

    2017-10-01

    Human action recognition is an important yet challenging task. This paper presents a low-cost descriptor called 3D histograms of texture (3DHoTs) to extract discriminant features from a sequence of depth maps. 3DHoTs are derived from projecting depth frames onto three orthogonal Cartesian planes, i.e., the frontal, side, and top planes, and thus compactly characterize the salient information of a specific action, on which texture features are calculated to represent the action. Besides this fast feature descriptor, a new multi-class boosting classifier (MBC) is also proposed to efficiently exploit different kinds of features in a unified framework for action classification. Compared with the existing boosting frameworks, we add a new multi-class constraint into the objective function, which helps to maintain a better margin distribution by maximizing the mean of margin, whereas still minimizing the variance of margin. Experiments on the MSRAction3D, MSRGesture3D, MSRActivity3D, and UTD-MHAD data sets demonstrate that the proposed system combining 3DHoTs and MBC is superior to the state of the art.

  14. Circular blurred shape model for multiclass symbol recognition.

    PubMed

    Escalera, Sergio; Fornés, Alicia; Pujol, Oriol; Lladós, Josep; Radeva, Petia

    2011-04-01

    In this paper, we propose a circular blurred shape model descriptor to deal with the problem of symbol detection and classification as a particular case of object recognition. The feature extraction is performed by capturing the spatial arrangement of significant object characteristics in a correlogram structure. The shape information from objects is shared among correlogram regions, where a prior blurring degree defines the level of distortion allowed in the symbol, making the descriptor tolerant to irregular deformations. Moreover, the descriptor is rotation invariant by definition. We validate the effectiveness of the proposed descriptor in both the multiclass symbol recognition and symbol detection domains. In order to perform the symbol detection, the descriptors are learned using a cascade of classifiers. In the case of multiclass categorization, the new feature space is learned using a set of binary classifiers which are embedded in an error-correcting output code design. The results over four symbol data sets show the significant improvements of the proposed descriptor compared to the state-of-the-art descriptors. In particular, the results are even more significant in those cases where the symbols suffer from elastic deformations.

  15. Non-Mutually Exclusive Deep Neural Network Classifier for Combined Modes of Bearing Fault Diagnosis.

    PubMed

    Duong, Bach Phi; Kim, Jong-Myon

    2018-04-07

    The simultaneous occurrence of various types of defects in bearings makes their diagnosis more challenging owing to the resultant complexity of the constituent parts of the acoustic emission (AE) signals. To address this issue, a new approach is proposed in this paper for the detection of multiple combined faults in bearings. The proposed methodology uses a deep neural network (DNN) architecture to effectively diagnose the combined defects. The DNN structure is based on the stacked denoising autoencoder non-mutually exclusive classifier (NMEC) method for combined modes. The NMEC-DNN is trained using data for a single fault and it classifies both single faults and multiple combined faults. The results of experiments conducted on AE data collected through an experimental test-bed demonstrate that the DNN achieves good classification performance with a maximum accuracy of 95%. The proposed method is compared with a multi-class classifier based on support vector machines (SVMs). The NMEC-DNN yields better diagnostic performance in comparison to the multi-class classifier based on SVM. The NMEC-DNN reduces the number of necessary data collections and improves the bearing fault diagnosis performance.

  16. Automatic classification and detection of clinically relevant images for diabetic retinopathy

    NASA Astrophysics Data System (ADS)

    Xu, Xinyu; Li, Baoxin

    2008-03-01

    We proposed a novel approach to automatic classification of Diabetic Retinopathy (DR) images and retrieval of clinically-relevant DR images from a database. Given a query image, our approach first classifies the image into one of the three categories: microaneurysm (MA), neovascularization (NV) and normal, and then it retrieves DR images that are clinically-relevant to the query image from an archival image database. In the classification stage, the query DR images are classified by the Multi-class Multiple-Instance Learning (McMIL) approach, where images are viewed as bags, each of which contains a number of instances corresponding to non-overlapping blocks, and each block is characterized by low-level features including color, texture, histogram of edge directions, and shape. McMIL first learns a collection of instance prototypes for each class that maximizes the Diverse Density function using Expectation- Maximization algorithm. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a point in a new multi-class bag feature space. Finally a multi-class Support Vector Machine is trained in the multi-class bag feature space. In the retrieval stage, we retrieve images from the archival database who bear the same label with the query image, and who are the top K nearest neighbors of the query image in terms of similarity in the multi-class bag feature space. The classification approach achieves high classification accuracy, and the retrieval of clinically-relevant images not only facilitates utilization of the vast amount of hidden diagnostic knowledge in the database, but also improves the efficiency and accuracy of DR lesion diagnosis and assessment.

  17. Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals

    DTIC Science & Technology

    2012-09-30

    floor 1176 Howell St Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR...multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et al. 2008). This classifier both detects and classifies echolocation...whales. Here Moretti’s group, especially S. Jarvis , will improve the SVM classifier by resolving confusion between species whose clicks overlap in

  18. Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals

    DTIC Science & Technology

    2013-09-30

    N0001411WX21394 Steve W. Martin SPAWAR Systems Center Pacific 53366 Front St. San Diego, CA 92152-6551 phone: (619) 553-9882 email: Steve.W.Martin...multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et al. 2008). This classifier both detects and classifies echolocation...whales. Here Moretti’s group, particularly S. Jarvis , will improve the SVM classifier by resolving confusion between species whose clicks overlap in

  19. Embedded feature ranking for ensemble MLP classifiers.

    PubMed

    Windeatt, Terry; Duangsoithong, Rakkrit; Smith, Raymond

    2011-06-01

    A feature ranking scheme for multilayer perceptron (MLP) ensembles is proposed, along with a stopping criterion based upon the out-of-bootstrap estimate. To solve multi-class problems feature ranking is combined with modified error-correcting output coding. Experimental results on benchmark data demonstrate the versatility of the MLP base classifier in removing irrelevant features.

  20. Automatic threshold selection for multi-class open set recognition

    NASA Astrophysics Data System (ADS)

    Scherreik, Matthew; Rigling, Brian

    2017-05-01

    Multi-class open set recognition is the problem of supervised classification with additional unknown classes encountered after a model has been trained. An open set classifer often has two core components. The first component is a base classifier which estimates the most likely class of a given example. The second component consists of open set logic which estimates if the example is truly a member of the candidate class. Such a system is operated in a feed-forward fashion. That is, a candidate label is first estimated by the base classifier, and the true membership of the example to the candidate class is estimated afterward. Previous works have developed an iterative threshold selection algorithm for rejecting examples from classes which were not present at training time. In those studies, a Platt-calibrated SVM was used as the base classifier, and the thresholds were applied to class posterior probabilities for rejection. In this work, we investigate the effectiveness of other base classifiers when paired with the threshold selection algorithm and compare their performance with the original SVM solution.

  1. Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.

    PubMed

    Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin

    2013-01-01

    DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.

  2. Single classifier, OvO, OvA and RCC multiclass classification method in handheld based smartphone gait identification

    NASA Astrophysics Data System (ADS)

    Raziff, Abdul Rafiez Abdul; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran

    2017-10-01

    Gait recognition is widely used in many applications. In the application of the gait identification especially in people, the number of classes (people) is many which may comprise to more than 20. Due to the large amount of classes, the usage of single classification mapping (direct classification) may not be suitable as most of the existing algorithms are mostly designed for the binary classification. Furthermore, having many classes in a dataset may result in the possibility of having a high degree of overlapped class boundary. This paper discusses the application of multiclass classifier mappings such as one-vs-all (OvA), one-vs-one (OvO) and random correction code (RCC) on handheld based smartphone gait signal for person identification. The results is then compared with a single J48 decision tree for benchmark. From the result, it can be said that using multiclass classification mapping method thus partially improved the overall accuracy especially on OvO and RCC with width factor more than 4. For OvA, the accuracy result is worse than a single J48 due to a high number of classes.

  3. Non-Mutually Exclusive Deep Neural Network Classifier for Combined Modes of Bearing Fault Diagnosis

    PubMed Central

    Kim, Jong-Myon

    2018-01-01

    The simultaneous occurrence of various types of defects in bearings makes their diagnosis more challenging owing to the resultant complexity of the constituent parts of the acoustic emission (AE) signals. To address this issue, a new approach is proposed in this paper for the detection of multiple combined faults in bearings. The proposed methodology uses a deep neural network (DNN) architecture to effectively diagnose the combined defects. The DNN structure is based on the stacked denoising autoencoder non-mutually exclusive classifier (NMEC) method for combined modes. The NMEC-DNN is trained using data for a single fault and it classifies both single faults and multiple combined faults. The results of experiments conducted on AE data collected through an experimental test-bed demonstrate that the DNN achieves good classification performance with a maximum accuracy of 95%. The proposed method is compared with a multi-class classifier based on support vector machines (SVMs). The NMEC-DNN yields better diagnostic performance in comparison to the multi-class classifier based on SVM. The NMEC-DNN reduces the number of necessary data collections and improves the bearing fault diagnosis performance. PMID:29642466

  4. Multiclass Bayes error estimation by a feature space sampling technique

    NASA Technical Reports Server (NTRS)

    Mobasseri, B. G.; Mcgillem, C. D.

    1979-01-01

    A general Gaussian M-class N-feature classification problem is defined. An algorithm is developed that requires the class statistics as its only input and computes the minimum probability of error through use of a combined analytical and numerical integration over a sequence simplifying transformations of the feature space. The results are compared with those obtained by conventional techniques applied to a 2-class 4-feature discrimination problem with results previously reported and 4-class 4-feature multispectral scanner Landsat data classified by training and testing of the available data.

  5. Diagnosis of oral lichen planus from analysis of saliva samples using terahertz time-domain spectroscopy and chemometrics

    NASA Astrophysics Data System (ADS)

    Kistenev, Yury V.; Borisov, Alexey V.; Titarenko, Maria A.; Baydik, Olga D.; Shapovalov, Alexander V.

    2018-04-01

    The ability to diagnose oral lichen planus (OLP) based on saliva analysis using THz time-domain spectroscopy and chemometrics is discussed. The study involved 30 patients (2 male and 28 female) with OLP. This group consisted of two subgroups with the erosive form of OLP (n = 15) and with the reticular and papular forms of OLP (n = 15). The control group consisted of six healthy volunteers (one male and five females) without inflammation in the mucous membrane in the oral cavity and without periodontitis. Principal component analysis was used to reveal informative features in the experimental data. The one-versus-one multiclass classifier using support vector machine binary classifiers was used. The two-stage classification approach using several absorption spectra scans for an individual saliva sample provided 100% accuracy of differential classification between OLP subgroups and control group.

  6. Seismic Data Analysis throught Multi-Class Classification.

    NASA Astrophysics Data System (ADS)

    Anderson, P.; Kappedal, R. D.; Magana-Zook, S. A.

    2017-12-01

    In this research, we conducted twenty experiments of varying time and frequency bands on 5000seismic signals with the intent of finding a method to classify signals as either an explosion or anearthquake in an automated fashion. We used a multi-class approach by clustering of the data throughvarious techniques. Dimensional reduction was examined through the use of wavelet transforms withthe use of the coiflet mother wavelet and various coefficients to explore possible computational time vsaccuracy dependencies. Three and four classes were generated from the clustering techniques andexamined with the three class approach producing the most accurate and realistic results.

  7. A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets

    NASA Astrophysics Data System (ADS)

    Bikku, Thulasi; Sambasiva Rao, N., Dr; Rao, Akepogu Ananda, Dr

    2017-08-01

    This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases. Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.

  8. Multiclass Classification for the Differential Diagnosis on the ADHD Subtypes Using Recursive Feature Elimination and Hierarchical Extreme Learning Machine: Structural MRI Study

    PubMed Central

    Qureshi, Muhammad Naveed Iqbal; Min, Beomjun; Jo, Hang Joon; Lee, Boreom

    2016-01-01

    The classification of neuroimaging data for the diagnosis of certain brain diseases is one of the main research goals of the neuroscience and clinical communities. In this study, we performed multiclass classification using a hierarchical extreme learning machine (H-ELM) classifier. We compared the performance of this classifier with that of a support vector machine (SVM) and basic extreme learning machine (ELM) for cortical MRI data from attention deficit/hyperactivity disorder (ADHD) patients. We used 159 structural MRI images of children from the publicly available ADHD-200 MRI dataset. The data consisted of three types, namely, typically developing (TDC), ADHD-inattentive (ADHD-I), and ADHD-combined (ADHD-C). We carried out feature selection by using standard SVM-based recursive feature elimination (RFE-SVM) that enabled us to achieve good classification accuracy (60.78%). In this study, we found the RFE-SVM feature selection approach in combination with H-ELM to effectively enable the acquisition of high multiclass classification accuracy rates for structural neuroimaging data. In addition, we found that the most important features for classification were the surface area of the superior frontal lobe, and the cortical thickness, volume, and mean surface area of the whole cortex. PMID:27500640

  9. Multiclass Classification for the Differential Diagnosis on the ADHD Subtypes Using Recursive Feature Elimination and Hierarchical Extreme Learning Machine: Structural MRI Study.

    PubMed

    Qureshi, Muhammad Naveed Iqbal; Min, Beomjun; Jo, Hang Joon; Lee, Boreom

    2016-01-01

    The classification of neuroimaging data for the diagnosis of certain brain diseases is one of the main research goals of the neuroscience and clinical communities. In this study, we performed multiclass classification using a hierarchical extreme learning machine (H-ELM) classifier. We compared the performance of this classifier with that of a support vector machine (SVM) and basic extreme learning machine (ELM) for cortical MRI data from attention deficit/hyperactivity disorder (ADHD) patients. We used 159 structural MRI images of children from the publicly available ADHD-200 MRI dataset. The data consisted of three types, namely, typically developing (TDC), ADHD-inattentive (ADHD-I), and ADHD-combined (ADHD-C). We carried out feature selection by using standard SVM-based recursive feature elimination (RFE-SVM) that enabled us to achieve good classification accuracy (60.78%). In this study, we found the RFE-SVM feature selection approach in combination with H-ELM to effectively enable the acquisition of high multiclass classification accuracy rates for structural neuroimaging data. In addition, we found that the most important features for classification were the surface area of the superior frontal lobe, and the cortical thickness, volume, and mean surface area of the whole cortex.

  10. Multiclass Classification of Cardiac Arrhythmia Using Improved Feature Selection and SVM Invariants.

    PubMed

    Mustaqeem, Anam; Anwar, Syed Muhammad; Majid, Muahammad

    2018-01-01

    Arrhythmia is considered a life-threatening disease causing serious health issues in patients, when left untreated. An early diagnosis of arrhythmias would be helpful in saving lives. This study is conducted to classify patients into one of the sixteen subclasses, among which one class represents absence of disease and the other fifteen classes represent electrocardiogram records of various subtypes of arrhythmias. The research is carried out on the dataset taken from the University of California at Irvine Machine Learning Data Repository. The dataset contains a large volume of feature dimensions which are reduced using wrapper based feature selection technique. For multiclass classification, support vector machine (SVM) based approaches including one-against-one (OAO), one-against-all (OAA), and error-correction code (ECC) are employed to detect the presence and absence of arrhythmias. The SVM method results are compared with other standard machine learning classifiers using varying parameters and the performance of the classifiers is evaluated using accuracy, kappa statistics, and root mean square error. The results show that OAO method of SVM outperforms all other classifiers by achieving an accuracy rate of 81.11% when used with 80/20 data split and 92.07% using 90/10 data split option.

  11. An ordinal classification approach for CTG categorization.

    PubMed

    Georgoulas, George; Karvelis, Petros; Gavrilis, Dimitris; Stylios, Chrysostomos D; Nikolakopoulos, George

    2017-07-01

    Evaluation of cardiotocogram (CTG) is a standard approach employed during pregnancy and delivery. But, its interpretation requires high level expertise to decide whether the recording is Normal, Suspicious or Pathological. Therefore, a number of attempts have been carried out over the past three decades for development automated sophisticated systems. These systems are usually (multiclass) classification systems that assign a category to the respective CTG. However most of these systems usually do not take into consideration the natural ordering of the categories associated with CTG recordings. In this work, an algorithm that explicitly takes into consideration the ordering of CTG categories, based on binary decomposition method, is investigated. Achieved results, using as a base classifier the C4.5 decision tree classifier, prove that the ordinal classification approach is marginally better than the traditional multiclass classification approach, which utilizes the standard C4.5 algorithm for several performance criteria.

  12. Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection.

    PubMed

    Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui

    2012-11-07

    RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.

  13. Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification.

    PubMed

    Zhang, Xiang; Guan, Naiyang; Jia, Zhilong; Qiu, Xiaogang; Luo, Zhigang

    2015-01-01

    Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF) to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR) to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods.

  14. The decision tree approach to classification

    NASA Technical Reports Server (NTRS)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  15. Improved Sparse Multi-Class SVM and Its Application for Gene Selection in Cancer Classification

    PubMed Central

    Huang, Lingkang; Zhang, Hao Helen; Zeng, Zhao-Bang; Bushel, Pierre R.

    2013-01-01

    Background Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability: The source MATLAB code are available from http://math.arizona.edu/~hzhang/software.html. PMID:23966761

  16. A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets.

    PubMed

    Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco

    2017-09-01

    Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.

  17. Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals

    DTIC Science & Technology

    2011-09-30

    Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR Systems Center Pacific...APPROACH Odontocete click detection and classification. A multiclass support vector machine (SVM) classifier was previously developed ( Jarvis et...beaked whales, Risso’s dolphins, short-finned pilot whales, and sperm whales. Here Moretti’s group, especially S. Jarvis , will improve the SVM classifier

  18. A method of neighbor classes based SVM classification for optical printed Chinese character recognition.

    PubMed

    Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng

    2013-01-01

    In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR.

  19. Genetic programming based ensemble system for microarray data classification.

    PubMed

    Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To

    2015-01-01

    Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.

  20. Genetic Programming Based Ensemble System for Microarray Data Classification

    PubMed Central

    Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To

    2015-01-01

    Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748

  1. Multi-class geospatial object detection and geographic image classification based on collection of part detectors

    NASA Astrophysics Data System (ADS)

    Cheng, Gong; Han, Junwei; Zhou, Peicheng; Guo, Lei

    2014-12-01

    The rapid development of remote sensing technology has facilitated us the acquisition of remote sensing images with higher and higher spatial resolution, but how to automatically understand the image contents is still a big challenge. In this paper, we develop a practical and rotation-invariant framework for multi-class geospatial object detection and geographic image classification based on collection of part detectors (COPD). The COPD is composed of a set of representative and discriminative part detectors, where each part detector is a linear support vector machine (SVM) classifier used for the detection of objects or recurring spatial patterns within a certain range of orientation. Specifically, when performing multi-class geospatial object detection, we learn a set of seed-based part detectors where each part detector corresponds to a particular viewpoint of an object class, so the collection of them provides a solution for rotation-invariant detection of multi-class objects. When performing geographic image classification, we utilize a large number of pre-trained part detectors to discovery distinctive visual parts from images and use them as attributes to represent the images. Comprehensive evaluations on two remote sensing image databases and comparisons with some state-of-the-art approaches demonstrate the effectiveness and superiority of the developed framework.

  2. Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis.

    PubMed

    Ozçift, Akin

    2011-05-01

    Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.

  3. Semantic classification of business images

    NASA Astrophysics Data System (ADS)

    Erol, Berna; Hull, Jonathan J.

    2006-01-01

    Digital cameras are becoming increasingly common for capturing information in business settings. In this paper, we describe a novel method for classifying images into the following semantic classes: document, whiteboard, business card, slide, and regular images. Our method is based on combining low-level image features, such as text color, layout, and handwriting features with high-level OCR output analysis. Several Support Vector Machine Classifiers are combined for multi-class classification of input images. The system yields 95% accuracy in classification.

  4. Application of a Hidden Bayes Naive Multiclass Classifier in Network Intrusion Detection

    ERIC Educational Resources Information Center

    Koc, Levent

    2013-01-01

    With increasing Internet connectivity and traffic volume, recent intrusion incidents have reemphasized the importance of network intrusion detection systems for combating increasingly sophisticated network attacks. Techniques such as pattern recognition and the data mining of network events are often used by intrusion detection systems to classify…

  5. Combination of minimum enclosing balls classifier with SVM in coal-rock recognition.

    PubMed

    Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan

    2017-01-01

    Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition.

  6. Combination of minimum enclosing balls classifier with SVM in coal-rock recognition

    PubMed Central

    Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan

    2017-01-01

    Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition. PMID:28937987

  7. Vision based nutrient deficiency classification in maize plants using multi class support vector machines

    NASA Astrophysics Data System (ADS)

    Leena, N.; Saju, K. K.

    2018-04-01

    Nutritional deficiencies in plants are a major concern for farmers as it affects productivity and thus profit. The work aims to classify nutritional deficiencies in maize plant in a non-destructive mannerusing image processing and machine learning techniques. The colored images of the leaves are analyzed and classified with multi-class support vector machine (SVM) method. Several images of maize leaves with known deficiencies like nitrogen, phosphorous and potassium (NPK) are used to train the SVM classifier prior to the classification of test images. The results show that the method was able to classify and identify nutritional deficiencies.

  8. A Method of Neighbor Classes Based SVM Classification for Optical Printed Chinese Character Recognition

    PubMed Central

    Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng

    2013-01-01

    In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR. PMID:23536777

  9. An ensemble of SVM classifiers based on gene pairs.

    PubMed

    Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin

    2013-07-01

    In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.

  10. A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease

    NASA Astrophysics Data System (ADS)

    Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas

    2017-08-01

    The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.

  11. Prediction of cancer class with majority voting genetic programming classifier using gene expression data.

    PubMed

    Paul, Topon Kumar; Iba, Hitoshi

    2009-01-01

    In order to get a better understanding of different types of cancers and to find the possible biomarkers for diseases, recently, many researchers are analyzing the gene expression data using various machine learning techniques. However, due to a very small number of training samples compared to the huge number of genes and class imbalance, most of these methods suffer from overfitting. In this paper, we present a majority voting genetic programming classifier (MVGPC) for the classification of microarray data. Instead of a single rule or a single set of rules, we evolve multiple rules with genetic programming (GP) and then apply those rules to test samples to determine their labels with majority voting technique. By performing experiments on four different public cancer data sets, including multiclass data sets, we have found that the test accuracies of MVGPC are better than those of other methods, including AdaBoost with GP. Moreover, some of the more frequently occurring genes in the classification rules are known to be associated with the types of cancers being studied in this paper.

  12. Enhancing atlas based segmentation with multiclass linear classifiers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sdika, Michaël, E-mail: michael.sdika@creatis.insa-lyon.fr

    Purpose: To present a method to enrich atlases for atlas based segmentation. Such enriched atlases can then be used as a single atlas or within a multiatlas framework. Methods: In this paper, machine learning techniques have been used to enhance the atlas based segmentation approach. The enhanced atlas defined in this work is a pair composed of a gray level image alongside an image of multiclass classifiers with one classifier per voxel. Each classifier embeds local information from the whole training dataset that allows for the correction of some systematic errors in the segmentation and accounts for the possible localmore » registration errors. The authors also propose to use these images of classifiers within a multiatlas framework: results produced by a set of such local classifier atlases can be combined using a label fusion method. Results: Experiments have been made on the in vivo images of the IBSR dataset and a comparison has been made with several state-of-the-art methods such as FreeSurfer and the multiatlas nonlocal patch based method of Coupé or Rousseau. These experiments show that their method is competitive with state-of-the-art methods while having a low computational cost. Further enhancement has also been obtained with a multiatlas version of their method. It is also shown that, in this case, nonlocal fusion is unnecessary. The multiatlas fusion can therefore be done efficiently. Conclusions: The single atlas version has similar quality as state-of-the-arts multiatlas methods but with the computational cost of a naive single atlas segmentation. The multiatlas version offers a improvement in quality and can be done efficiently without a nonlocal strategy.« less

  13. "When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.

    PubMed

    Daniulaityte, Raminta; Chen, Lu; Lamy, Francois R; Carlson, Robert G; Thirunarayan, Krishnaprasad; Sheth, Amit

    2016-10-24

    To harness the full potential of social media for epidemiological surveillance of drug abuse trends, the field needs a greater level of automation in processing and analyzing social media content. The objective of the study is to describe the development of supervised machine-learning techniques for the eDrugTrends platform to automatically classify tweets by type/source of communication (personal, official/media, retail) and sentiment (positive, negative, neutral) expressed in cannabis- and synthetic cannabinoid-related tweets. Tweets were collected using Twitter streaming Application Programming Interface and filtered through the eDrugTrends platform using keywords related to cannabis, marijuana edibles, marijuana concentrates, and synthetic cannabinoids. After creating coding rules and assessing intercoder reliability, a manually labeled data set (N=4000) was developed by coding several batches of randomly selected subsets of tweets extracted from the pool of 15,623,869 collected by eDrugTrends (May-November 2015). Out of 4000 tweets, 25% (1000/4000) were used to build source classifiers and 75% (3000/4000) were used for sentiment classifiers. Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) were used to train the classifiers. Source classification (n=1000) tested Approach 1 that used short URLs, and Approach 2 where URLs were expanded and included into the bag-of-words analysis. For sentiment classification, Approach 1 used all tweets, regardless of their source/type (n=3000), while Approach 2 applied sentiment classification to personal communication tweets only (2633/3000, 88%). Multiclass and binary classification tasks were examined, and machine-learning sentiment classifier performance was compared with Valence Aware Dictionary for sEntiment Reasoning (VADER), a lexicon and rule-based method. The performance of each classifier was assessed using 5-fold cross validation that calculated average F-scores. One-tailed t test was used to determine if differences in F-scores were statistically significant. In multiclass source classification, the use of expanded URLs did not contribute to significant improvement in classifier performance (0.7972 vs 0.8102 for SVM, P=.19). In binary classification, the identification of all source categories improved significantly when unshortened URLs were used, with personal communication tweets benefiting the most (0.8736 vs 0.8200, P<.001). In multiclass sentiment classification Approach 1, SVM (0.6723) performed similarly to NB (0.6683) and LR (0.6703). In Approach 2, SVM (0.7062) did not differ from NB (0.6980, P=.13) or LR (F=0.6931, P=.05), but it was over 40% more accurate than VADER (F=0.5030, P<.001). In multiclass task, improvements in sentiment classification (Approach 2 vs Approach 1) did not reach statistical significance (eg, SVM: 0.7062 vs 0.6723, P=.052). In binary sentiment classification (positive vs negative), Approach 2 (focus on personal communication tweets only) improved classification results, compared with Approach 1, for LR (0.8752 vs 0.8516, P=.04) and SVM (0.8800 vs 0.8557, P=.045). The study provides an example of the use of supervised machine learning methods to categorize cannabis- and synthetic cannabinoid-related tweets with fairly high accuracy. Use of these content analysis tools along with geographic identification capabilities developed by the eDrugTrends platform will provide powerful methods for tracking regional changes in user opinions related to cannabis and synthetic cannabinoids use over time and across different regions.

  14. Generalized Partial Least Squares Approach for Nominal Multinomial Logit Regression Models with a Functional Covariate

    ERIC Educational Resources Information Center

    Albaqshi, Amani Mohammed H.

    2017-01-01

    Functional Data Analysis (FDA) has attracted substantial attention for the last two decades. Within FDA, classifying curves into two or more categories is consistently of interest to scientists, but multi-class prediction within FDA is challenged in that most classification tools have been limited to binary response applications. The functional…

  15. Online Feature Transformation Learning for Cross-Domain Object Category Recognition.

    PubMed

    Zhang, Xuesong; Zhuang, Yan; Wang, Wei; Pedrycz, Witold

    2017-06-09

    In this paper, we introduce a new research problem termed online feature transformation learning in the context of multiclass object category recognition. The learning of a feature transformation is viewed as learning a global similarity metric function in an online manner. We first consider the problem of online learning a feature transformation matrix expressed in the original feature space and propose an online passive aggressive feature transformation algorithm. Then these original features are mapped to kernel space and an online single kernel feature transformation (OSKFT) algorithm is developed to learn a nonlinear feature transformation. Based on the OSKFT and the existing Hedge algorithm, a novel online multiple kernel feature transformation algorithm is also proposed, which can further improve the performance of online feature transformation learning in large-scale application. The classifier is trained with k nearest neighbor algorithm together with the learned similarity metric function. Finally, we experimentally examined the effect of setting different parameter values in the proposed algorithms and evaluate the model performance on several multiclass object recognition data sets. The experimental results demonstrate the validity and good performance of our methods on cross-domain and multiclass object recognition application.

  16. Feasibility of Active Machine Learning for Multiclass Compound Classification.

    PubMed

    Lang, Tobias; Flachsenberg, Florian; von Luxburg, Ulrike; Rarey, Matthias

    2016-01-25

    A common task in the hit-to-lead process is classifying sets of compounds into multiple, usually structural classes, which build the groundwork for subsequent SAR studies. Machine learning techniques can be used to automate this process by learning classification models from training compounds of each class. Gathering class information for compounds can be cost-intensive as the required data needs to be provided by human experts or experiments. This paper studies whether active machine learning can be used to reduce the required number of training compounds. Active learning is a machine learning method which processes class label data in an iterative fashion. It has gained much attention in a broad range of application areas. In this paper, an active learning method for multiclass compound classification is proposed. This method selects informative training compounds so as to optimally support the learning progress. The combination with human feedback leads to a semiautomated interactive multiclass classification procedure. This method was investigated empirically on 15 compound classification tasks containing 86-2870 compounds in 3-38 classes. The empirical results show that active learning can solve these classification tasks using 10-80% of the data which would be necessary for standard learning techniques.

  17. Comparative study of classification algorithms for damage classification in smart composite laminates

    NASA Astrophysics Data System (ADS)

    Khan, Asif; Ryoo, Chang-Kyung; Kim, Heung Soo

    2017-04-01

    This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to model delamination at different interfaces along the thickness and longitudinal directions of the smart composite laminate. The input-output data obtained through surface bonded piezoelectric sensor and actuator is analyzed by the system identification algorithm to get the system parameters. The identified parameters for the healthy and delaminated structure are supplied as input data to the classification algorithms. The classification algorithms considered in this study are ZeroR, Classification via regression, Naïve Bayes, Multilayer Perceptron, Sequential Minimal Optimization, Multiclass-Classifier, and Decision tree (J48). The open source software of Waikato Environment for Knowledge Analysis (WEKA) is used to evaluate the classification performance of the classifiers mentioned above via 75-25 holdout and leave-one-sample-out cross-validation regarding classification accuracy, precision, recall, kappa statistic and ROC Area.

  18. A Ternary Brain-Computer Interface Based on Single-Trial Readiness Potentials of Self-initiated Fine Movements: A Diversified Classification Scheme

    PubMed Central

    Abou Zeid, Elias; Rezazadeh Sereshkeh, Alborz; Schultz, Benjamin; Chau, Tom

    2017-01-01

    In recent years, the readiness potential (RP), a type of pre-movement neural activity, has been investigated for asynchronous electroencephalogram (EEG)-based brain-computer interfaces (BCIs). Since the RP is attenuated for involuntary movements, a BCI driven by RP alone could facilitate intentional control amid a plethora of unintentional movements. Previous studies have mainly attempted binary single-trial classification of RP. An RP-based BCI with three or more states would expand the options for functional control. Here, we propose a ternary BCI based on single-trial RPs. This BCI classifies amongst an idle state, a left hand and a right hand self-initiated fine movement. A pipeline of spatio-temporal filtering with per participant parameter optimization was used for feature extraction. The ternary classification was decomposed into binary classifications using a decision-directed acyclic graph (DDAG). For each class pair in the DDAG structure, an ordered diversified classifier system (ODCS-DDAG) was used to select the best among various classification algorithms or to combine the results of different classification algorithms. Using EEG data from 14 participants performing self-initiated left or right key presses, punctuated with rest periods, we compared the performance of ODCS-DDAG to a ternary classifier and four popular multiclass decomposition methods using only a single classification algorithm. ODCS-DDAG had the highest performance (0.769 Cohen's Kappa score) and was significantly better than the ternary classifier and two of the four multiclass decomposition methods. Our work supports further study of RP-based BCI for intuitive asynchronous environmental control or augmentative communication. PMID:28596725

  19. Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification.

    PubMed

    Huang, Chuen-Der; Lin, Chin-Teng; Pal, Nikhil Ranjan

    2003-12-01

    The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the original features. The gating mechanism also helps us to get a better insight into the folding process of proteins. For example, tracking the evolution of different gates we can find which characteristics (features) of the data are more important for the folding process. And, of course, it also reduces the computation time.

  20. Protein classification using sequential pattern mining.

    PubMed

    Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I

    2006-01-01

    Protein classification in terms of fold recognition can be employed to determine the structural and functional properties of a newly discovered protein. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. One of the most efficient SPM algorithms, cSPADE, is employed for protein primary structure analysis. Then a classifier uses the extracted sequential patterns for classifying proteins of unknown structure in the appropriate fold category. The proposed methodology exhibited an overall accuracy of 36% in a multi-class problem of 17 candidate categories. The classification performance reaches up to 65% when the three most probable protein folds are considered.

  1. Soft Computing Application in Fault Detection of Induction Motor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Konar, P.; Puhan, P. S.; Chattopadhyay, P. Dr.

    2010-10-26

    The paper investigates the effectiveness of different patter classifier like Feed Forward Back Propagation (FFBPN), Radial Basis Function (RBF) and Support Vector Machine (SVM) for detection of bearing faults in Induction Motor. The steady state motor current with Park's Transformation has been used for discrimination of inner race and outer race bearing defects. The RBF neural network shows very encouraging results for multi-class classification problems and is hoped to set up a base for incipient fault detection of induction motor. SVM is also found to be a very good fault classifier which is highly competitive with RBF.

  2. Detection of surface cracking in steel pipes based on vibration data using a multi-class support vector machine classifier

    NASA Astrophysics Data System (ADS)

    Mustapha, S.; Braytee, A.; Ye, L.

    2017-04-01

    In this study, we focused at the development and verification of a robust framework for surface crack detection in steel pipes using measured vibration responses; with the presence of multiple progressive damage occurring in different locations within the structure. Feature selection, dimensionality reduction, and multi-class support vector machine were established for this purpose. Nine damage cases, at different locations, orientations and length, were introduced into the pipe structure. The pipe was impacted 300 times using an impact hammer, after each damage case, the vibration data were collected using 3 PZT wafers which were installed on the outer surface of the pipe. At first, damage sensitive features were extracted using the frequency response function approach followed by recursive feature elimination for dimensionality reduction. Then, a multi-class support vector machine learning algorithm was employed to train the data and generate a statistical model. Once the model is established, decision values and distances from the hyper-plane were generated for the new collected data using the trained model. This process was repeated on the data collected from each sensor. Overall, using a single sensor for training and testing led to a very high accuracy reaching 98% in the assessment of the 9 damage cases used in this study.

  3. Development of "one-pot" method for multi-class compounds in porcine formula feed by multi-function impurity adsorption cleaning followed ultra-performance liquid chromatography-tandem mass spectrometry detection.

    PubMed

    Wang, Peilong; Wang, Xiao; Zhang, Wei; Su, Xiaoou

    2014-02-01

    A novel and efficient determination method for multi-class compounds including β-agonists, sedatives, nitro-imidazoles and aflatoxins in porcine formula feed based on a fast "one-pot" extraction/multifunction impurity adsorption (MFIA) clean-up procedure has been developed. 23 target analytes belonging to four different class compounds could be determined simultaneously in a single run. Conditions for "one-pot" extraction were studied in detail. Under the optimized conditions, the multi-class compounds in porcine formula feed samples were extracted and purified with methanol contained ammonia and absorbents by one step. The compounds in extracts were purified by using multi types of absorbent based on MFIA in one pot. The multi-walled carbon nanotubes were employed to improved clean-up efficiency. Shield BEH C18 column was used to separate 23 target analytes, followed by tandem mass spectrometry (MS/MS) detection using an electro-spray ionization source in positive mode. Recovery studies were done at three fortification levels. Overall average recoveries of target compounds in porcine formula feed at each levels were >51.6% based on matrix fortified calibration with coefficients of variation from 2.7% to 13.2% (n=6). The limit of determination (LOD) of these compounds in porcine formula feed sample matrix was <5.0 μg/kg. This method was successfully applied in screening and confirmation of target drugs in >30 porcine formula feed samples. It was demonstrated that the integration of the MFIA protocol with the MS/MS instrument could serve as a valuable strategy for rapid screening and reliable confirmatory analysis of multi-class compounds in real samples. Copyright © 2013 Elsevier B.V. All rights reserved.

  4. Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆

    PubMed Central

    Cao, Houwei; Verma, Ragini; Nenkova, Ani

    2014-01-01

    We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion. PMID:25422534

  5. Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆

    PubMed

    Cao, Houwei; Verma, Ragini; Nenkova, Ani

    2015-01-01

    We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.

  6. Advanced Methods for Passive Acoustic Detection, Classification, and Localization of Marine Mammals

    DTIC Science & Technology

    2014-09-30

    floor 1176 Howell St Newport RI 02842 phone: (401) 832-5749 fax: (401) 832-4441 email: David.Moretti@navy.mil Steve W. Martin SPAWAR...APPROACH Odontocete click detection and classification. A multi-class support vector machine (SVM) classifier was previously developed ( Jarvis ...beaked whales, Risso’s dolphins, short-finned pilot whales, and sperm whales. Here Moretti’s group, particularly S. Jarvis , is improving the SVM

  7. Comparison of SVM and ANFIS for Snore Related Sounds Classification by Using the Largest Lyapunov Exponent and Entropy

    PubMed Central

    Ankışhan, Haydar; Yılmaz, Derya

    2013-01-01

    Snoring, which may be decisive for many diseases, is an important indicator especially for sleep disorders. In recent years, many studies have been performed on the snore related sounds (SRSs) due to producing useful results for detection of sleep apnea/hypopnea syndrome (SAHS). The first important step of these studies is the detection of snore from SRSs by using different time and frequency domain features. The SRSs have a complex nature that is originated from several physiological and physical conditions. The nonlinear characteristics of SRSs can be examined with chaos theory methods which are widely used to evaluate the biomedical signals and systems, recently. The aim of this study is to classify the SRSs as snore/breathing/silence by using the largest Lyapunov exponent (LLE) and entropy with multiclass support vector machines (SVMs) and adaptive network fuzzy inference system (ANFIS). Two different experiments were performed for different training and test data sets. Experimental results show that the multiclass SVMs can produce the better classification results than ANFIS with used nonlinear quantities. Additionally, these nonlinear features are carrying meaningful information for classifying SRSs and are able to be used for diagnosis of sleep disorders such as SAHS. PMID:24194786

  8. Object recognition based on Google's reverse image search and image similarity

    NASA Astrophysics Data System (ADS)

    Horváth, András.

    2015-12-01

    Image classification is one of the most challenging tasks in computer vision and a general multiclass classifier could solve many different tasks in image processing. Classification is usually done by shallow learning for predefined objects, which is a difficult task and very different from human vision, which is based on continuous learning of object classes and one requires years to learn a large taxonomy of objects which are not disjunct nor independent. In this paper I present a system based on Google image similarity algorithm and Google image database, which can classify a large set of different objects in a human like manner, identifying related classes and taxonomies.

  9. Steganalysis using logistic regression

    NASA Astrophysics Data System (ADS)

    Lubenko, Ivans; Ker, Andrew D.

    2011-02-01

    We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.

  10. Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework.

    PubMed

    Yang, Lingjian; Ainali, Chrysanthi; Tsoka, Sophia; Papageorgiou, Lazaros G

    2014-12-05

    Applying machine learning methods on microarray gene expression profiles for disease classification problems is a popular method to derive biomarkers, i.e. sets of genes that can predict disease state or outcome. Traditional approaches where expression of genes were treated independently suffer from low prediction accuracy and difficulty of biological interpretation. Current research efforts focus on integrating information on protein interactions through biochemical pathway datasets with expression profiles to propose pathway-based classifiers that can enhance disease diagnosis and prognosis. As most of the pathway activity inference methods in literature are either unsupervised or applied on two-class datasets, there is good scope to address such limitations by proposing novel methodologies. A supervised multiclass pathway activity inference method using optimisation techniques is reported. For each pathway expression dataset, patterns of its constituent genes are summarised into one composite feature, termed pathway activity, and a novel mathematical programming model is proposed to infer this feature as a weighted linear summation of expression of its constituent genes. Gene weights are determined by the optimisation model, in a way that the resulting pathway activity has the optimal discriminative power with regards to disease phenotypes. Classification is then performed on the resulting low-dimensional pathway activity profile. The model was evaluated through a variety of published gene expression profiles that cover different types of disease. We show that not only does it improve classification accuracy, but it can also perform well in multiclass disease datasets, a limitation of other approaches from the literature. Desirable features of the model include the ability to control the maximum number of genes that may participate in determining pathway activity, which may be pre-specified by the user. Overall, this work highlights the potential of building pathway-based multi-phenotype classifiers for accurate disease diagnosis and prognosis problems.

  11. Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset

    DTIC Science & Technology

    2010-10-28

    rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of...analyze toxicological mechanisms for two military- unique explosive compounds 2,4,6-trinitrotolune (TNT) and 1,3,5- trinitro-1,3,5-triazacyclohexane...also known as Royal Demolition eXplosive or RDX) [7,8]. These two compounds exhibit distinctive toxicological properties that are accompanied by

  12. Ensemble support vector machine classification of dementia using structural MRI and mini-mental state examination.

    PubMed

    Sørensen, Lauge; Nielsen, Mads

    2018-05-15

    The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Simple-random-sampling-based multiclass text classification algorithm.

    PubMed

    Liu, Wuying; Wang, Lin; Yi, Mianzhu

    2014-01-01

    Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.

  14. Multiclass methods for the analysis of antibiotic residues in milk by liquid chromatography coupled to mass spectrometry: A review.

    PubMed

    Rossi, Rosanna; Saluti, Giorgio; Moretti, Simone; Diamanti, Irene; Giusepponi, Danilo; Galarini, Roberta

    2018-02-01

    Milk is an important and beneficial food from a nutritional point of view, being an indispensable source of high quality proteins. Furthermore, it is a raw material for many dairy products, such as yoghurt, cheese, cream etc. Before reaching consumers, milk goes through production, processing and circulation. Each step involves potentially unsafe factors, such as chemical contamination that can affect milk quality. Antibiotics are widely used in veterinary medicine for dry cow therapy and mastitis treatment in lactating cows, which can cause the presence of antimicrobial residues in milk. In order to ensure consumers' safety, milk is analyzed to make sure that the fixed Maximum Residue Limits (MRLs) for antibiotics are not exceeded. Multiclass methods can monitor more drug classes through a single analysis, so they are faster, less time-consuming and cheaper than traditional methods (single-class); this aspect is particularly important for milk, which is a highly perishable food. Nevertheless, multiclass methods for veterinary drug residues in foodstuffs are real analytical challenges. This article reviews the major multiclass methods published for the determination of antibiotic residues in milk by liquid chromatography coupled to mass spectrometry, with a special focus on sample preparation approaches.

  15. Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics.

    PubMed

    Trainor, Patrick J; DeFilippis, Andrew P; Rai, Shesh N

    2017-06-21

    Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k -Nearest Neighbors ( k -NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction, classifier performance (least to greatest error) was ranked as follows: SVM, Random Forest, Naïve Bayes, sPLS-DA, Neural Networks, PLS-DA and k -NN classifiers. When non-normal error distributions were introduced, the performance of PLS-DA and k -NN classifiers deteriorated further relative to the remaining techniques. Over the real datasets, a trend of better performance of SVM and Random Forest classifier performance was observed.

  16. Spatiotemporal source tuning filter bank for multiclass EEG based brain computer interfaces.

    PubMed

    Acharya, Soumyadipta; Mollazadeh, Moshen; Murari, Kartikeya; Thakor, Nitish

    2006-01-01

    Non invasive brain-computer interfaces (BCI) allow people to communicate by modulating features of their electroencephalogram (EEG). Spatiotemporal filtering has a vital role in multi-class, EEG based BCI. In this study, we used a novel combination of principle component analysis, independent component analysis and dipole source localization to design a spatiotemporal multiple source tuning (SPAMSORT) filter bank, each channel of which was tuned to the activity of an underlying dipole source. Changes in the event-related spectral perturbation (ERSP) were measured and used to train a linear support vector machine to classify between four classes of motor imagery tasks (left hand, right hand, foot and tongue) for one subject. ERSP values were significantly (p<0.01) different across tasks and better (p<0.01) than conventional spatial filtering methods (large Laplacian and common average reference). Classification resulted in an average accuracy of 82.5%. This approach could lead to promising BCI applications such as control of a prosthesis with multiple degrees of freedom.

  17. Wearable Sensor Data Classification for Human Activity Recognition Based on an Iterative Learning Framework.

    PubMed

    Davila, Juan Carlos; Cretu, Ana-Maria; Zaremba, Marek

    2017-06-07

    The design of multiple human activity recognition applications in areas such as healthcare, sports and safety relies on wearable sensor technologies. However, when making decisions based on the data acquired by such sensors in practical situations, several factors related to sensor data alignment, data losses, and noise, among other experimental constraints, deteriorate data quality and model accuracy. To tackle these issues, this paper presents a data-driven iterative learning framework to classify human locomotion activities such as walk, stand, lie, and sit, extracted from the Opportunity dataset. Data acquired by twelve 3-axial acceleration sensors and seven inertial measurement units are initially de-noised using a two-stage consecutive filtering approach combining a band-pass Finite Impulse Response (FIR) and a wavelet filter. A series of statistical parameters are extracted from the kinematical features, including the principal components and singular value decomposition of roll, pitch, yaw and the norm of the axial components. The novel interactive learning procedure is then applied in order to minimize the number of samples required to classify human locomotion activities. Only those samples that are most distant from the centroids of data clusters, according to a measure presented in the paper, are selected as candidates for the training dataset. The newly built dataset is then used to train an SVM multi-class classifier. The latter will produce the lowest prediction error. The proposed learning framework ensures a high level of robustness to variations in the quality of input data, while only using a much lower number of training samples and therefore a much shorter training time, which is an important consideration given the large size of the dataset.

  18. 77 FR 73498 - Self-Regulatory Organizations; Chicago Board Options Exchange, Incorporated; Notice of Filing and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-12-10

    ... Effectiveness of a Proposed Rule Change To Amend Its Rule Related to Multi-Class Broad- Based Index Option... Rule Change The Exchange proposes to amend its rule related to multi-class broad-based index option... is to (i) clarify that the term ``Multi-Class Broad-Based Index Option Spread Order (Multi-Class...

  19. Improving computer-aided detection assistance in breast cancer screening by removal of obviously false-positive findings.

    PubMed

    Mordang, Jan-Jurre; Gubern-Mérida, Albert; Bria, Alessandro; Tortorella, Francesco; den Heeten, Gerard; Karssemeijer, Nico

    2017-04-01

    Computer-aided detection (CADe) systems for mammography screening still mark many false positives. This can cause radiologists to lose confidence in CADe, especially when many false positives are obviously not suspicious to them. In this study, we focus on obvious false positives generated by microcalcification detection algorithms. We aim at reducing the number of obvious false-positive findings by adding an additional step in the detection method. In this step, a multiclass machine learning method is implemented in which dedicated classifiers learn to recognize the patterns of obvious false-positive subtypes that occur most frequently. The method is compared to a conventional two-class approach, where all false-positive subtypes are grouped together in one class, and to the baseline CADe system without the new false-positive removal step. The methods are evaluated on an independent dataset containing 1,542 screening examinations of which 80 examinations contain malignant microcalcifications. Analysis showed that the multiclass approach yielded a significantly higher sensitivity compared to the other two methods (P < 0.0002). At one obvious false positive per 100 images, the baseline CADe system detected 61% of the malignant examinations, while the systems with the two-class and multiclass false-positive reduction step detected 73% and 83%, respectively. Our study showed that by adding the proposed method to a CADe system, the number of obvious false positives can decrease significantly (P < 0.0002). © 2017 American Association of Physicists in Medicine.

  20. Exploring diversity in ensemble classification: Applications in large area land cover mapping

    NASA Astrophysics Data System (ADS)

    Mellor, Andrew; Boukir, Samia

    2017-07-01

    Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area remote sensing applications, for which training data is costly and resource intensive to collect.

  1. Fast Multiclass Segmentation using Diffuse Interface Methods on Graphs

    DTIC Science & Technology

    2013-02-01

    000 28 × 28 images of handwritten digits 0 through 9. Examples of entries can be found in Figure 6. The task is to classify each of the images into the...database of handwritten digits .” [Online]. Available: http://yann.lecun.com/exdb/mnist/ [36] J. Lellmann, J. H. Kappes, J. Yuan, F. Becker, and C...corresponding digit . The images include digits from 0 to 9; thus, this is a 10 class segmentation problem. To construct the weight matrix, we used N

  2. Using multiclass classification to automate the identification of patient safety incident reports by type and severity.

    PubMed

    Wang, Ying; Coiera, Enrico; Runciman, William; Magrabi, Farah

    2017-06-12

    Approximately 10% of admissions to acute-care hospitals are associated with an adverse event. Analysis of incident reports helps to understand how and why incidents occur and can inform policy and practice for safer care. Unfortunately our capacity to monitor and respond to incident reports in a timely manner is limited by the sheer volumes of data collected. In this study, we aim to evaluate the feasibility of using multiclass classification to automate the identification of patient safety incidents in hospitals. Text based classifiers were applied to identify 10 incident types and 4 severity levels. Using the one-versus-one (OvsO) and one-versus-all (OvsA) ensemble strategies, we evaluated regularized logistic regression, linear support vector machine (SVM) and SVM with a radial-basis function (RBF) kernel. Classifiers were trained and tested with "balanced" datasets (n_ Type  = 2860, n_ SeverityLevel  = 1160) from a state-wide incident reporting system. Testing was also undertaken with imbalanced "stratified" datasets (n_ Type  = 6000, n_ SeverityLevel =5950) from the state-wide system and an independent hospital reporting system. Classifier performance was evaluated using a confusion matrix, as well as F-score, precision and recall. The most effective combination was a OvsO ensemble of binary SVM RBF classifiers with binary count feature extraction. For incident type, classifiers performed well on balanced and stratified datasets (F-score: 78.3, 73.9%), but were worse on independent datasets (68.5%). Reports about falls, medications, pressure injury, aggression and blood products were identified with high recall and precision. "Documentation" was the hardest type to identify. For severity level, F-score for severity assessment code (SAC) 1 (extreme risk) was 87.3 and 64% for SAC4 (low risk) on balanced data. With stratified data, high recall was achieved for SAC1 (82.8-84%) but precision was poor (6.8-11.2%). High risk incidents (SAC2) were confused with medium risk incidents (SAC3). Binary classifier ensembles appear to be a feasible method for identifying incidents by type and severity level. Automated identification should enable safety problems to be detected and addressed in a more timely manner. Multi-label classifiers may be necessary for reports that relate to more than one incident type.

  3. Discriminative illumination: per-pixel classification of raw materials based on optimal projections of spectral BRDF.

    PubMed

    Liu, Chao; Gu, Jinwei

    2014-01-01

    Classifying raw, unpainted materials--metal, plastic, ceramic, fabric, and so on--is an important yet challenging task for computer vision. Previous works measure subsets of surface spectral reflectance as features for classification. However, acquiring the full spectral reflectance is time consuming and error-prone. In this paper, we propose to use coded illumination to directly measure discriminative features for material classification. Optimal illumination patterns--which we call "discriminative illumination"--are learned from training samples, after projecting to which the spectral reflectance of different materials are maximally separated. This projection is automatically realized by the integration of incident light for surface reflection. While a single discriminative illumination is capable of linear, two-class classification, we show that multiple discriminative illuminations can be used for nonlinear and multiclass classification. We also show theoretically that the proposed method has higher signal-to-noise ratio than previous methods due to light multiplexing. Finally, we construct an LED-based multispectral dome and use the discriminative illumination method for classifying a variety of raw materials, including metal (aluminum, alloy, steel, stainless steel, brass, and copper), plastic, ceramic, fabric, and wood. Experimental results demonstrate its effectiveness.

  4. Pedagogical Practices: The Case of Multi-Class Teaching in Fiji Primary School

    ERIC Educational Resources Information Center

    Lingam, Govinda I.

    2007-01-01

    Multi-class teaching is a common phenomenon in small schools not only in Fiji, but also in many countries. The aim of the present study was to determine the teaching styles adopted by teachers in the context of multi-class teaching. A qualitative case study research design was adopted. This included a school with multi-class teaching as the norm.…

  5. Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data.

    PubMed

    Liu, Zhenqiu; Hsiao, William; Cantarel, Brandi L; Drábek, Elliott Franco; Fraser-Liggett, Claire

    2011-12-01

    Direct sequencing of microbes in human ecosystems (the human microbiome) has complemented single genome cultivation and sequencing to understand and explore the impact of commensal microbes on human health. As sequencing technologies improve and costs decline, the sophistication of data has outgrown available computational methods. While several existing machine learning methods have been adapted for analyzing microbiome data recently, there is not yet an efficient and dedicated algorithm available for multiclass classification of human microbiota. By combining instance-based and model-based learning, we propose a novel sparse distance-based learning method for simultaneous class prediction and feature (variable or taxa, which is used interchangeably) selection from multiple treatment populations on the basis of 16S rRNA sequence count data. Our proposed method simultaneously minimizes the intraclass distance and maximizes the interclass distance with many fewer estimated parameters than other methods. It is very efficient for problems with small sample sizes and unbalanced classes, which are common in metagenomic studies. We implemented this method in a MATLAB toolbox called MetaDistance. We also propose several approaches for data normalization and variance stabilization transformation in MetaDistance. We validate this method on several real and simulated 16S rRNA datasets to show that it outperforms existing methods for classifying metagenomic data. This article is the first to address simultaneous multifeature selection and class prediction with metagenomic count data. The MATLAB toolbox is freely available online at http://metadistance.igs.umaryland.edu/. zliu@umm.edu Supplementary data are available at Bioinformatics online.

  6. LBP and SIFT based facial expression recognition

    NASA Astrophysics Data System (ADS)

    Sumer, Omer; Gunes, Ece O.

    2015-02-01

    This study compares the performance of local binary patterns (LBP) and scale invariant feature transform (SIFT) with support vector machines (SVM) in automatic classification of discrete facial expressions. Facial expression recognition is a multiclass classification problem and seven classes; happiness, anger, sadness, disgust, surprise, fear and comtempt are classified. Using SIFT feature vectors and linear SVM, 93.1% mean accuracy is acquired on CK+ database. On the other hand, the performance of LBP-based classifier with linear SVM is reported on SFEW using strictly person independent (SPI) protocol. Seven-class mean accuracy on SFEW is 59.76%. Experiments on both databases showed that LBP features can be used in a fairly descriptive way if a good localization of facial points and partitioning strategy are followed.

  7. Decoding grating orientation from microelectrode array recordings in monkey cortical area V4.

    PubMed

    Manyakov, Nikolay V; Van Hulle, Marc M

    2010-04-01

    We propose an invasive brain-machine interface (BMI) that decodes the orientation of a visual grating from spike train recordings made with a 96 microelectrodes array chronically implanted into the prelunate gyrus (area V4) of a rhesus monkey. The orientation is decoded irrespective of the grating's spatial frequency. Since pyramidal cells are less prominent in visual areas, compared to (pre)motor areas, the recordings contain spikes with smaller amplitudes, compared to the noise level. Hence, rather than performing spike decoding, feature selection algorithms are applied to extract the required information for the decoder. Two types of feature selection procedures are compared, filter and wrapper. The wrapper is combined with a linear discriminant analysis classifier, and the filter is followed by a radial-basis function support vector machine classifier. In addition, since we have a multiclass classification problen, different methods for combining pairwise classifiers are compared.

  8. Towards a ternary NIRS-BCI: single-trial classification of verbal fluency task, Stroop task and unconstrained rest

    NASA Astrophysics Data System (ADS)

    Schudlo, Larissa C.; Chau, Tom

    2015-12-01

    Objective. The majority of near-infrared spectroscopy (NIRS) brain-computer interface (BCI) studies have investigated binary classification problems. Limited work has considered differentiation of more than two mental states, or multi-class differentiation of higher-level cognitive tasks using measurements outside of the anterior prefrontal cortex. Improvements in accuracies are needed to deliver effective communication with a multi-class NIRS system. We investigated the feasibility of a ternary NIRS-BCI that supports mental states corresponding to verbal fluency task (VFT) performance, Stroop task performance, and unconstrained rest using prefrontal and parietal measurements. Approach. Prefrontal and parietal NIRS signals were acquired from 11 able-bodied adults during rest and performance of the VFT or Stroop task. Classification was performed offline using bagging with a linear discriminant base classifier trained on a 10 dimensional feature set. Main results. VFT, Stroop task and rest were classified at an average accuracy of 71.7% ± 7.9%. The ternary classification system provided a statistically significant improvement in information transfer rate relative to a binary system controlled by either mental task (0.87 ± 0.35 bits/min versus 0.73 ± 0.24 bits/min). Significance. These results suggest that effective communication can be achieved with a ternary NIRS-BCI that supports VFT, Stroop task and rest via measurements from the frontal and parietal cortices. Further development of such a system is warranted. Accurate ternary classification can enhance communication rates offered by NIRS-BCIs, improving the practicality of this technology.

  9. A Novel Wearable Device for Food Intake and Physical Activity Recognition

    PubMed Central

    Farooq, Muhammad; Sazonov, Edward

    2016-01-01

    Presence of speech and motion artifacts has been shown to impact the performance of wearable sensor systems used for automatic detection of food intake. This work presents a novel wearable device which can detect food intake even when the user is physically active and/or talking. The device consists of a piezoelectric strain sensor placed on the temporalis muscle, an accelerometer, and a data acquisition module connected to the temple of eyeglasses. Data from 10 participants was collected while they performed activities including quiet sitting, talking, eating while sitting, eating while walking, and walking. Piezoelectric strain sensor and accelerometer signals were divided into non-overlapping epochs of 3 s; four features were computed for each signal. To differentiate between eating and not eating, as well as between sedentary postures and physical activity, two multiclass classification approaches are presented. The first approach used a single classifier with sensor fusion and the second approach used two-stage classification. The best results were achieved when two separate linear support vector machine (SVM) classifiers were trained for food intake and activity detection, and their results were combined using a decision tree (two-stage classification) to determine the final class. This approach resulted in an average F1-score of 99.85% and area under the curve (AUC) of 0.99 for multiclass classification. With its ability to differentiate between food intake and activity level, this device may potentially be used for tracking both energy intake and energy expenditure. PMID:27409622

  10. A Novel Wearable Device for Food Intake and Physical Activity Recognition.

    PubMed

    Farooq, Muhammad; Sazonov, Edward

    2016-07-11

    Presence of speech and motion artifacts has been shown to impact the performance of wearable sensor systems used for automatic detection of food intake. This work presents a novel wearable device which can detect food intake even when the user is physically active and/or talking. The device consists of a piezoelectric strain sensor placed on the temporalis muscle, an accelerometer, and a data acquisition module connected to the temple of eyeglasses. Data from 10 participants was collected while they performed activities including quiet sitting, talking, eating while sitting, eating while walking, and walking. Piezoelectric strain sensor and accelerometer signals were divided into non-overlapping epochs of 3 s; four features were computed for each signal. To differentiate between eating and not eating, as well as between sedentary postures and physical activity, two multiclass classification approaches are presented. The first approach used a single classifier with sensor fusion and the second approach used two-stage classification. The best results were achieved when two separate linear support vector machine (SVM) classifiers were trained for food intake and activity detection, and their results were combined using a decision tree (two-stage classification) to determine the final class. This approach resulted in an average F1-score of 99.85% and area under the curve (AUC) of 0.99 for multiclass classification. With its ability to differentiate between food intake and activity level, this device may potentially be used for tracking both energy intake and energy expenditure.

  11. Using multi-class queuing network to solve performance models of e-business sites.

    PubMed

    Zheng, Xiao-ying; Chen, De-ren

    2004-01-01

    Due to e-business's variety of customers with different navigational patterns and demands, multi-class queuing network is a natural performance model for it. The open multi-class queuing network(QN) models are based on the assumption that no service center is saturated as a result of the combined loads of all the classes. Several formulas are used to calculate performance measures, including throughput, residence time, queue length, response time and the average number of requests. The solution technique of closed multi-class QN models is an approximate mean value analysis algorithm (MVA) based on three key equations, because the exact algorithm needs huge time and space requirement. As mixed multi-class QN models, include some open and some closed classes, the open classes should be eliminated to create a closed multi-class QN so that the closed model algorithm can be applied. Some corresponding examples are given to show how to apply the algorithms mentioned in this article. These examples indicate that multi-class QN is a reasonably accurate model of e-business and can be solved efficiently.

  12. Simultaneous determination of 41 multiclass organic pollutants in environmental waters by means of polyethersulfone microextraction followed by liquid chromatography-tandem mass spectrometry.

    PubMed

    Mijangos, Leire; Ziarrusta, Haizea; Olivares, Maitane; Zuloaga, Olatz; Möder, Monika; Etxebarria, Nestor; Prieto, Ailette

    2018-01-01

    A new procedure using polyethersulfone (PES) microextraction followed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis was developed in this work for the simultaneous determination of 41 multiclass priority and emerging organic pollutants including herbicides, hormones, personal care products, and pharmaceuticals, among others, in seawater, wastewater treatment plant (WWTP) effluents, and estuary samples. The optimization of the analysis included two different chromatographic columns and different variables (polarity, fragmentor voltage, collision energy, and collision cell accelerator) of the mass spectrometer. In the case of PES extraction, ion strength of the water, pH, addition of EDTA, and the amount of the polymeric material were thoroughly investigated. The developed procedure was compared with a previously validated one based on a standard solid-phase extraction (SPE). In contrast to the SPE protocol, the PES method allowed a cost-efficient extraction of complex aqueous samples with lower matrix effect from 120 mL of water sample. Satisfactory and comparable apparent recovery values (80-119 and 70-131%) and method quantification limits (MQLs, 0.4-26 and 0.2-23 ng/L) were obtained for PES and SPE procedures, respectively, regardless of the matrix. Repeatability values lower than 27% were obtained. Finally, the developed methods were applied to the analysis of real samples from the Basque Country and irbesartan, valsartan, acesulfame, and sucralose were the analytes most often detected at the highest concentrations (51-1096 ng/L). Graphical abstract Forty-one multiclass pollutant determination in environmental waters by means of PES/SPE-LC-MS/MS.

  13. Field trial of applicability of lot quality assurance sampling survey method for rapid assessment of prevalence of active trachoma.

    PubMed Central

    Myatt, Mark; Limburg, Hans; Minassian, Darwin; Katyola, Damson

    2003-01-01

    OBJECTIVE: To test the applicability of lot quality assurance sampling (LQAS) for the rapid assessment of the prevalence of active trachoma. METHODS: Prevalence of active trachoma in six communities was found by examining all children aged 2-5 years. Trial surveys were conducted in these communities. A sampling plan appropriate for classifying communities with prevalences < or =20% and > or =40% was applied to the survey data. Operating characteristic and average sample number curves were plotted, and screening test indices were calculated. The ability of LQAS to provide a three-class classification system was investigated. FINDINGS: Ninety-six trial surveys were conducted. All communities with prevalences < or =20% and > or =40% were identified correctly. The method discriminated between communities with prevalences < or =30% and >30%, with sensitivity of 98% (95% confidence interval (CI)=88.2-99.9%), specificity of 84.4% (CI=69.9-93.0%), positive predictive value of 87.7% (CI=75.7-94.5%), negative predictive value of 97.4% (CI=84.9-99.9%), and accuracy of 91.7% (CI=83.8-96.1%). Agreement between the three prevalence classes and survey classifications was 84.4% (CI=75.2-90.7%). The time needed to complete the surveys was consistent with the need to complete a survey in one day. CONCLUSION: Lot quality assurance sampling provides a method of classifying communities according to the prevalence of active trachoma. It merits serious consideration as a replacement for the assessment of the prevalence of active trachoma with the currently used trachoma rapid assessment method. It may be extended to provide a multi-class classification method. PMID:14997240

  14. Cascaded discrimination of normal, abnormal, and confounder classes in histopathology: Gleason grading of prostate cancer

    PubMed Central

    2012-01-01

    Background Automated classification of histopathology involves identification of multiple classes, including benign, cancerous, and confounder categories. The confounder tissue classes can often mimic and share attributes with both the diseased and normal tissue classes, and can be particularly difficult to identify, both manually and by automated classifiers. In the case of prostate cancer, they may be several confounding tissue types present in a biopsy sample, posing as major sources of diagnostic error for pathologists. Two common multi-class approaches are one-shot classification (OSC), where all classes are identified simultaneously, and one-versus-all (OVA), where a “target” class is distinguished from all “non-target” classes. OSC is typically unable to handle discrimination of classes of varying similarity (e.g. with images of prostate atrophy and high grade cancer), while OVA forces several heterogeneous classes into a single “non-target” class. In this work, we present a cascaded (CAS) approach to classifying prostate biopsy tissue samples, where images from different classes are grouped to maximize intra-group homogeneity while maximizing inter-group heterogeneity. Results We apply the CAS approach to categorize 2000 tissue samples taken from 214 patient studies into seven classes: epithelium, stroma, atrophy, prostatic intraepithelial neoplasia (PIN), and prostate cancer Gleason grades 3, 4, and 5. A series of increasingly granular binary classifiers are used to split the different tissue classes until the images have been categorized into a single unique class. Our automatically-extracted image feature set includes architectural features based on location of the nuclei within the tissue sample as well as texture features extracted on a per-pixel level. The CAS strategy yields a positive predictive value (PPV) of 0.86 in classifying the 2000 tissue images into one of 7 classes, compared with the OVA (0.77 PPV) and OSC approaches (0.76 PPV). Conclusions Use of the CAS strategy increases the PPV for a multi-category classification system over two common alternative strategies. In classification problems such as histopathology, where multiple class groups exist with varying degrees of heterogeneity, the CAS system can intelligently assign class labels to objects by performing multiple binary classifications according to domain knowledge. PMID:23110677

  15. Multiclass method for the determination of 62 antibiotics in milk.

    PubMed

    Moretti, Simone; Cruciani, Gabriele; Romanelli, Sara; Rossi, Rosanna; Saluti, Giorgio; Galarini, Roberta

    2016-09-01

    A multiclass method for screening and confirmatory analysis of antimicrobial residues in milk has been developed and validated. Sixty-two antibiotics belonging to ten different drug families (amphenicols, cephalosporins, lincosamides, macrolides, penicillin, pleuromutilins, quinolones, rifamycins, sulfonamides and tetracyclines) have been included. After the addition of an aqueous solution of EDTA, the milk samples were extracted twice with acetonitrile, evaporated and dissolved in ammonium acetate. After centrifugation, 10 µl were analysed using LC-Q-Orbitrap operating in positive electrospray ionization mode. The method was validated in bovine milk in the range 2-150 µg kg(-1) for all antibiotics; for four compounds with maximum residue limits higher than 100 µg kg(-1) , the validation interval has been extended until 333 µg kg(-1) . The estimated performance characteristics were satisfactory complying with the requirements of Commission Decision 2002/657/EC. Good accuracies were obtained also taking advantage from the versatility of the hybrid mass analyser. Identification criteria were achieved verifying the mass accuracy and ion ratio of two ions, including the pseudomolecular one, where possible. Finally, the developed procedure was applied to 13 real cases of suspect milk samples (microbiological assay) confirming the presence of one or more antibiotics, although frequently, the maximum residue limits were not exceeded. The availability of rapid multiclass confirmatory methods can avoid wastes of suspect, but compliant, raw milk samples. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  16. Walking pattern analysis and SVM classification based on simulated gaits.

    PubMed

    Mao, Yuxiang; Saito, Masaru; Kanno, Takehiro; Wei, Daming; Muroi, Hiroyasu

    2008-01-01

    Three classes of walking patterns, normal, caution and danger, were simulated by tying elastic bands to joints of lower body. In order to distinguish one class from another, four local motions suggested by doctors were investigated stepwise, and differences between levels were evaluated using t-tests. The human adaptability in the tests was also evaluated. We improved average classification accuracy to 84.50% using multiclass support vector machine classifier and concluded that human adaptability is a factor that can cause obvious bias in contiguous data collections.

  17. Multiclass Posterior Probability Twin SVM for Motor Imagery EEG Classification.

    PubMed

    She, Qingshan; Ma, Yuliang; Meng, Ming; Luo, Zhizeng

    2015-01-01

    Motor imagery electroencephalography is widely used in the brain-computer interface systems. Due to inherent characteristics of electroencephalography signals, accurate and real-time multiclass classification is always challenging. In order to solve this problem, a multiclass posterior probability solution for twin SVM is proposed by the ranking continuous output and pairwise coupling in this paper. First, two-class posterior probability model is constructed to approximate the posterior probability by the ranking continuous output techniques and Platt's estimating method. Secondly, a solution of multiclass probabilistic outputs for twin SVM is provided by combining every pair of class probabilities according to the method of pairwise coupling. Finally, the proposed method is compared with multiclass SVM and twin SVM via voting, and multiclass posterior probability SVM using different coupling approaches. The efficacy on the classification accuracy and time complexity of the proposed method has been demonstrated by both the UCI benchmark datasets and real world EEG data from BCI Competition IV Dataset 2a, respectively.

  18. 75 FR 18872 - Ginnie Mae Multiclass Securities Program Documents

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-13

    ... Securities Program Documents AGENCY: Office of the Chief Information Officer, HUD. ACTION: Notice. SUMMARY... with the Multiclass Securities Program. The intent of the Multiclass Securities program is to increase... Securities Program Documents. OMB Approval Number: 2503-0030. Form Numbers: None. Description of the Need for...

  19. A multiclass multiresidue LC-MS/MS method for analysis of veterinary drugs in bovine kidney

    USDA-ARS?s Scientific Manuscript database

    The increased efficiency permitted by multiclass, multiresidue methods has made such approaches very attractive to laboratories involved in monitoring veterinary drug residues in animal tissues. In this current work, evaluation of a multiclass multiresidue LC-MS/MS method in bovine kidney is describ...

  20. Random forests in non-invasive sensorimotor rhythm brain-computer interfaces: a practical and convenient non-linear classifier.

    PubMed

    Steyrl, David; Scherer, Reinhold; Faller, Josef; Müller-Putz, Gernot R

    2016-02-01

    There is general agreement in the brain-computer interface (BCI) community that although non-linear classifiers can provide better results in some cases, linear classifiers are preferable. Particularly, as non-linear classifiers often involve a number of parameters that must be carefully chosen. However, new non-linear classifiers were developed over the last decade. One of them is the random forest (RF) classifier. Although popular in other fields of science, RFs are not common in BCI research. In this work, we address three open questions regarding RFs in sensorimotor rhythm (SMR) BCIs: parametrization, online applicability, and performance compared to regularized linear discriminant analysis (LDA). We found that the performance of RF is constant over a large range of parameter values. We demonstrate - for the first time - that RFs are applicable online in SMR-BCIs. Further, we show in an offline BCI simulation that RFs statistically significantly outperform regularized LDA by about 3%. These results confirm that RFs are practical and convenient non-linear classifiers for SMR-BCIs. Taking into account further properties of RFs, such as independence from feature distributions, maximum margin behavior, multiclass and advanced data mining capabilities, we argue that RFs should be taken into consideration for future BCIs.

  1. Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature.

    PubMed

    Wang, Xinglong; Rak, Rafal; Restificar, Angelo; Nobata, Chikashi; Rupp, C J; Batista-Navarro, Riza Theresa B; Nawaz, Raheel; Ananiadou, Sophia

    2011-10-03

    The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest's Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task's development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew's Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance.

  2. Enhancement of gesture recognition for contactless interface using a personalized classifier in the operating room.

    PubMed

    Cho, Yongwon; Lee, Areum; Park, Jongha; Ko, Bemseok; Kim, Namkug

    2018-07-01

    Contactless operating room (OR) interfaces are important for computer-aided surgery, and have been developed to decrease the risk of contamination during surgical procedures. In this study, we used Leap Motion™, with a personalized automated classifier, to enhance the accuracy of gesture recognition for contactless interfaces. This software was trained and tested on a personal basis that means the training of gesture per a user. We used 30 features including finger and hand data, which were computed, selected, and fed into a multiclass support vector machine (SVM), and Naïve Bayes classifiers and to predict and train five types of gestures including hover, grab, click, one peak, and two peaks. Overall accuracy of the five gestures was 99.58% ± 0.06, and 98.74% ± 3.64 on a personal basis using SVM and Naïve Bayes classifiers, respectively. We compared gesture accuracy across the entire dataset and used SVM and Naïve Bayes classifiers to examine the strength of personal basis training. We developed and enhanced non-contact interfaces with gesture recognition to enhance OR control systems. Copyright © 2018 Elsevier B.V. All rights reserved.

  3. Single-Pol Synthetic Aperture Radar Terrain Classification using Multiclass Confidence for One-Class Classifiers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Koch, Mark William; Steinbach, Ryan Matthew; Moya, Mary M

    2015-10-01

    Except in the most extreme conditions, Synthetic aperture radar (SAR) is a remote sensing technology that can operate day or night. A SAR can provide surveillance over a long time period by making multiple passes over a wide area. For object-based intelligence it is convenient to segment and classify the SAR images into objects that identify various terrains and man-made structures that we call “static features.” In this paper we introduce a novel SAR image product that captures how different regions decorrelate at different rates. Using superpixels and their first two moments we develop a series of one-class classification algorithmsmore » using a goodness-of-fit metric. P-value fusion is used to combine the results from different classes. We also show how to combine multiple one-class classifiers to get a confidence about a classification. This can be used by downstream algorithms such as a conditional random field to enforce spatial constraints.« less

  4. 78 FR 73211 - Self-Regulatory Organizations; Chicago Board Options Exchange, Incorporated; Notice of Filing of...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-05

    ... Change Relating to Multi-Class Spread Orders November 29, 2013. Pursuant to Section 19(b)(1) of the... Substance of the Proposed Rule Change CBOE proposes to amend its rule related to Multi-Class Broad-Based Index Option Spread Orders (referred to herein as ``Multi-Class Spread Orders''). The text of the...

  5. A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification

    PubMed Central

    Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong

    2016-01-01

    Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs). PMID:26985826

  6. A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification.

    PubMed

    Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong

    2016-01-01

    Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs).

  7. A Features Selection for Crops Classification

    NASA Astrophysics Data System (ADS)

    Liu, Yifan; Shao, Luyi; Yin, Qiang; Hong, Wen

    2016-08-01

    The components of the polarimetric target decomposition reflect the differences of target since they linked with the scattering properties of the target and can be imported into SVM as the classification features. The result of decomposition usually concentrate on part of the components. Selecting a combination of components can reduce the features that importing into the SVM. The features reduction can lead to less calculation and targeted classification of one target when we classify a multi-class area. In this research, we import different combinations of features into the SVM and find a better combination for classification with a data of AGRISAR.

  8. Classification of vegetation types in military region

    NASA Astrophysics Data System (ADS)

    Gonçalves, Miguel; Silva, Jose Silvestre; Bioucas-Dias, Jose

    2015-10-01

    In decision-making process regarding planning and execution of military operations, the terrain is a determining factor. Aerial photographs are a source of vital information for the success of an operation in hostile region, namely when the cartographic information behind enemy lines is scarce or non-existent. The objective of present work is the development of a tool capable of processing aerial photos. The methodology implemented starts with feature extraction, followed by the application of an automatic selector of features. The next step, using the k-fold cross validation technique, estimates the input parameters for the following classifiers: Sparse Multinomial Logist Regression (SMLR), K Nearest Neighbor (KNN), Linear Classifier using Principal Component Expansion on the Joint Data (PCLDC) and Multi-Class Support Vector Machine (MSVM). These classifiers were used in two different studies with distinct objectives: discrimination of vegetation's density and identification of vegetation's main components. It was found that the best classifier on the first approach is the Sparse Logistic Multinomial Regression (SMLR). On the second approach, the implemented methodology applied to high resolution images showed that the better performance was achieved by KNN classifier and PCLDC. Comparing the two approaches there is a multiscale issue, in which for different resolutions, the best solution to the problem requires different classifiers and the extraction of different features.

  9. Supervised Gamma Process Poisson Factorization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Dylan Zachary

    This thesis develops the supervised gamma process Poisson factorization (S- GPPF) framework, a novel supervised topic model for joint modeling of count matrices and document labels. S-GPPF is fully generative and nonparametric: document labels and count matrices are modeled under a uni ed probabilistic framework and the number of latent topics is controlled automatically via a gamma process prior. The framework provides for multi-class classification of documents using a generative max-margin classifier. Several recent data augmentation techniques are leveraged to provide for exact inference using a Gibbs sampling scheme. The first portion of this thesis reviews supervised topic modeling andmore » several key mathematical devices used in the formulation of S-GPPF. The thesis then introduces the S-GPPF generative model and derives the conditional posterior distributions of the latent variables for posterior inference via Gibbs sampling. The S-GPPF is shown to exhibit state-of-the-art performance for joint topic modeling and document classification on a dataset of conference abstracts, beating out competing supervised topic models. The unique properties of S-GPPF along with its competitive performance make it a novel contribution to supervised topic modeling.« less

  10. Mass Spectrometry Parameters Optimization for the 46 Multiclass Pesticides Determination in Strawberries with Gas Chromatography Ion-Trap Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Fernandes, Virgínia C.; Vera, Jose L.; Domingues, Valentina F.; Silva, Luís M. S.; Mateus, Nuno; Delerue-Matos, Cristina

    2012-12-01

    Multiclass analysis method was optimized in order to analyze pesticides traces by gas chromatography with ion-trap and tandem mass spectrometry (GC-MS/MS). The influence of some analytical parameters on pesticide signal response was explored. Five ion trap mass spectrometry (IT-MS) operating parameters, including isolation time (IT), excitation voltage (EV), excitation time (ET), maximum excitation energy or " q" value (q), and isolation mass window (IMW) were numerically tested in order to maximize the instrument analytical signal response. For this, multiple linear regression was used in data analysis to evaluate the influence of the five parameters on the analytical response in the ion trap mass spectrometer and to predict its response. The assessment of the five parameters based on the regression equations substantially increased the sensitivity of IT-MS/MS in the MS/MS mode. The results obtained show that for most of the pesticides, these parameters have a strong influence on both signal response and detection limit. Using the optimized method, a multiclass pesticide analysis was performed for 46 pesticides in a strawberry matrix. Levels higher than the limit established for strawberries by the European Union were found in some samples.

  11. Eco-friendly LC-MS/MS method for analysis of multi-class micropollutants in tap, fountain, and well water from northern Portugal.

    PubMed

    Barbosa, Marta O; Ribeiro, Ana R; Pereira, Manuel F R; Silva, Adrián M T

    2016-11-01

    Organic micropollutants present in drinking water (DW) may cause adverse effects for public health, and so reliable analytical methods are required to detect these pollutants at trace levels in DW. This work describes the first green analytical methodology for multi-class determination of 21 pollutants in DW: seven pesticides, an industrial compound, 12 pharmaceuticals, and a metabolite (some included in Directive 2013/39/EU or Decision 2015/495/EU). A solid-phase extraction procedure followed by ultra-high-performance liquid chromatography coupled to tandem mass spectrometry (offline SPE-UHPLC-MS/MS) method was optimized using eco-friendly solvents, achieving detection limits below 0.20 ng L -1 . The validated analytical method was successfully applied to DW samples from different sources (tap, fountain, and well waters) from different locations in the north of Portugal, as well as before and after bench-scale UV and ozonation experiments in spiked tap water samples. Thirteen compounds were detected, many of them not regulated yet, in the following order of frequency: diclofenac > norfluoxetine > atrazine > simazine > warfarin > metoprolol > alachlor > chlorfenvinphos > trimethoprim > clarithromycin ≈ carbamazepine ≈ PFOS > citalopram. Hazard quotients were also estimated for the quantified substances and suggested no adverse effects to humans. Graphical Abstract Occurrence and removal of multi-class micropollutants in drinking water, analyzed by an eco-friendly LC-MS/MS method.

  12. Optimization of sample preparation by central composite design for multi-class determination of veterinary drugs in bovine muscle, kidney and liver by ultra-high-performance liquid chromatographic-tandem mass spectrometry.

    PubMed

    Rizzetti, Tiele M; de Souza, Maiara P; Prestes, Osmar D; Adaime, Martha B; Zanella, Renato

    2018-04-25

    In this study a simple and fast multi-class method for the determination of veterinary drugs in bovine liver, kidney and muscle was developed. The method employed acetonitrile for extraction followed by clean-up with EMR-Lipid® sorbent and trichloracetic acid. Tests indicated that the use of TCA was most effective when added in the final step of the clean-up procedure instead of during extraction. Different sorbents were tested and optimized using central composite design and the analytes determined by ultra-high-performance liquid chromatographic-tandem mass spectrometry (UHPLC-MS/MS). The method was validated according the European Commission Decision 2002/657 presenting satisfactory results for 69 veterinary drugs in bovine liver and 68 compounds in bovine muscle and kidney. The method was applied in real samples and in proficiency tests and proved to be adequate for routine analysis. Residues of abamectin, doramectin, eprinomectin and ivermectin were found in samples of bovine muscle and only ivermectin in bovine liver. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Comparison of two microextraction methods based on solidification of floating organic droplet for the determination of multiclass analytes in river water samples by liquid chromatography tandem mass spectrometry using Central Composite Design.

    PubMed

    Asati, Ankita; Satyanarayana, G N V; Patel, Devendra K

    2017-09-01

    Two low density organic solvents based liquid-liquid microextraction methods, namely Vortex assisted liquid-liquid microextraction based on solidification of floating organic droplet (VALLME-SFO) and Dispersive liquid-liquid microextraction based on solidification of floating organic droplet(DLLME-SFO) have been compared for the determination of multiclass analytes (pesticides, plasticizers, pharmaceuticals and personal care products) in river water samples by using liquid chromatography tandem mass spectrometry (LC-MS/MS). The effect of various experimental parameters on the efficiency of the two methods and their optimum values were studied with the aid of Central Composite Design (CCD) and Response Surface Methodology(RSM). Under optimal conditions, VALLME-SFO was validated in terms of limit of detection, limit of quantification, dynamic linearity range, determination of coefficient, enrichment factor and extraction recovery for which the respective values were (0.011-0.219ngmL -1 ), (0.035-0.723ngmL -1 ), (0.050-0.500ngmL -1 ), (R 2 =0.992-0.999), (40-56), (80-106%). However, when the DLLME-SFO method was validated under optimal conditions, the range of values of limit of detection, limit of quantification, dynamic linearity range, determination of coefficient, enrichment factor and extraction recovery were (0.025-0.377ngmL -1 ), (0.083-1.256ngmL -1 ), (0.100-1.000ngmL -1 ), (R 2 =0.990-0.999), (35-49), (69-98%) respectively. Interday and intraday precisions were calculated as percent relative standard deviation (%RSD) and the values were ≤15% for VALLME-SFO and DLLME-SFO methods. Both methods were successfully applied for determining multiclass analytes in river water samples. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. A workflow for multiclass determination of 256 pesticides in essential oils by liquid chromatography tandem mass spectrometry using evaporation and dilution approaches: Application to lavandin, lemon and cypress essential oils.

    PubMed

    Fillatre, Yoann; Rondeau, David; Daguin, Antoine; Communal, Pierre-Yves

    2016-01-01

    This paper describes the determination of 256 multiclass pesticides in cypress and lemon essential oils (EOs) by the way of liquid chromatography-electrospray ionization tandem mass spectrometry (LC-ESI/MS/MS) analysis using the scheduled selected reaction monitoring mode (sSRM) available on a hybrid quadrupole linear ion trap (QLIT) mass spectrometer. The performance of a sample preparation of lemon and cypress EOs based on dilution or evaporation under nitrogen assisted by a controlled heating were assessed. The best limits of quantification (LOQs) were achieved with the evaporation under nitrogen method giving LOQs≤10µgL(-1) for 91% of the pesticides. In addition the very satisfactory results obtained for recovery, repeatability and linearity showed that for EOs of relatively low evaporation temperature, a sample preparation based on evaporation under nitrogen is well adapted and preferable to dilution. By compiling these results with those previously published by some of us on lavandin EO, we proposed a workflow dedicated to multiresidue determination of pesticides in various EOs by LC-ESI/sSRM. Among the steps involved in this workflow, the protocol related to mass spectrometry proposes an alternative confirmation method to the classical SRM ratio criteria based on a sSRM survey scan followed by an information-dependent acquisition using the sensitive enhanced product ion (EPI) scan to generate MS/MS spectra then compared to a reference. The submitted workflow was applied to the case of lemon EOs samples highlighting for the first time the simultaneous detection of 20 multiclass pesticides in one EO. Some pesticides showed very high concentration levels with amounts greatly exceeding the mgL(-1). Copyright © 2015 Elsevier B.V. All rights reserved.

  15. An improved dispersive solid-phase extraction clean-up method for the gas chromatography-negative chemical ionisation tandem mass spectrometric determination of multiclass pesticide residues in edible oils.

    PubMed

    Deme, Pragney; Azmeera, Tirupathi; Prabhavathi Devi, B L A; Jonnalagadda, Padmaja R; Prasad, R B N; Vijaya Sarathi, U V R

    2014-01-01

    An improved sample preparation using dispersive solid-phase extraction clean-up was proposed for the trace level determination of 35 multiclass pesticide residues (organochlorine, organophosphorus and synthetic pyrethroids) in edible oils. Quantification of the analytes was carried out by gas chromatography-mass spectrometry in negative chemical ionisation mode (GC-NCI-MS/MS). The limit of detection and limit of quantification of residues were in the range of 0.01-1ng/g and 0.05-2ng/g, respectively. The analytes showed recoveries between 62% and 110%, and the matrix effect was observed to be less than 25% for most of the pesticides. Crude edible oil samples showed endosulfan isomers, p,p'-DDD, α-cypermethrin, chlorpyrifos, and diazinon residues in the range of 0.56-2.14ng/g. However, no pesticide residues in the detection range of the method were observed in refined oils. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. A support vector machine approach for classification of welding defects from ultrasonic signals

    NASA Astrophysics Data System (ADS)

    Chen, Yuan; Ma, Hong-Wei; Zhang, Guang-Ming

    2014-07-01

    Defect classification is an important issue in ultrasonic non-destructive evaluation. A layered multi-class support vector machine (LMSVM) classification system, which combines multiple SVM classifiers through a layered architecture, is proposed in this paper. The proposed LMSVM classification system is applied to the classification of welding defects from ultrasonic test signals. The measured ultrasonic defect echo signals are first decomposed into wavelet coefficients by the wavelet packet transform. The energy of the wavelet coefficients at different frequency channels are used to construct the feature vectors. The bees algorithm (BA) is then used for feature selection and SVM parameter optimisation for the LMSVM classification system. The BA-based feature selection optimises the energy feature vectors. The optimised feature vectors are input to the LMSVM classification system for training and testing. Experimental results of classifying welding defects demonstrate that the proposed technique is highly robust, precise and reliable for ultrasonic defect classification.

  17. Realistic Subsurface Anomaly Discrimination Using Electromagnetic Induction and an SVM Classifier

    NASA Astrophysics Data System (ADS)

    Pablo Fernández, Juan; Shubitidze, Fridon; Shamatava, Irma; Barrowes, Benjamin E.; O'Neill, Kevin

    2010-12-01

    The environmental research program of the United States military has set up blind tests for detection and discrimination of unexploded ordnance. One such test consists of measurements taken with the EM-63 sensor at Camp Sibert, AL. We review the performance on the test of a procedure that combines a field-potential (HAP) method to locate targets, the normalized surface magnetic source (NSMS) model to characterize them, and a support vector machine (SVM) to classify them. The HAP method infers location from the scattered magnetic field and its associated scalar potential, the latter reconstructed using equivalent sources. NSMS replaces the target with an enclosing spheroid of equivalent radial magnetization whose integral it uses as a discriminator. SVM generalizes from empirical evidence and can be adapted for multiclass discrimination using a voting system. Our method identifies all potentially dangerous targets correctly and has a false-alarm rate of about 5%.

  18. Effect of Subliminal Lexical Priming on the Subjective Perception of Images: A Machine Learning Approach.

    PubMed

    Mohan, Dhanya Menoth; Kumar, Parmod; Mahmood, Faisal; Wong, Kian Foong; Agrawal, Abhishek; Elgendi, Mohamed; Shukla, Rohit; Ang, Natania; Ching, April; Dauwels, Justin; Chan, Alice H D

    2016-01-01

    The purpose of the study is to examine the effect of subliminal priming in terms of the perception of images influenced by words with positive, negative, and neutral emotional content, through electroencephalograms (EEGs). Participants were instructed to rate how much they like the stimuli images, on a 7-point Likert scale, after being subliminally exposed to masked lexical prime words that exhibit positive, negative, and neutral connotations with respect to the images. Simultaneously, the EEGs were recorded. Statistical tests such as repeated measures ANOVAs and two-tailed paired-samples t-tests were performed to measure significant differences in the likability ratings among the three prime affect types; the results showed a strong shift in the likeness judgment for the images in the positively primed condition compared to the other two. The acquired EEGs were examined to assess the difference in brain activity associated with the three different conditions. The consistent results obtained confirmed the overall priming effect on participants' explicit ratings. In addition, machine learning algorithms such as support vector machines (SVMs), and AdaBoost classifiers were applied to infer the prime affect type from the ERPs. The highest classification rates of 95.0% and 70.0% obtained respectively for average-trial binary classifier and average-trial multi-class further emphasize that the ERPs encode information about the different kinds of primes.

  19. Effect of Subliminal Lexical Priming on the Subjective Perception of Images: A Machine Learning Approach

    PubMed Central

    Mahmood, Faisal; Wong, Kian Foong; Agrawal, Abhishek; Elgendi, Mohamed; Shukla, Rohit; Ang, Natania; Ching, April; Dauwels, Justin; Chan, Alice H. D.

    2016-01-01

    The purpose of the study is to examine the effect of subliminal priming in terms of the perception of images influenced by words with positive, negative, and neutral emotional content, through electroencephalograms (EEGs). Participants were instructed to rate how much they like the stimuli images, on a 7-point Likert scale, after being subliminally exposed to masked lexical prime words that exhibit positive, negative, and neutral connotations with respect to the images. Simultaneously, the EEGs were recorded. Statistical tests such as repeated measures ANOVAs and two-tailed paired-samples t-tests were performed to measure significant differences in the likability ratings among the three prime affect types; the results showed a strong shift in the likeness judgment for the images in the positively primed condition compared to the other two. The acquired EEGs were examined to assess the difference in brain activity associated with the three different conditions. The consistent results obtained confirmed the overall priming effect on participants’ explicit ratings. In addition, machine learning algorithms such as support vector machines (SVMs), and AdaBoost classifiers were applied to infer the prime affect type from the ERPs. The highest classification rates of 95.0% and 70.0% obtained respectively for average-trial binary classifier and average-trial multi-class further emphasize that the ERPs encode information about the different kinds of primes. PMID:26866807

  20. Multi-class biological tissue classification based on a multi-classifier: Preliminary study of an automatic output power control for ultrasonic surgical units.

    PubMed

    Youn, Su Hyun; Sim, Taeyong; Choi, Ahnryul; Song, Jinsung; Shin, Ki Young; Lee, Il Kwon; Heo, Hyun Mu; Lee, Daeweon; Mun, Joung Hwan

    2015-06-01

    Ultrasonic surgical units (USUs) have the advantage of minimizing tissue damage during surgeries that require tissue dissection by reducing problems such as coagulation and unwanted carbonization, but the disadvantage of requiring manual adjustment of power output according to the target tissue. In order to overcome this limitation, it is necessary to determine the properties of in vivo tissues automatically. We propose a multi-classifier that can accurately classify tissues based on the unique impedance of each tissue. For this purpose, a multi-classifier was built based on single classifiers with high classification rates, and the classification accuracy of the proposed model was compared with that of single classifiers for various electrode types (Type-I: 6 mm invasive; Type-II: 3 mm invasive; Type-III: surface). The sensitivity and positive predictive value (PPV) of the multi-classifier by cross checks were determined. According to the 10-fold cross validation results, the classification accuracy of the proposed model was significantly higher (p<0.05 or <0.01) than that of existing single classifiers for all electrode types. In particular, the classification accuracy of the proposed model was highest when the 3mm invasive electrode (Type-II) was used (sensitivity=97.33-100.00%; PPV=96.71-100.00%). The results of this study are an important contribution to achieving automatic optimal output power adjustment of USUs according to the properties of individual tissues. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Simultaneous determination of multiclass pesticide residues in human plasma using a mini QuEChERS method.

    PubMed

    Srivastava, Anshuman; Rai, Satyajeet; Kumar Sonker, Ashish; Karsauliya, Kajal; Pandey, Chandra Prabha; Singh, Sheelendra Pratap

    2017-06-01

    Blood is one of the most assessable matrices for the determination of pesticide residue exposure in humans. Effective sample preparation/cleanup of biological samples is very important in the development of a sensitive, reproducible, and robust method. In the present study, a simple, cost-effective, and rapid gas chromatography-tandem mass spectrometry method has been developed and validated for simultaneous analysis of 31 multiclass (organophosphates, organochlorines, and synthetic pyrethroids) pesticide residues in human plasma by means of a mini QuEChERS (quick, easy, cheap, effective, rugged, and safe) method. We have adopted a modified version of the QuEChERS method, which is primarily used for pesticide residue analysis in food commodities. The QuEChERS method was optimized by use of different extraction solvents and different amounts and combinations of salts and sorbents (primary-secondary amines and C 18 ) for the dispersive solid-phase extraction step. The results show that a combination of ethyl acetate with 2% acetic acid, magnesium sulfate (0.4 g), and solid-phase extraction for sample cleanup with primary-secondary amines (50 mg) per 1-mL volume of plasma is the most suitable for generating acceptable results with high recoveries for all multiclass pesticides from human plasma. The mean recovery ranged from 74% to 109% for all the analytes. The limit of quantification and limit of detection of the method ranged from 0.12 to 13.53 ng mL -1 and from 0.04 to 4.10 ng mL -1 respectively. The intraday precision and the interday precision of the method were 6% or less and 11% or less respectively. This method would be useful for the analysis of a wide range of pesticides of interest in a small volume of clinical and/or forensic samples to support biomonitoring and toxicological applications. Graphical Abstract Pesticide residues analysis in human plasma using mini QuEChERS method.

  2. Multi-class SVM model for fMRI-based classification and grading of liver fibrosis

    NASA Astrophysics Data System (ADS)

    Freiman, M.; Sela, Y.; Edrei, Y.; Pappo, O.; Joskowicz, L.; Abramovitch, R.

    2010-03-01

    We present a novel non-invasive automatic method for the classification and grading of liver fibrosis from fMRI maps based on hepatic hemodynamic changes. This method automatically creates a model for liver fibrosis grading based on training datasets. Our supervised learning method evaluates hepatic hemodynamics from an anatomical MRI image and three T2*-W fMRI signal intensity time-course scans acquired during the breathing of air, air-carbon dioxide, and carbogen. It constructs a statistical model of liver fibrosis from these fMRI scans using a binary-based one-against-all multi class Support Vector Machine (SVM) classifier. We evaluated the resulting classification model with the leave-one out technique and compared it to both full multi-class SVM and K-Nearest Neighbor (KNN) classifications. Our experimental study analyzed 57 slice sets from 13 mice, and yielded a 98.2% separation accuracy between healthy and low grade fibrotic subjects, and an overall accuracy of 84.2% for fibrosis grading. These results are better than the existing image-based methods which can only discriminate between healthy and high grade fibrosis subjects. With appropriate extensions, our method may be used for non-invasive classification and progression monitoring of liver fibrosis in human patients instead of more invasive approaches, such as biopsy or contrast-enhanced imaging.

  3. Multiclass Classification by Adaptive Network of Dendritic Neurons with Binary Synapses Using Structural Plasticity

    PubMed Central

    Hussain, Shaista; Basu, Arindam

    2016-01-01

    The development of power-efficient neuromorphic devices presents the challenge of designing spike pattern classification algorithms which can be implemented on low-precision hardware and can also achieve state-of-the-art performance. In our pursuit of meeting this challenge, we present a pattern classification model which uses a sparse connection matrix and exploits the mechanism of nonlinear dendritic processing to achieve high classification accuracy. A rate-based structural learning rule for multiclass classification is proposed which modifies a connectivity matrix of binary synaptic connections by choosing the best “k” out of “d” inputs to make connections on every dendritic branch (k < < d). Because learning only modifies connectivity, the model is well suited for implementation in neuromorphic systems using address-event representation (AER). We develop an ensemble method which combines several dendritic classifiers to achieve enhanced generalization over individual classifiers. We have two major findings: (1) Our results demonstrate that an ensemble created with classifiers comprising moderate number of dendrites performs better than both ensembles of perceptrons and of complex dendritic trees. (2) In order to determine the moderate number of dendrites required for a specific classification problem, a two-step solution is proposed. First, an adaptive approach is proposed which scales the relative size of the dendritic trees of neurons for each class. It works by progressively adding dendrites with fixed number of synapses to the network, thereby allocating synaptic resources as per the complexity of the given problem. As a second step, theoretical capacity calculations are used to convert each neuronal dendritic tree to its optimal topology where dendrites of each class are assigned different number of synapses. The performance of the model is evaluated on classification of handwritten digits from the benchmark MNIST dataset and compared with other spike classifiers. We show that our system can achieve classification accuracy within 1 − 2% of other reported spike-based classifiers while using much less synaptic resources (only 7%) compared to that used by other methods. Further, an ensemble classifier created with adaptively learned sizes can attain accuracy of 96.4% which is at par with the best reported performance of spike-based classifiers. Moreover, the proposed method achieves this by using about 20% of the synapses used by other spike algorithms. We also present results of applying our algorithm to classify the MNIST-DVS dataset collected from a real spike-based image sensor and show results comparable to the best reported ones (88.1% accuracy). For VLSI implementations, we show that the reduced synaptic memory can save upto 4X area compared to conventional crossbar topologies. Finally, we also present a biologically realistic spike-based version for calculating the correlations required by the structural learning rule and demonstrate the correspondence between the rate-based and spike-based methods of learning. PMID:27065782

  4. Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature

    PubMed Central

    2011-01-01

    Background The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest’s Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. Results We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task’s development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew’s Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. Conclusions Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance. PMID:22151769

  5. Health condition identification of multi-stage planetary gearboxes using a mRVM-based method

    NASA Astrophysics Data System (ADS)

    Lei, Yaguo; Liu, Zongyao; Wu, Xionghui; Li, Naipeng; Chen, Wu; Lin, Jing

    2015-08-01

    Multi-stage planetary gearboxes are widely applied in aerospace, automotive and heavy industries. Their key components, such as gears and bearings, can easily suffer from damage due to tough working environment. Health condition identification of planetary gearboxes aims to prevent accidents and save costs. This paper proposes a method based on multiclass relevance vector machine (mRVM) to identify health condition of multi-stage planetary gearboxes. In this method, a mRVM algorithm is adopted as a classifier, and two features, i.e. accumulative amplitudes of carrier orders (AACO) and energy ratio based on difference spectra (ERDS), are used as the input of the classifier to classify different health conditions of multi-stage planetary gearboxes. To test the proposed method, seven health conditions of a two-stage planetary gearbox are considered and vibration data is acquired from the planetary gearbox under different motor speeds and loading conditions. The results of three tests based on different data show that the proposed method obtains an improved identification performance and robustness compared with the existing method.

  6. A Realistic Seizure Prediction Study Based on Multiclass SVM.

    PubMed

    Direito, Bruno; Teixeira, César A; Sales, Francisco; Castelo-Branco, Miguel; Dourado, António

    2017-05-01

    A patient-specific algorithm, for epileptic seizure prediction, based on multiclass support-vector machines (SVM) and using multi-channel high-dimensional feature sets, is presented. The feature sets, combined with multiclass classification and post-processing schemes aim at the generation of alarms and reduced influence of false positives. This study considers 216 patients from the European Epilepsy Database, and includes 185 patients with scalp EEG recordings and 31 with intracranial data. The strategy was tested over a total of 16,729.80[Formula: see text]h of inter-ictal data, including 1206 seizures. We found an overall sensitivity of 38.47% and a false positive rate per hour of 0.20. The performance of the method achieved statistical significance in 24 patients (11% of the patients). Despite the encouraging results previously reported in specific datasets, the prospective demonstration on long-term EEG recording has been limited. Our study presents a prospective analysis of a large heterogeneous, multicentric dataset. The statistical framework based on conservative assumptions, reflects a realistic approach compared to constrained datasets, and/or in-sample evaluations. The improvement of these results, with the definition of an appropriate set of features able to improve the distinction between the pre-ictal and nonpre-ictal states, hence minimizing the effect of confounding variables, remains a key aspect.

  7. On the role of cost-sensitive learning in multi-class brain-computer interfaces.

    PubMed

    Devlaminck, Dieter; Waegeman, Willem; Wyns, Bart; Otte, Georges; Santens, Patrick

    2010-06-01

    Brain-computer interfaces (BCIs) present an alternative way of communication for people with severe disabilities. One of the shortcomings in current BCI systems, recently put forward in the fourth BCI competition, is the asynchronous detection of motor imagery versus resting state. We investigated this extension to the three-class case, in which the resting state is considered virtually lying between two motor classes, resulting in a large penalty when one motor task is misclassified into the other motor class. We particularly focus on the behavior of different machine-learning techniques and on the role of multi-class cost-sensitive learning in such a context. To this end, four different kernel methods are empirically compared, namely pairwise multi-class support vector machines (SVMs), two cost-sensitive multi-class SVMs and kernel-based ordinal regression. The experimental results illustrate that ordinal regression performs better than the other three approaches when a cost-sensitive performance measure such as the mean-squared error is considered. By contrast, multi-class cost-sensitive learning enables us to control the number of large errors made between two motor tasks.

  8. Differential gene expression detection and sample classification using penalized linear regression models.

    PubMed

    Wu, Baolin

    2006-02-15

    Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.

  9. Complex extreme learning machine applications in terahertz pulsed signals feature sets.

    PubMed

    Yin, X-X; Hadjiloucas, S; Zhang, Y

    2014-11-01

    This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  10. RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics

    PubMed Central

    Best, Myron G.; Sol, Nik; Kooi, Irsan; Tannous, Jihane; Westerman, Bart A.; Rustenburg, François; Schellen, Pepijn; Verschueren, Heleen; Post, Edward; Koster, Jan; Ylstra, Bauke; Ameziane, Najim; Dorsman, Josephine; Smit, Egbert F.; Verheul, Henk M.; Noske, David P.; Reijneveld, Jaap C.; Nilsson, R. Jonas A.; Tannous, Bakhos A.; Wesseling, Pieter; Wurdinger, Thomas

    2015-01-01

    Summary Tumor-educated blood platelets (TEPs) are implicated as central players in the systemic and local responses to tumor growth, thereby altering their RNA profile. We determined the diagnostic potential of TEPs by mRNA sequencing of 283 platelet samples. We distinguished 228 patients with localized and metastasized tumors from 55 healthy individuals with 96% accuracy. Across six different tumor types, the location of the primary tumor was correctly identified with 71% accuracy. Also, MET or HER2-positive, and mutant KRAS, EGFR, or PIK3CA tumors were accurately distinguished using surrogate TEP mRNA profiles. Our results indicate that blood platelets provide a valuable platform for pan-cancer, multiclass cancer, and companion diagnostics, possibly enabling clinical advances in blood-based “liquid biopsies”. PMID:26525104

  11. Multi-class methodology to determine pesticides and mycotoxins in green tea and royal jelly supplements by liquid chromatography coupled to Orbitrap high resolution mass spectrometry.

    PubMed

    Martínez-Domínguez, Gerardo; Romero-González, Roberto; Garrido Frenich, Antonia

    2016-04-15

    A multi-class methodology was developed to determine pesticides and mycotoxins in food supplements. The extraction was performed using acetonitrile acidified with formic acid (1%, v/v). Different clean-up sorbents were tested, and the best results were obtained using C18 and zirconium oxide for green tea and royal jelly, respectively. The compounds were determined using ultra high performance liquid chromatography (UHPLC) coupled to Exactive-Orbitrap high resolution mass spectrometry (HRMS). The recovery rates obtained were between 70% and 120% for most of the compounds studied with a relative standard deviation <25%, at three different concentration levels. The calculated limits of quantification (LOQ) were <10 μg/kg. The method was applied to green tea (10) and royal jelly (8) samples. Nine (eight of green tea and one of royal jelly) samples were found to be positive for pesticides at concentrations ranging from 10.6 (cinosulfuron) to 47.9 μg/kg (paclobutrazol). The aflatoxin B1 (5.4 μg/kg) was also found in one of the green tea samples. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Decoding Multiple Sound Categories in the Human Temporal Cortex Using High Resolution fMRI

    PubMed Central

    Zhang, Fengqing; Wang, Ji-Ping; Kim, Jieun; Parrish, Todd; Wong, Patrick C. M.

    2015-01-01

    Perception of sound categories is an important aspect of auditory perception. The extent to which the brain’s representation of sound categories is encoded in specialized subregions or distributed across the auditory cortex remains unclear. Recent studies using multivariate pattern analysis (MVPA) of brain activations have provided important insights into how the brain decodes perceptual information. In the large existing literature on brain decoding using MVPA methods, relatively few studies have been conducted on multi-class categorization in the auditory domain. Here, we investigated the representation and processing of auditory categories within the human temporal cortex using high resolution fMRI and MVPA methods. More importantly, we considered decoding multiple sound categories simultaneously through multi-class support vector machine-recursive feature elimination (MSVM-RFE) as our MVPA tool. Results show that for all classifications the model MSVM-RFE was able to learn the functional relation between the multiple sound categories and the corresponding evoked spatial patterns and classify the unlabeled sound-evoked patterns significantly above chance. This indicates the feasibility of decoding multiple sound categories not only within but across subjects. However, the across-subject variation affects classification performance more than the within-subject variation, as the across-subject analysis has significantly lower classification accuracies. Sound category-selective brain maps were identified based on multi-class classification and revealed distributed patterns of brain activity in the superior temporal gyrus and the middle temporal gyrus. This is in accordance with previous studies, indicating that information in the spatially distributed patterns may reflect a more abstract perceptual level of representation of sound categories. Further, we show that the across-subject classification performance can be significantly improved by averaging the fMRI images over items, because the irrelevant variations between different items of the same sound category are reduced and in turn the proportion of signals relevant to sound categorization increases. PMID:25692885

  13. Decoding multiple sound categories in the human temporal cortex using high resolution fMRI.

    PubMed

    Zhang, Fengqing; Wang, Ji-Ping; Kim, Jieun; Parrish, Todd; Wong, Patrick C M

    2015-01-01

    Perception of sound categories is an important aspect of auditory perception. The extent to which the brain's representation of sound categories is encoded in specialized subregions or distributed across the auditory cortex remains unclear. Recent studies using multivariate pattern analysis (MVPA) of brain activations have provided important insights into how the brain decodes perceptual information. In the large existing literature on brain decoding using MVPA methods, relatively few studies have been conducted on multi-class categorization in the auditory domain. Here, we investigated the representation and processing of auditory categories within the human temporal cortex using high resolution fMRI and MVPA methods. More importantly, we considered decoding multiple sound categories simultaneously through multi-class support vector machine-recursive feature elimination (MSVM-RFE) as our MVPA tool. Results show that for all classifications the model MSVM-RFE was able to learn the functional relation between the multiple sound categories and the corresponding evoked spatial patterns and classify the unlabeled sound-evoked patterns significantly above chance. This indicates the feasibility of decoding multiple sound categories not only within but across subjects. However, the across-subject variation affects classification performance more than the within-subject variation, as the across-subject analysis has significantly lower classification accuracies. Sound category-selective brain maps were identified based on multi-class classification and revealed distributed patterns of brain activity in the superior temporal gyrus and the middle temporal gyrus. This is in accordance with previous studies, indicating that information in the spatially distributed patterns may reflect a more abstract perceptual level of representation of sound categories. Further, we show that the across-subject classification performance can be significantly improved by averaging the fMRI images over items, because the irrelevant variations between different items of the same sound category are reduced and in turn the proportion of signals relevant to sound categorization increases.

  14. Kernel Wiener filter and its application to pattern recognition.

    PubMed

    Yoshino, Hirokazu; Dong, Chen; Washizawa, Yoshikazu; Yamashita, Yukihiko

    2010-11-01

    The Wiener filter (WF) is widely used for inverse problems. From an observed signal, it provides the best estimated signal with respect to the squared error averaged over the original and the observed signals among linear operators. The kernel WF (KWF), extended directly from WF, has a problem that an additive noise has to be handled by samples. Since the computational complexity of kernel methods depends on the number of samples, a huge computational cost is necessary for the case. By using the first-order approximation of kernel functions, we realize KWF that can handle such a noise not by samples but as a random variable. We also propose the error estimation method for kernel filters by using the approximations. In order to show the advantages of the proposed methods, we conducted the experiments to denoise images and estimate errors. We also apply KWF to classification since KWF can provide an approximated result of the maximum a posteriori classifier that provides the best recognition accuracy. The noise term in the criterion can be used for the classification in the presence of noise or a new regularization to suppress changes in the input space, whereas the ordinary regularization for the kernel method suppresses changes in the feature space. In order to show the advantages of the proposed methods, we conducted experiments of binary and multiclass classifications and classification in the presence of noise.

  15. Segmentation, feature extraction, and multiclass brain tumor classification.

    PubMed

    Sachdeva, Jainy; Kumar, Vinod; Gupta, Indra; Khandelwal, Niranjan; Ahuja, Chirag Kamal

    2013-12-01

    Multiclass brain tumor classification is performed by using a diversified dataset of 428 post-contrast T1-weighted MR images from 55 patients. These images are of primary brain tumors namely astrocytoma (AS), glioblastoma multiforme (GBM), childhood tumor-medulloblastoma (MED), meningioma (MEN), secondary tumor-metastatic (MET), and normal regions (NR). Eight hundred fifty-six regions of interest (SROIs) are extracted by a content-based active contour model. Two hundred eighteen intensity and texture features are extracted from these SROIs. In this study, principal component analysis (PCA) is used for reduction of dimensionality of the feature space. These six classes are then classified by artificial neural network (ANN). Hence, this approach is named as PCA-ANN approach. Three sets of experiments have been performed. In the first experiment, classification accuracy by ANN approach is performed. In the second experiment, PCA-ANN approach with random sub-sampling has been used in which the SROIs from the same patient may get repeated during testing. It is observed that the classification accuracy has increased from 77 to 91 %. PCA-ANN has delivered high accuracy for each class: AS-90.74 %, GBM-88.46 %, MED-85 %, MEN-90.70 %, MET-96.67 %, and NR-93.78 %. In the third experiment, to remove bias and to test the robustness of the proposed system, data is partitioned in a manner such that the SROIs from the same patient are not common for training and testing sets. In this case also, the proposed system has performed well by delivering an overall accuracy of 85.23 %. The individual class accuracy for each class is: AS-86.15 %, GBM-65.1 %, MED-63.36 %, MEN-91.5 %, MET-65.21 %, and NR-93.3 %. A computer-aided diagnostic system comprising of developed methods for segmentation, feature extraction, and classification of brain tumors can be beneficial to radiologists for precise localization, diagnosis, and interpretation of brain tumors on MR images.

  16. Direct immersion single drop micro-extraction method for multi-class pesticides analysis in mango using GC-MS.

    PubMed

    Pano-Farias, Norma S; Ceballos-Magaña, Silvia G; Muñiz-Valencia, Roberto; Jurado, Jose M; Alcázar, Ángela; Aguayo-Villarreal, Ismael A

    2017-12-15

    Due the negative effects of pesticides on environment and human health, more efficient and environmentally friendly methods are needed. In this sense, a simple, fast, free from memory effects and economical direct-immersion single drop micro-extraction (SDME) method and GC-MS for multi-class pesticides determination in mango samples was developed. Sample pre-treatment using ultrasound-assisted solvent extraction and factors affecting the SDME procedure (extractant solvent, drop volume, stirring rate, ionic strength, time, pH and temperature) were optimized using factorial experimental design. This method presented high sensitive (LOD: 0.14-169.20μgkg -1 ), acceptable precision (RSD: 0.7-19.1%), satisfactory recovery (69-119%) and high enrichment factors (20-722). Several obtained LOQs are below the MRLs established by the European Commission; therefore, the method could be applied for pesticides determination in routing analysis and custom laboratories. Moreover, this method has shown to be suitable for determination of some of the studied pesticides in lime, melon, papaya, banana, tomato, and lettuce. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Simultaneous determination of multi-residue and multi-class antibiotics in aquaculture shrimps by UPLC-MS/MS.

    PubMed

    Saxena, Sushil Kumar; Rangasamy, Rajesh; Krishnan, Anoop A; Singh, Dhirendra P; Uke, Sumedh P; Malekadi, Praveen Kumar; Sengar, Anoop S; Mohamed, D Peer; Gupta, Ananda

    2018-09-15

    An accurate, reliable and fast multi-residue, multi-class method using ultra-performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) was developed and validated for simultaneous determination and quantification of 24 pharmacologically active substances of three different classes (Quinolones including fluoroquinolones, sulphonamides and tetracyclines) in aquaculture shrimps. Sample preparation involves extraction with acetonitrile containing 0.1% formic acid and followed by clean up with n-hexane and 0.1% methanol in water by UPLC-MS/MS within 8 min. The method was validated according to European Commission Decision 2002/657. Acceptable values were obtained for linearity (5-200 μg kg -1 ), specificity, Limit of Quantification (5-10 μg kg -1 ), recovery (between 83 and 100%), repeatability (RSD < 9%), within lab reproducibility (RSD < 15%), reproducibility (RSD ≤ 22%), decision limit (105-116 μg kg -1 ) and detection capability (110-132 μg kg -1 ). The validated method was applied to aquaculture shrimp samples from India. Copyright © 2018 Elsevier Ltd. All rights reserved.

  18. Multiclass determination and confirmation of antibiotic residues in honey using LC-MS/MS.

    PubMed

    Lopez, Mayda I; Pettis, Jeffery S; Smith, I Barton; Chu, Pak-Sin

    2008-03-12

    A multiclass method has been developed for the determination and confirmation in honey of tetracyclines (chlortetracycline, doxycycline, oxytetracycline, and tetracycline), fluoroquinolones (ciprofloxacin, danofloxacin, difloxacin, enrofloxacin, and sarafloxacin), macrolides (tylosin), lincosamides (lincomycin), aminoglycosides (streptomycin), sulfonamides (sulfathiazole), phenicols (chloramphenicol), and fumagillin residues using liquid chromatography tandem mass spectrometry (LC-MS/MS). Erythromycin (a macrolide) and monensin (an ionophore) can be detected and confirmed but not quantitated. Honey samples (approximately 2 g) are dissolved in 10 mL of water and centrifuged. An aliquot of the supernatant is used to determine streptomycin. The remaining supernatant is filtered through a fine-mesh nylon fabric and cleaned up by solid phase extraction. After solvent evaporation and sample reconstitution, 15 antibiotics are assayed by LC-MS/MS using electrospray ionization (ESI) in positive ion mode. Afterward, chloramphenicol is assayed using ESI in negative ion mode. The method has been validated at the low part per billion levels for most of the drugs with accuracies between 65 and 104% and coefficients of variation less than 17%. The evaluation of matrix effects caused by honey of different floral origin is presented.

  19. Development of an Analytical Procedure for the Determination of Multiclass Compounds for Forensic Veterinary Toxicology.

    PubMed

    Sell, Bartosz; Sniegocki, Tomasz; Zmudzki, Jan; Posyniak, Andrzej

    2018-04-01

    Reported here is a new analytical multiclass method based on QuEChERS technique, which has proven to be effective in diagnosing fatal poisoning cases in animals. This method has been developed for the determination of analytes in liver samples comprising rodenticides, carbamate and organophosphorus pesticides, coccidiostats and mycotoxins. The procedure entails addition of acetonitrile and sodium acetate to 2 g of homogenized liver sample. The mixture was shaken intensively and centrifuged for phase separation, which was followed by an organic phase transfer into a tube containing sorbents (PSA and C18) and magnesium sulfate, then it was centrifuged, the supernatant was filtered and analyzed by liquid chromatography tandem mass spectrometry. A validation of the procedure was performed. Repeatability variation coefficients <15% have been achieved for most of the analyzed substances. Analytical conditions allowed for a successful separation of variety of poisons with the typical screening detection limit at ≤10 μg/kg levels. The method was used to investigate more than 100 animals poisoning incidents and proved that is useful to be used in animal forensic toxicology cases.

  20. System and method for memory allocation in a multiclass memory system

    DOEpatents

    Loh, Gabriel; Meswani, Mitesh; Ignatowski, Michael; Nutter, Mark

    2016-06-28

    A system for memory allocation in a multiclass memory system includes a processor coupleable to a plurality of memories sharing a unified memory address space, and a library store to store a library of software functions. The processor identifies a type of a data structure in response to a memory allocation function call to the library for allocating memory to the data structure. Using the library, the processor allocates portions of the data structure among multiple memories of the multiclass memory system based on the type of the data structure.

  1. Scoliosis curve type classification using kernel machine from 3D trunk image

    NASA Astrophysics Data System (ADS)

    Adankon, Mathias M.; Dansereau, Jean; Parent, Stefan; Labelle, Hubert; Cheriet, Farida

    2012-03-01

    Adolescent idiopathic scoliosis (AIS) is a deformity of the spine manifested by asymmetry and deformities of the external surface of the trunk. Classification of scoliosis deformities according to curve type is used to plan management of scoliosis patients. Currently, scoliosis curve type is determined based on X-ray exam. However, cumulative exposure to X-rays radiation significantly increases the risk for certain cancer. In this paper, we propose a robust system that can classify the scoliosis curve type from non invasive acquisition of 3D trunk surface of the patients. The 3D image of the trunk is divided into patches and local geometric descriptors characterizing the surface of the back are computed from each patch and forming the features. We perform the reduction of the dimensionality by using Principal Component Analysis and 53 components were retained. In this work a multi-class classifier is built with Least-squares support vector machine (LS-SVM) which is a kernel classifier. For this study, a new kernel was designed in order to achieve a robust classifier in comparison with polynomial and Gaussian kernel. The proposed system was validated using data of 103 patients with different scoliosis curve types diagnosed and classified by an orthopedic surgeon from the X-ray images. The average rate of successful classification was 93.3% with a better rate of prediction for the major thoracic and lumbar/thoracolumbar types.

  2. On multi-site damage identification using single-site training data

    NASA Astrophysics Data System (ADS)

    Barthorpe, R. J.; Manson, G.; Worden, K.

    2017-11-01

    This paper proposes a methodology for developing multi-site damage location systems for engineering structures that can be trained using single-site damaged state data only. The methodology involves training a sequence of binary classifiers based upon single-site damage data and combining the developed classifiers into a robust multi-class damage locator. In this way, the multi-site damage identification problem may be decomposed into a sequence of binary decisions. In this paper Support Vector Classifiers are adopted as the means of making these binary decisions. The proposed methodology represents an advancement on the state of the art in the field of multi-site damage identification which require either: (1) full damaged state data from single- and multi-site damage cases or (2) the development of a physics-based model to make multi-site model predictions. The potential benefit of the proposed methodology is that a significantly reduced number of recorded damage states may be required in order to train a multi-site damage locator without recourse to physics-based model predictions. In this paper it is first demonstrated that Support Vector Classification represents an appropriate approach to the multi-site damage location problem, with methods for combining binary classifiers discussed. Next, the proposed methodology is demonstrated and evaluated through application to a real engineering structure - a Piper Tomahawk trainer aircraft wing - with its performance compared to classifiers trained using the full damaged-state dataset.

  3. An improved PSO-SVM model for online recognition defects in eddy current testing

    NASA Astrophysics Data System (ADS)

    Liu, Baoling; Hou, Dibo; Huang, Pingjie; Liu, Banteng; Tang, Huayi; Zhang, Wubo; Chen, Peihua; Zhang, Guangxin

    2013-12-01

    Accurate and rapid recognition of defects is essential for structural integrity and health monitoring of in-service device using eddy current (EC) non-destructive testing. This paper introduces a novel model-free method that includes three main modules: a signal pre-processing module, a classifier module and an optimisation module. In the signal pre-processing module, a kind of two-stage differential structure is proposed to suppress the lift-off fluctuation that could contaminate the EC signal. In the classifier module, multi-class support vector machine (SVM) based on one-against-one strategy is utilised for its good accuracy. In the optimisation module, the optimal parameters of classifier are obtained by an improved particle swarm optimisation (IPSO) algorithm. The proposed IPSO technique can improve convergence performance of the primary PSO through the following strategies: nonlinear processing of inertia weight, introductions of the black hole and simulated annealing model with extremum disturbance. The good generalisation ability of the IPSO-SVM model has been validated through adding additional specimen into the testing set. Experiments show that the proposed algorithm can achieve higher recognition accuracy and efficiency than other well-known classifiers and the superiorities are more obvious with less training set, which contributes to online application.

  4. Immune Centroids Over-Sampling Method for Multi-Class Classification

    DTIC Science & Technology

    2015-05-22

    recognize to specific antigens . The response of a receptor to an antigen can activate its hosting B-cell. Activated B-cell then proliferates and...modifying N.K. Jerne’s theory. The theory states that in a pre-existing group of lympho- cytes ( specifically B cells), a specific antigen only...the clusters of each small class, which have high data density, called global immune centroids over-sampling (denoted as Global-IC). Specifically

  5. Development and application of operational techniques for the inventory and monitoring of resources and uses for the Texas coastal zone. [Galvaston Bay and San Antonio test sites

    NASA Technical Reports Server (NTRS)

    Jones, R. (Principal Investigator); Harwood, P.; Finley, R.; Clements, G.; Lodwick, L.; Mcculloch, S.; Marphy, D.

    1976-01-01

    The author has identified the following significant results. The most significant ADP result was the modification of the DAM package to produce classified printouts, scaled and registered to U.S.G.S., 71/2 minute topographic maps from LARSYS-type classification files. With this modification, all the powerful scaling and registration capabilities of DAM become available for multiclass classification files. The most significant results with respect to image interpretation were the application of mapping techniques to a new, more complex area, and the refinement of an image interpretation procedure which should yield the best results.

  6. Boosting bonsai trees for handwritten/printed text discrimination

    NASA Astrophysics Data System (ADS)

    Ricquebourg, Yann; Raymond, Christian; Poirriez, Baptiste; Lemaitre, Aurélie; Coüasnon, Bertrand

    2013-12-01

    Boosting over decision-stumps proved its efficiency in Natural Language Processing essentially with symbolic features, and its good properties (fast, few and not critical parameters, not sensitive to over-fitting) could be of great interest in the numeric world of pixel images. In this article we investigated the use of boosting over small decision trees, in image classification processing, for the discrimination of handwritten/printed text. Then, we conducted experiments to compare it to usual SVM-based classification revealing convincing results with very close performance, but with faster predictions and behaving far less as a black-box. Those promising results tend to make use of this classifier in more complex recognition tasks like multiclass problems.

  7. RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics.

    PubMed

    Best, Myron G; Sol, Nik; Kooi, Irsan; Tannous, Jihane; Westerman, Bart A; Rustenburg, François; Schellen, Pepijn; Verschueren, Heleen; Post, Edward; Koster, Jan; Ylstra, Bauke; Ameziane, Najim; Dorsman, Josephine; Smit, Egbert F; Verheul, Henk M; Noske, David P; Reijneveld, Jaap C; Nilsson, R Jonas A; Tannous, Bakhos A; Wesseling, Pieter; Wurdinger, Thomas

    2015-11-09

    Tumor-educated blood platelets (TEPs) are implicated as central players in the systemic and local responses to tumor growth, thereby altering their RNA profile. We determined the diagnostic potential of TEPs by mRNA sequencing of 283 platelet samples. We distinguished 228 patients with localized and metastasized tumors from 55 healthy individuals with 96% accuracy. Across six different tumor types, the location of the primary tumor was correctly identified with 71% accuracy. Also, MET or HER2-positive, and mutant KRAS, EGFR, or PIK3CA tumors were accurately distinguished using surrogate TEP mRNA profiles. Our results indicate that blood platelets provide a valuable platform for pan-cancer, multiclass cancer, and companion diagnostics, possibly enabling clinical advances in blood-based "liquid biopsies". Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Ultrahigh-Dimensional Multiclass Linear Discriminant Analysis by Pairwise Sure Independence Screening

    PubMed Central

    Pan, Rui; Wang, Hansheng; Li, Runze

    2016-01-01

    This paper is concerned with the problem of feature screening for multi-class linear discriminant analysis under ultrahigh dimensional setting. We allow the number of classes to be relatively large. As a result, the total number of relevant features is larger than usual. This makes the related classification problem much more challenging than the conventional one, where the number of classes is small (very often two). To solve the problem, we propose a novel pairwise sure independence screening method for linear discriminant analysis with an ultrahigh dimensional predictor. The proposed procedure is directly applicable to the situation with many classes. We further prove that the proposed method is screening consistent. Simulation studies are conducted to assess the finite sample performance of the new procedure. We also demonstrate the proposed methodology via an empirical analysis of a real life example on handwritten Chinese character recognition. PMID:28127109

  9. Multiclass cancer classification using a feature subset-based ensemble from microRNA expression profiles.

    PubMed

    Piao, Yongjun; Piao, Minghao; Ryu, Keun Ho

    2017-01-01

    Cancer classification has been a crucial topic of research in cancer treatment. In the last decade, messenger RNA (mRNA) expression profiles have been widely used to classify different types of cancers. With the discovery of a new class of small non-coding RNAs; known as microRNAs (miRNAs), various studies have shown that the expression patterns of miRNA can also accurately classify human cancers. Therefore, there is a great demand for the development of machine learning approaches to accurately classify various types of cancers using miRNA expression data. In this article, we propose a feature subset-based ensemble method in which each model is learned from a different projection of the original feature space to classify multiple cancers. In our method, the feature relevance and redundancy are considered to generate multiple feature subsets, the base classifiers are learned from each independent miRNA subset, and the average posterior probability is used to combine the base classifiers. To test the performance of our method, we used bead-based and sequence-based miRNA expression datasets and conducted 10-fold and leave-one-out cross validations. The experimental results show that the proposed method yields good results and has higher prediction accuracy than popular ensemble methods. The Java program and source code of the proposed method and the datasets in the experiments are freely available at https://sourceforge.net/projects/mirna-ensemble/. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Brain-Computer Interface Based on Generation of Visual Images

    PubMed Central

    Bobrov, Pavel; Frolov, Alexander; Cantor, Charles; Fedulova, Irina; Bakhnyan, Mikhail; Zhavoronkov, Alexander

    2011-01-01

    This paper examines the task of recognizing EEG patterns that correspond to performing three mental tasks: relaxation and imagining of two types of pictures: faces and houses. The experiments were performed using two EEG headsets: BrainProducts ActiCap and Emotiv EPOC. The Emotiv headset becomes widely used in consumer BCI application allowing for conducting large-scale EEG experiments in the future. Since classification accuracy significantly exceeded the level of random classification during the first three days of the experiment with EPOC headset, a control experiment was performed on the fourth day using ActiCap. The control experiment has shown that utilization of high-quality research equipment can enhance classification accuracy (up to 68% in some subjects) and that the accuracy is independent of the presence of EEG artifacts related to blinking and eye movement. This study also shows that computationally-inexpensive Bayesian classifier based on covariance matrix analysis yields similar classification accuracy in this problem as a more sophisticated Multi-class Common Spatial Patterns (MCSP) classifier. PMID:21695206

  11. QuEChERS, a sample preparation technique that is “catching on”: an up-to-date interview with its inventors

    USDA-ARS?s Scientific Manuscript database

    The technique of QuEChERS (Quick, Easy, Cheap, Effective, Rugged and Safe) is only 7 years old, yet it is revolutionizing the manner in which multiresidue, multiclass pesticide analysis (and perhaps beyond) is performed. Columnist Ron Majors sits down with inventors Steve Lehotay and Michelangelo An...

  12. Negative Correlates of Part-Time Employment during Adolescence: Replication and Elaboration.

    ERIC Educational Resources Information Center

    Steinberg, Laurence; Dornbusch, Sanford M.

    This study examined the relation between part-time employment and adolescent behavior and development in a multi-ethnic, multi-class sample of approximately 4,000 15- through 18-year-olds. The results indicated that long work hours during the school year were associated with diminished investment in schooling and lowered school performance,…

  13. Structural analysis of online handwritten mathematical symbols based on support vector machines

    NASA Astrophysics Data System (ADS)

    Simistira, Foteini; Papavassiliou, Vassilis; Katsouros, Vassilis; Carayannis, George

    2013-01-01

    Mathematical expression recognition is still a very challenging task for the research community mainly because of the two-dimensional (2d) structure of mathematical expressions (MEs). In this paper, we present a novel approach for the structural analysis between two on-line handwritten mathematical symbols of a ME, based on spatial features of the symbols. We introduce six features to represent the spatial affinity of the symbols and compare two multi-class classification methods that employ support vector machines (SVMs): one based on the "one-against-one" technique and one based on the "one-against-all", in identifying the relation between a pair of symbols (i.e. subscript, numerator, etc). A dataset containing 1906 spatial relations derived from the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) 2012 training dataset is constructed to evaluate the classifiers and compare them with the rule-based classifier of the ILSP-1 system participated in the contest. The experimental results give an overall mean error rate of 2.61% for the "one-against-one" SVM approach, 6.57% for the "one-against-all" SVM technique and 12.31% error rate for the ILSP-1 classifier.

  14. Understanding user intents in online health forums.

    PubMed

    Zhang, Thomas; Cho, Jason H D; Zhai, Chengxiang

    2015-07-01

    Online health forums provide a convenient way for patients to obtain medical information and connect with physicians and peers outside of clinical settings. However, large quantities of unstructured and diversified content generated on these forums make it difficult for users to digest and extract useful information. Understanding user intents would enable forums to find and recommend relevant information to users by filtering out threads that do not match particular intents. In this paper, we derive a taxonomy of intents to capture user information needs in online health forums and propose novel pattern-based features for use with a multiclass support vector machine (SVM) classifier to classify original thread posts according to their underlying intents. Since no dataset existed for this task, we employ three annotators to manually label a dataset of 1192 HealthBoards posts spanning four forum topics. Experimental results show that a SVM using pattern-based features is highly capable of identifying user intents in forum posts, reaching a maximum precision of 75%, and that a SVM-based hierarchical classifier using both pattern and word features outperforms its SVM counterpart that uses only word features. Furthermore, comparable classification performance can be achieved by training and testing on posts from different forum topics.

  15. Inter-class sparsity based discriminative least square regression.

    PubMed

    Wen, Jie; Xu, Yong; Li, Zuoyong; Ma, Zhongli; Xu, Yuanrong

    2018-06-01

    Least square regression is a very popular supervised classification method. However, two main issues greatly limit its performance. The first one is that it only focuses on fitting the input features to the corresponding output labels while ignoring the correlations among samples. The second one is that the used label matrix, i.e., zero-one label matrix is inappropriate for classification. To solve these problems and improve the performance, this paper presents a novel method, i.e., inter-class sparsity based discriminative least square regression (ICS_DLSR), for multi-class classification. Different from other methods, the proposed method pursues that the transformed samples have a common sparsity structure in each class. For this goal, an inter-class sparsity constraint is introduced to the least square regression model such that the margins of samples from the same class can be greatly reduced while those of samples from different classes can be enlarged. In addition, an error term with row-sparsity constraint is introduced to relax the strict zero-one label matrix, which allows the method to be more flexible in learning the discriminative transformation matrix. These factors encourage the method to learn a more compact and discriminative transformation for regression and thus has the potential to perform better than other methods. Extensive experimental results show that the proposed method achieves the best performance in comparison with other methods for multi-class classification. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Classifying brain metastases by their primary site of origin using a radiomics approach based on texture analysis: a feasibility study.

    PubMed

    Ortiz-Ramón, Rafael; Larroza, Andrés; Ruiz-España, Silvia; Arana, Estanislao; Moratal, David

    2018-05-14

    To examine the capability of MRI texture analysis to differentiate the primary site of origin of brain metastases following a radiomics approach. Sixty-seven untreated brain metastases (BM) were found in 3D T1-weighted MRI of 38 patients with cancer: 27 from lung cancer, 23 from melanoma and 17 from breast cancer. These lesions were segmented in 2D and 3D to compare the discriminative power of 2D and 3D texture features. The images were quantized using different number of gray-levels to test the influence of quantization. Forty-three rotation-invariant texture features were examined. Feature selection and random forest classification were implemented within a nested cross-validation structure. Classification was evaluated with the area under receiver operating characteristic curve (AUC) considering two strategies: multiclass and one-versus-one. In the multiclass approach, 3D texture features were more discriminative than 2D features. The best results were achieved for images quantized with 32 gray-levels (AUC = 0.873 ± 0.064) using the top four features provided by the feature selection method based on the p-value. In the one-versus-one approach, high accuracy was obtained when differentiating lung cancer BM from breast cancer BM (four features, AUC = 0.963 ± 0.054) and melanoma BM (eight features, AUC = 0.936 ± 0.070) using the optimal dataset (3D features, 32 gray-levels). Classification of breast cancer and melanoma BM was unsatisfactory (AUC = 0.607 ± 0.180). Volumetric MRI texture features can be useful to differentiate brain metastases from different primary cancers after quantizing the images with the proper number of gray-levels. • Texture analysis is a promising source of biomarkers for classifying brain neoplasms. • MRI texture features of brain metastases could help identifying the primary cancer. • Volumetric texture features are more discriminative than traditional 2D texture features.

  17. Describing three-class task performance: three-class linear discriminant analysis and three-class ROC analysis

    NASA Astrophysics Data System (ADS)

    He, Xin; Frey, Eric C.

    2007-03-01

    Binary ROC analysis has solid decision-theoretic foundations and a close relationship to linear discriminant analysis (LDA). In particular, for the case of Gaussian equal covariance input data, the area under the ROC curve (AUC) value has a direct relationship to the Hotelling trace. Many attempts have been made to extend binary classification methods to multi-class. For example, Fukunaga extended binary LDA to obtain multi-class LDA, which uses the multi-class Hotelling trace as a figure-of-merit, and we have previously developed a three-class ROC analysis method. This work explores the relationship between conventional multi-class LDA and three-class ROC analysis. First, we developed a linear observer, the three-class Hotelling observer (3-HO). For Gaussian equal covariance data, the 3- HO provides equivalent performance to the three-class ideal observer and, under less strict conditions, maximizes the signal to noise ratio for classification of all pairs of the three classes simultaneously. The 3-HO templates are not the eigenvectors obtained from multi-class LDA. Second, we show that the three-class Hotelling trace, which is the figureof- merit in the conventional three-class extension of LDA, has significant limitations. Third, we demonstrate that, under certain conditions, there is a linear relationship between the eigenvectors obtained from multi-class LDA and 3-HO templates. We conclude that the 3-HO based on decision theory has advantages both in its decision theoretic background and in the usefulness of its figure-of-merit. Additionally, there exists the possibility of interpreting the two linear features extracted by the conventional extension of LDA from a decision theoretic point of view.

  18. Multiclass Reduced-Set Support Vector Machines

    NASA Technical Reports Server (NTRS)

    Tang, Benyang; Mazzoni, Dominic

    2006-01-01

    There are well-established methods for reducing the number of support vectors in a trained binary support vector machine, often with minimal impact on accuracy. We show how reduced-set methods can be applied to multiclass SVMs made up of several binary SVMs, with significantly better results than reducing each binary SVM independently. Our approach is based on Burges' approach that constructs each reduced-set vector as the pre-image of a vector in kernel space, but we extend this by recomputing the SVM weights and bias optimally using the original SVM objective function. This leads to greater accuracy for a binary reduced-set SVM, and also allows vectors to be 'shared' between multiple binary SVMs for greater multiclass accuracy with fewer reduced-set vectors. We also propose computing pre-images using differential evolution, which we have found to be more robust than gradient descent alone. We show experimental results on a variety of problems and find that this new approach is consistently better than previous multiclass reduced-set methods, sometimes with a dramatic difference.

  19. Discriminative least squares regression for multiclass classification and feature selection.

    PubMed

    Xiang, Shiming; Nie, Feiping; Meng, Gaofeng; Pan, Chunhong; Zhang, Changshui

    2012-11-01

    This paper presents a framework of discriminative least squares regression (LSR) for multiclass classification and feature selection. The core idea is to enlarge the distance between different classes under the conceptual framework of LSR. First, a technique called ε-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged. Then, the ε-draggings are integrated into the LSR model for multiclass classification. Our learning framework, referred to as discriminative LSR, has a compact model form, where there is no need to train two-class machines that are independent of each other. With its compact form, this model can be naturally extended for feature selection. This goal is achieved in terms of L2,1 norm of matrix, generating a sparse learning model for feature selection. The model for multiclass classification and its extension for feature selection are finally solved elegantly and efficiently. Experimental evaluation over a range of benchmark datasets indicates the validity of our method.

  20. A simple, fast and cheap non-SPE screening method for antibacterial residue analysis in milk and liver using liquid chromatography-tandem mass spectrometry.

    PubMed

    Martins, Magda Targa; Melo, Jéssica; Barreto, Fabiano; Hoff, Rodrigo Barcellos; Jank, Louise; Bittencourt, Michele Soares; Arsand, Juliana Bazzan; Schapoval, Elfrides Eva Scherman

    2014-11-01

    In routine laboratory work, screening methods for multiclass analysis can process a large number of samples in a short time. The main challenge is to develop a methodology to detect as many different classes of residues as possible, combined with speed and low cost. An efficient technique for the analysis of multiclass antibacterial residues (fluoroquinolones, tetracyclines, sulfonamides and trimethoprim) was developed based on simple, environment-friendly extraction for bovine milk, cattle and poultry liver. Acidified ethanol was used as an extracting solvent for milk samples. Liver samples were treated using EDTA-washed sand for cell disruption, methanol:water and acidified acetonitrile as extracting solvent. A total of 24 antibacterial residues were detected and confirmed using liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS), at levels between 10, 25 and 50% of the maximum residue limit (MRL). For liver samples a metabolite (sulfaquinoxaline-OH) was also monitored. A validation procedure was conducted for screening purposes in accordance with European Union requirements (2002/657/EC). The detection capability (CCβ) false compliant rate was less than 5% at the lowest level for each residue. Specificity and ruggedness were also discussed. Incurred and routine samples were analyzed and the method was successfully applied. The results proved that this method can be an important tool in routine analysis, since it is very fast and reliable. Copyright © 2014. Published by Elsevier B.V.

  1. A Multi-Class, Interdisciplinary Project Using Elementary Statistics

    ERIC Educational Resources Information Center

    Reese, Margaret

    2012-01-01

    This article describes a multi-class project that employs statistical computing and writing in a statistics class. Three courses, General Ecology, Meteorology, and Introductory Statistics, cooperated on a project for the EPA's Student Design Competition. The continuing investigation has also spawned several undergraduate research projects in…

  2. Surface-enhanced Raman spectroscopy of saliva proteins for the noninvasive differentiation of benign and malignant breast tumors

    PubMed Central

    Feng, Shangyuan; Huang, Shaohua; Lin, Duo; Chen, Guannan; Xu, Yuanji; Li, Yongzeng; Huang, Zufang; Pan, Jianji; Chen, Rong; Zeng, Haishan

    2015-01-01

    The capability of saliva protein analysis, based on membrane protein purification and surface-enhanced Raman spectroscopy (SERS), for detecting benign and malignant breast tumors is presented in this paper. A total of 97 SERS spectra from purified saliva proteins were acquired from samples obtained from three groups: 33 healthy subjects; 33 patients with benign breast tumors; and 31 patients with malignant breast tumors. Subtle but discernible changes in the mean SERS spectra of the three groups were observed. Tentative assignments of the saliva protein SERS spectra demonstrated that benign and malignant breast tumors led to several specific biomolecular changes of the saliva proteins. Multiclass partial least squares–discriminant analysis was utilized to analyze and classify the saliva protein SERS spectra from healthy subjects, benign breast tumor patients, and malignant breast tumor patients, yielding diagnostic sensitivities of 75.75%, 72.73%, and 74.19%, as well as specificities of 93.75%, 81.25%, and 86.36%, respectively. The results from this exploratory work demonstrate that saliva protein SERS analysis combined with partial least squares–discriminant analysis diagnostic algorithms has great potential for the noninvasive and label-free detection of breast cancer. PMID:25609959

  3. A metabolic fingerprinting approach based on selected ion flow tube mass spectrometry (SIFT-MS) and chemometrics: A reliable tool for Mediterranean origin-labeled olive oils authentication.

    PubMed

    Bajoub, Aadil; Medina-Rodríguez, Santiago; Ajal, El Amine; Cuadros-Rodríguez, Luis; Monasterio, Romina Paula; Vercammen, Joeri; Fernández-Gutiérrez, Alberto; Carrasco-Pancorbo, Alegría

    2018-04-01

    Selected Ion flow tube mass spectrometry (SIFT-MS) in combination with chemometrics was used to authenticate the geographical origin of Mediterranean virgin olive oils (VOOs) produced under geographical origin labels. In particular, 130 oil samples from six different Mediterranean regions (Kalamata (Greece); Toscana (Italy); Meknès and Tyout (Morocco); and Priego de Córdoba and Baena (Spain)) were considered. The headspace volatile fingerprints were measured by SIFT-MS in full scan with H 3 O + , NO + and O 2 + as precursor ions and the results were subjected to chemometric treatments. Principal Component Analysis (PCA) was used for preliminary multivariate data analysis and Partial Least Squares-Discriminant Analysis (PLS-DA) was applied to build different models (considering the three reagent ions) to classify samples according to the country of origin and regions (within the same country). The multi-class PLS-DA models showed very good performance in terms of fitting accuracy (98.90-100%) and prediction accuracy (96.70-100% accuracy for cross validation and 97.30-100% accuracy for external validation (test set)). Considering the two-class PLS-DA models, the one for the Spanish samples showed 100% sensitivity, specificity and accuracy in calibration, cross validation and external validation; the model for Moroccan oils also showed very satisfactory results (with perfect scores for almost every parameter in all the cases). Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Selective classification for improved robustness of myoelectric control under nonideal conditions.

    PubMed

    Scheme, Erik J; Englehart, Kevin B; Hudgins, Bernard S

    2011-06-01

    Recent literature in pattern recognition-based myoelectric control has highlighted a disparity between classification accuracy and the usability of upper limb prostheses. This paper suggests that the conventionally defined classification accuracy may be idealistic and may not reflect true clinical performance. Herein, a novel myoelectric control system based on a selective multiclass one-versus-one classification scheme, capable of rejecting unknown data patterns, is introduced. This scheme is shown to outperform nine other popular classifiers when compared using conventional classification accuracy as well as a form of leave-one-out analysis that may be more representative of real prosthetic use. Additionally, the classification scheme allows for real-time, independent adjustment of individual class-pair boundaries making it flexible and intuitive for clinical use.

  5. Efficacy of hidden markov model over support vector machine on multiclass classification of healthy and cancerous cervical tissues

    NASA Astrophysics Data System (ADS)

    Mukhopadhyay, Sabyasachi; Kurmi, Indrajit; Pratiher, Sawon; Mukherjee, Sukanya; Barman, Ritwik; Ghosh, Nirmalya; Panigrahi, Prasanta K.

    2018-02-01

    In this paper, a comparative study between SVM and HMM has been carried out for multiclass classification of cervical healthy and cancerous tissues. In our study, the HMM methodology is more promising to produce higher accuracy in classification.

  6. Diffuse Interface Methods for Multiclass Segmentation of High-Dimensional Data

    DTIC Science & Technology

    2014-03-04

    handwritten digits , 1998. http://yann.lecun.com/exdb/mnist/. [19] S. Nene, S. Nayar, H. Murase, Columbia Object Image Library (COIL-100), Technical Report... recognition on smartphones using a multiclass hardware-friendly support vector machine, in: Ambient Assisted Living and Home Care, Springer, 2012, pp. 216–223.

  7. Looking for Alzheimer's Disease morphometric signatures using machine learning techniques.

    PubMed

    Donnelly-Kehoe, Patricio Andres; Pascariello, Guido Orlando; Gómez, Juan Carlos

    2018-05-15

    We present our results in the International challenge for automated prediction of MCI from MRI data. We evaluate the performance of MRI-based neuromorphometrics features (nMF) in the classification of Healthy Controls (HC), Mild Cognitive Impairment (MCI), converters MCI (cMCI) and Alzheimer's Disease (AD) patients. We propose to segregate participants in three groups according to Mini Mental State Examination score (MMSEs), searching for the main nMF in each group. Then we use them to develop a Multi Classifier System (MCS). We compare the MCS against a single classifier scheme using both MMSEs+nMF and nMF only. We repeat this comparison using three state-of-the-art classification algorithms. The MCS showed the best performance on both Accuracy and Area Under the Receiver Operating Curve (AUC) in comparison with single classifiers. The multiclass AUC for the MCS classification on Test Dataset were 0.83 for HC, 0.76 for cMCI, 0.65 for MCI and 0.95 for AD. Furthermore, MCS's optimum accuracy on Neurodegenerative Disease (ND) detection (AD+cMCI vs MCI+HC) was 81.0% (AUC=0.88), while the single classifiers got 71.3% (AUC=0.86) and 63.1% (AUC=0.79) for MMSEs+nMF and only nMF respectively. The proposed MCS showed a better performance than using all nMF into a single state-of-the-art classifier. These findings suggest that using cognitive scoring, e.g. MMSEs, in the design of a Multi Classifier System improves performance by allowing a better selection of MRI-based features. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data.

    PubMed

    Aliper, Alexander; Plis, Sergey; Artemov, Artem; Ulloa, Alvaro; Mamoshina, Polina; Zhavoronkov, Alex

    2016-07-05

    Deep learning is rapidly advancing many areas of science and technology with multiple success stories in image, text, voice and video recognition, robotics, and autonomous driving. In this paper we demonstrate how deep neural networks (DNN) trained on large transcriptional response data sets can classify various drugs to therapeutic categories solely based on their transcriptional profiles. We used the perturbation samples of 678 drugs across A549, MCF-7, and PC-3 cell lines from the LINCS Project and linked those to 12 therapeutic use categories derived from MeSH. To train the DNN, we utilized both gene level transcriptomic data and transcriptomic data processed using a pathway activation scoring algorithm, for a pooled data set of samples perturbed with different concentrations of the drug for 6 and 24 hours. In both pathway and gene level classification, DNN achieved high classification accuracy and convincingly outperformed the support vector machine (SVM) model on every multiclass classification problem, however, models based on pathway level data performed significantly better. For the first time we demonstrate a deep learning neural net trained on transcriptomic data to recognize pharmacological properties of multiple drugs across different biological systems and conditions. We also propose using deep neural net confusion matrices for drug repositioning. This work is a proof of principle for applying deep learning to drug discovery and development.

  9. Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data

    PubMed Central

    Aliper, Alexander; Plis, Sergey; Artemov, Artem; Ulloa, Alvaro; Mamoshina, Polina; Zhavoronkov, Alex

    2016-01-01

    Deep learning is rapidly advancing many areas of science and technology with multiple success stories in image, text, voice and video recognition, robotics and autonomous driving. In this paper we demonstrate how deep neural networks (DNN) trained on large transcriptional response data sets can classify various drugs to therapeutic categories solely based on their transcriptional profiles. We used the perturbation samples of 678 drugs across A549, MCF‐7 and PC‐3 cell lines from the LINCS project and linked those to 12 therapeutic use categories derived from MeSH. To train the DNN, we utilized both gene level transcriptomic data and transcriptomic data processed using a pathway activation scoring algorithm, for a pooled dataset of samples perturbed with different concentrations of the drug for 6 and 24 hours. In both gene and pathway level classification, DNN convincingly outperformed support vector machine (SVM) model on every multiclass classification problem, however, models based on a pathway level classification perform better. For the first time we demonstrate a deep learning neural net trained on transcriptomic data to recognize pharmacological properties of multiple drugs across different biological systems and conditions. We also propose using deep neural net confusion matrices for drug repositioning. This work is a proof of principle for applying deep learning to drug discovery and development. PMID:27200455

  10. From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

    PubMed

    Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

    2010-01-30

    Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.

  11. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    PubMed Central

    2010-01-01

    Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. PMID:20113515

  12. Human Activity Recognition from Smart-Phone Sensor Data using a Multi-Class Ensemble Learning in Home Monitoring.

    PubMed

    Ghose, Soumya; Mitra, Jhimli; Karunanithi, Mohan; Dowling, Jason

    2015-01-01

    Home monitoring of chronically ill or elderly patient can reduce frequent hospitalisations and hence provide improved quality of care at a reduced cost to the community, therefore reducing the burden on the healthcare system. Activity recognition of such patients is of high importance in such a design. In this work, a system for automatic human physical activity recognition from smart-phone inertial sensors data is proposed. An ensemble of decision trees framework is adopted to train and predict the multi-class human activity system. A comparison of our proposed method with a multi-class traditional support vector machine shows significant improvement in activity recognition accuracies.

  13. Evaluation of a multiclass, multiresidue liquid chromatography-tandem mass spectrometry method for analysis of 120 veterinary drugs in bovine kidney

    USDA-ARS?s Scientific Manuscript database

    Traditionally, regulatory monitoring of veterinary drug residues in food animal tissues involves the use of several single-class methods to cover a wide analytical scope. Multiclass, multiresidue methods of analysis tend to provide greater overall laboratory efficiency than the use of multiple meth...

  14. Multi-class, multi-residue analysis of pesticides, polychlorinated biphenyls, polycyclic aromatic hydrocarbons, polybrominated diphenyl ethers and novel flame retardants....mass spectrometry

    USDA-ARS?s Scientific Manuscript database

    A multi-class, multi-residue method for the analysis of 13 novel flame retardants, 18 representative pesticides, 14 polychlorinated biphenyl (PCB) congeners, 16 polycyclic aromatic hydrocarbons (PAHs), and 7 polybrominated diphenyl ether (PBDE) congeners in catfish muscle was developed and evaluated...

  15. National Trends in Child and Adolescent Psychotropic Polypharmacy in Office-Based Practice, 1996-2007

    ERIC Educational Resources Information Center

    Comer, Jonathan S.; Olfson, Mark; Mojtabai, Ramin

    2010-01-01

    Objective: To examine patterns and recent trends in multiclass psychotropic treatment among youth visits to office-based physicians in the United States. Method: Annual data from the 1996-2007 National Ambulatory Medical Care Surveys were analyzed to examine patterns and trends in multiclass psychotropic treatment within a nationally…

  16. Cost-sensitive AdaBoost algorithm for ordinal regression based on extreme learning machine.

    PubMed

    Riccardi, Annalisa; Fernández-Navarro, Francisco; Carloni, Sante

    2014-10-01

    In this paper, the well known stagewise additive modeling using a multiclass exponential (SAMME) boosting algorithm is extended to address problems where there exists a natural order in the targets using a cost-sensitive approach. The proposed ensemble model uses an extreme learning machine (ELM) model as a base classifier (with the Gaussian kernel and the additional regularization parameter). The closed form of the derived weighted least squares problem is provided, and it is employed to estimate analytically the parameters connecting the hidden layer to the output layer at each iteration of the boosting algorithm. Compared to the state-of-the-art boosting algorithms, in particular those using ELM as base classifier, the suggested technique does not require the generation of a new training dataset at each iteration. The adoption of the weighted least squares formulation of the problem has been presented as an unbiased and alternative approach to the already existing ELM boosting techniques. Moreover, the addition of a cost model for weighting the patterns, according to the order of the targets, enables the classifier to tackle ordinal regression problems further. The proposed method has been validated by an experimental study by comparing it with already existing ensemble methods and ELM techniques for ordinal regression, showing competitive results.

  17. Fully automated macular pathology detection in retina optical coherence tomography images using sparse coding and dictionary learning

    NASA Astrophysics Data System (ADS)

    Sun, Yankui; Li, Shan; Sun, Zhongyang

    2017-01-01

    We propose a framework for automated detection of dry age-related macular degeneration (AMD) and diabetic macular edema (DME) from retina optical coherence tomography (OCT) images, based on sparse coding and dictionary learning. The study aims to improve the classification performance of state-of-the-art methods. First, our method presents a general approach to automatically align and crop retina regions; then it obtains global representations of images by using sparse coding and a spatial pyramid; finally, a multiclass linear support vector machine classifier is employed for classification. We apply two datasets for validating our algorithm: Duke spectral domain OCT (SD-OCT) dataset, consisting of volumetric scans acquired from 45 subjects-15 normal subjects, 15 AMD patients, and 15 DME patients; and clinical SD-OCT dataset, consisting of 678 OCT retina scans acquired from clinics in Beijing-168, 297, and 213 OCT images for AMD, DME, and normal retinas, respectively. For the former dataset, our classifier correctly identifies 100%, 100%, and 93.33% of the volumes with DME, AMD, and normal subjects, respectively, and thus performs much better than the conventional method; for the latter dataset, our classifier leads to a correct classification rate of 99.67%, 99.67%, and 100.00% for DME, AMD, and normal images, respectively.

  18. Spatial modeling and classification of corneal shape.

    PubMed

    Marsolo, Keith; Twa, Michael; Bullimore, Mark A; Parthasarathy, Srinivasan

    2007-03-01

    One of the most promising applications of data mining is in biomedical data used in patient diagnosis. Any method of data analysis intended to support the clinical decision-making process should meet several criteria: it should capture clinically relevant features, be computationally feasible, and provide easily interpretable results. In an initial study, we examined the feasibility of using Zernike polynomials to represent biomedical instrument data in conjunction with a decision tree classifier to distinguish between the diseased and non-diseased eyes. Here, we provide a comprehensive follow-up to that work, examining a second representation, pseudo-Zernike polynomials, to determine whether they provide any increase in classification accuracy. We compare the fidelity of both methods using residual root-mean-square (rms) error and evaluate accuracy using several classifiers: neural networks, C4.5 decision trees, Voting Feature Intervals, and Naïve Bayes. We also examine the effect of several meta-learning strategies: boosting, bagging, and Random Forests (RFs). We present results comparing accuracy as it relates to dataset and transformation resolution over a larger, more challenging, multi-class dataset. They show that classification accuracy is similar for both data transformations, but differs by classifier. We find that the Zernike polynomials provide better feature representation than the pseudo-Zernikes and that the decision trees yield the best balance of classification accuracy and interpretability.

  19. A tri-fold hybrid classification approach for diagnostics with unexampled faulty states

    NASA Astrophysics Data System (ADS)

    Tamilselvan, Prasanna; Wang, Pingfeng

    2015-01-01

    System health diagnostics provides diversified benefits such as improved safety, improved reliability and reduced costs for the operation and maintenance of engineered systems. Successful health diagnostics requires the knowledge of system failures. However, with an increasing system complexity, it is extraordinarily difficult to have a well-tested system so that all potential faulty states can be realized and studied at product testing stage. Thus, real time health diagnostics requires automatic detection of unexampled system faulty states based upon sensory data to avoid sudden catastrophic system failures. This paper presents a trifold hybrid classification (THC) approach for structural health diagnosis with unexampled health states (UHS), which comprises of preliminary UHS identification using a new thresholded Mahalanobis distance (TMD) classifier, UHS diagnostics using a two-class support vector machine (SVM) classifier, and exampled health states diagnostics using a multi-class SVM classifier. The proposed THC approach, which takes the advantages of both TMD and SVM-based classification techniques, is able to identify and isolate the unexampled faulty states through interactively detecting the deviation of sensory data from the exampled health states and forming new ones autonomously. The proposed THC approach is further extended to a generic framework for health diagnostics problems with unexampled faulty states and demonstrated with health diagnostics case studies for power transformers and rolling bearings.

  20. Fully automated macular pathology detection in retina optical coherence tomography images using sparse coding and dictionary learning.

    PubMed

    Sun, Yankui; Li, Shan; Sun, Zhongyang

    2017-01-01

    We propose a framework for automated detection of dry age-related macular degeneration (AMD) and diabetic macular edema (DME) from retina optical coherence tomography (OCT) images, based on sparse coding and dictionary learning. The study aims to improve the classification performance of state-of-the-art methods. First, our method presents a general approach to automatically align and crop retina regions; then it obtains global representations of images by using sparse coding and a spatial pyramid; finally, a multiclass linear support vector machine classifier is employed for classification. We apply two datasets for validating our algorithm: Duke spectral domain OCT (SD-OCT) dataset, consisting of volumetric scans acquired from 45 subjects—15 normal subjects, 15 AMD patients, and 15 DME patients; and clinical SD-OCT dataset, consisting of 678 OCT retina scans acquired from clinics in Beijing—168, 297, and 213 OCT images for AMD, DME, and normal retinas, respectively. For the former dataset, our classifier correctly identifies 100%, 100%, and 93.33% of the volumes with DME, AMD, and normal subjects, respectively, and thus performs much better than the conventional method; for the latter dataset, our classifier leads to a correct classification rate of 99.67%, 99.67%, and 100.00% for DME, AMD, and normal images, respectively.

  1. Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.

    PubMed

    Gálvez, Juan Manuel; Castillo, Daniel; Herrera, Luis Javier; San Román, Belén; Valenzuela, Olga; Ortuño, Francisco Manuel; Rojas, Ignacio

    2018-01-01

    Most of the research studies developed applying microarray technology to the characterization of different pathological states of any disease may fail in reaching statistically significant results. This is largely due to the small repertoire of analysed samples, and to the limitation in the number of states or pathologies usually addressed. Moreover, the influence of potential deviations on the gene expression quantification is usually disregarded. In spite of the continuous changes in omic sciences, reflected for instance in the emergence of new Next-Generation Sequencing-related technologies, the existing availability of a vast amount of gene expression microarray datasets should be properly exploited. Therefore, this work proposes a novel methodological approach involving the integration of several heterogeneous skin cancer series, and a later multiclass classifier design. This approach is thus a way to provide the clinicians with an intelligent diagnosis support tool based on the use of a robust set of selected biomarkers, which simultaneously distinguishes among different cancer-related skin states. To achieve this, a multi-platform combination of microarray datasets from Affymetrix and Illumina manufacturers was carried out. This integration is expected to strengthen the statistical robustness of the study as well as the finding of highly-reliable skin cancer biomarkers. Specifically, the designed operation pipeline has allowed the identification of a small subset of 17 differentially expressed genes (DEGs) from which to distinguish among 7 involved skin states. These genes were obtained from the assessment of a number of potential batch effects on the gene expression data. The biological interpretation of these genes was inspected in the specific literature to understand their underlying information in relation to skin cancer. Finally, in order to assess their possible effectiveness in cancer diagnosis, a cross-validation Support Vector Machines (SVM)-based classification including feature ranking was performed. The accuracy attained exceeded the 92% in overall recognition of the 7 different cancer-related skin states. The proposed integration scheme is expected to allow the co-integration with other state-of-the-art technologies such as RNA-seq.

  2. Probabilistic Open Set Recognition

    NASA Astrophysics Data System (ADS)

    Jain, Lalit Prithviraj

    Real-world tasks in computer vision, pattern recognition and machine learning often touch upon the open set recognition problem: multi-class recognition with incomplete knowledge of the world and many unknown inputs. An obvious way to approach such problems is to develop a recognition system that thresholds probabilities to reject unknown classes. Traditional rejection techniques are not about the unknown; they are about the uncertain boundary and rejection around that boundary. Thus traditional techniques only represent the "known unknowns". However, a proper open set recognition algorithm is needed to reduce the risk from the "unknown unknowns". This dissertation examines this concept and finds existing probabilistic multi-class recognition approaches are ineffective for true open set recognition. We hypothesize the cause is due to weak adhoc assumptions combined with closed-world assumptions made by existing calibration techniques. Intuitively, if we could accurately model just the positive data for any known class without overfitting, we could reject the large set of unknown classes even under this assumption of incomplete class knowledge. For this, we formulate the problem as one of modeling positive training data by invoking statistical extreme value theory (EVT) near the decision boundary of positive data with respect to negative data. We provide a new algorithm called the PI-SVM for estimating the unnormalized posterior probability of class inclusion. This dissertation also introduces a new open set recognition model called Compact Abating Probability (CAP), where the probability of class membership decreases in value (abates) as points move from known data toward open space. We show that CAP models improve open set recognition for multiple algorithms. Leveraging the CAP formulation, we go on to describe the novel Weibull-calibrated SVM (W-SVM) algorithm, which combines the useful properties of statistical EVT for score calibration with one-class and binary support vector machines. Building from the success of statistical EVT based recognition methods such as PI-SVM and W-SVM on the open set problem, we present a new general supervised learning algorithm for multi-class classification and multi-class open set recognition called the Extreme Value Local Basis (EVLB). The design of this algorithm is motivated by the observation that extrema from known negative class distributions are the closest negative points to any positive sample during training, and thus should be used to define the parameters of a probabilistic decision model. In the EVLB, the kernel distribution for each positive training sample is estimated via an EVT distribution fit over the distances to the separating hyperplane between positive training sample and closest negative samples, with a subset of the overall positive training data retained to form a probabilistic decision boundary. Using this subset as a frame of reference, the probability of a sample at test time decreases as it moves away from the positive class. Possessing this property, the EVLB is well-suited to open set recognition problems where samples from unknown or novel classes are encountered at test. Our experimental evaluation shows that the EVLB provides a substantial improvement in scalability compared to standard radial basis function kernel machines, as well as P I-SVM and W-SVM, with improved accuracy in many cases. We evaluate our algorithm on open set variations of the standard visual learning benchmarks, as well as with an open subset of classes from Caltech 256 and ImageNet. Our experiments show that PI-SVM, WSVM and EVLB provide significant advances over the previous state-of-the-art solutions for the same tasks.

  3. Divide and Conquer-Based 1D CNN Human Activity Recognition Using Test Data Sharpening †

    PubMed Central

    Yoon, Sang Min

    2018-01-01

    Human Activity Recognition (HAR) aims to identify the actions performed by humans using signals collected from various sensors embedded in mobile devices. In recent years, deep learning techniques have further improved HAR performance on several benchmark datasets. In this paper, we propose one-dimensional Convolutional Neural Network (1D CNN) for HAR that employs a divide and conquer-based classifier learning coupled with test data sharpening. Our approach leverages a two-stage learning of multiple 1D CNN models; we first build a binary classifier for recognizing abstract activities, and then build two multi-class 1D CNN models for recognizing individual activities. We then introduce test data sharpening during prediction phase to further improve the activity recognition accuracy. While there have been numerous researches exploring the benefits of activity signal denoising for HAR, few researches have examined the effect of test data sharpening for HAR. We evaluate the effectiveness of our approach on two popular HAR benchmark datasets, and show that our approach outperforms both the two-stage 1D CNN-only method and other state of the art approaches. PMID:29614767

  4. Divide and Conquer-Based 1D CNN Human Activity Recognition Using Test Data Sharpening.

    PubMed

    Cho, Heeryon; Yoon, Sang Min

    2018-04-01

    Human Activity Recognition (HAR) aims to identify the actions performed by humans using signals collected from various sensors embedded in mobile devices. In recent years, deep learning techniques have further improved HAR performance on several benchmark datasets. In this paper, we propose one-dimensional Convolutional Neural Network (1D CNN) for HAR that employs a divide and conquer-based classifier learning coupled with test data sharpening. Our approach leverages a two-stage learning of multiple 1D CNN models; we first build a binary classifier for recognizing abstract activities, and then build two multi-class 1D CNN models for recognizing individual activities. We then introduce test data sharpening during prediction phase to further improve the activity recognition accuracy. While there have been numerous researches exploring the benefits of activity signal denoising for HAR, few researches have examined the effect of test data sharpening for HAR. We evaluate the effectiveness of our approach on two popular HAR benchmark datasets, and show that our approach outperforms both the two-stage 1D CNN-only method and other state of the art approaches.

  5. Di-codon Usage for Gene Classification

    NASA Astrophysics Data System (ADS)

    Nguyen, Minh N.; Ma, Jianmin; Fogel, Gary B.; Rajapakse, Jagath C.

    Classification of genes into biologically related groups facilitates inference of their functions. Codon usage bias has been described previously as a potential feature for gene classification. In this paper, we demonstrate that di-codon usage can further improve classification of genes. By using both codon and di-codon features, we achieve near perfect accuracies for the classification of HLA molecules into major classes and sub-classes. The method is illustrated on 1,841 HLA sequences which are classified into two major classes, HLA-I and HLA-II. Major classes are further classified into sub-groups. A binary SVM using di-codon usage patterns achieved 99.95% accuracy in the classification of HLA genes into major HLA classes; and multi-class SVM achieved accuracy rates of 99.82% and 99.03% for sub-class classification of HLA-I and HLA-II genes, respectively. Furthermore, by combining codon and di-codon usages, the prediction accuracies reached 100%, 99.82%, and 99.84% for HLA major class classification, and for sub-class classification of HLA-I and HLA-II genes, respectively.

  6. HPSLPred: An Ensemble Multi-Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source.

    PubMed

    Wan, Shixiang; Duan, Yucong; Zou, Quan

    2017-09-01

    Predicting the subcellular localization of proteins is an important and challenging problem. Traditional experimental approaches are often expensive and time-consuming. Consequently, a growing number of research efforts employ a series of machine learning approaches to predict the subcellular location of proteins. There are two main challenges among the state-of-the-art prediction methods. First, most of the existing techniques are designed to deal with multi-class rather than multi-label classification, which ignores connections between multiple labels. In reality, multiple locations of particular proteins imply that there are vital and unique biological significances that deserve special focus and cannot be ignored. Second, techniques for handling imbalanced data in multi-label classification problems are necessary, but never employed. For solving these two issues, we have developed an ensemble multi-label classifier called HPSLPred, which can be applied for multi-label classification with an imbalanced protein source. For convenience, a user-friendly webserver has been established at http://server.malab.cn/HPSLPred. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. On the asynchronously continuous control of mobile robot movement by motor cortical spiking activity.

    PubMed

    Xu, Zhiming; So, Rosa Q; Toe, Kyaw Kyar; Ang, Kai Keng; Guan, Cuntai

    2014-01-01

    This paper presents an asynchronously intracortical brain-computer interface (BCI) which allows the subject to continuously drive a mobile robot. This system has a great implication for disabled patients to move around. By carefully designing a multiclass support vector machine (SVM), the subject's self-paced instantaneous movement intents are continuously decoded to control the mobile robot. In particular, we studied the stability of the neural representation of the movement directions. Experimental results on the nonhuman primate showed that the overt movement directions were stably represented in ensemble of recorded units, and our SVM classifier could successfully decode such movements continuously along the desired movement path. However, the neural representation of the stop state for the self-paced control was not stably represented and could drift.

  8. exprso: an R-package for the rapid implementation of machine learning algorithms.

    PubMed

    Quinn, Thomas; Tylee, Daniel; Glatt, Stephen

    2016-01-01

    Machine learning plays a major role in many scientific investigations. However, non-expert programmers may struggle to implement the elaborate pipelines necessary to build highly accurate and generalizable models. We introduce exprso , a new R package that is an intuitive machine learning suite designed specifically for non-expert programmers. Built initially for the classification of high-dimensional data, exprso uses an object-oriented framework to encapsulate a number of common analytical methods into a series of interchangeable modules. This includes modules for feature selection, classification, high-throughput parameter grid-searching, elaborate cross-validation schemes (e.g., Monte Carlo and nested cross-validation), ensemble classification, and prediction. In addition, exprso also supports multi-class classification (through the 1-vs-all generalization of binary classifiers) and the prediction of continuous outcomes.

  9. Discrimination of fish populations using parasites: Random Forests on a 'predictable' host-parasite system.

    PubMed

    Pérez-Del-Olmo, A; Montero, F E; Fernández, M; Barrett, J; Raga, J A; Kostadinova, A

    2010-10-01

    We address the effect of spatial scale and temporal variation on model generality when forming predictive models for fish assignment using a new data mining approach, Random Forests (RF), to variable biological markers (parasite community data). Models were implemented for a fish host-parasite system sampled along the Mediterranean and Atlantic coasts of Spain and were validated using independent datasets. We considered 2 basic classification problems in evaluating the importance of variations in parasite infracommunities for assignment of individual fish to their populations of origin: multiclass (2-5 population models, using 2 seasonal replicates from each of the populations) and 2-class task (using 4 seasonal replicates from 1 Atlantic and 1 Mediterranean population each). The main results are that (i) RF are well suited for multiclass population assignment using parasite communities in non-migratory fish; (ii) RF provide an efficient means for model cross-validation on the baseline data and this allows sample size limitations in parasite tag studies to be tackled effectively; (iii) the performance of RF is dependent on the complexity and spatial extent/configuration of the problem; and (iv) the development of predictive models is strongly influenced by seasonal change and this stresses the importance of both temporal replication and model validation in parasite tagging studies.

  10. Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition.

    PubMed

    Chen, Yen-Kuang; Li, Kuo-Bin

    2013-02-07

    The type information of un-annotated membrane proteins provides an important hint for their biological functions. The experimental determination of membrane protein types, despite being more accurate and reliable, is not always feasible due to the costly laboratory procedures, thereby creating a need for the development of bioinformatics methods. This article describes a novel computational classifier for the prediction of membrane protein types using proteins' sequences. The classifier, comprising a collection of one-versus-one support vector machines, makes use of the following sequence attributes: (1) the cationic patch sizes, the orientation, and the topology of transmembrane segments; (2) the amino acid physicochemical properties; (3) the presence of signal peptides or anchors; and (4) the specific protein motifs. A new voting scheme was implemented to cope with the multi-class prediction. Both the training and the testing sequences were collected from SwissProt. Homologous proteins were removed such that there is no pair of sequences left in the datasets with a sequence identity higher than 40%. The performance of the classifier was evaluated by a Jackknife cross-validation and an independent testing experiments. Results show that the proposed classifier outperforms earlier predictors in prediction accuracy in seven of the eight membrane protein types. The overall accuracy was increased from 78.3% to 88.2%. Unlike earlier approaches which largely depend on position-specific substitution matrices and amino acid compositions, most of the sequence attributes implemented in the proposed classifier have supported literature evidences. The classifier has been deployed as a web server and can be accessed at http://bsaltools.ym.edu.tw/predmpt. Copyright © 2012 Elsevier Ltd. All rights reserved.

  11. A hybrid three-class brain-computer interface system utilizing SSSEPs and transient ERPs

    NASA Astrophysics Data System (ADS)

    Breitwieser, Christian; Pokorny, Christoph; Müller-Putz, Gernot R.

    2016-12-01

    Objective. This paper investigates the fusion of steady-state somatosensory evoked potentials (SSSEPs) and transient event-related potentials (tERPs), evoked through tactile simulation on the left and right-hand fingertips, in a three-class EEG based hybrid brain-computer interface. It was hypothesized, that fusing the input signals leads to higher classification rates than classifying tERP and SSSEP individually. Approach. Fourteen subjects participated in the studies, consisting of a screening paradigm to determine person dependent resonance-like frequencies and a subsequent online paradigm. The whole setup of the BCI system was based on open interfaces, following suggestions for a common implementation platform. During the online experiment, subjects were instructed to focus their attention on the stimulated fingertips as indicated by a visual cue. The recorded data were classified during runtime using a multi-class shrinkage LDA classifier and the outputs were fused together applying a posterior probability based fusion. Data were further analyzed offline, involving a combined classification of SSSEP and tERP features as a second fusion principle. The final results were tested for statistical significance applying a repeated measures ANOVA. Main results. A significant classification increase was achieved when fusing the results with a combined classification compared to performing an individual classification. Furthermore, the SSSEP classifier was significantly better in detecting a non-control state, whereas the tERP classifier was significantly better in detecting control states. Subjects who had a higher relative band power increase during the screening session also achieved significantly higher classification results than subjects with lower relative band power increase. Significance. It could be shown that utilizing SSSEP and tERP for hBCIs increases the classification accuracy and also that tERP and SSSEP are not classifying control- and non-control states with the same level of accuracy.

  12. A fresh look at functional link neural network for motor imagery-based brain-computer interface.

    PubMed

    Hettiarachchi, Imali T; Babaei, Toktam; Nguyen, Thanh; Lim, Chee P; Nahavandi, Saeid

    2018-05-04

    Artificial neural networks (ANNs) are one of the widely used classifiers in the brain-computer interface (BCI) systems-based on noninvasive electroencephalography (EEG) signals. Among the different ANN architectures, the most commonly applied for BCI classifiers is the multilayer perceptron (MLP). When appropriately designed with optimal number of neuron layers and number of neurons per layer, the ANN can act as a universal approximator. However, due to the low signal-to-noise ratio of EEG signal data, overtraining problem may become an inherent issue, causing these universal approximators to fail in real-time applications. In this study we introduce a higher order neural network, namely the functional link neural network (FLNN) as a classifier for motor imagery (MI)-based BCI systems, to remedy the drawbacks in MLP. We compare the proposed method with competing classifiers such as linear decomposition analysis, naïve Bayes, k-nearest neighbours, support vector machine and three MLP architectures. Two multi-class benchmark datasets from the BCI competitions are used. Common spatial pattern algorithm is utilized for feature extraction to build classification models. FLNN reports the highest average Kappa value over multiple subjects for both the BCI competition datasets, under similarly preprocessed data and extracted features. Further, statistical comparison results over multiple subjects show that the proposed FLNN classification method yields the best performance among the competing classifiers. Findings from this study imply that the proposed method, which has less computational complexity compared to the MLP, can be implemented effectively in practical MI-based BCI systems. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Automated detection of tuberculosis on sputum smeared slides using stepwise classification

    NASA Astrophysics Data System (ADS)

    Divekar, Ajay; Pangilinan, Corina; Coetzee, Gerrit; Sondh, Tarlochan; Lure, Fleming Y. M.; Kennedy, Sean

    2012-03-01

    Routine visual slide screening for identification of tuberculosis (TB) bacilli in stained sputum slides under microscope system is a tedious labor-intensive task and can miss up to 50% of TB. Based on the Shannon cofactor expansion on Boolean function for classification, a stepwise classification (SWC) algorithm is developed to remove different types of false positives, one type at a time, and to increase the detection of TB bacilli at different concentrations. Both bacilli and non-bacilli objects are first analyzed and classified into several different categories including scanty positive, high concentration positive, and several non-bacilli categories: small bright objects, beaded, dim elongated objects, etc. The morphological and contrast features are extracted based on aprior clinical knowledge. The SWC is composed of several individual classifiers. Individual classifier to increase the bacilli counts utilizes an adaptive algorithm based on a microbiologist's statistical heuristic decision process. Individual classifier to reduce false positive is developed through minimization from a binary decision tree to classify different types of true and false positive based on feature vectors. Finally, the detection algorithm is was tested on 102 independent confirmed negative and 74 positive cases. A multi-class task analysis shows high accordance rate for negative, scanty, and high-concentration as 88.24%, 56.00%, and 97.96%, respectively. A binary-class task analysis using a receiver operating characteristics method with the area under the curve (Az) is also utilized to analyze the performance of this detection algorithm, showing the superior detection performance on the high-concentration cases (Az=0.913) and cases mixed with high-concentration and scanty cases (Az=0.878).

  14. Simultaneous determination of seven multiclass veterinary antibiotics in surface water samples in the Republic of Korea using liquid chromatography with tandem mass spectrometry.

    PubMed

    Chung, Hyung Suk; Choi, Jeong-Heui; Abd El-Aty, A M; Lee, Young-Jun; Lee, Han Sol; Kim, Sangdon; Jung, Hee-Jung; Kang, Tae-Woo; Shin, Ho-Chul; Shim, Jae-Han

    2016-12-01

    A simultaneous determination method using solid-phase extraction and liquid chromatography with tandem mass spectrometry was developed to detect and quantify the presence of seven multiclass veterinary antibiotics (13 compounds in total) in surface water samples, which included the effluents of livestock wastewater and sewage treatment plants, as well as the reservoir drainage areas from dense animal farms. The pH of all water samples was adjusted to 2 or 6 before solid-phase extraction using Oasis HLB cartridges. The developed method was fully validated in terms of linearity, method detection limit, method quantitation limit, accuracy, and precision. The linearity of all tested drugs was good, with R 2 determination coefficients ≥ 0.9931. The method detection limits and method quantitation limits were 0.1-74.3 and 0.5-236.6 ng/L, respectively. Accuracy and precision values were 71-120 and 1-17%, respectively. The determination method was successfully applied for monitoring water samples obtained from the Yeongsan River in 2015. The most frequently detected antibiotics were lincomycin (96%), sulfamethazine (90%), sulfamethoxazole (88%), and sulfathiazole (50%); the maximum concentrations of which were 398.9, 1151.3, 533.1, and 307.4 ng/L, respectively. Overall, the greatest numbers and concentrations of detected antibiotics were found in samples from the effluents of livestock wastewater, sewage treatment plants, and reservoir drainage areas. Diverse veterinary antibiotics were present, and their presence was dependent upon the commercial sales and environmental properties of the analytes, the geographical positions of the sampling points, and the origin of the water. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Automated Classification of Selected Data Elements from Free-text Diagnostic Reports for Clinical Research.

    PubMed

    Löpprich, Martin; Krauss, Felix; Ganzinger, Matthias; Senghas, Karsten; Riezler, Stefan; Knaup, Petra

    2016-08-05

    In the Multiple Myeloma clinical registry at Heidelberg University Hospital, most data are extracted from discharge letters. Our aim was to analyze if it is possible to make the manual documentation process more efficient by using methods of natural language processing for multiclass classification of free-text diagnostic reports to automatically document the diagnosis and state of disease of myeloma patients. The first objective was to create a corpus consisting of free-text diagnosis paragraphs of patients with multiple myeloma from German diagnostic reports, and its manual annotation of relevant data elements by documentation specialists. The second objective was to construct and evaluate a framework using different NLP methods to enable automatic multiclass classification of relevant data elements from free-text diagnostic reports. The main diagnoses paragraph was extracted from the clinical report of one third randomly selected patients of the multiple myeloma research database from Heidelberg University Hospital (in total 737 selected patients). An EDC system was setup and two data entry specialists performed independently a manual documentation of at least nine specific data elements for multiple myeloma characterization. Both data entries were compared and assessed by a third specialist and an annotated text corpus was created. A framework was constructed, consisting of a self-developed package to split multiple diagnosis sequences into several subsequences, four different preprocessing steps to normalize the input data and two classifiers: a maximum entropy classifier (MEC) and a support vector machine (SVM). In total 15 different pipelines were examined and assessed by a ten-fold cross-validation, reiterated 100 times. For quality indication the average error rate and the average F1-score were conducted. For significance testing the approximate randomization test was used. The created annotated corpus consists of 737 different diagnoses paragraphs with a total number of 865 coded diagnosis. The dataset is publicly available in the supplementary online files for training and testing of further NLP methods. Both classifiers showed low average error rates (MEC: 1.05; SVM: 0.84) and high F1-scores (MEC: 0.89; SVM: 0.92). However the results varied widely depending on the classified data element. Preprocessing methods increased this effect and had significant impact on the classification, both positive and negative. The automatic diagnosis splitter increased the average error rate significantly, even if the F1-score decreased only slightly. The low average error rates and high average F1-scores of each pipeline demonstrate the suitability of the investigated NPL methods. However, it was also shown that there is no best practice for an automatic classification of data elements from free-text diagnostic reports.

  16. Analysis of multi-class preservatives in leave-on and rinse-off cosmetics by matrix solid-phase dispersion.

    PubMed

    Sanchez-Prado, Lucia; Alvarez-Rivera, Gerardo; Lamas, J Pablo; Lores, Marta; Garcia-Jares, Carmen; Llompart, Maria

    2011-12-01

    Matrix solid-phase extraction has been successfully applied for the determination of multi-class preservatives in a wide variety of cosmetic samples including rinse-off and leave-on products. After extraction, derivatization with acetic anhydride, and gas chromatography-mass spectrometry analysis were performed. Optimization studies were done on real non-spiked and spiked leave-on and rinse-off cosmetic samples. The selection of the most suitable extraction conditions was made using statistical tools such as ANOVA, as well as factorial experimental designs. The final optimized conditions were common for both groups of cosmetics and included the dispersion of the sample with Florisil (1:4), and the elution of the MSPD column with 5 mL of hexane/acetone (1:1). After derivatization, the extract was analyzed without any further clean-up or concentration step. Accuracy, precision, linearity and detection limits were evaluated to assess the performance of the proposed method. The recovery studies on leave-on and rinse-off cosmetics gave satisfactory values (>78% for all analytes in all the samples) with an average relative standard deviation value of 4.2%. The quantification limits were well below those set by the international cosmetic regulations, making this multi-component analytical method suitable for routine control. The analysis of a broad range of cosmetics including body milk, moisturizing creams, anti-stretch marks creams, hand creams, deodorant, shampoos, liquid soaps, makeup, sun milk, hand soaps, among others, demonstrated the high use of most of the target preservatives, especially butylated hydroxytoluene, methylparaben, propylparaben, and butylparaben.

  17. Dimensionality reduction based on distance preservation to local mean for symmetric positive definite matrices and its application in brain-computer interfaces

    NASA Astrophysics Data System (ADS)

    Davoudi, Alireza; Shiry Ghidary, Saeed; Sadatnejad, Khadijeh

    2017-06-01

    Objective. In this paper, we propose a nonlinear dimensionality reduction algorithm for the manifold of symmetric positive definite (SPD) matrices that considers the geometry of SPD matrices and provides a low-dimensional representation of the manifold with high class discrimination in a supervised or unsupervised manner. Approach. The proposed algorithm tries to preserve the local structure of the data by preserving distances to local means (DPLM) and also provides an implicit projection matrix. DPLM is linear in terms of the number of training samples. Main results. We performed several experiments on the multi-class dataset IIa from BCI competition IV and two other datasets from BCI competition III including datasets IIIa and IVa. The results show that our approach as dimensionality reduction technique—leads to superior results in comparison with other competitors in the related literature because of its robustness against outliers and the way it preserves the local geometry of the data. Significance. The experiments confirm that the combination of DPLM with filter geodesic minimum distance to mean as the classifier leads to superior performance compared with the state of the art on brain-computer interface competition IV dataset IIa. Also the statistical analysis shows that our dimensionality reduction method performs significantly better than its competitors.

  18. Machine-learning-based Brokers for Real-time Classification of the LSST Alert Stream

    NASA Astrophysics Data System (ADS)

    Narayan, Gautham; Zaidi, Tayeb; Soraisam, Monika D.; Wang, Zhe; Lochner, Michelle; Matheson, Thomas; Saha, Abhijit; Yang, Shuo; Zhao, Zhenge; Kececioglu, John; Scheidegger, Carlos; Snodgrass, Richard T.; Axelrod, Tim; Jenness, Tim; Maier, Robert S.; Ridgway, Stephen T.; Seaman, Robert L.; Evans, Eric Michael; Singh, Navdeep; Taylor, Clark; Toeniskoetter, Jackson; Welch, Eric; Zhu, Songzhe; The ANTARES Collaboration

    2018-05-01

    The unprecedented volume and rate of transient events that will be discovered by the Large Synoptic Survey Telescope (LSST) demand that the astronomical community update its follow-up paradigm. Alert-brokers—automated software system to sift through, characterize, annotate, and prioritize events for follow-up—will be critical tools for managing alert streams in the LSST era. The Arizona-NOAO Temporal Analysis and Response to Events System (ANTARES) is one such broker. In this work, we develop a machine learning pipeline to characterize and classify variable and transient sources only using the available multiband optical photometry. We describe three illustrative stages of the pipeline, serving the three goals of early, intermediate, and retrospective classification of alerts. The first takes the form of variable versus transient categorization, the second a multiclass typing of the combined variable and transient data set, and the third a purity-driven subtyping of a transient class. Although several similar algorithms have proven themselves in simulations, we validate their performance on real observations for the first time. We quantitatively evaluate our pipeline on sparse, unevenly sampled, heteroskedastic data from various existing observational campaigns, and demonstrate very competitive classification performance. We describe our progress toward adapting the pipeline developed in this work into a real-time broker working on live alert streams from time-domain surveys.

  19. Chemometrics and chromatographic fingerprints to classify plant food supplements according to the content of regulated plants.

    PubMed

    Deconinck, E; Sokeng Djiogo, C A; Courselle, P

    2017-09-05

    Plant food supplements are gaining popularity, resulting in a broader spectrum of available products and an increased consumption. Next to the problem of adulteration of these products with synthetic drugs the presence of regulated or toxic plants is an important issue, especially when the products are purchased from irregular sources. This paper focusses on this problem by using specific chromatographic fingerprints for five targeted plants and chemometric classification techniques in order to extract the important information from the fingerprints and determine the presence of the targeted plants in plant food supplements in an objective way. Two approaches were followed: (1) a multiclass model, (2) 2-class model for each of the targeted plants separately. For both approaches good classification models were obtained, especially when using SIMCA and PLS-DA. For each model, misclassification rates for the external test set of maximum one sample could be obtained. The models were applied to five real samples resulting in the identification of the correct plants, confirmed by mass spectrometry. Therefore chromatographic fingerprinting combined with chemometric modelling can be considered interesting to make a more objective decision on whether a regulated plant is present in a plant food supplement or not, especially when no mass spectrometry equipment is available. The results suggest also that the use of a battery of 2-class models to screen for several plants is the approach to be preferred. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Solid-phase extraction of multi-class pharmaceuticals from environmental water samples onto modified multi-walled carbon nanotubes followed by LC-MS/MS.

    PubMed

    Lalović, Bojana; Đurkić, Tatjana; Vukčević, Marija; Janković-Častvan, Ivona; Kalijadis, Ana; Laušević, Zoran; Laušević, Mila

    2017-09-01

    In this paper, pristine and chemically treated multi-walled carbon nanotubes (MWCNTs) were employed as solid-phase extraction sorbents for the isolation and enrichment of multi-class pharmaceuticals from the surface water and groundwater, prior to liquid chromatography-tandem mass spectrometry analysis. Thirteen pharmaceuticals that belong to different therapeutical classes (erythromycin, azithromycin, sulfamethoxazole, diazepam, lorazepam, carbamazepine, metoprolol, bisoprolol, enalapril, cilazapril, simvastatin, clopidogrel, diclofenac) and two metabolites of metamizole (4-acetylaminoantipyrine and 4-formylaminoantipyrine) were selected for this study. The influence of chemical treatment on MWCNT surface characteristics and extraction efficiency was studied, and it was shown that HCl treatment of MWCNT leads to a decrease in the amount of surface oxygen groups and at the same time favorably affects the efficiency toward extraction of selected pharmaceuticals. After the optimization of the SPE procedure, the following conditions were chosen: 50 mg of HCl-treated MCWNT as a sorbent, 100 mL of water sample at pH 6, and 15 mL of the methanol-dichloromethane mixture (1:1, v/v) as eluent. Under optimal conditions, high recoveries (79-119%), as well as low detection (0.2 to 103 ng L -1 ) and quantitation (0.5-345 ng L -1 ) limits, were obtained. The optimized method was applied to the analysis of five surface water and two groundwater samples, and three pharmaceuticals were detected, the antiepileptic drug carbamazepine and two metabolites of antipyretic metamizole.

  1. Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs.

    PubMed

    Shamim, Mohammad Tabrez Anwar; Anwaruddin, Mohammad; Nagarajaram, H A

    2007-12-15

    Fold recognition is a key step in the protein structure discovery process, especially when traditional sequence comparison methods fail to yield convincing structural homologies. Although many methods have been developed for protein fold recognition, their accuracies remain low. This can be attributed to insufficient exploitation of fold discriminatory features. We have developed a new method for protein fold recognition using structural information of amino acid residues and amino acid residue pairs. Since protein fold recognition can be treated as a protein fold classification problem, we have developed a Support Vector Machine (SVM) based classifier approach that uses secondary structural state and solvent accessibility state frequencies of amino acids and amino acid pairs as feature vectors. Among the individual properties examined secondary structural state frequencies of amino acids gave an overall accuracy of 65.2% for fold discrimination, which is better than the accuracy by any method reported so far in the literature. Combination of secondary structural state frequencies with solvent accessibility state frequencies of amino acids and amino acid pairs further improved the fold discrimination accuracy to more than 70%, which is approximately 8% higher than the best available method. In this study we have also tested, for the first time, an all-together multi-class method known as Crammer and Singer method for protein fold classification. Our studies reveal that the three multi-class classification methods, namely one versus all, one versus one and Crammer and Singer method, yield similar predictions. Dataset and stand-alone program are available upon request.

  2. New materials for solid-phase extraction and multiclass high-performance liquid chromatographic analysis of pesticides in grapes.

    PubMed

    Melo, Lucio F C; Collins, Carol H; Jardim, Isabel C S F

    2004-04-02

    Sample preparation procedures which included the use of new aminopropyl (NH2) and octadecyl (C18) solid-phase extraction (SPE) sorbents are proposed for the simultaneous multiclass determination of the fungicide benomyl and of the herbicides tebuthiuron, diuron, simazine, atrazine, and ametryn in grapes, using single wavelength high-performance liquid chromatography. Sorbent preparation uses a fast, easy, and effective procedure to obtain silica-based materials, made by depositing polysiloxanes on a silica support followed by thermal immobilization. Recovery results of the compounds, after elution from the SPE cartridges, indicate that the most efficient system employed silica loaded with 40% of an aminofunctional polydimethylsiloxane as sorbent, using dichloromethane:methanol (95:5, v/v) as eluent. Method validation, carried out in agreement with International Conference on Harmonization directives, was performed at three fortification levels (100, 200, and 1000 microg kg(-1)). Limits of detection and quantification show that the method developed can be used to detect the pesticides at concentrations below the maximum residue levels established by Codex Alimentarius, the US Environmental Protection Agency, the European Union, and Brazilian legislation.

  3. Improving Predictions of Multiple Binary Models in ILP

    PubMed Central

    2014-01-01

    Despite the success of ILP systems in learning first-order rules from small number of examples and complexly structured data in various domains, they struggle in dealing with multiclass problems. In most cases they boil down a multiclass problem into multiple black-box binary problems following the one-versus-one or one-versus-rest binarisation techniques and learn a theory for each one. When evaluating the learned theories of multiple class problems in one-versus-rest paradigm particularly, there is a bias caused by the default rule toward the negative classes leading to an unrealistic high performance beside the lack of prediction integrity between the theories. Here we discuss the problem of using one-versus-rest binarisation technique when it comes to evaluating multiclass data and propose several methods to remedy this problem. We also illustrate the methods and highlight their link to binary tree and Formal Concept Analysis (FCA). Our methods allow learning of a simple, consistent, and reliable multiclass theory by combining the rules of the multiple one-versus-rest theories into one rule list or rule set theory. Empirical evaluation over a number of data sets shows that our proposed methods produce coherent and accurate rule models from the rules learned by the ILP system of Aleph. PMID:24696657

  4. Multi-Class Motor Imagery EEG Decoding for Brain-Computer Interfaces

    PubMed Central

    Wang, Deng; Miao, Duoqian; Blohm, Gunnar

    2012-01-01

    Recent studies show that scalp electroencephalography (EEG) as a non-invasive interface has great potential for brain-computer interfaces (BCIs). However, one factor that has limited practical applications for EEG-based BCI so far is the difficulty to decode brain signals in a reliable and efficient way. This paper proposes a new robust processing framework for decoding of multi-class motor imagery (MI) that is based on five main processing steps. (i) Raw EEG segmentation without the need of visual artifact inspection. (ii) Considering that EEG recordings are often contaminated not just by electrooculography (EOG) but also other types of artifacts, we propose to first implement an automatic artifact correction method that combines regression analysis with independent component analysis for recovering the original source signals. (iii) The significant difference between frequency components based on event-related (de-) synchronization and sample entropy is then used to find non-contiguous discriminating rhythms. After spectral filtering using the discriminating rhythms, a channel selection algorithm is used to select only relevant channels. (iv) Feature vectors are extracted based on the inter-class diversity and time-varying dynamic characteristics of the signals. (v) Finally, a support vector machine is employed for four-class classification. We tested our proposed algorithm on experimental data that was obtained from dataset 2a of BCI competition IV (2008). The overall four-class kappa values (between 0.41 and 0.80) were comparable to other models but without requiring any artifact-contaminated trial removal. The performance showed that multi-class MI tasks can be reliably discriminated using artifact-contaminated EEG recordings from a few channels. This may be a promising avenue for online robust EEG-based BCI applications. PMID:23087607

  5. Detecting epileptic seizure with different feature extracting strategies using robust machine learning classification techniques by applying advance parameter optimization approach.

    PubMed

    Hussain, Lal

    2018-06-01

    Epilepsy is a neurological disorder produced due to abnormal excitability of neurons in the brain. The research reveals that brain activity is monitored through electroencephalogram (EEG) of patients suffered from seizure to detect the epileptic seizure. The performance of EEG detection based epilepsy require feature extracting strategies. In this research, we have extracted varying features extracting strategies based on time and frequency domain characteristics, nonlinear, wavelet based entropy and few statistical features. A deeper study was undertaken using novel machine learning classifiers by considering multiple factors. The support vector machine kernels are evaluated based on multiclass kernel and box constraint level. Likewise, for K-nearest neighbors (KNN), we computed the different distance metrics, Neighbor weights and Neighbors. Similarly, the decision trees we tuned the paramours based on maximum splits and split criteria and ensemble classifiers are evaluated based on different ensemble methods and learning rate. For training/testing tenfold Cross validation was employed and performance was evaluated in form of TPR, NPR, PPV, accuracy and AUC. In this research, a deeper analysis approach was performed using diverse features extracting strategies using robust machine learning classifiers with more advanced optimal options. Support Vector Machine linear kernel and KNN with City block distance metric give the overall highest accuracy of 99.5% which was higher than using the default parameters for these classifiers. Moreover, highest separation (AUC = 0.9991, 0.9990) were obtained at different kernel scales using SVM. Additionally, the K-nearest neighbors with inverse squared distance weight give higher performance at different Neighbors. Moreover, to distinguish the postictal heart rate oscillations from epileptic ictal subjects, and highest performance of 100% was obtained using different machine learning classifiers.

  6. Automated anatomical labeling of bronchial branches using multiple classifiers and its application to bronchoscopy guidance based on fusion of virtual and real bronchoscopy

    NASA Astrophysics Data System (ADS)

    Ota, Shunsuke; Deguchi, Daisuke; Kitasaka, Takayuki; Mori, Kensaku; Suenaga, Yasuhito; Hasegawa, Yoshinori; Imaizumi, Kazuyoshi; Takabatake, Hirotsugu; Mori, Masaki; Natori, Hiroshi

    2008-03-01

    This paper presents a method for automated anatomical labeling of bronchial branches (ALBB) extracted from 3D CT datasets. The proposed method constructs classifiers that output anatomical names of bronchial branches by employing the machine-learning approach. We also present its application to a bronchoscopy guidance system. Since the bronchus has a complex tree structure, bronchoscopists easily tend to get disoriented and lose the way to a target location. A bronchoscopy guidance system is strongly expected to be developed to assist bronchoscopists. In such guidance system, automated presentation of anatomical names is quite useful information for bronchoscopy. Although several methods for automated ALBB were reported, most of them constructed models taking only variations of branching patterns into account and did not consider those of running directions. Since the running directions of bronchial branches differ greatly in individuals, they could not perform ALBB accurately when running directions of bronchial branches were different from those of models. Our method tries to solve such problems by utilizing the machine-learning approach. Actual procedure consists of three steps: (a) extraction of bronchial tree structures from 3D CT datasets, (b) construction of classifiers using the multi-class AdaBoost technique, and (c) automated classification of bronchial branches by using the constructed classifiers. We applied the proposed method to 51 cases of 3D CT datasets. The constructed classifiers were evaluated by leave-one-out scheme. The experimental results showed that the proposed method could assign correct anatomical names to bronchial branches of 89.1% up to segmental lobe branches. Also, we confirmed that it was quite useful to assist the bronchoscopy by presenting anatomical names of bronchial branches on real bronchoscopic views.

  7. Multi-class machine classification of suicide-related communication on Twitter.

    PubMed

    Burnap, Pete; Colombo, Gualtiero; Amery, Rosie; Hodorog, Andrei; Scourfield, Jonathan

    2017-08-01

    The World Wide Web, and online social networks in particular, have increased connectivity between people such that information can spread to millions of people in a matter of minutes. This form of online collective contagion has provided many benefits to society, such as providing reassurance and emergency management in the immediate aftermath of natural disasters. However, it also poses a potential risk to vulnerable Web users who receive this information and could subsequently come to harm. One example of this would be the spread of suicidal ideation in online social networks, about which concerns have been raised. In this paper we report the results of a number of machine classifiers built with the aim of classifying text relating to suicide on Twitter. The classifier distinguishes between the more worrying content, such as suicidal ideation, and other suicide-related topics such as reporting of a suicide, memorial, campaigning and support. It also aims to identify flippant references to suicide. We built a set of baseline classifiers using lexical, structural, emotive and psychological features extracted from Twitter posts. We then improved on the baseline classifiers by building an ensemble classifier using the Rotation Forest algorithm and a Maximum Probability voting classification decision method, based on the outcome of base classifiers. This achieved an F-measure of 0.728 overall (for 7 classes, including suicidal ideation) and 0.69 for the suicidal ideation class. We summarise the results by reflecting on the most significant predictive principle components of the suicidal ideation class to provide insight into the language used on Twitter to express suicidal ideation. Finally, we perform a 12-month case study of suicide-related posts where we further evaluate the classification approach - showing a sustained classification performance and providing anonymous insights into the trends and demographic profile of Twitter users posting content of this type.

  8. Instruction-matrix-based genetic programming.

    PubMed

    Li, Gang; Wang, Jin Feng; Lee, Kin Hong; Leung, Kwong-Sak

    2008-08-01

    In genetic programming (GP), evolving tree nodes separately would reduce the huge solution space. However, tree nodes are highly interdependent with respect to their fitness. In this paper, we propose a new GP framework, namely, instruction-matrix (IM)-based GP (IMGP), to handle their interactions. IMGP maintains an IM to evolve tree nodes and subtrees separately. IMGP extracts program trees from an IM and updates the IM with the information of the extracted program trees. As the IM actually keeps most of the information of the schemata of GP and evolves the schemata directly, IMGP is effective and efficient. Our experimental results on benchmark problems have verified that IMGP is not only better than those of canonical GP in terms of the qualities of the solutions and the number of program evaluations, but they are also better than some of the related GP algorithms. IMGP can also be used to evolve programs for classification problems. The classifiers obtained have higher classification accuracies than four other GP classification algorithms on four benchmark classification problems. The testing errors are also comparable to or better than those obtained with well-known classifiers. Furthermore, an extended version, called condition matrix for rule learning, has been used successfully to handle multiclass classification problems.

  9. The Decoding Toolbox (TDT): a versatile software package for multivariate analyses of functional imaging data

    PubMed Central

    Hebart, Martin N.; Görgen, Kai; Haynes, John-Dylan

    2015-01-01

    The multivariate analysis of brain signals has recently sparked a great amount of interest, yet accessible and versatile tools to carry out decoding analyses are scarce. Here we introduce The Decoding Toolbox (TDT) which represents a user-friendly, powerful and flexible package for multivariate analysis of functional brain imaging data. TDT is written in Matlab and equipped with an interface to the widely used brain data analysis package SPM. The toolbox allows running fast whole-brain analyses, region-of-interest analyses and searchlight analyses, using machine learning classifiers, pattern correlation analysis, or representational similarity analysis. It offers automatic creation and visualization of diverse cross-validation schemes, feature scaling, nested parameter selection, a variety of feature selection methods, multiclass capabilities, and pattern reconstruction from classifier weights. While basic users can implement a generic analysis in one line of code, advanced users can extend the toolbox to their needs or exploit the structure to combine it with external high-performance classification toolboxes. The toolbox comes with an example data set which can be used to try out the various analysis methods. Taken together, TDT offers a promising option for researchers who want to employ multivariate analyses of brain activity patterns. PMID:25610393

  10. Active learning for solving the incomplete data problem in facial age classification by the furthest nearest-neighbor criterion.

    PubMed

    Wang, Jian-Gang; Sung, Eric; Yau, Wei-Yun

    2011-07-01

    Facial age classification is an approach to classify face images into one of several predefined age groups. One of the difficulties in applying learning techniques to the age classification problem is the large amount of labeled training data required. Acquiring such training data is very costly in terms of age progress, privacy, human time, and effort. Although unlabeled face images can be obtained easily, it would be expensive to manually label them on a large scale and getting the ground truth. The frugal selection of the unlabeled data for labeling to quickly reach high classification performance with minimal labeling efforts is a challenging problem. In this paper, we present an active learning approach based on an online incremental bilateral two-dimension linear discriminant analysis (IB2DLDA) which initially learns from a small pool of labeled data and then iteratively selects the most informative samples from the unlabeled set to increasingly improve the classifier. Specifically, we propose a novel data selection criterion called the furthest nearest-neighbor (FNN) that generalizes the margin-based uncertainty to the multiclass case and which is easy to compute, so that the proposed active learning algorithm can handle a large number of classes and large data sizes efficiently. Empirical experiments on FG-NET and Morph databases together with a large unlabeled data set for age categorization problems show that the proposed approach can achieve results comparable or even outperform a conventionally trained active classifier that requires much more labeling effort. Our IB2DLDA-FNN algorithm can achieve similar results much faster than random selection and with fewer samples for age categorization. It also can achieve comparable results with active SVM but is much faster than active SVM in terms of training because kernel methods are not needed. The results on the face recognition database and palmprint/palm vein database showed that our approach can handle problems with large number of classes. Our contributions in this paper are twofold. First, we proposed the IB2DLDA-FNN, the FNN being our novel idea, as a generic on-line or active learning paradigm. Second, we showed that it can be another viable tool for active learning of facial age range classification.

  11. Performance of the goulden large-sample extractor in multiclass pesticide isolation and preconcentration from stream water

    USGS Publications Warehouse

    Foster, G.D.; Foreman, W.T.; Gates, Paul M.

    1991-01-01

    The reliability of the Goulden large-sample extractor in preconcentrating pesticides from water was evaluated from the recoveries of 35 pesticides amended to filtered stream waters. Recoveries greater than 90% were observed for many of the pesticides in each major chemical class, but recoveries for some of the individual pesticides varied in seemingly unpredictable ways. Corrections cannot yet be factored into liquid-liquid extraction theory to account for matrix effects, which were apparent between the two stream waters tested. The Goulden large-sample extractor appears to be well suited for rapid chemical screening applications, with quantitative analysis requiring special quality control considerations. ?? 1991 American Chemical Society.

  12. Human cell structure-driven model construction for predicting protein subcellular location from biological images.

    PubMed

    Shao, Wei; Liu, Mingxia; Zhang, Daoqiang

    2016-01-01

    The systematic study of subcellular location pattern is very important for fully characterizing the human proteome. Nowadays, with the great advances in automated microscopic imaging, accurate bioimage-based classification methods to predict protein subcellular locations are highly desired. All existing models were constructed on the independent parallel hypothesis, where the cellular component classes are positioned independently in a multi-class classification engine. The important structural information of cellular compartments is missed. To deal with this problem for developing more accurate models, we proposed a novel cell structure-driven classifier construction approach (SC-PSorter) by employing the prior biological structural information in the learning model. Specifically, the structural relationship among the cellular components is reflected by a new codeword matrix under the error correcting output coding framework. Then, we construct multiple SC-PSorter-based classifiers corresponding to the columns of the error correcting output coding codeword matrix using a multi-kernel support vector machine classification approach. Finally, we perform the classifier ensemble by combining those multiple SC-PSorter-based classifiers via majority voting. We evaluate our method on a collection of 1636 immunohistochemistry images from the Human Protein Atlas database. The experimental results show that our method achieves an overall accuracy of 89.0%, which is 6.4% higher than the state-of-the-art method. The dataset and code can be downloaded from https://github.com/shaoweinuaa/. dqzhang@nuaa.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Simultaneous in-cell derivatization pressurized liquid extraction for the determination of multiclass preservatives in leave-on cosmetics.

    PubMed

    Sanchez-Prado, Lucia; Lamas, J Pablo; Lores, Marta; Garcia-Jares, Carmen; Llompart, Maria

    2010-11-15

    An effective one-step sample preparation methodology for the determination of multiclass preservatives in cosmetics has been developed, applying, for the first time to this kind of matrix, pressurized liquid extraction (PLE) and a very simple, cheap, and fast derivatization procedure: acetylation with acetic anhydride and pyridine. A multifactorial experimental design has been used to evaluate and optimize the main experimental parameters potentially affecting the extraction process. In the final conditions the sample was mixed with Florisil as the dispersing sorbent and extracted with ethyl acetate for 15 min at 120 °C. One of the main goals of this work was to demonstrate the possibility of carrying out direct cosmetic preservative acetylation by simply adding the derivatization reagents into the PLE cell. The extract was then analyzed by GC/MS without any further cleanup or concentration step. The accuracy, precision, linearity, and detection limits (LODs) were evaluated to assess the performance of the proposed method. Quantitative recoveries were obtained, and relative standard deviation values were lower than 10% in all cases. The obtained LODs ranged from 0.000004% to 0.0001% (w/w), values far below the established restrictions in the European Cosmetics Regulation, making this multicomponent analytical method suitable for routine control. Finally, several cosmetic products such as moisturizing and antiwrinkle creams and lotions, hand creams, sunscreen and after-sun creams, baby lotions, and hair care products were analyzed. All the samples contained several of the target cosmetic ingredients, in some cases at quite high concentrations, although the actual European Cosmetics Regulation was fulfilled in all cases.

  14. Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery

    NASA Astrophysics Data System (ADS)

    Zhong, Yanfei; Han, Xiaobing; Zhang, Liangpei

    2018-04-01

    Multi-class geospatial object detection from high spatial resolution (HSR) remote sensing imagery is attracting increasing attention in a wide range of object-related civil and engineering applications. However, the distribution of objects in HSR remote sensing imagery is location-variable and complicated, and how to accurately detect the objects in HSR remote sensing imagery is a critical problem. Due to the powerful feature extraction and representation capability of deep learning, the deep learning based region proposal generation and object detection integrated framework has greatly promoted the performance of multi-class geospatial object detection for HSR remote sensing imagery. However, due to the translation caused by the convolution operation in the convolutional neural network (CNN), although the performance of the classification stage is seldom influenced, the localization accuracies of the predicted bounding boxes in the detection stage are easily influenced. The dilemma between translation-invariance in the classification stage and translation-variance in the object detection stage has not been addressed for HSR remote sensing imagery, and causes position accuracy problems for multi-class geospatial object detection with region proposal generation and object detection. In order to further improve the performance of the region proposal generation and object detection integrated framework for HSR remote sensing imagery object detection, a position-sensitive balancing (PSB) framework is proposed in this paper for multi-class geospatial object detection from HSR remote sensing imagery. The proposed PSB framework takes full advantage of the fully convolutional network (FCN), on the basis of a residual network, and adopts the PSB framework to solve the dilemma between translation-invariance in the classification stage and translation-variance in the object detection stage. In addition, a pre-training mechanism is utilized to accelerate the training procedure and increase the robustness of the proposed algorithm. The proposed algorithm is validated with a publicly available 10-class object detection dataset.

  15. A new computer approach to mixed feature classification for forestry application

    NASA Technical Reports Server (NTRS)

    Kan, E. P.

    1976-01-01

    A computer approach for mapping mixed forest features (i.e., types, classes) from computer classification maps is discussed. Mixed features such as mixed softwood/hardwood stands are treated as admixtures of softwood and hardwood areas. Large-area mixed features are identified and small-area features neglected when the nominal size of a mixed feature can be specified. The computer program merges small isolated areas into surrounding areas by the iterative manipulation of the postprocessing algorithm that eliminates small connected sets. For a forestry application, computer-classified LANDSAT multispectral scanner data of the Sam Houston National Forest were used to demonstrate the proposed approach. The technique was successful in cleaning the salt-and-pepper appearance of multiclass classification maps and in mapping admixtures of softwood areas and hardwood areas. However, the computer-mapped mixed areas matched very poorly with the ground truth because of inadequate resolution and inappropriate definition of mixed features.

  16. CancerLocator: non-invasive cancer diagnosis and tissue-of-origin prediction using methylation profiles of cell-free DNA.

    PubMed

    Kang, Shuli; Li, Qingjiao; Chen, Quan; Zhou, Yonggang; Park, Stacy; Lee, Gina; Grimes, Brandon; Krysan, Kostyantyn; Yu, Min; Wang, Wei; Alber, Frank; Sun, Fengzhu; Dubinett, Steven M; Li, Wenyuan; Zhou, Xianghong Jasmine

    2017-03-24

    We propose a probabilistic method, CancerLocator, which exploits the diagnostic potential of cell-free DNA by determining not only the presence but also the location of tumors. CancerLocator simultaneously infers the proportions and the tissue-of-origin of tumor-derived cell-free DNA in a blood sample using genome-wide DNA methylation data. CancerLocator outperforms two established multi-class classification methods on simulations and real data, even with the low proportion of tumor-derived DNA in the cell-free DNA scenarios. CancerLocator also achieves promising results on patient plasma samples with low DNA methylation sequencing coverage.

  17. Freeze-thaw approach: A practical sample preparation strategy for residue analysis of multi-class veterinary drugs in chicken muscle.

    PubMed

    Zhang, Meiyu; Li, Erfen; Su, Yijuan; Song, Xuqin; Xie, Jingmeng; Zhang, Yingxia; He, Limin

    2018-06-01

    Seven drugs from different classes, namely, fluoroquinolones (enrofloxacin, ciprofloxacin, sarafloxacin), sulfonamides (sulfadimidine, sulfamonomethoxine), and macrolides (tilmicosin, tylosin), were used as test compounds in chickens by oral administration, a simple extraction step after cryogenic freezing might allow the effective extraction of multi-class veterinary drug residues from minced chicken muscles by mix vortexing. On basis of the optimized freeze-thaw approach, a convenient, selective, and reproducible liquid chromatography with tandem mass spectrometry method was developed. At three spiking levels in blank chicken and medicated chicken muscles, average recoveries of the analytes were in the range of 71-106 and 63-119%, respectively. All the relative standard deviations were <20%. The limits of quantification of analytes were 0.2-5.0 ng/g. Regardless of the chicken levels, there were no significant differences (P > 0.05) in the average contents of almost any of the analytes in medicated chickens between this method and specific methods in the literature for the determination of specific analytes. Finally, the developed method was successfully extended to the monitoring of residues of 55 common veterinary drugs in food animal muscles. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. Intelligent agent-based intrusion detection system using enhanced multiclass SVM.

    PubMed

    Ganapathy, S; Yogesh, P; Kannan, A

    2012-01-01

    Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set.

  19. Intelligent Agent-Based Intrusion Detection System Using Enhanced Multiclass SVM

    PubMed Central

    Ganapathy, S.; Yogesh, P.; Kannan, A.

    2012-01-01

    Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set. PMID:23056036

  20. Machine vision system for measuring conifer seedling morphology

    NASA Astrophysics Data System (ADS)

    Rigney, Michael P.; Kranzler, Glenn A.

    1995-01-01

    A PC-based machine vision system providing rapid measurement of bare-root tree seedling morphological features has been designed. The system uses backlighting and a 2048-pixel line- scan camera to acquire images with transverse resolutions as high as 0.05 mm for precise measurement of stem diameter. Individual seedlings are manually loaded on a conveyor belt and inspected by the vision system in less than 0.25 seconds. Designed for quality control and morphological data acquisition by nursery personnel, the system provides a user-friendly, menu-driven graphical interface. The system automatically locates the seedling root collar and measures stem diameter, shoot height, sturdiness ratio, root mass length, projected shoot and root area, shoot-root area ratio, and percent fine roots. Sample statistics are computed for each measured feature. Measurements for each seedling may be stored for later analysis. Feature measurements may be compared with multi-class quality criteria to determine sample quality or to perform multi-class sorting. Statistical summary and classification reports may be printed to facilitate the communication of quality concerns with grading personnel. Tests were conducted at a commercial forest nursery to evaluate measurement precision. Four quality control personnel measured root collar diameter, stem height, and root mass length on each of 200 conifer seedlings. The same seedlings were inspected four times by the machine vision system. Machine stem diameter measurement precision was four times greater than that of manual measurements. Machine and manual measurements had comparable precision for shoot height and root mass length.

  1. Enhanced Data Representation by Kernel Metric Learning for Dementia Diagnosis

    PubMed Central

    Cárdenas-Peña, David; Collazos-Huertas, Diego; Castellanos-Dominguez, German

    2017-01-01

    Alzheimer's disease (AD) is the kind of dementia that affects the most people around the world. Therefore, an early identification supporting effective treatments is required to increase the life quality of a wide number of patients. Recently, computer-aided diagnosis tools for dementia using Magnetic Resonance Imaging scans have been successfully proposed to discriminate between patients with AD, mild cognitive impairment, and healthy controls. Most of the attention has been given to the clinical data, provided by initiatives as the ADNI, supporting reliable researches on intervention, prevention, and treatments of AD. Therefore, there is a need for improving the performance of classification machines. In this paper, we propose a kernel framework for learning metrics that enhances conventional machines and supports the diagnosis of dementia. Our framework aims at building discriminative spaces through the maximization of center kernel alignment function, aiming at improving the discrimination of the three considered neurological classes. The proposed metric learning performance is evaluated on the widely-known ADNI database using three supervised classification machines (k-nn, SVM and NNs) for multi-class and bi-class scenarios from structural MRIs. Specifically, from ADNI collection 286 AD patients, 379 MCI patients and 231 healthy controls are used for development and validation of our proposed metric learning framework. For the experimental validation, we split the data into two subsets: 30% of subjects used like a blindfolded assessment and 70% employed for parameter tuning. Then, in the preprocessing stage, each structural MRI scan a total of 310 morphological measurements are automatically extracted from by FreeSurfer software package and concatenated to build an input feature matrix. Obtained test performance results, show that including a supervised metric learning improves the compared baseline classifiers in both scenarios. In the multi-class scenario, we achieve the best performance (accuracy 60.1%) for pretrained 1-layered NN, and we obtain measures over 90% in the average for HC vs. AD task. From the machine learning point of view, our proposal enhances the classifier performance by building spaces with a better class separability. From the clinical application, our enhancement results in a more balanced performance in each class than the compared approaches from the CADDementia challenge by increasing the sensitivity of pathological groups and the specificity of healthy controls. PMID:28798659

  2. Multi-class multi-residue analysis of veterinary drugs in meat using enhanced matrix removal lipid cleanup and liquid chromatography-tandem mass spectrometry.

    PubMed

    Zhao, Limian; Lucas, Derick; Long, David; Richter, Bruce; Stevens, Joan

    2018-05-11

    This study presents the development and validation of a quantitation method for the analysis of multi-class, multi-residue veterinary drugs using lipid removal cleanup cartridges, enhanced matrix removal lipid (EMR-Lipid), for different meat matrices by liquid chromatography tandem mass spectrometry detection. Meat samples were extracted using a two-step solid-liquid extraction followed by pass-through sample cleanup. The method was optimized based on the buffer and solvent composition, solvent additive additions, and EMR-Lipid cartridge cleanup. The developed method was then validated in five meat matrices, porcine muscle, bovine muscle, bovine liver, bovine kidney and chicken liver to evaluate the method performance characteristics, such as absolute recoveries and precision at three spiking levels, calibration curve linearity, limit of quantitation (LOQ) and matrix effect. The results showed that >90% of veterinary drug analytes achieved satisfactory recovery results of 60-120%. Over 97% analytes achieved excellent reproducibility results (relative standard deviation (RSD) < 20%), and the LOQs were 1-5 μg/kg in the evaluated meat matrices. The matrix co-extractive removal efficiency by weight provided by EMR-lipid cartridge cleanup was 42-58% in samples. The post column infusion study showed that the matrix ion suppression was reduced for samples with the EMR-Lipid cartridge cleanup. The reduced matrix ion suppression effect was also confirmed with <15% frequency of compounds with significant quantitative ion suppression (>30%) for all tested veterinary drugs in all of meat matrices. The results showed that the two-step solid-liquid extraction provides efficient extraction for the entire spectrum of veterinary drugs, including the difficult classes such as tetracyclines, beta-lactams etc. EMR-Lipid cartridges after extraction provided efficient sample cleanup with easy streamlined protocol and minimal impacts on analytes recovery, improving method reliability and consistency. Copyright © 2018 Elsevier B.V. All rights reserved.

  3. CASAnova: a multiclass support vector machine model for the classification of human sperm motility patterns.

    PubMed

    Goodson, Summer G; White, Sarah; Stevans, Alicia M; Bhat, Sanjana; Kao, Chia-Yu; Jaworski, Scott; Marlowe, Tamara R; Kohlmeier, Martin; McMillan, Leonard; Zeisel, Steven H; O'Brien, Deborah A

    2017-11-01

    The ability to accurately monitor alterations in sperm motility is paramount to understanding multiple genetic and biochemical perturbations impacting normal fertilization. Computer-aided sperm analysis (CASA) of human sperm typically reports motile percentage and kinematic parameters at the population level, and uses kinematic gating methods to identify subpopulations such as progressive or hyperactivated sperm. The goal of this study was to develop an automated method that classifies all patterns of human sperm motility during in vitro capacitation following the removal of seminal plasma. We visually classified CASA tracks of 2817 sperm from 18 individuals and used a support vector machine-based decision tree to compute four hyperplanes that separate five classes based on their kinematic parameters. We then developed a web-based program, CASAnova, which applies these equations sequentially to assign a single classification to each motile sperm. Vigorous sperm are classified as progressive, intermediate, or hyperactivated, and nonvigorous sperm as slow or weakly motile. This program correctly classifies sperm motility into one of five classes with an overall accuracy of 89.9%. Application of CASAnova to capacitating sperm populations showed a shift from predominantly linear patterns of motility at initial time points to more vigorous patterns, including hyperactivated motility, as capacitation proceeds. Both intermediate and hyperactivated motility patterns were largely eliminated when sperm were incubated in noncapacitating medium, demonstrating the sensitivity of this method. The five CASAnova classifications are distinctive and reflect kinetic parameters of washed human sperm, providing an accurate, quantitative, and high-throughput method for monitoring alterations in motility. © The Authors 2017. Published by Oxford University Press on behalf of Society for the Study of Reproduction. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. Multi-class, multi-residue analysis of pesticides, polychlorinated biphenyls, polycyclic aromatic hydrocarbons, polybrominated diphenyl ethers and novel flame retardants in fish using fast, low-pressure gas chromatography-tandem mass spectrometry.

    PubMed

    Sapozhnikova, Yelena; Lehotay, Steven J

    2013-01-03

    A multi-class, multi-residue method for the analysis of 13 novel flame retardants, 18 representative pesticides, 14 polychlorinated biphenyl (PCB) congeners, 16 polycyclic aromatic hydrocarbons (PAHs), and 7 polybrominated diphenyl ether (PBDE) congeners in catfish muscle was developed and evaluated using fast low pressure gas chromatography triple quadrupole tandem mass spectrometry (LP-GC/MS-MS). The method was based on a QuEChERS (quick, easy, cheap, effective, rugged, safe) extraction with acetonitrile and dispersive solid-phase extraction (d-SPE) clean-up with zirconium-based sorbent prior to LP-GC/MS-MS analysis. The developed method was evaluated at 4 spiking levels and further validated by analysis of NIST Standard Reference Materials (SRMs) 1974B and 1947. Sample preparation for a batch of 10 homogenized samples took about 1h/analyst, and LP-GC/MS-MS analysis provided fast separation of multiple analytes within 9min achieving high throughput. With the use of isotopically labeled internal standards, recoveries of all but one analyte were between 70 and 120% with relative standard deviations less than 20% (n=5). The measured values for both SRMs agreed with certified/reference values (72-119% accuracy) for the majority of analytes. The detection limits were 0.1-0.5ng g(-1) for PCBs, 0.5-10ng g(-1) for PBDEs, 0.5-5ng g(-1) for select pesticides and PAHs and 1-10ng g(-1) for flame retardants. The developed method was successfully applied for analysis of catfish samples from the market. Published by Elsevier B.V.

  5. FSR: feature set reduction for scalable and accurate multi-class cancer subtype classification based on copy number.

    PubMed

    Wong, Gerard; Leckie, Christopher; Kowalczyk, Adam

    2012-01-15

    Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. FSR was implemented in MATLAB R2010b and is available at http://ww2.cs.mu.oz.au/~gwong/FSR.

  6. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

    PubMed

    Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

    2017-11-02

    Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.

  7. Broad screening of illicit ingredients in cosmetics using ultra-high-performance liquid chromatography-hybrid quadrupole-Orbitrap mass spectrometry with customized accurate-mass database and mass spectral library.

    PubMed

    Meng, Xianshuang; Bai, Hua; Guo, Teng; Niu, Zengyuan; Ma, Qiang

    2017-12-15

    Comprehensive identification and quantitation of 100 multi-class regulated ingredients in cosmetics was achieved using ultra-high-performance liquid chromatography (UHPLC) coupled with hybrid quadrupole-Orbitrap high-resolution mass spectrometry (Q-Orbitrap HRMS). A simple, efficient, and inexpensive sample pretreatment protocol was developed using ultrasound-assisted extraction (UAE), followed by dispersive solid-phase extraction (dSPE). The cosmetic samples were analyzed by UHPLC-Q-Orbitrap HRMS under synchronous full-scan MS and data-dependent MS/MS (full-scan MS 1 /dd-MS 2 ) acquisition mode. The mass resolution was set to 70,000 FWHM (full width at half maximum) for full-scan MS 1 and 17,500 FWHM for dd-MS 2 stage with the experimentally measured mass deviations of less than 2ppm (parts per million) for quasi-molecular ions and 5ppm for characteristic fragment ions for each individual analyte. An accurate-mass database and a mass spectral library were built in house for searching the 100 target compounds. Broad screening was conducted by comparing the experimentally measured exact mass of precursor and fragment ions, retention time, isotopic pattern, and ionic ratio with the accurate-mass database and by matching the acquired MS/MS spectra against the mass spectral library. The developed methodology was evaluated and validated in terms of limits of detection (LODs), limits of quantitation (LOQs), linearity, stability, accuracy, and matrix effect. The UHPLC-Q-Orbitrap HRMS approach was applied for the analysis of 100 target illicit ingredients in 123 genuine cosmetic samples, and exhibited great potential for high-throughput, sensitive, and reliable screening of multi-class illicit compounds in cosmetics. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. The prevalence of antiretroviral multidrug resistance in highly active antiretroviral therapy-treated patients with HIV/AIDS between 2004 and 2009 in South Korea.

    PubMed

    Choi, Ju-yeon; Kwon, Oh-Kyung; Choi, Byeong-Sun; Kee, Mee-Kyung; Park, Mina; Kim, Sung Soon

    2014-06-01

    Highly active antiretroviral therapy (HAART) including protease inhibitors (PIs) has been used in South Korea since 1997. Currently, more than 20 types of antiretroviral drugs are used in the treatment of human immunodeficiency virus-infected/acquired immune deficiency syndrome patients in South Korea. Despite the rapid development of various antiretroviral drugs, many drug-resistant variants have been reported after initiating HAART, and the efficiency of HAART is limited by these variants. To investigate and estimate the annual antiretroviral drug resistance and prevalence of antiretroviral multi-class drug resistance in Korean patients with experience of treatment. The amplified HIV-1 pol gene in 535 patients requested for genotypic drug resistance testing from 2004 to 2009 by the Korea Centers for Disease Control and Prevention was sequenced and analyzed annually and totally. The prevalence of antiretroviral drug resistance was estimated based on "SIR" interpretation of the Stanford sequence database. Of viruses derived from 787 specimens, 380 samples (48.3%) showed at least one drug class-related resistance. Predicted NRTI drug resistance was highest at 41.9%. NNRTI showed 27.2% resistance with 23.3% for PI. The percent of annual drug resistance showed similar pattern and slightly declined except 2004 and 2005. The prevalence of multi-class drug resistance against each drug class was: NRTI/NNRTI/PI, 9.8%; NRTI/PI, 21.9%; NNRTI/PI, 10.4%; and NRTI/NNRTI, 21.5%. About 50% and less than 10% of patients infected with HIV-1 have multidrug and multiclass resistance linked to 16 antiretroviral drugs, respectively. The significance of this study lies in its larger-scale examination of the prevalence of drug-resistant variants and multidrug resistance in HAART-experienced patients in South Korea. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds.

    PubMed

    Abbasi, Elham; Ghatee, Mehdi; Shiri, M E

    2013-09-01

    In this paper, an intelligent hyper framework is proposed to recognize protein folds from its amino acid sequence which is a fundamental problem in bioinformatics. This framework includes some statistical and intelligent algorithms for proteins classification. The main components of the proposed framework are the Fuzzy Resource-Allocating Network (FRAN) and the Radial Bases Function based on Particle Swarm Optimization (RBF-PSO). FRAN applies a dynamic method to tune up the RBF network parameters. Due to the patterns complexity captured in protein dataset, FRAN classifies the proteins under fuzzy conditions. Also, RBF-PSO applies PSO to tune up the RBF classifier. Experimental results demonstrate that FRAN improves prediction accuracy up to 51% and achieves acceptable multi-class results for protein fold prediction. Although RBF-PSO provides reasonable results for protein fold recognition up to 48%, it is weaker than FRAN in some cases. However the proposed hyper framework provides an opportunity to use a great range of intelligent methods and can learn from previous experiences. Thus it can avoid the weakness of some intelligent methods in terms of memory, computational time and static structure. Furthermore, the performance of this system can be enhanced throughout the system life-cycle. Copyright © 2013 Elsevier Ltd. All rights reserved.

  10. Application of airborne hyperspectral remote sensing for the retrieval of forest inventory parameters

    NASA Astrophysics Data System (ADS)

    Dmitriev, Yegor V.; Kozoderov, Vladimir V.; Sokolov, Anton A.

    2016-04-01

    Collecting and updating forest inventory data play an important part in the forest management. The data can be obtained directly by using exact enough but low efficient ground based methods as well as from the remote sensing measurements. We present applications of airborne hyperspectral remote sensing for the retrieval of such important inventory parameters as the forest species and age composition. The hyperspectral images of the test region were obtained from the airplane equipped by the produced in Russia light-weight airborne video-spectrometer of visible and near infrared spectral range and high resolution photo-camera on the same gyro-stabilized platform. The quality of the thematic processing depends on many factors such as the atmospheric conditions, characteristics of measuring instruments, corrections and preprocessing methods, etc. An important role plays the construction of the classifier together with methods of the reduction of the feature space. The performance of different spectral classification methods is analyzed for the problem of hyperspectral remote sensing of soil and vegetation. For the reduction of the feature space we used the earlier proposed stable feature selection method. The results of the classification of hyperspectral airborne images by using the Multiclass Support Vector Machine method with Gaussian kernel and the parametric Bayesian classifier based on the Gaussian mixture model and their comparative analysis are demonstrated.

  11. Computer-aided diagnosis system: a Bayesian hybrid classification method.

    PubMed

    Calle-Alonso, F; Pérez, C J; Arias-Nicolás, J P; Martín, J

    2013-10-01

    A novel method to classify multi-class biomedical objects is presented. The method is based on a hybrid approach which combines pairwise comparison, Bayesian regression and the k-nearest neighbor technique. It can be applied in a fully automatic way or in a relevance feedback framework. In the latter case, the information obtained from both an expert and the automatic classification is iteratively used to improve the results until a certain accuracy level is achieved, then, the learning process is finished and new classifications can be automatically performed. The method has been applied in two biomedical contexts by following the same cross-validation schemes as in the original studies. The first one refers to cancer diagnosis, leading to an accuracy of 77.35% versus 66.37%, originally obtained. The second one considers the diagnosis of pathologies of the vertebral column. The original method achieves accuracies ranging from 76.5% to 96.7%, and from 82.3% to 97.1% in two different cross-validation schemes. Even with no supervision, the proposed method reaches 96.71% and 97.32% in these two cases. By using a supervised framework the achieved accuracy is 97.74%. Furthermore, all abnormal cases were correctly classified. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  12. Land cover and land use mapping of the iSimangaliso Wetland Park, South Africa: comparison of oblique and orthogonal random forest algorithms

    NASA Astrophysics Data System (ADS)

    Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad

    2016-01-01

    In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.

  13. On the decoding process in ternary error-correcting output codes.

    PubMed

    Escalera, Sergio; Pujol, Oriol; Radeva, Petia

    2010-01-01

    A common way to model multiclass classification problems is to design a set of binary classifiers and to combine them. Error-Correcting Output Codes (ECOC) represent a successful framework to deal with these type of problems. Recent works in the ECOC framework showed significant performance improvements by means of new problem-dependent designs based on the ternary ECOC framework. The ternary framework contains a larger set of binary problems because of the use of a "do not care" symbol that allows us to ignore some classes by a given classifier. However, there are no proper studies that analyze the effect of the new symbol at the decoding step. In this paper, we present a taxonomy that embeds all binary and ternary ECOC decoding strategies into four groups. We show that the zero symbol introduces two kinds of biases that require redefinition of the decoding design. A new type of decoding measure is proposed, and two novel decoding strategies are defined. We evaluate the state-of-the-art coding and decoding strategies over a set of UCI Machine Learning Repository data sets and into a real traffic sign categorization problem. The experimental results show that, following the new decoding strategies, the performance of the ECOC design is significantly improved.

  14. Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data.

    PubMed

    Ooi, Chia Huey; Chetty, Madhu; Teng, Shyh Wei

    2006-06-23

    Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy), a third criterion called the degree of differential prioritization (DDP). DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa). This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures.

  15. A heuristic method for consumable resource allocation in multi-class dynamic PERT networks

    NASA Astrophysics Data System (ADS)

    Yaghoubi, Saeed; Noori, Siamak; Mazdeh, Mohammad Mahdavi

    2013-06-01

    This investigation presents a heuristic method for consumable resource allocation problem in multi-class dynamic Project Evaluation and Review Technique (PERT) networks, where new projects from different classes (types) arrive to system according to independent Poisson processes with different arrival rates. Each activity of any project is operated at a devoted service station located in a node of the network with exponential distribution according to its class. Indeed, each project arrives to the first service station and continues its routing according to precedence network of its class. Such system can be represented as a queuing network, while the discipline of queues is first come, first served. On the basis of presented method, a multi-class system is decomposed into several single-class dynamic PERT networks, whereas each class is considered separately as a minisystem. In modeling of single-class dynamic PERT network, we use Markov process and a multi-objective model investigated by Azaron and Tavakkoli-Moghaddam in 2007. Then, after obtaining the resources allocated to service stations in every minisystem, the final resources allocated to activities are calculated by the proposed method.

  16. High-performance Chinese multiclass traffic sign detection via coarse-to-fine cascade and parallel support vector machine detectors

    NASA Astrophysics Data System (ADS)

    Chang, Faliang; Liu, Chunsheng

    2017-09-01

    The high variability of sign colors and shapes in uncontrolled environments has made the detection of traffic signs a challenging problem in computer vision. We propose a traffic sign detection (TSD) method based on coarse-to-fine cascade and parallel support vector machine (SVM) detectors to detect Chinese warning and danger traffic signs. First, a region of interest (ROI) extraction method is proposed to extract ROIs using color contrast features in local regions. The ROI extraction can reduce scanning regions and save detection time. For multiclass TSD, we propose a structure that combines a coarse-to-fine cascaded tree with a parallel structure of histogram of oriented gradients (HOG) + SVM detectors. The cascaded tree is designed to detect different types of traffic signs in a coarse-to-fine process. The parallel HOG + SVM detectors are designed to do fine detection of different types of traffic signs. The experiments demonstrate the proposed TSD method can rapidly detect multiclass traffic signs with different colors and shapes in high accuracy.

  17. Steganalysis feature improvement using expectation maximization

    NASA Astrophysics Data System (ADS)

    Rodriguez, Benjamin M.; Peterson, Gilbert L.; Agaian, Sos S.

    2007-04-01

    Images and data files provide an excellent opportunity for concealing illegal or clandestine material. Currently, there are over 250 different tools which embed data into an image without causing noticeable changes to the image. From a forensics perspective, when a system is confiscated or an image of a system is generated the investigator needs a tool that can scan and accurately identify files suspected of containing malicious information. The identification process is termed the steganalysis problem which focuses on both blind identification, in which only normal images are available for training, and multi-class identification, in which both the clean and stego images at several embedding rates are available for training. In this paper an investigation of a clustering and classification technique (Expectation Maximization with mixture models) is used to determine if a digital image contains hidden information. The steganalysis problem is for both anomaly detection and multi-class detection. The various clusters represent clean images and stego images with between 1% and 10% embedding percentage. Based on the results it is concluded that the EM classification technique is highly suitable for both blind detection and the multi-class problem.

  18. Ensemble of random forests One vs. Rest classifiers for MCI and AD prediction using ANOVA cortical and subcortical feature selection and partial least squares.

    PubMed

    Ramírez, J; Górriz, J M; Ortiz, A; Martínez-Murcia, F J; Segovia, F; Salas-Gonzalez, D; Castillo-Barnes, D; Illán, I A; Puntonet, C G

    2018-05-15

    Alzheimer's disease (AD) is the most common cause of dementia in the elderly and affects approximately 30 million individuals worldwide. Mild cognitive impairment (MCI) is very frequently a prodromal phase of AD, and existing studies have suggested that people with MCI tend to progress to AD at a rate of about 10-15% per year. However, the ability of clinicians and machine learning systems to predict AD based on MRI biomarkers at an early stage is still a challenging problem that can have a great impact in improving treatments. The proposed system, developed by the SiPBA-UGR team for this challenge, is based on feature standardization, ANOVA feature selection, partial least squares feature dimension reduction and an ensemble of One vs. Rest random forest classifiers. With the aim of improving its performance when discriminating healthy controls (HC) from MCI, a second binary classification level was introduced that reconsiders the HC and MCI predictions of the first level. The system was trained and evaluated on an ADNI datasets that consist of T1-weighted MRI morphological measurements from HC, stable MCI, converter MCI and AD subjects. The proposed system yields a 56.25% classification score on the test subset which consists of 160 real subjects. The classifier yielded the best performance when compared to: (i) One vs. One (OvO), One vs. Rest (OvR) and error correcting output codes (ECOC) as strategies for reducing the multiclass classification task to multiple binary classification problems, (ii) support vector machines, gradient boosting classifier and random forest as base binary classifiers, and (iii) bagging ensemble learning. A robust method has been proposed for the international challenge on MCI prediction based on MRI data. The system yielded the second best performance during the competition with an accuracy rate of 56.25% when evaluated on the real subjects of the test set. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Classification of different reaching movements from the same limb using EEG

    NASA Astrophysics Data System (ADS)

    Shiman, Farid; López-Larraz, Eduardo; Sarasola-Sanz, Andrea; Irastorza-Landa, Nerea; Spüler, Martin; Birbaumer, Niels; Ramos-Murguialday, Ander

    2017-08-01

    Objective. Brain-computer-interfaces (BCIs) have been proposed not only as assistive technologies but also as rehabilitation tools for lost functions. However, due to the stochastic nature, poor spatial resolution and signal to noise ratio from electroencephalography (EEG), multidimensional decoding has been the main obstacle to implement non-invasive BCIs in real-live rehabilitation scenarios. This study explores the classification of several functional reaching movements from the same limb using EEG oscillations in order to create a more versatile BCI for rehabilitation. Approach. Nine healthy participants performed four 3D center-out reaching tasks in four different sessions while wearing a passive robotic exoskeleton at their right upper limb. Kinematics data were acquired from the robotic exoskeleton. Multiclass extensions of Filter Bank Common Spatial Patterns (FBCSP) and a linear discriminant analysis (LDA) classifier were used to classify the EEG activity into four forward reaching movements (from a starting position towards four target positions), a backward movement (from any of the targets to the starting position and rest). Recalibrating the classifier using data from previous or the same session was also investigated and compared. Main results. Average EEG decoding accuracy were significantly above chance with 67%, 62.75%, and 50.3% when decoding three, four and six tasks from the same limb, respectively. Furthermore, classification accuracy could be increased when using data from the beginning of each session as training data to recalibrate the classifier. Significance. Our results demonstrate that classification from several functional movements performed by the same limb is possible with acceptable accuracy using EEG oscillations, especially if data from the same session are used to recalibrate the classifier. Therefore, an ecologically valid decoding could be used to control assistive or rehabilitation mutli-degrees of freedom (DoF) robotic devices using EEG data. These results have important implications towards assistive and rehabilitative neuroprostheses control in paralyzed patients.

  20. A robust multi-kernel change detection framework for detecting leaf beetle defoliation using Landsat 7 ETM+ data

    NASA Astrophysics Data System (ADS)

    Anees, Asim; Aryal, Jagannath; O'Reilly, Małgorzata M.; Gale, Timothy J.; Wardlaw, Tim

    2016-12-01

    A robust non-parametric framework, based on multiple Radial Basic Function (RBF) kernels, is proposed in this study, for detecting land/forest cover changes using Landsat 7 ETM+ images. One of the widely used frameworks is to find change vectors (difference image) and use a supervised classifier to differentiate between change and no-change. The Bayesian Classifiers e.g. Maximum Likelihood Classifier (MLC), Naive Bayes (NB), are widely used probabilistic classifiers which assume parametric models, e.g. Gaussian function, for the class conditional distributions. However, their performance can be limited if the data set deviates from the assumed model. The proposed framework exploits the useful properties of Least Squares Probabilistic Classifier (LSPC) formulation i.e. non-parametric and probabilistic nature, to model class posterior probabilities of the difference image using a linear combination of a large number of Gaussian kernels. To this end, a simple technique, based on 10-fold cross-validation is also proposed for tuning model parameters automatically instead of selecting a (possibly) suboptimal combination from pre-specified lists of values. The proposed framework has been tested and compared with Support Vector Machine (SVM) and NB for detection of defoliation, caused by leaf beetles (Paropsisterna spp.) in Eucalyptus nitens and Eucalyptus globulus plantations of two test areas, in Tasmania, Australia, using raw bands and band combination indices of Landsat 7 ETM+. It was observed that due to multi-kernel non-parametric formulation and probabilistic nature, the LSPC outperforms parametric NB with Gaussian assumption in change detection framework, with Overall Accuracy (OA) ranging from 93.6% (κ = 0.87) to 97.4% (κ = 0.94) against 85.3% (κ = 0.69) to 93.4% (κ = 0.85), and is more robust to changing data distributions. Its performance was comparable to SVM, with added advantages of being probabilistic and capable of handling multi-class problems naturally with its original formulation.

  1. Selecting a restoration technique to minimize OCR error.

    PubMed

    Cannon, M; Fugate, M; Hush, D R; Scovel, C

    2003-01-01

    This paper introduces a learning problem related to the task of converting printed documents to ASCII text files. The goal of the learning procedure is to produce a function that maps documents to restoration techniques in such a way that on average the restored documents have minimum optical character recognition error. We derive a general form for the optimal function and use it to motivate the development of a nonparametric method based on nearest neighbors. We also develop a direct method of solution based on empirical error minimization for which we prove a finite sample bound on estimation error that is independent of distribution. We show that this empirical error minimization problem is an extension of the empirical optimization problem for traditional M-class classification with general loss function and prove computational hardness for this problem. We then derive a simple iterative algorithm called generalized multiclass ratchet (GMR) and prove that it produces an optimal function asymptotically (with probability 1). To obtain the GMR algorithm we introduce a new data map that extends Kesler's construction for the multiclass problem and then apply an algorithm called Ratchet to this mapped data, where Ratchet is a modification of the Pocket algorithm . Finally, we apply these methods to a collection of documents and report on the experimental results.

  2. Camouflage target reconnaissance based on hyperspectral imaging technology

    NASA Astrophysics Data System (ADS)

    Hua, Wenshen; Guo, Tong; Liu, Xun

    2015-08-01

    Efficient camouflaged target reconnaissance technology makes great influence on modern warfare. Hyperspectral images can provide large spectral range and high spectral resolution, which are invaluable in discriminating between camouflaged targets and backgrounds. Hyperspectral target detection and classification technology are utilized to achieve single class and multi-class camouflaged targets reconnaissance respectively. Constrained energy minimization (CEM), a widely used algorithm in hyperspectral target detection, is employed to achieve one class camouflage target reconnaissance. Then, support vector machine (SVM), a classification method, is proposed to achieve multi-class camouflage target reconnaissance. Experiments have been conducted to demonstrate the efficiency of the proposed method.

  3. Trace analysis of multi-class pesticide residues in Chinese medicinal health wines using gas chromatography with electron capture detection

    PubMed Central

    Kong, Wei-Jun; Liu, Qiu-Tao; Kong, Dan-Dan; Liu, Qian-Zhen; Ma, Xin-Ping; Yang, Mei-Hua

    2016-01-01

    A method is described for multi-residue, high-throughput determination of trace levels of 22 organochlorine pesticides (OCPs) and 5 pyrethroid pesticides (PYPs) in Chinese medicinal (CM) health wines using a QuEChERS (quick, easy, cheap, effective, rugged, and safe) based extraction method and gas chromatography-electron capture detection (GC-ECD). Several parameters were optimized to improve preparation and separation time while still maintaining high sensitivity. Validation tests of spiked samples showed good linearities for 27 pesticides (R = 0.9909–0.9996) over wide concentration ranges. Limits of detection (LODs) and quantification (LOQs) were measured at ng/L levels, 0.06–2 ng/L and 0.2–6 ng/L for OCPs and 0.02–3 ng/L and 0.06–7 ng/L for PYPs, respectively. Inter- and intra-day precision tests showed variations of 0.65–9.89% for OCPs and 0.98–13.99% for PYPs, respectively. Average recoveries were in the range of 47.74–120.31%, with relative standard deviations below 20%. The developed method was then applied to analyze 80 CM wine samples. Beta-BHC (Benzene hexachloride) was the most frequently detected pesticide at concentration levels of 5.67–31.55 mg/L, followed by delta-BHC, trans-chlordane, gamma-BHC, and alpha-BHC. The validated method is simple and economical, with adequate sensitivity for trace levels of multi-class pesticides. It could be adopted by laboratories for this and other types of complex matrices analysis. PMID:26883080

  4. Trace analysis of multi-class pesticide residues in Chinese medicinal health wines using gas chromatography with electron capture detection

    NASA Astrophysics Data System (ADS)

    Kong, Wei-Jun; Liu, Qiu-Tao; Kong, Dan-Dan; Liu, Qian-Zhen; Ma, Xin-Ping; Yang, Mei-Hua

    2016-02-01

    A method is described for multi-residue, high-throughput determination of trace levels of 22 organochlorine pesticides (OCPs) and 5 pyrethroid pesticides (PYPs) in Chinese medicinal (CM) health wines using a QuEChERS (quick, easy, cheap, effective, rugged, and safe) based extraction method and gas chromatography-electron capture detection (GC-ECD). Several parameters were optimized to improve preparation and separation time while still maintaining high sensitivity. Validation tests of spiked samples showed good linearities for 27 pesticides (R = 0.9909-0.9996) over wide concentration ranges. Limits of detection (LODs) and quantification (LOQs) were measured at ng/L levels, 0.06-2 ng/L and 0.2-6 ng/L for OCPs and 0.02-3 ng/L and 0.06-7 ng/L for PYPs, respectively. Inter- and intra-day precision tests showed variations of 0.65-9.89% for OCPs and 0.98-13.99% for PYPs, respectively. Average recoveries were in the range of 47.74-120.31%, with relative standard deviations below 20%. The developed method was then applied to analyze 80 CM wine samples. Beta-BHC (Benzene hexachloride) was the most frequently detected pesticide at concentration levels of 5.67-31.55 mg/L, followed by delta-BHC, trans-chlordane, gamma-BHC, and alpha-BHC. The validated method is simple and economical, with adequate sensitivity for trace levels of multi-class pesticides. It could be adopted by laboratories for this and other types of complex matrices analysis.

  5. Dilute-and-shoot coupled to nanoflow liquid chromatography high resolution mass spectrometry for the determination of drugs of abuse and sport drugs in human urine.

    PubMed

    Alcántara-Durán, Jaime; Moreno-González, David; Beneito-Cambra, Miriam; García-Reyes, Juan F

    2018-05-15

    In this work, a sensitive nanoflow liquid chromatography high-resolution mass spectrometry screening method has been developed for the determination of multiclass drugs of abuse and sport drugs in human urine. 81 drugs belonging to different multiclass pharmaceuticals were targeted. The method is based on the use of a nanoLC column (75 µm × 150 mm, 3 µm particle size and 100 Å pore) with the nanospray emitter tip integrated so that dead volumes are significantly minimized. Data acquisition method included both full-scan and all ion fragmentation experiments using an Orbitrap analyser (Q-Exactive) operated in the positive ionization mode. To increase laboratory throughput, a dilute-and-shoot methodology has been tested and proposed, based solely on direct urine dilution without further sample workup. Matrix effects were evaluated, showing a negligible effect for all studied compounds when a dilution 1:50 was implemented. Despite this high-dilution factor, limits of quantification were still satisfactory, with values below 5 µg L -1 in most cases, being lower than their minimum required performance limits correspond established by the World Anti-Doping Agency. Therefore, the use of the dilute-and-shoot method with the enhanced sensitivity provided by nanoflow LC setup could be useful tool for the determination of studied compounds in drug testing, thus increasing laboratory performance, because a minimum sample treatment steps are required. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Voxel-based plaque classification in coronary intravascular optical coherence tomography images using decision trees

    NASA Astrophysics Data System (ADS)

    Kolluru, Chaitanya; Prabhu, David; Gharaibeh, Yazan; Wu, Hao; Wilson, David L.

    2018-02-01

    Intravascular Optical Coherence Tomography (IVOCT) is a high contrast, 3D microscopic imaging technique that can be used to assess atherosclerosis and guide stent interventions. Despite its advantages, IVOCT image interpretation is challenging and time consuming with over 500 image frames generated in a single pullback volume. We have developed a method to classify voxel plaque types in IVOCT images using machine learning. To train and test the classifier, we have used our unique database of labeled cadaver vessel IVOCT images accurately registered to gold standard cryoimages. This database currently contains 300 images and is growing. Each voxel is labeled as fibrotic, lipid-rich, calcified or other. Optical attenuation, intensity and texture features were extracted for each voxel and were used to build a decision tree classifier for multi-class classification. Five-fold cross-validation across images gave accuracies of 96 % +/- 0.01 %, 90 +/- 0.02% and 90 % +/- 0.01 % for fibrotic, lipid-rich and calcified classes respectively. To rectify performance degradation seen in left out vessel specimens as opposed to left out images, we are adding data and reducing features to limit overfitting. Following spatial noise cleaning, important vascular regions were unambiguous in display. We developed displays that enable physicians to make rapid determination of calcified and lipid regions. This will inform treatment decisions such as the need for devices (e.g., atherectomy or scoring balloon in the case of calcifications) or extended stent lengths to ensure coverage of lipid regions prone to injury at the edge of a stent.

  7. Classification of brain tumours using short echo time 1H MR spectra

    NASA Astrophysics Data System (ADS)

    Devos, A.; Lukas, L.; Suykens, J. A. K.; Vanhamme, L.; Tate, A. R.; Howe, F. A.; Majós, C.; Moreno-Torres, A.; van der Graaf, M.; Arús, C.; Van Huffel, S.

    2004-09-01

    The purpose was to objectively compare the application of several techniques and the use of several input features for brain tumour classification using Magnetic Resonance Spectroscopy (MRS). Short echo time 1H MRS signals from patients with glioblastomas ( n = 87), meningiomas ( n = 57), metastases ( n = 39), and astrocytomas grade II ( n = 22) were provided by six centres in the European Union funded INTERPRET project. Linear discriminant analysis, least squares support vector machines (LS-SVM) with a linear kernel and LS-SVM with radial basis function kernel were applied and evaluated over 100 stratified random splittings of the dataset into training and test sets. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of binary classifiers, while the percentage of correct classifications was used to evaluate the multiclass classifiers. The influence of several factors on the classification performance has been tested: L2- vs. water normalization, magnitude vs. real spectra and baseline correction. The effect of input feature reduction was also investigated by using only the selected frequency regions containing the most discriminatory information, and peak integrated values. Using L2-normalized complete spectra the automated binary classifiers reached a mean test AUC of more than 0.95, except for glioblastomas vs. metastases. Similar results were obtained for all classification techniques and input features except for water normalized spectra, where classification performance was lower. This indicates that data acquisition and processing can be simplified for classification purposes, excluding the need for separate water signal acquisition, baseline correction or phasing.

  8. A fingertip force prediction model for grasp patterns characterised from the chaotic behaviour of EEG.

    PubMed

    Roy, Rinku; Sikdar, Debdeep; Mahadevappa, Manjunatha; Kumar, C S

    2018-05-19

    A stable grasp is attained through appropriate hand preshaping and precise fingertip forces. Here, we have proposed a method to decode grasp patterns from motor imagery and subsequent fingertip force estimation model with a slippage avoidance strategy. We have developed a feature-based classification of electroencephalography (EEG) associated with imagination of the grasping postures. Chaotic behaviour of EEG for different grasping patterns has been utilised to capture the dynamics of associated motor activities. We have computed correlation dimension (CD) as the feature and classified with "one against one" multiclass support vector machine (SVM) to discriminate between different grasping patterns. The result of the analysis showed varying classification accuracies at different subband levels. Broad categories of grasping patterns, namely, power grasp and precision grasp, were classified at a 96.0% accuracy rate in the alpha subband. Furthermore, power grasp subtypes were classified with an accuracy of 97.2% in the upper beta subband, whereas precision grasp subtypes showed relatively lower 75.0% accuracy in the alpha subband. Following assessment of fingertip force distributions while grasping, a nonlinear autoregressive (NAR) model with proper prediction of fingertip forces was proposed for each grasp pattern. A slippage detection strategy has been incorporated with automatic recalibration of the regripping force. Intention of each grasp pattern associated with corresponding fingertip force model was virtualised in this work. This integrated system can be utilised as the control strategy for prosthetic hand in the future. The model to virtualise motor imagery based fingertip force prediction with inherent slippage correction for different grasp types ᅟ.

  9. Single-trial EEG RSVP classification using convolutional neural networks

    NASA Astrophysics Data System (ADS)

    Shamwell, Jared; Lee, Hyungtae; Kwon, Heesung; Marathe, Amar R.; Lawhern, Vernon; Nothwang, William

    2016-05-01

    Traditionally, Brain-Computer Interfaces (BCI) have been explored as a means to return function to paralyzed or otherwise debilitated individuals. An emerging use for BCIs is in human-autonomy sensor fusion where physiological data from healthy subjects is combined with machine-generated information to enhance the capabilities of artificial systems. While human-autonomy fusion of physiological data and computer vision have been shown to improve classification during visual search tasks, to date these approaches have relied on separately trained classification models for each modality. We aim to improve human-autonomy classification performance by developing a single framework that builds codependent models of human electroencephalograph (EEG) and image data to generate fused target estimates. As a first step, we developed a novel convolutional neural network (CNN) architecture and applied it to EEG recordings of subjects classifying target and non-target image presentations during a rapid serial visual presentation (RSVP) image triage task. The low signal-to-noise ratio (SNR) of EEG inherently limits the accuracy of single-trial classification and when combined with the high dimensionality of EEG recordings, extremely large training sets are needed to prevent overfitting and achieve accurate classification from raw EEG data. This paper explores a new deep CNN architecture for generalized multi-class, single-trial EEG classification across subjects. We compare classification performance from the generalized CNN architecture trained across all subjects to the individualized XDAWN, HDCA, and CSP neural classifiers which are trained and tested on single subjects. Preliminary results show that our CNN meets and slightly exceeds the performance of the other classifiers despite being trained across subjects.

  10. The effect of combining two echo times in automatic brain tumor classification by MRS.

    PubMed

    García-Gómez, Juan M; Tortajada, Salvador; Vidal, César; Julià-Sapé, Margarida; Luts, Jan; Moreno-Torres, Angel; Van Huffel, Sabine; Arús, Carles; Robles, Montserrat

    2008-11-01

    (1)H MRS is becoming an accurate, non-invasive technique for initial examination of brain masses. We investigated if the combination of single-voxel (1)H MRS at 1.5 T at two different (TEs), short TE (PRESS or STEAM, 20-32 ms) and long TE (PRESS, 135-136 ms), improves the classification of brain tumors over using only one echo TE. A clinically validated dataset of 50 low-grade meningiomas, 105 aggressive tumors (glioblastoma and metastasis), and 30 low-grade glial tumors (astrocytomas grade II, oligodendrogliomas and oligoastrocytomas) was used to fit predictive models based on the combination of features from short-TEs and long-TE spectra. A new approach that combines the two consecutively was used to produce a single data vector from which relevant features of the two TE spectra could be extracted by means of three algorithms: stepwise, reliefF, and principal components analysis. Least squares support vector machines and linear discriminant analysis were applied to fit the pairwise and multiclass classifiers, respectively. Significant differences in performance were found when short-TE, long-TE or both spectra combined were used as input. In our dataset, to discriminate meningiomas, the combination of the two TE acquisitions produced optimal performance. To discriminate aggressive tumors from low-grade glial tumours, the use of short-TE acquisition alone was preferable. The classifier development strategy used here lends itself to automated learning and test performance processes, which may be of use for future web-based multicentric classifier development studies. Copyright (c) 2008 John Wiley & Sons, Ltd.

  11. An AdaBoost Based Approach to Automatic Classification and Detection of Buildings Footprints, Vegetation Areas and Roads from Satellite Images

    NASA Astrophysics Data System (ADS)

    Gonulalan, Cansu

    In recent years, there has been an increasing demand for applications to monitor the targets related to land-use, using remote sensing images. Advances in remote sensing satellites give rise to the research in this area. Many applications ranging from urban growth planning to homeland security have already used the algorithms for automated object recognition from remote sensing imagery. However, they have still problems such as low accuracy on detection of targets, specific algorithms for a specific area etc. In this thesis, we focus on an automatic approach to classify and detect building foot-prints, road networks and vegetation areas. The automatic interpretation of visual data is a comprehensive task in computer vision field. The machine learning approaches improve the capability of classification in an intelligent way. We propose a method, which has high accuracy on detection and classification. The multi class classification is developed for detecting multiple objects. We present an AdaBoost-based approach along with the supervised learning algorithm. The combi- nation of AdaBoost with "Attentional Cascade" is adopted from Viola and Jones [1]. This combination decreases the computation time and gives opportunity to real time applications. For the feature extraction step, our contribution is to combine Haar-like features that include corner, rectangle and Gabor. Among all features, AdaBoost selects only critical features and generates in extremely efficient cascade structured classifier. Finally, we present and evaluate our experimental results. The overall system is tested and high performance of detection is achieved. The precision rate of the final multi-class classifier is over 98%.

  12. Multivariate decoding of brain images using ordinal regression.

    PubMed

    Doyle, O M; Ashburner, J; Zelaya, F O; Williams, S C R; Mehta, M A; Marquand, A F

    2013-11-01

    Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations - whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds - lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection. Copyright © 2013. Published by Elsevier Inc.

  13. Assessing product image quality for online shopping

    NASA Astrophysics Data System (ADS)

    Goswami, Anjan; Chung, Sung H.; Chittar, Naren; Islam, Atiq

    2012-01-01

    Assessing product-image quality is important in the context of online shopping. A high quality image that conveys more information about a product can boost the buyer's confidence and can get more attention. However, the notion of image quality for product-images is not the same as that in other domains. The perception of quality of product-images depends not only on various photographic quality features but also on various high level features such as clarity of the foreground or goodness of the background etc. In this paper, we define a notion of product-image quality based on various such features. We conduct a crowd-sourced experiment to collect user judgments on thousands of eBay's images. We formulate a multi-class classification problem for modeling image quality by classifying images into good, fair and poor quality based on the guided perceptual notions from the judges. We also conduct experiments with regression using average crowd-sourced human judgments as target. We compute a pseudo-regression score with expected average of predicted classes and also compute a score from the regression technique. We design many experiments with various sampling and voting schemes with crowd-sourced data and construct various experimental image quality models. Most of our models have reasonable accuracies (greater or equal to 70%) on test data set. We observe that our computed image quality score has a high (0.66) rank correlation with average votes from the crowd sourced human judgments.

  14. A learning scheme for reach to grasp movements: on EMG-based interfaces using task specific motion decoding models.

    PubMed

    Liarokapis, Minas V; Artemiadis, Panagiotis K; Kyriakopoulos, Kostas J; Manolakos, Elias S

    2013-09-01

    A learning scheme based on random forests is used to discriminate between different reach to grasp movements in 3-D space, based on the myoelectric activity of human muscles of the upper-arm and the forearm. Task specificity for motion decoding is introduced in two different levels: Subspace to move toward and object to be grasped. The discrimination between the different reach to grasp strategies is accomplished with machine learning techniques for classification. The classification decision is then used in order to trigger an EMG-based task-specific motion decoding model. Task specific models manage to outperform "general" models providing better estimation accuracy. Thus, the proposed scheme takes advantage of a framework incorporating both a classifier and a regressor that cooperate advantageously in order to split the task space. The proposed learning scheme can be easily used to a series of EMG-based interfaces that must operate in real time, providing data-driven capabilities for multiclass problems, that occur in everyday life complex environments.

  15. Image segmentation using hidden Markov Gauss mixture models.

    PubMed

    Pyun, Kyungsuk; Lim, Johan; Won, Chee Sun; Gray, Robert M

    2007-07-01

    Image segmentation is an important tool in image processing and can serve as an efficient front end to sophisticated algorithms and thereby simplify subsequent processing. We develop a multiclass image segmentation method using hidden Markov Gauss mixture models (HMGMMs) and provide examples of segmentation of aerial images and textures. HMGMMs incorporate supervised learning, fitting the observation probability distribution given each class by a Gauss mixture estimated using vector quantization with a minimum discrimination information (MDI) distortion. We formulate the image segmentation problem using a maximum a posteriori criteria and find the hidden states that maximize the posterior density given the observation. We estimate both the hidden Markov parameter and hidden states using a stochastic expectation-maximization algorithm. Our results demonstrate that HMGMM provides better classification in terms of Bayes risk and spatial homogeneity of the classified objects than do several popular methods, including classification and regression trees, learning vector quantization, causal hidden Markov models (HMMs), and multiresolution HMMs. The computational load of HMGMM is similar to that of the causal HMM.

  16. Normed kernel function-based fuzzy possibilistic C-means (NKFPCM) algorithm for high-dimensional breast cancer database classification with feature selection is based on Laplacian Score

    NASA Astrophysics Data System (ADS)

    Lestari, A. W.; Rustam, Z.

    2017-07-01

    In the last decade, breast cancer has become the focus of world attention as this disease is one of the primary leading cause of death for women. Therefore, it is necessary to have the correct precautions and treatment. In previous studies, Fuzzy Kennel K-Medoid algorithm has been used for multi-class data. This paper proposes an algorithm to classify the high dimensional data of breast cancer using Fuzzy Possibilistic C-means (FPCM) and a new method based on clustering analysis using Normed Kernel Function-Based Fuzzy Possibilistic C-Means (NKFPCM). The objective of this paper is to obtain the best accuracy in classification of breast cancer data. In order to improve the accuracy of the two methods, the features candidates are evaluated using feature selection, where Laplacian Score is used. The results show the comparison accuracy and running time of FPCM and NKFPCM with and without feature selection.

  17. SEGMA: An Automatic SEGMentation Approach for Human Brain MRI Using Sliding Window and Random Forests

    PubMed Central

    Serag, Ahmed; Wilkinson, Alastair G.; Telford, Emma J.; Pataky, Rozalia; Sparrow, Sarah A.; Anblagan, Devasuda; Macnaught, Gillian; Semple, Scott I.; Boardman, James P.

    2017-01-01

    Quantitative volumes from brain magnetic resonance imaging (MRI) acquired across the life course may be useful for investigating long term effects of risk and resilience factors for brain development and healthy aging, and for understanding early life determinants of adult brain structure. Therefore, there is an increasing need for automated segmentation tools that can be applied to images acquired at different life stages. We developed an automatic segmentation method for human brain MRI, where a sliding window approach and a multi-class random forest classifier were applied to high-dimensional feature vectors for accurate segmentation. The method performed well on brain MRI data acquired from 179 individuals, analyzed in three age groups: newborns (38–42 weeks gestational age), children and adolescents (4–17 years) and adults (35–71 years). As the method can learn from partially labeled datasets, it can be used to segment large-scale datasets efficiently. It could also be applied to different populations and imaging modalities across the life course. PMID:28163680

  18. Automated analysis and classification of melanocytic tumor on skin whole slide images.

    PubMed

    Xu, Hongming; Lu, Cheng; Berendt, Richard; Jha, Naresh; Mandal, Mrinal

    2018-06-01

    This paper presents a computer-aided technique for automated analysis and classification of melanocytic tumor on skin whole slide biopsy images. The proposed technique consists of four main modules. First, skin epidermis and dermis regions are segmented by a multi-resolution framework. Next, epidermis analysis is performed, where a set of epidermis features reflecting nuclear morphologies and spatial distributions is computed. In parallel with epidermis analysis, dermis analysis is also performed, where dermal cell nuclei are segmented and a set of textural and cytological features are computed. Finally, the skin melanocytic image is classified into different categories such as melanoma, nevus or normal tissue by using a multi-class support vector machine (mSVM) with extracted epidermis and dermis features. Experimental results on 66 skin whole slide images indicate that the proposed technique achieves more than 95% classification accuracy, which suggests that the technique has the potential to be used for assisting pathologists on skin biopsy image analysis and classification. Copyright © 2018 Elsevier Ltd. All rights reserved.

  19. Blind Linguistic Steganalysis against Translation Based Steganography

    NASA Astrophysics Data System (ADS)

    Chen, Zhili; Huang, Liusheng; Meng, Peng; Yang, Wei; Miao, Haibo

    Translation based steganography (TBS) is a kind of relatively new and secure linguistic steganography. It takes advantage of the "noise" created by automatic translation of natural language text to encode the secret information. Up to date, there is little research on the steganalysis against this kind of linguistic steganography. In this paper, a blind steganalytic method, which is named natural frequency zoned word distribution analysis (NFZ-WDA), is presented. This method has improved on a previously proposed linguistic steganalysis method based on word distribution which is targeted for the detection of linguistic steganography like nicetext and texto. The new method aims to detect the application of TBS and uses none of the related information about TBS, its only used resource is a word frequency dictionary obtained from a large corpus, or a so called natural frequency dictionary, so it is totally blind. To verify the effectiveness of NFZ-WDA, two experiments with two-class and multi-class SVM classifiers respectively are carried out. The experimental results show that the steganalytic method is pretty promising.

  20. [Identification of special quality eggs with NIR spectroscopy technology based on symbol entropy feature extraction method].

    PubMed

    Zhao, Yong; Hong, Wen-Xue

    2011-11-01

    Fast, nondestructive and accurate identification of special quality eggs is an urgent problem. The present paper proposed a new feature extraction method based on symbol entropy to identify near infrared spectroscopy of special quality eggs. The authors selected normal eggs, free range eggs, selenium-enriched eggs and zinc-enriched eggs as research objects and measured the near-infrared diffuse reflectance spectra in the range of 12 000-4 000 cm(-1). Raw spectra were symbolically represented with aggregation approximation algorithm and symbolic entropy was extracted as feature vector. An error-correcting output codes multiclass support vector machine classifier was designed to identify the spectrum. Symbolic entropy feature is robust when parameter changed and the highest recognition rate reaches up to 100%. The results show that the identification method of special quality eggs using near-infrared is feasible and the symbol entropy can be used as a new feature extraction method of near-infrared spectra.

  1. Belief Function Based Decision Fusion for Decentralized Target Classification in Wireless Sensor Networks

    PubMed Central

    Zhang, Wenyu; Zhang, Zhenjiang

    2015-01-01

    Decision fusion in sensor networks enables sensors to improve classification accuracy while reducing the energy consumption and bandwidth demand for data transmission. In this paper, we focus on the decentralized multi-class classification fusion problem in wireless sensor networks (WSNs) and a new simple but effective decision fusion rule based on belief function theory is proposed. Unlike existing belief function based decision fusion schemes, the proposed approach is compatible with any type of classifier because the basic belief assignments (BBAs) of each sensor are constructed on the basis of the classifier’s training output confusion matrix and real-time observations. We also derive explicit global BBA in the fusion center under Dempster’s combinational rule, making the decision making operation in the fusion center greatly simplified. Also, sending the whole BBA structure to the fusion center is avoided. Experimental results demonstrate that the proposed fusion rule has better performance in fusion accuracy compared with the naïve Bayes rule and weighted majority voting rule. PMID:26295399

  2. TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

    PubMed Central

    Diaz, Naryttza N; Krause, Lutz; Goesmann, Alexander; Niehaus, Karsten; Nattkemper, Tim W

    2009-01-01

    Background Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. Results Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained. Conclusion An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date. PMID:19210774

  3. Generation and Termination of Binary Decision Trees for Nonparametric Multiclass Classification.

    DTIC Science & Technology

    1984-10-01

    O M coF=F;; UMBER2. GOVT ACCE5SION NO.1 3 . REC,PINS :A7AL:,G NUMBER ( ’eneration and Terminat_,on :)f Binary D-ecision jC j ik; Trees for Nonnararetrc...1-I . v)IAMO 0~I4 EDvt" O F I 00 . 3 15I OR%.OL.ETL - S-S OCTOBER 1984 LIDS-P-1411 GENERATION AND TERMINATION OF BINARY DECISION TREES FOR...minimizes the Bayes risk. Tree generation and termination are based on the training and test samples, respectively. 0 0 0/ 6 0¢ A 3 I. Introduction We state

  4. Multiclass classification of obstructive sleep apnea/hypopnea based on a convolutional neural network from a single-lead electrocardiogram.

    PubMed

    Urtnasan, Erdenebayar; Park, Jong-Uk; Lee, Kyoung-Joung

    2018-05-24

    In this paper, we propose a convolutional neural network (CNN)-based deep learning architecture for multiclass classification of obstructive sleep apnea and hypopnea (OSAH) using single-lead electrocardiogram (ECG) recordings. OSAH is the most common sleep-related breathing disorder. Many subjects who suffer from OSAH remain undiagnosed; thus, early detection of OSAH is important. In this study, automatic classification of three classes-normal, hypopnea, and apnea-based on a CNN is performed. An optimal six-layer CNN model is trained on a training dataset (45,096 events) and evaluated on a test dataset (11,274 events). The training set (69 subjects) and test set (17 subjects) were collected from 86 subjects with length of approximately 6 h and segmented into 10 s durations. The proposed CNN model reaches a mean -score of 93.0 for the training dataset and 87.0 for the test dataset. Thus, proposed deep learning architecture achieved a high performance for multiclass classification of OSAH using single-lead ECG recordings. The proposed method can be employed in screening of patients suspected of having OSAH. © 2018 Institute of Physics and Engineering in Medicine.

  5. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection.

    PubMed

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports.

  6. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection

    PubMed Central

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports. PMID:28166263

  7. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

    PubMed

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

    2018-02-15

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  8. Prediction of recombinant protein overexpression in Escherichia coli using a machine learning based model (RPOLP).

    PubMed

    Habibi, Narjeskhatoon; Norouzi, Alireza; Mohd Hashim, Siti Z; Shamsir, Mohd Shahir; Samian, Razip

    2015-11-01

    Recombinant protein overexpression, an important biotechnological process, is ruled by complex biological rules which are mostly unknown, is in need of an intelligent algorithm so as to avoid resource-intensive lab-based trial and error experiments in order to determine the expression level of the recombinant protein. The purpose of this study is to propose a predictive model to estimate the level of recombinant protein overexpression for the first time in the literature using a machine learning approach based on the sequence, expression vector, and expression host. The expression host was confined to Escherichia coli which is the most popular bacterial host to overexpress recombinant proteins. To provide a handle to the problem, the overexpression level was categorized as low, medium and high. A set of features which were likely to affect the overexpression level was generated based on the known facts (e.g. gene length) and knowledge gathered from related literature. Then, a representative sub-set of features generated in the previous objective was determined using feature selection techniques. Finally a predictive model was developed using random forest classifier which was able to adequately classify the multi-class imbalanced small dataset constructed. The result showed that the predictive model provided a promising accuracy of 80% on average, in estimating the overexpression level of a recombinant protein. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. High-throughput method based on quick, easy, cheap, effective, rugged and safe followed by liquid chromatography-multi-wavelength detection for the quantification of multiclass polyphenols in wines.

    PubMed

    Fontana, Ariel R; Bottini, Rubén

    2014-05-16

    In this work, a reliable, simple, fast, inexpensive and robust sample preparation approach for the determination of multiclass polyphenols in wine samples is proposed. The polyphenols selected for this work were gallic acid, (+)-catechin, (-)-epicatechin, caffeic acid, syringic acid, coumaric acid, ferulic acid, trans-resveratrol, quercetin and cinnamic acid. The method is based on QuEChERS (quick, easy, cheap, effective, rugged and safe) extraction technique coupled with dispersive solid-phase extraction (d-SPE) clean-up. Under optimized conditions, the analytes were extracted from 5mL wine samples (previously acidified with 1% formic acid) using 2.5mL acetonitrile. For phase separation, 1.5g NaCl and 4g anhydrous MgSO4 were added. Then, a 1mL aliquot of the partitioned supernatant was cleaned-up using d-SPE with a combination of 150mg CaCl2, 50mg primary-secondary amine (PSA) and 50mgC18 as sorbents. A 250μL aliquot of the obtained cleaned extract was concentrated to dryness and taken up with the initial mobile phase previous to liquid chromatography-multi-wavelength detection (LC-MWD). The proposed method provided limits of detection (LODs) ranging from 0.004 to 0.079μgmL(-1) and an inter-day variability below 12% RSD for all analytes in red and white wine samples. Considering external calibration (red wines) and matrix-matched calibration (white wines) as quantification techniques, the overall recoveries (accuracy) of the method ranged between 75.0% and 119.6% for red and white wine samples, respectively. The developed method was applied for the determination of polyphenols in 10 wines produced in Argentina. Nine phenolic compounds were determined, at concentrations above detectable levels in the method. The maximum concentrations corresponded to (-)-epicatechin in white wines, while gallic acid and (+)-catechin were the most abundant in red wines. Copyright © 2014 Elsevier B.V. All rights reserved.

  10. Machine Learning for Big Data: A Study to Understand Limits at Scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sukumar, Sreenivas R.; Del-Castillo-Negrete, Carlos Emilio

    This report aims to empirically understand the limits of machine learning when applied to Big Data. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical data mining and machine learning under more scrutiny, evaluation and application for gleaning insights from the data than ever before. Much is expected from algorithms without understanding their limitations at scale while dealing with massive datasets. In that context, we pose and address the following questions How does a machine learning algorithm perform on measuresmore » such as accuracy and execution time with increasing sample size and feature dimensionality? Does training with more samples guarantee better accuracy? How many features to compute for a given problem? Do more features guarantee better accuracy? Do efforts to derive and calculate more features and train on larger samples worth the effort? As problems become more complex and traditional binary classification algorithms are replaced with multi-task, multi-class categorization algorithms do parallel learners perform better? What happens to the accuracy of the learning algorithm when trained to categorize multiple classes within the same feature space? Towards finding answers to these questions, we describe the design of an empirical study and present the results. We conclude with the following observations (i) accuracy of the learning algorithm increases with increasing sample size but saturates at a point, beyond which more samples do not contribute to better accuracy/learning, (ii) the richness of the feature space dictates performance - both accuracy and training time, (iii) increased dimensionality often reflected in better performance (higher accuracy in spite of longer training times) but the improvements are not commensurate the efforts for feature computation and training and (iv) accuracy of the learning algorithms drop significantly with multi-class learners training on the same feature matrix and (v) learning algorithms perform well when categories in labeled data are independent (i.e., no relationship or hierarchy exists among categories).« less

  11. A novel framework for change detection in bi-temporal polarimetric SAR images

    NASA Astrophysics Data System (ADS)

    Pirrone, Davide; Bovolo, Francesca; Bruzzone, Lorenzo

    2016-10-01

    Last years have seen relevant increase of polarimetric Synthetic Aperture Radar (SAR) data availability, thanks to satellite sensors like Sentinel-1 or ALOS-2 PALSAR-2. The augmented information lying in the additional polarimetric channels represents a possibility for better discriminate different classes of changes in change detection (CD) applications. This work aims at proposing a framework for CD in multi-temporal multi-polarization SAR data. The framework includes both a tool for an effective visual representation of the change information and a method for extracting the multiple-change information. Both components are designed to effectively handle the multi-dimensionality of polarimetric data. In the novel representation, multi-temporal intensity SAR data are employed to compute a polarimetric log-ratio. The multitemporal information of the polarimetric log-ratio image is represented in a multi-dimensional features space, where changes are highlighted in terms of magnitude and direction. This representation is employed to design a novel unsupervised multi-class CD approach. This approach considers a sequential two-step analysis of the magnitude and the direction information for separating non-changed and changed samples. The proposed approach has been validated on a pair of Sentinel-1 data acquired before and after the flood in Tamil-Nadu in 2015. Preliminary results demonstrate that the representation tool is effective and that the use of polarimetric SAR data is promising in multi-class change detection applications.

  12. Analytical method for fast screening and confirmation of multi-class veterinary drug residues in fish and shrimp by LC-MS/MS.

    PubMed

    Kim, Junghyun; Suh, Joon Hyuk; Cho, Hyun-Deok; Kang, Wonjae; Choi, Yong Seok; Han, Sang Beom

    2016-01-01

    A multi-class, multi-residue analytical method based on LC-MS/MS detection was developed for the screening and confirmation of 28 veterinary drug and metabolite residues in flatfish, shrimp and eel. The chosen veterinary drugs are prohibited or unauthorised compounds in Korea, which were categorised into various chemical classes including nitroimidazoles, benzimidazoles, sulfones, quinolones, macrolides, phenothiazines, pyrethroids and others. To achieve fast and simultaneous extraction of various analytes, a simple and generic liquid extraction procedure using EDTA-ammonium acetate buffer and acetonitrile, without further clean-up steps, was applied to sample preparation. The final extracts were analysed by ultra-high-performance liquid chromatography coupled with tandem mass spectrometry (UHPLC-MS/MS). The method was validated for each compound in each matrix at three different concentrations (5, 10 and 20 ng g(-1)) in accordance with Codex guidelines (CAC/GL 71-2009). For most compounds, the recoveries were in the range of 60-110%, and precision, expressed as the relative standard deviation (RSD), was in the range of 5-15%. The detection capabilities (CCβs) were below or equal to 5 ng g(-1), which indicates that the developed method is sufficient to detect illegal fishery products containing the target compounds above the residue limit (10 ng g(-1)) of the new regulatory system (Positive List System - PLS).

  13. Enhanced Dissipation of Triazole and Multiclass Pesticide Residues on Grapes after Foliar Application of Grapevine-Associated Bacillus Species.

    PubMed

    Salunkhe, Varsha P; Sawant, Indu S; Banerjee, Kaushik; Wadkar, Pallavi N; Sawant, Sanjay D

    2015-12-23

    Disease management in vineyards with fungicides sometimes results in undesirable residue accumulations in grapes at harvest. Bioaugmentation of the grape fructosphere can be a useful approach for enhancing the degradation rate and reducing the residues to safe levels. This paper reports the in vitro and in vivo biodegradation of three triazole fungicides commonly used in Indian vineyards, by Bacillus strains, namely, DR-39, CS-126, TL-171, and TS-204, which were earlier found to enhance the dissipation rate of profenophos and carbendazim. The strains utilized the triazoles as carbon source and enhanced their in vitro rate of degradation. Myclobutanil, tetraconazole, and flusilazole were applied in separate vineyard plots at field doses of 0.40 g L(-1), 0.75 mL L(-1), and 0.125 mL L(-1), respectively. Residue analysis of field samples from the treated fields reflected 87.38 and >99% degradations of myclobutanil and tetraconazole, respectively, by the strain DR-39, and 90.82% degradation of flusilazole by the strain CS-126 after 15-20 days of treatment. In the respective controls, the corresponding percent degradations were 72.07, 58.88, and 54.28, respectively. These Bacillus strains could also simultaneously degrade the residues of profenofos, carbendazim, and tetraconazole on the grape berries and can be useful in multiclass pesticide residue biodegradation.

  14. Multiclass screening method based on solvent extraction and liquid chromatography-tandem mass spectrometry for the determination of antimicrobials and mycotoxins in egg.

    PubMed

    Capriotti, Anna Laura; Cavaliere, Chiara; Piovesana, Susy; Samperi, Roberto; Laganà, Aldo

    2012-12-14

    A QuEChERS (Quick Easy Cheap Effective Rugged Safe)-like extraction method was developed for the simultaneous analysis of veterinary drugs and mycotoxins in hen eggs by liquid chromatography-tandem mass spectrometry (LC-MS/MS) with electrospray (ESI) source. Various classes of antimicrobials (tetracyclines, ionophores, coccidiostats, penicillins, cephalosporins, fluoroquinolones, sulfonamides) and mycotoxins (enniatins, beauvericin, ochratoxins, aflatoxins) were considered for the development of this method. Particular attention was devoted to extraction optimization: different solvents (acetone, acetonitrile and methanol), different pH values and different sample to extracting volume ratios were tested and evaluated in terms of recovery, relative standard deviation (RSD) and ESI signal suppression due to matrix effect. Chromatographic and mass spectrometric conditions were optimized to obtain the best instrumental performances for most of the analytes. Quantitative analysis was performed by means of matrix-matched calibration, in a range that varied depending on the analyte and its established maximum limit, when there was one. Recoveries at 100 μg kg(-1) spiking level were >62% (3

  15. Multi-class analysis of new psychoactive substances and metabolites in hair by pressurized liquid extraction coupled to HPLC-HRMS.

    PubMed

    Montesano, Camilla; Vannutelli, Gabriele; Massa, Maristella; Simeoni, Maria Chiara; Gregori, Adolfo; Ripani, Luigi; Compagnone, Dario; Curini, Roberta; Sergi, Manuel

    2017-05-01

    In this paper, an analytical method has been developed and validated for the analysis of new psychoactive substances (NPS) and metabolites in hair samples. The method was based on pressurized liquid extraction (PLE) followed by solid-phase extraction (SPE) clean-up and high performance liquid chromatography-high resolution mass spectrometry (HPLC-HRMS) analysis. To evaluate extraction efficiency and the applicability of the method, hair samples were fortified by soaking in order to obtain a good surrogate for drug users' hair; the amount of incorporated drugs related to their lipophilicity, similarly to in vivo drug incorporation. To the best of our knowledge, this is the first method that allowed for the analysis of both cathinones (5) and synthetic cannabinoids (7) in hair with a single extraction procedure and chromatographic run. A phenethylamine (2C-T-4), 4- fluorophenylpiperazine and methoxetamine were also included showing that PLE coupled to SPE clean-up was suitable for a multi-class analysis of NPS in hair. In addition, the use of PLE significantly reduced hair analysis time: decontamination, incubation, clean-up, and liquid chromatography-mass spectrometry (LC-MS) analysis were carried out in approximately 45 min. The method was fully validated according to Scientific Working Group for Forensic Toxicology (SWGTOX) and Society of Hair Testing (SoHT) guidelines. Limit of quantification (LOQ) values ranged from 8 to 50 pg mg -1 for cathinones, phenetylamines and piperazines, and from 9 to 40 pg mg -1 for synthetic cannabinoids (10 pg mg -1 for methoxetamine). Matrix effects were below 15% for all the analytes, demonstrating the effectiveness of the clean-up step. Inaccuracy was lower than 9% in terms of bias. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  16. StruLocPred: structure-based protein subcellular localisation prediction using multi-class support vector machine.

    PubMed

    Zhou, Wengang; Dickerson, Julie A

    2012-01-01

    Knowledge of protein subcellular locations can help decipher a protein's biological function. This work proposes new features: sequence-based: Hybrid Amino Acid Pair (HAAP) and two structure-based: Secondary Structural Element Composition (SSEC) and solvent accessibility state frequency. A multi-class Support Vector Machine is developed to predict the locations. Testing on two established data sets yields better prediction accuracies than the best available systems. Comparisons with existing methods show comparable results to ESLPred2. When StruLocPred is applied to the entire Arabidopsis proteome, over 77% of proteins with known locations match the prediction results. An implementation of this system is at http://wgzhou.ece. iastate.edu/StruLocPred/.

  17. A semi-automated image analysis procedure for in situ plankton imaging systems.

    PubMed

    Bi, Hongsheng; Guo, Zhenhua; Benfield, Mark C; Fan, Chunlei; Ford, Michael; Shahrestani, Suzan; Sieracki, Jeffery M

    2015-01-01

    Plankton imaging systems are capable of providing fine-scale observations that enhance our understanding of key physical and biological processes. However, processing the large volumes of data collected by imaging systems remains a major obstacle for their employment, and existing approaches are designed either for images acquired under laboratory controlled conditions or within clear waters. In the present study, we developed a semi-automated approach to analyze plankton taxa from images acquired by the ZOOplankton VISualization (ZOOVIS) system within turbid estuarine waters, in Chesapeake Bay. When compared to images under laboratory controlled conditions or clear waters, images from highly turbid waters are often of relatively low quality and more variable, due to the large amount of objects and nonlinear illumination within each image. We first customized a segmentation procedure to locate objects within each image and extracted them for classification. A maximally stable extremal regions algorithm was applied to segment large gelatinous zooplankton and an adaptive threshold approach was developed to segment small organisms, such as copepods. Unlike the existing approaches for images acquired from laboratory, controlled conditions or clear waters, the target objects are often the majority class, and the classification can be treated as a multi-class classification problem. We customized a two-level hierarchical classification procedure using support vector machines to classify the target objects (< 5%), and remove the non-target objects (> 95%). First, histograms of oriented gradients feature descriptors were constructed for the segmented objects. In the first step all non-target and target objects were classified into different groups: arrow-like, copepod-like, and gelatinous zooplankton. Each object was passed to a group-specific classifier to remove most non-target objects. After the object was classified, an expert or non-expert then manually removed the non-target objects that could not be removed by the procedure. The procedure was tested on 89,419 images collected in Chesapeake Bay, and results were consistent with visual counts with >80% accuracy for all three groups.

  18. A Semi-Automated Image Analysis Procedure for In Situ Plankton Imaging Systems

    PubMed Central

    Bi, Hongsheng; Guo, Zhenhua; Benfield, Mark C.; Fan, Chunlei; Ford, Michael; Shahrestani, Suzan; Sieracki, Jeffery M.

    2015-01-01

    Plankton imaging systems are capable of providing fine-scale observations that enhance our understanding of key physical and biological processes. However, processing the large volumes of data collected by imaging systems remains a major obstacle for their employment, and existing approaches are designed either for images acquired under laboratory controlled conditions or within clear waters. In the present study, we developed a semi-automated approach to analyze plankton taxa from images acquired by the ZOOplankton VISualization (ZOOVIS) system within turbid estuarine waters, in Chesapeake Bay. When compared to images under laboratory controlled conditions or clear waters, images from highly turbid waters are often of relatively low quality and more variable, due to the large amount of objects and nonlinear illumination within each image. We first customized a segmentation procedure to locate objects within each image and extracted them for classification. A maximally stable extremal regions algorithm was applied to segment large gelatinous zooplankton and an adaptive threshold approach was developed to segment small organisms, such as copepods. Unlike the existing approaches for images acquired from laboratory, controlled conditions or clear waters, the target objects are often the majority class, and the classification can be treated as a multi-class classification problem. We customized a two-level hierarchical classification procedure using support vector machines to classify the target objects (< 5%), and remove the non-target objects (> 95%). First, histograms of oriented gradients feature descriptors were constructed for the segmented objects. In the first step all non-target and target objects were classified into different groups: arrow-like, copepod-like, and gelatinous zooplankton. Each object was passed to a group-specific classifier to remove most non-target objects. After the object was classified, an expert or non-expert then manually removed the non-target objects that could not be removed by the procedure. The procedure was tested on 89,419 images collected in Chesapeake Bay, and results were consistent with visual counts with >80% accuracy for all three groups. PMID:26010260

  19. Psychophysiological Sensing and State Classification for Attention Management in Commercial Aviation

    NASA Technical Reports Server (NTRS)

    Harrivel, Angela R.; Liles, Charles; Stephens, Chad L.; Ellis, Kyle K.; Prinzel, Lawrence J.; Pope, Alan T.

    2016-01-01

    Attention-related human performance limiting states (AHPLS) can cause pilots to lose airplane state awareness (ASA), and their detection is important to improving commercial aviation safety. The Commercial Aviation Safety Team found that the majority of recent international commercial aviation accidents attributable to loss of control inflight involved flight crew loss of airplane state awareness, and that distraction of various forms was involved in all of them. Research on AHPLS, including channelized attention, diverted attention, startle / surprise, and confirmation bias, has been recommended in a Safety Enhancement (SE) entitled "Training for Attention Management." To accomplish the detection of such cognitive and psychophysiological states, a broad suite of sensors has been implemented to simultaneously measure their physiological markers during high fidelity flight simulation human subject studies. Pilot participants were asked to perform benchmark tasks and experimental flight scenarios designed to induce AHPLS. Pattern classification was employed to distinguish the AHPLS induced by the benchmark tasks. Unimodal classification using pre-processed electroencephalography (EEG) signals as input features to extreme gradient boosting, random forest and deep neural network multiclass classifiers was implemented. Multi-modal classification using galvanic skin response (GSR) in addition to the same EEG signals and using the same types of classifiers produced increased accuracy with respect to the unimodal case (90 percent vs. 86 percent), although only via the deep neural network classifier. These initial results are a first step toward the goal of demonstrating simultaneous real time classification of multiple states using multiple sensing modalities in high-fidelity flight simulators. This detection is intended to support and inform training methods under development to mitigate the loss of ASA and thus reduce accidents and incidents.

  20. An evaluation of consensus techniques for diagnostic interpretation

    NASA Astrophysics Data System (ADS)

    Sauter, Jake N.; LaBarre, Victoria M.; Furst, Jacob D.; Raicu, Daniela S.

    2018-02-01

    Learning diagnostic labels from image content has been the standard in computer-aided diagnosis. Most computer-aided diagnosis systems use low-level image features extracted directly from image content to train and test machine learning classifiers for diagnostic label prediction. When the ground truth for the diagnostic labels is not available, reference truth is generated from the experts diagnostic interpretations of the image/region of interest. More specifically, when the label is uncertain, e.g. when multiple experts label an image and their interpretations are different, techniques to handle the label variability are necessary. In this paper, we compare three consensus techniques that are typically used to encode the variability in the experts labeling of the medical data: mean, median and mode, and their effects on simple classifiers that can handle deterministic labels (decision trees) and probabilistic vectors of labels (belief decision trees). Given that the NIH/NCI Lung Image Database Consortium (LIDC) data provides interpretations for lung nodules by up to four radiologists, we leverage the LIDC data to evaluate and compare these consensus approaches when creating computer-aided diagnosis systems for lung nodules. First, low-level image features of nodules are extracted and paired with their radiologists semantic ratings (1= most likely benign, , 5 = most likely malignant); second, machine learning multi-class classifiers that handle deterministic labels (decision trees) and probabilistic vectors of labels (belief decision trees) are built to predict the lung nodules semantic ratings. We show that the mean-based consensus generates the most robust classi- fier overall when compared to the median- and mode-based consensus. Lastly, the results of this study show that, when building CAD systems with uncertain diagnostic interpretation, it is important to evaluate different strategies for encoding and predicting the diagnostic label.

  1. EEG Classification for Hybrid Brain-Computer Interface Using a Tensor Based Multiclass Multimodal Analysis Scheme

    PubMed Central

    Ji, Hongfei; Li, Jie; Lu, Rongrong; Gu, Rong; Cao, Lei; Gong, Xiaoliang

    2016-01-01

    Electroencephalogram- (EEG-) based brain-computer interface (BCI) systems usually utilize one type of changes in the dynamics of brain oscillations for control, such as event-related desynchronization/synchronization (ERD/ERS), steady state visual evoked potential (SSVEP), and P300 evoked potentials. There is a recent trend to detect more than one of these signals in one system to create a hybrid BCI. However, in this case, EEG data were always divided into groups and analyzed by the separate processing procedures. As a result, the interactive effects were ignored when different types of BCI tasks were executed simultaneously. In this work, we propose an improved tensor based multiclass multimodal scheme especially for hybrid BCI, in which EEG signals are denoted as multiway tensors, a nonredundant rank-one tensor decomposition model is proposed to obtain nonredundant tensor components, a weighted fisher criterion is designed to select multimodal discriminative patterns without ignoring the interactive effects, and support vector machine (SVM) is extended to multiclass classification. Experiment results suggest that the proposed scheme can not only identify the different changes in the dynamics of brain oscillations induced by different types of tasks but also capture the interactive effects of simultaneous tasks properly. Therefore, it has great potential use for hybrid BCI. PMID:26880873

  2. EEG Classification for Hybrid Brain-Computer Interface Using a Tensor Based Multiclass Multimodal Analysis Scheme.

    PubMed

    Ji, Hongfei; Li, Jie; Lu, Rongrong; Gu, Rong; Cao, Lei; Gong, Xiaoliang

    2016-01-01

    Electroencephalogram- (EEG-) based brain-computer interface (BCI) systems usually utilize one type of changes in the dynamics of brain oscillations for control, such as event-related desynchronization/synchronization (ERD/ERS), steady state visual evoked potential (SSVEP), and P300 evoked potentials. There is a recent trend to detect more than one of these signals in one system to create a hybrid BCI. However, in this case, EEG data were always divided into groups and analyzed by the separate processing procedures. As a result, the interactive effects were ignored when different types of BCI tasks were executed simultaneously. In this work, we propose an improved tensor based multiclass multimodal scheme especially for hybrid BCI, in which EEG signals are denoted as multiway tensors, a nonredundant rank-one tensor decomposition model is proposed to obtain nonredundant tensor components, a weighted fisher criterion is designed to select multimodal discriminative patterns without ignoring the interactive effects, and support vector machine (SVM) is extended to multiclass classification. Experiment results suggest that the proposed scheme can not only identify the different changes in the dynamics of brain oscillations induced by different types of tasks but also capture the interactive effects of simultaneous tasks properly. Therefore, it has great potential use for hybrid BCI.

  3. A cross-sectional study examining the prevalence and risk factors for anti-microbial-resistant generic Escherichia coli in domestic dogs that frequent dog parks in three cities in south-western Ontario, Canada.

    PubMed

    Procter, T D; Pearl, D L; Finley, R L; Leonard, E K; Janecko, N; Reid-Smith, R J; Weese, J S; Peregrine, A S; Sargeant, J M

    2014-06-01

    Anti-microbial resistance can threaten health by limiting treatment options and increasing the risk of hospitalization and severity of infection. Companion animals can shed anti-microbial-resistant bacteria that may result in the exposure of other dogs and humans to anti-microbial-resistant genes. The prevalence of anti-microbial-resistant generic Escherichia coli in the faeces of dogs that visited dog parks in south-western Ontario was examined and risk factors for shedding anti-microbial-resistant generic E. coli identified. From May to August 2009, canine faecal samples were collected at ten dog parks in three cities in south-western Ontario, Canada. Owners completed a questionnaire related to pet characteristics and management factors including recent treatment with antibiotics. Faecal samples were collected from 251 dogs, and 189 surveys were completed. Generic E. coli was isolated from 237 of the faecal samples, and up to three isolates per sample were tested for anti-microbial susceptibility. Eighty-nine percent of isolates were pan-susceptible; 82.3% of dogs shed isolates that were pan-susceptible. Multiclass resistance was detected in 7.2% of the isolates from 10.1% of the dogs. Based on multilevel multivariable logistic regression, a risk factor for the shedding of generic E. coli resistant to ampicillin was attending dog day care. Risk factors for the shedding of E. coli resistant to at least one anti-microbial included attending dog day care and being a large mixed breed dog, whereas consumption of commercial dry and home cooked diets was protective factor. In a multilevel multivariable model for the shedding of multiclass-resistant E. coli, exposure to compost and being a large mixed breed dog were risk factors, while consumption of a commercial dry diet was a sparing factor. Pet dogs are a potential reservoir of anti-microbial-resistant generic E. coli; some dog characteristics and management factors are associated with the prevalence of anti-microbial-resistant generic E. coli in dogs. © 2013 Blackwell Verlag GmbH.

  4. Facial Expression Recognition using Multiclass Ensemble Least-Square Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Lawi, Armin; Sya'Rani Machrizzandi, M.

    2018-03-01

    Facial expression is one of behavior characteristics of human-being. The use of biometrics technology system with facial expression characteristics makes it possible to recognize a person’s mood or emotion. The basic components of facial expression analysis system are face detection, face image extraction, facial classification and facial expressions recognition. This paper uses Principal Component Analysis (PCA) algorithm to extract facial features with expression parameters, i.e., happy, sad, neutral, angry, fear, and disgusted. Then Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM) is used for the classification process of facial expression. The result of MELS-SVM model obtained from our 185 different expression images of 10 persons showed high accuracy level of 99.998% using RBF kernel.

  5. A parametric multiclass Bayes error estimator for the multispectral scanner spatial model performance evaluation

    NASA Technical Reports Server (NTRS)

    Mobasseri, B. G.; Mcgillem, C. D.; Anuta, P. E. (Principal Investigator)

    1978-01-01

    The author has identified the following significant results. The probability of correct classification of various populations in data was defined as the primary performance index. The multispectral data being of multiclass nature as well, required a Bayes error estimation procedure that was dependent on a set of class statistics alone. The classification error was expressed in terms of an N dimensional integral, where N was the dimensionality of the feature space. The multispectral scanner spatial model was represented by a linear shift, invariant multiple, port system where the N spectral bands comprised the input processes. The scanner characteristic function, the relationship governing the transformation of the input spatial, and hence, spectral correlation matrices through the systems, was developed.

  6. Machine Learning Approach to Automated Quality Identification of Human Induced Pluripotent Stem Cell Colony Images.

    PubMed

    Joutsijoki, Henry; Haponen, Markus; Rasku, Jyrki; Aalto-Setälä, Katriina; Juhola, Martti

    2016-01-01

    The focus of this research is on automated identification of the quality of human induced pluripotent stem cell (iPSC) colony images. iPS cell technology is a contemporary method by which the patient's cells are reprogrammed back to stem cells and are differentiated to any cell type wanted. iPS cell technology will be used in future to patient specific drug screening, disease modeling, and tissue repairing, for instance. However, there are technical challenges before iPS cell technology can be used in practice and one of them is quality control of growing iPSC colonies which is currently done manually but is unfeasible solution in large-scale cultures. The monitoring problem returns to image analysis and classification problem. In this paper, we tackle this problem using machine learning methods such as multiclass Support Vector Machines and several baseline methods together with Scaled Invariant Feature Transformation based features. We perform over 80 test arrangements and do a thorough parameter value search. The best accuracy (62.4%) for classification was obtained by using a k-NN classifier showing improved accuracy compared to earlier studies.

  7. Metabolic profiles are principally different between cancers of the liver, pancreas and breast.

    PubMed

    Budhu, Anuradha; Terunuma, Atsushi; Zhang, Geng; Hussain, S Perwez; Ambs, Stefan; Wang, Xin Wei

    2014-01-01

    Molecular profiling of primary tumors may facilitate the classification of patients with cancer into more homogenous biological groups to aid clinical management. Metabolomic profiling has been shown to be a powerful tool in characterizing the biological mechanisms underlying a disease but has not been evaluated for its ability to classify cancers by their tissue of origin. Thus, we assessed metabolomic profiling as a novel tool for multiclass cancer characterization. Global metabolic profiling was employed to identify metabolites in paired tumor and non-tumor liver (n=60), breast (n=130) and pancreatic (n=76) tissue specimens. Unsupervised principal component analysis showed that metabolites are principally unique to each tissue and cancer type. Such a difference can also be observed even among early stage cancers, suggesting a significant and unique alteration of global metabolic pathways associated with each cancer type. Our global high-throughput metabolomic profiling study shows that specific biochemical alterations distinguish liver, pancreatic and breast cancer and could be applied as cancer classification tools to differentiate tumors based on tissue of origin.

  8. Multi-class ERP-based BCI data analysis using a discriminant space self-organizing map.

    PubMed

    Onishi, Akinari; Natsume, Kiyohisa

    2014-01-01

    Emotional or non-emotional image stimulus is recently applied to event-related potential (ERP) based brain computer interfaces (BCI). Though the classification performance is over 80% in a single trial, a discrimination between those ERPs has not been considered. In this research we tried to clarify the discriminability of four-class ERP-based BCI target data elicited by desk, seal, spider images and letter intensifications. A conventional self organizing map (SOM) and newly proposed discriminant space SOM (ds-SOM) were applied, then the discriminabilites were visualized. We also classify all pairs of those ERPs by stepwise linear discriminant analysis (SWLDA) and verify the visualization of discriminabilities. As a result, the ds-SOM showed understandable visualization of the data with a shorter computational time than the traditional SOM. We also confirmed the clear boundary between the letter cluster and the other clusters. The result was coherent with the classification performances by SWLDA. The method might be helpful not only for developing a new BCI paradigm, but also for the big data analysis.

  9. 3D facial expression recognition using maximum relevance minimum redundancy geometrical features

    NASA Astrophysics Data System (ADS)

    Rabiu, Habibu; Saripan, M. Iqbal; Mashohor, Syamsiah; Marhaban, Mohd Hamiruce

    2012-12-01

    In recent years, facial expression recognition (FER) has become an attractive research area, which besides the fundamental challenges, it poses, finds application in areas, such as human-computer interaction, clinical psychology, lie detection, pain assessment, and neurology. Generally the approaches to FER consist of three main steps: face detection, feature extraction and expression recognition. The recognition accuracy of FER hinges immensely on the relevance of the selected features in representing the target expressions. In this article, we present a person and gender independent 3D facial expression recognition method, using maximum relevance minimum redundancy geometrical features. The aim is to detect a compact set of features that sufficiently represents the most discriminative features between the target classes. Multi-class one-against-one SVM classifier was employed to recognize the seven facial expressions; neutral, happy, sad, angry, fear, disgust, and surprise. The average recognition accuracy of 92.2% was recorded. Furthermore, inter database homogeneity was investigated between two independent databases the BU-3DFE and UPM-3DFE the results showed a strong homogeneity between the two databases.

  10. Applied learning-based color tone mapping for face recognition in video surveillance system

    NASA Astrophysics Data System (ADS)

    Yew, Chuu Tian; Suandi, Shahrel Azmin

    2012-04-01

    In this paper, we present an applied learning-based color tone mapping technique for video surveillance system. This technique can be applied onto both color and grayscale surveillance images. The basic idea is to learn the color or intensity statistics from a training dataset of photorealistic images of the candidates appeared in the surveillance images, and remap the color or intensity of the input image so that the color or intensity statistics match those in the training dataset. It is well known that the difference in commercial surveillance cameras models, and signal processing chipsets used by different manufacturers will cause the color and intensity of the images to differ from one another, thus creating additional challenges for face recognition in video surveillance system. Using Multi-Class Support Vector Machines as the classifier on a publicly available video surveillance camera database, namely SCface database, this approach is validated and compared to the results of using holistic approach on grayscale images. The results show that this technique is suitable to improve the color or intensity quality of video surveillance system for face recognition.

  11. Behavior of Multiclass Pesticide Residue Concentrations during the Transformation from Rose Petals to Rose Absolute.

    PubMed

    Tascone, Oriane; Fillâtre, Yoann; Roy, Céline; Meierhenrich, Uwe J

    2015-05-27

    This study investigates the concentrations of 54 multiclass pesticides during the transformation processes from rose petal to concrete and absolute using roses spiked with pesticides as a model. The concentrations of the pesticides were followed during the process of transforming the spiked rose flowers from an organic field into concrete and then into absolute. The rose flowers, the concrete, and the absolute, as well as their transformation intermediates, were analyzed for pesticide content using gas chromatography/tandem mass spectrometry. We observed that all the pesticides were extracted and concentrated in the absolute, with the exception of three molecules: fenthion, fenamiphos, and phorate. Typical pesticides were found to be concentrated by a factor of 100-300 from the rose flowers to the rose absolute. The observed effect of pesticide enrichment was also studied in roses and their extracts from four classically phytosanitary treated fields. Seventeen pesticides were detected in at least one of the extracts. Like the case for the spiked samples in our model, the pesticides present in the rose flowers from Turkey were concentrated in the absolute. Two pesticides, methidathion and chlorpyrifos, were quantified in the rose flowers at approximately 0.01 and 0.01-0.05 mg kg(-1), respectively, depending on the treated field. The concentrations determined for the corresponding rose absolutes were 4.7 mg kg(-1) for methidathion and 0.65-27.25 mg kg(-1) for chlorpyrifos.

  12. A Simple and Fast Extraction Method for the Determination of Multiclass Antibiotics in Eggs Using LC-MS/MS.

    PubMed

    Wang, Kun; Lin, Kunde; Huang, Xinwen; Chen, Meng

    2017-06-21

    The purpose of this study was to develop and validate a simple, fast, and specific extraction method for the analysis of 64 antibiotics from nine classes (including sulfonamides, quinolones, tetracyclines, macrolides, lincosamide, nitrofurans, β-lactams, nitromidazoles, and cloramphenicols) in chicken eggs. Briefly, egg samples were simply extracted with a mixture of acetonitrile-water (90:10, v/v) and 0.1 mol·L -1 Na 2 EDTA solution assisted with ultrasonic. The extract was centrifuged, condensed, and directly analyzed on a liquid chromatography coupled to tandem mass spectrometry. Compared with conventional cleanup methods (passing through solid phase extract cartridges), the established method demonstrated comparable efficiencies in eliminating matrix effects and higher or equivalent recoveries for most of the target compounds. Typical validation parameters including specificity, linearity, matrix effect, limits of detection (LODs) and quantification (LOQs), the decision limit, detection capability, trueness, and precision were evaluated. The recoveries of target compounds ranged from 70.8% to 116.1% at three spiking levels (5, 20, and 50 μg·kg -1 ), with relative standard deviations less than 14%. LODs and LOQs were in the ranges of 0.005-2.00 μg·kg -1 and 0.015-6.00 μg·kg -1 for all of the antibiotics, respectively. A total of five antibiotics were successfully detected in 22 commercial eggs from local markets. This work suggests that the method is suitable for the analysis of multiclass antibiotics in eggs.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Chang; Deng, Na; Wang, Haimin

    Adverse space-weather effects can often be traced to solar flares, the prediction of which has drawn significant research interests. The Helioseismic and Magnetic Imager (HMI) produces full-disk vector magnetograms with continuous high cadence, while flare prediction efforts utilizing this unprecedented data source are still limited. Here we report results of flare prediction using physical parameters provided by the Space-weather HMI Active Region Patches (SHARP) and related data products. We survey X-ray flares that occurred from 2010 May to 2016 December and categorize their source regions into four classes (B, C, M, and X) according to the maximum GOES magnitude ofmore » flares they generated. We then retrieve SHARP-related parameters for each selected region at the beginning of its flare date to build a database. Finally, we train a machine-learning algorithm, called random forest (RF), to predict the occurrence of a certain class of flares in a given active region within 24 hr, evaluate the classifier performance using the 10-fold cross-validation scheme, and characterize the results using standard performance metrics. Compared to previous works, our experiments indicate that using the HMI parameters and RF is a valid method for flare forecasting with fairly reasonable prediction performance. To our knowledge, this is the first time that RF has been used to make multiclass predictions of solar flares. We also find that the total unsigned quantities of vertical current, current helicity, and flux near the polarity inversion line are among the most important parameters for classifying flaring regions into different classes.« less

  14. A hierarchical anatomical classification schema for prediction of phenotypic side effects

    PubMed Central

    Kanji, Rakesh

    2018-01-01

    Prediction of adverse drug reactions is an important problem in drug discovery endeavors which can be addressed with data-driven strategies. SIDER is one of the most reliable and frequently used datasets for identification of key features as well as building machine learning models for side effects prediction. The inherently unbalanced nature of this data presents with a difficult multi-label multi-class problem towards prediction of drug side effects. We highlight the intrinsic issue with SIDER data and methodological flaws in relying on performance measures such as AUC while attempting to predict side effects.We argue for the use of metrics that are robust to class imbalance for evaluation of classifiers. Importantly, we present a ‘hierarchical anatomical classification schema’ which aggregates side effects into organs, sub-systems, and systems. With the help of a weighted performance measure, using 5-fold cross-validation we show that this strategy facilitates biologically meaningful side effects prediction at different levels of anatomical hierarchy. By implementing various machine learning classifiers we show that Random Forest model yields best classification accuracy at each level of coarse-graining. The manually curated, hierarchical schema for side effects can also serve as the basis of future studies towards prediction of adverse reactions and identification of key features linked to specific organ systems. Our study provides a strategy for hierarchical classification of side effects rooted in the anatomy and can pave the way for calibrated expert systems for multi-level prediction of side effects. PMID:29494708

  15. Spatially aggregated multiclass pattern classification in functional MRI using optimally selected functional brain areas.

    PubMed

    Zheng, Weili; Ackley, Elena S; Martínez-Ramón, Manel; Posse, Stefan

    2013-02-01

    In previous works, boosting aggregation of classifier outputs from discrete brain areas has been demonstrated to reduce dimensionality and improve the robustness and accuracy of functional magnetic resonance imaging (fMRI) classification. However, dimensionality reduction and classification of mixed activation patterns of multiple classes remain challenging. In the present study, the goals were (a) to reduce dimensionality by combining feature reduction at the voxel level and backward elimination of optimally aggregated classifiers at the region level, (b) to compare region selection for spatially aggregated classification using boosting and partial least squares regression methods and (c) to resolve mixed activation patterns using probabilistic prediction of individual tasks. Brain activation maps from interleaved visual, motor, auditory and cognitive tasks were segmented into 144 functional regions. Feature selection reduced the number of feature voxels by more than 50%, leaving 95 regions. The two aggregation approaches further reduced the number of regions to 30, resulting in more than 75% reduction of classification time and misclassification rates of less than 3%. Boosting and partial least squares (PLS) were compared to select the most discriminative and the most task correlated regions, respectively. Successful task prediction in mixed activation patterns was feasible within the first block of task activation in real-time fMRI experiments. This methodology is suitable for sparsifying activation patterns in real-time fMRI and for neurofeedback from distributed networks of brain activation. Copyright © 2013 Elsevier Inc. All rights reserved.

  16. A hierarchical anatomical classification schema for prediction of phenotypic side effects.

    PubMed

    Wadhwa, Somin; Gupta, Aishwarya; Dokania, Shubham; Kanji, Rakesh; Bagler, Ganesh

    2018-01-01

    Prediction of adverse drug reactions is an important problem in drug discovery endeavors which can be addressed with data-driven strategies. SIDER is one of the most reliable and frequently used datasets for identification of key features as well as building machine learning models for side effects prediction. The inherently unbalanced nature of this data presents with a difficult multi-label multi-class problem towards prediction of drug side effects. We highlight the intrinsic issue with SIDER data and methodological flaws in relying on performance measures such as AUC while attempting to predict side effects.We argue for the use of metrics that are robust to class imbalance for evaluation of classifiers. Importantly, we present a 'hierarchical anatomical classification schema' which aggregates side effects into organs, sub-systems, and systems. With the help of a weighted performance measure, using 5-fold cross-validation we show that this strategy facilitates biologically meaningful side effects prediction at different levels of anatomical hierarchy. By implementing various machine learning classifiers we show that Random Forest model yields best classification accuracy at each level of coarse-graining. The manually curated, hierarchical schema for side effects can also serve as the basis of future studies towards prediction of adverse reactions and identification of key features linked to specific organ systems. Our study provides a strategy for hierarchical classification of side effects rooted in the anatomy and can pave the way for calibrated expert systems for multi-level prediction of side effects.

  17. Machine Learning Techniques for Global Sensitivity Analysis in Climate Models

    NASA Astrophysics Data System (ADS)

    Safta, C.; Sargsyan, K.; Ricciuto, D. M.

    2017-12-01

    Climate models studies are not only challenged by the compute intensive nature of these models but also by the high-dimensionality of the input parameter space. In our previous work with the land model components (Sargsyan et al., 2014) we identified subsets of 10 to 20 parameters relevant for each QoI via Bayesian compressive sensing and variance-based decomposition. Nevertheless the algorithms were challenged by the nonlinear input-output dependencies for some of the relevant QoIs. In this work we will explore a combination of techniques to extract relevant parameters for each QoI and subsequently construct surrogate models with quantified uncertainty necessary to future developments, e.g. model calibration and prediction studies. In the first step, we will compare the skill of machine-learning models (e.g. neural networks, support vector machine) to identify the optimal number of classes in selected QoIs and construct robust multi-class classifiers that will partition the parameter space in regions with smooth input-output dependencies. These classifiers will be coupled with techniques aimed at building sparse and/or low-rank surrogate models tailored to each class. Specifically we will explore and compare sparse learning techniques with low-rank tensor decompositions. These models will be used to identify parameters that are important for each QoI. Surrogate accuracy requirements are higher for subsequent model calibration studies and we will ascertain the performance of this workflow for multi-site ALM simulation ensembles.

  18. Multi-class segmentation of neuronal electron microscopy images using deep learning

    NASA Astrophysics Data System (ADS)

    Khobragade, Nivedita; Agarwal, Chirag

    2018-03-01

    Study of connectivity of neural circuits is an essential step towards a better understanding of functioning of the nervous system. With the recent improvement in imaging techniques, high-resolution and high-volume images are being generated requiring automated segmentation techniques. We present a pixel-wise classification method based on Bayesian SegNet architecture. We carried out multi-class segmentation on serial section Transmission Electron Microscopy (ssTEM) images of Drosophila third instar larva ventral nerve cord, labeling the four classes of neuron membranes, neuron intracellular space, mitochondria and glia / extracellular space. Bayesian SegNet was trained using 256 ssTEM images of 256 x 256 pixels and tested on 64 different ssTEM images of the same size, from the same serial stack. Due to high class imbalance, we used a class-balanced version of Bayesian SegNet by re-weighting each class based on their relative frequency. We achieved an overall accuracy of 93% and a mean class accuracy of 88% for pixel-wise segmentation using this encoder-decoder approach. On evaluating the segmentation results using similarity metrics like SSIM and Dice Coefficient, we obtained scores of 0.994 and 0.886 respectively. Additionally, we used the network trained using the 256 ssTEM images of Drosophila third instar larva for multi-class labeling of ISBI 2012 challenge ssTEM dataset.

  19. A novel channel selection method for optimal classification in different motor imagery BCI paradigms.

    PubMed

    Shan, Haijun; Xu, Haojie; Zhu, Shanan; He, Bin

    2015-10-21

    For sensorimotor rhythms based brain-computer interface (BCI) systems, classification of different motor imageries (MIs) remains a crucial problem. An important aspect is how many scalp electrodes (channels) should be used in order to reach optimal performance classifying motor imaginations. While the previous researches on channel selection mainly focus on MI tasks paradigms without feedback, the present work aims to investigate the optimal channel selection in MI tasks paradigms with real-time feedback (two-class control and four-class control paradigms). In the present study, three datasets respectively recorded from MI tasks experiment, two-class control and four-class control experiments were analyzed offline. Multiple frequency-spatial synthesized features were comprehensively extracted from every channel, and a new enhanced method IterRelCen was proposed to perform channel selection. IterRelCen was constructed based on Relief algorithm, but was enhanced from two aspects: change of target sample selection strategy and adoption of the idea of iterative computation, and thus performed more robust in feature selection. Finally, a multiclass support vector machine was applied as the classifier. The least number of channels that yield the best classification accuracy were considered as the optimal channels. One-way ANOVA was employed to test the significance of performance improvement among using optimal channels, all the channels and three typical MI channels (C3, C4, Cz). The results show that the proposed method outperformed other channel selection methods by achieving average classification accuracies of 85.2, 94.1, and 83.2 % for the three datasets, respectively. Moreover, the channel selection results reveal that the average numbers of optimal channels were significantly different among the three MI paradigms. It is demonstrated that IterRelCen has a strong ability for feature selection. In addition, the results have shown that the numbers of optimal channels in the three different motor imagery BCI paradigms are distinct. From a MI task paradigm, to a two-class control paradigm, and to a four-class control paradigm, the number of required channels for optimizing the classification accuracy increased. These findings may provide useful information to optimize EEG based BCI systems, and further improve the performance of noninvasive BCI.

  20. Retrieving clinically relevant diabetic retinopathy images using a multi-class multiple-instance framework

    NASA Astrophysics Data System (ADS)

    Chandakkar, Parag S.; Venkatesan, Ragav; Li, Baoxin

    2013-02-01

    Diabetic retinopathy (DR) is a vision-threatening complication from diabetes mellitus, a medical condition that is rising globally. Unfortunately, many patients are unaware of this complication because of absence of symptoms. Regular screening of DR is necessary to detect the condition for timely treatment. Content-based image retrieval, using archived and diagnosed fundus (retinal) camera DR images can improve screening efficiency of DR. This content-based image retrieval study focuses on two DR clinical findings, microaneurysm and neovascularization, which are clinical signs of non-proliferative and proliferative diabetic retinopathy. The authors propose a multi-class multiple-instance image retrieval framework which deploys a modified color correlogram and statistics of steerable Gaussian Filter responses, for retrieving clinically relevant images from a database of DR fundus image database.

  1. A multiclass vehicular dynamic traffic flow model for main roads and dedicated lanes/roads of multimodal transport network

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sossoe, K.S., E-mail: kwami.sossoe@irt-systemx.fr; Lebacque, J-P., E-mail: jean-patrick.lebacque@ifsttar.fr

    2015-03-10

    We present in this paper a model of vehicular traffic flow for a multimodal transportation road network. We introduce the notion of class of vehicles to refer to vehicles of different transport modes. Our model describes the traffic on highways (which may contain several lanes) and network transit for pubic transportation. The model is drafted with Eulerian and Lagrangian coordinates and uses a Logit model to describe the traffic assignment of our multiclass vehicular flow description on shared roads. The paper also discusses traffic streams on dedicated lanes for specific class of vehicles with event-based traffic laws. An Euler-Lagrangian-remap schememore » is introduced to numerically approximate the model’s flow equations.« less

  2. Distinguishing prostate cancer from benign confounders via a cascaded classifier on multi-parametric MRI

    NASA Astrophysics Data System (ADS)

    Litjens, G. J. S.; Elliott, R.; Shih, N.; Feldman, M.; Barentsz, J. O.; Hulsbergen-van de Kaa, C. A.; Kovacs, I.; Huisman, H. J.; Madabhushi, A.

    2014-03-01

    Learning how to separate benign confounders from prostate cancer is important because the imaging characteristics of these confounders are poorly understood. Furthermore, the typical representations of the MRI parameters might not be enough to allow discrimination. The diagnostic uncertainty this causes leads to a lower diagnostic accuracy. In this paper a new cascaded classifier is introduced to separate prostate cancer and benign confounders on MRI in conjunction with specific computer-extracted features to distinguish each of the benign classes (benign prostatic hyperplasia (BPH), inflammation, atrophy or prostatic intra-epithelial neoplasia (PIN). In this study we tried to (1) calculate different mathematical representations of the MRI parameters which more clearly express subtle differences between different classes, (2) learn which of the MRI image features will allow to distinguish specific benign confounders from prostate cancer, and (2) find the combination of computer-extracted MRI features to best discriminate cancer from the confounding classes using a cascaded classifier. One of the most important requirements for identifying MRI signatures for adenocarcinoma, BPH, atrophy, inflammation, and PIN is accurate mapping of the location and spatial extent of the confounder and cancer categories from ex vivo histopathology to MRI. Towards this end we employed an annotated prostatectomy data set of 31 patients, all of whom underwent a multi-parametric 3 Tesla MRI prior to radical prostatectomy. The prostatectomy slides were carefully co-registered to the corresponding MRI slices using an elastic registration technique. We extracted texture features from the T2-weighted imaging, pharmacokinetic features from the dynamic contrast enhanced imaging and diffusion features from the diffusion-weighted imaging for each of the confounder classes and prostate cancer. These features were selected because they form the mainstay of clinical diagnosis. Relevant features for each of the classes were selected using maximum relevance minimum redundancy feature selection, allowing us to perform classifier independent feature selection. The selected features were then incorporated in a cascading classifier, which can focus on easier sub-tasks at each stage, leaving the more difficult classification tasks for later stages. Results show that distinct features are relevant for each of the benign classes, for example the fraction of extra-vascular, extra-cellular space in a voxel is a clear discriminator for inflammation. Furthermore, the cascaded classifier outperforms both multi-class and one-shot classifiers in overall accuracy for discriminating confounders from cancer: 0.76 versus 0.71 and 0.62.

  3. Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.

    PubMed

    Stojanova, Daniela; Ceci, Michelangelo; Malerba, Donato; Dzeroski, Saso

    2013-09-26

    Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions.

  4. Multiclass fMRI data decoding and visualization using supervised self-organizing maps.

    PubMed

    Hausfeld, Lars; Valente, Giancarlo; Formisano, Elia

    2014-08-01

    When multivariate pattern decoding is applied to fMRI studies entailing more than two experimental conditions, a most common approach is to transform the multiclass classification problem into a series of binary problems. Furthermore, for decoding analyses, classification accuracy is often the only outcome reported although the topology of activation patterns in the high-dimensional features space may provide additional insights into underlying brain representations. Here we propose to decode and visualize voxel patterns of fMRI datasets consisting of multiple conditions with a supervised variant of self-organizing maps (SSOMs). Using simulations and real fMRI data, we evaluated the performance of our SSOM-based approach. Specifically, the analysis of simulated fMRI data with varying signal-to-noise and contrast-to-noise ratio suggested that SSOMs perform better than a k-nearest-neighbor classifier for medium and large numbers of features (i.e. 250 to 1000 or more voxels) and similar to support vector machines (SVMs) for small and medium numbers of features (i.e. 100 to 600voxels). However, for a larger number of features (>800voxels), SSOMs performed worse than SVMs. When applied to a challenging 3-class fMRI classification problem with datasets collected to examine the neural representation of three human voices at individual speaker level, the SSOM-based algorithm was able to decode speaker identity from auditory cortical activation patterns. Classification performances were similar between SSOMs and other decoding algorithms; however, the ability to visualize decoding models and underlying data topology of SSOMs promotes a more comprehensive understanding of classification outcomes. We further illustrated this visualization ability of SSOMs with a re-analysis of a dataset examining the representation of visual categories in the ventral visual cortex (Haxby et al., 2001). This analysis showed that SSOMs could retrieve and visualize topography and neighborhood relations of the brain representation of eight visual categories. We conclude that SSOMs are particularly suited for decoding datasets consisting of more than two classes and are optimally combined with approaches that reduce the number of voxels used for classification (e.g. region-of-interest or searchlight approaches). Copyright © 2014. Published by Elsevier Inc.

  5. Development and validation of a multiclass method for the quantification of veterinary drug residues in honey and royal jelly by liquid chromatography-tandem mass spectrometry.

    PubMed

    Jin, Yue; Zhang, Jinzhen; Zhao, Wen; Zhang, Wenwen; Wang, Lin; Zhou, Jinhui; Li, Yi

    2017-04-15

    The aim of this study was to develop an analytical method for the analysis of a wide range of veterinary drugs in honey and royal jelly. A modified sample preparation procedure based on the quick, easy, cheap, effective, rugged and safe (QuEChERS) method was developed, followed by liquid chromatography tandem mass spectrometry determination. Use of the single sample preparation method for analysis of 42 veterinary drugs becomes more valuable because honey and royal jelly belong to completely different complex matrices. Another main advantage of the proposed method is its ability to identify and quantify 42 veterinary drugs with higher sensitivity than reference methods of China. This work has shown that the reported method was demonstrated to be convenient and reliable for the quick monitoring of veterinary drugs in honey and royal jelly samples. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.

    PubMed

    Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel

    2017-06-01

    Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.

  7. Rapid screening and identification of multi-class substances of very high concern in textiles using liquid chromatography-hybrid linear ion trap orbitrap mass spectrometry.

    PubMed

    Zhang, Li; Luo, Xin; Niu, Zengyuan; Ye, Xiwen; Tang, Zhixu; Yao, Peng

    2015-03-20

    A new analytical method was established and validated for the analysis of 19 substances of very high concern (SVHCs) in textiles, including phthalic acid esters (PAEs), organotins (OTs), perfluorochemicals (PFCs) and flame retardants (FRs). After ultrasonic extraction in methanol, the textile samples were analyzed by high performance liquid chromatography-hybrid linear ion trap Orbitrap high resolution mass spectrometry (HPLC-LTQ/Orbitrap). The values of LOQ were in the range of 2-200mg/kg. Recoveries at two levels (at the LOQ and at half the limit of regulation) ranged from 68% to 120%, and the repeatability was lower than 13%. This method was successfully applied to the screening of SVHCs in commercial textile samples and is useful for the fast screening of various SVHCs. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Comparative evaluation of liquid-liquid extraction, solid-phase extraction and solid-phase microextraction for the gas chromatography-mass spectrometry determination of multiclass priority organic contaminants in wastewater.

    PubMed

    Robles-Molina, José; Gilbert-López, Bienvenida; García-Reyes, Juan F; Molina-Díaz, Antonio

    2013-12-15

    The European Water Framework Directive (WFD) 2000/60/EC establishes guidelines to control the pollution of surface water by sorting out a list of priority substances that involves a significant risk to or via the aquatic systems. In this article, the analytical performance of three different sample preparation methodologies for the GC-MS/MS determination of multiclass organic contaminants-including priority comprounds from the WFD-in wastewater samples using gas chromatography-mass spectrometry was evaluated. The methodologies tested were: (a) liquid-liquid extraction (LLE) with n-hexane; (b) solid-phase extraction (SPE) with C18 cartridges and elution with ethyl acetate:dichloromethane (1:1 (v/v)), and (c) headspace solid-phase microextraction (HS-SPME) using two different fibers: polyacrylate and polydimethylsiloxane/carboxen/divinilbenzene. Identification and confirmation of the selected 57 compounds included in the study (comprising polycyclic aromatic hydrocarbons (PAHs), pesticides and other contaminants) were accomplished using gas chromatography tandem mass spectrometry (GC-MS/MS) with a triple quadrupole instrument operated in the multiple reaction monitoring (MRM) mode. Three MS/MS transitions were selected for unambiguous confirmation of the target chemicals. The different advantages and pitfalls of each method were discussed. In the case of both LLE and SPE procedures, the method was validated at two different concentration levels (15 and 150 ng L(-1)) obtaining recovery rates in the range 70-120% for most of the target compounds. In terms of analyte coverage, results with HS-SPME were not satisfactory, since 14 of the compounds tested were not properly recovered and the overall performance was worse than the other two methods tested. LLE, SPE and HS-SPME (using polyacrylate fiber) procedures also showed good linearity and precision. Using any of the three methodologies tested, limits of quantitation obtained for most of the detected compounds were in the low nanogram per liter range. © 2013 Elsevier B.V. All rights reserved.

  9. Contamination of Canadian private drinking water sources with antimicrobial resistant Escherichia coli.

    PubMed

    Coleman, Brenda L; Louie, Marie; Salvadori, Marina I; McEwen, Scott A; Neumann, Norman; Sibley, Kristen; Irwin, Rebecca J; Jamieson, Frances B; Daignault, Danielle; Majury, Anna; Braithwaite, Shannon; Crago, Bryanne; McGeer, Allison J

    2013-06-01

    Surface and ground water across the world, including North America, is contaminated with bacteria resistant to antibiotics. The consumption of water contaminated with antimicrobial resistant Escherichia coli (E. coli) has been associated with the carriage of resistant E. coli in people who drink it. To describe the proportion of drinking water samples submitted from private sources for bacteriological testing that were contaminated with E. coli resistant to antibiotics and to determine risk factors for the contamination of these water sources with resistant and multi-class resistant E. coli. Water samples submitted for bacteriological testing in Ontario and Alberta Canada were tested for E. coli contamination, with a portion of the positive isolates tested for antimicrobial resistance. Households were invited to complete questionnaires to determine putative risk factors for well contamination. Using multinomial logistic regression, the risk of contamination with E. coli resistant to one or two classes of antibiotics compared to susceptible E. coli was higher for shore wells than drilled wells (odds ratio [OR] 2.8) and higher for farms housing chickens or turkeys (OR 3.0) than properties without poultry. The risk of contamination with multi-class resistant E. coli (3 or more classes) was higher if the properties housed swine (OR 5.5) or cattle (OR 2.2) than properties without these livestock and higher if the wells were located in gravel (OR 2.4) or clay (OR 2.1) than in loam. Housing livestock on the property, using a shore well, and having a well located in gravel or clay soil increases the risk of having antimicrobial resistant E. coli in E. coli contaminated wells. To reduce the incidence of water borne disease and the transmission of antimicrobial resistant bacteria, owners of private wells need to take measures to prevent contamination of their drinking water, routinely test their wells for contamination, and use treatments that eliminate bacteria. Copyright © 2013 Elsevier Ltd. All rights reserved.

  10. Ingenious Snake: An Adaptive Multi-Class Contours Extraction

    NASA Astrophysics Data System (ADS)

    Li, Baolin; Zhou, Shoujun

    2018-04-01

    Active contour model (ACM) plays an important role in computer vision and medical image application. The traditional ACMs were used to extract single-class of object contours. While, simultaneous extraction of multi-class of interesting contours (i.e., various contours with closed- or open-ended) have not been solved so far. Therefore, a novel ACM model named “Ingenious Snake” is proposed to adaptively extract these interesting contours. In the first place, the ridge-points are extracted based on the local phase measurement of gradient vector flow field; the consequential ridgelines initialization are automated with high speed. Secondly, the contours’ deformation and evolvement are implemented with the ingenious snake. In the experiments, the result from initialization, deformation and evolvement are compared with the existing methods. The quantitative evaluation of the structure extraction is satisfying with respect of effectiveness and accuracy.

  11. Performance improvement of multi-class detection using greedy algorithm for Viola-Jones cascade selection

    NASA Astrophysics Data System (ADS)

    Tereshin, Alexander A.; Usilin, Sergey A.; Arlazarov, Vladimir V.

    2018-04-01

    This paper aims to study the problem of multi-class object detection in video stream with Viola-Jones cascades. An adaptive algorithm for selecting Viola-Jones cascade based on greedy choice strategy in solution of the N-armed bandit problem is proposed. The efficiency of the algorithm on the problem of detection and recognition of the bank card logos in the video stream is shown. The proposed algorithm can be effectively used in documents localization and identification, recognition of road scene elements, localization and tracking of the lengthy objects , and for solving other problems of rigid object detection in a heterogeneous data flows. The computational efficiency of the algorithm makes it possible to use it both on personal computers and on mobile devices based on processors with low power consumption.

  12. Evaluation of Multiclass Model Observers in PET LROC Studies

    NASA Astrophysics Data System (ADS)

    Gifford, H. C.; Kinahan, P. E.; Lartizien, C.; King, M. A.

    2007-02-01

    A localization ROC (LROC) study was conducted to evaluate nonprewhitening matched-filter (NPW) and channelized NPW (CNPW) versions of a multiclass model observer as predictors of human tumor-detection performance with PET images. Target localization is explicitly performed by these model observers. Tumors were placed in the liver, lungs, and background soft tissue of a mathematical phantom, and the data simulation modeled a full-3D acquisition mode. Reconstructions were performed with the FORE+AWOSEM algorithm. The LROC study measured observer performance with 2D images consisting of either coronal, sagittal, or transverse views of the same set of cases. Versions of the CNPW observer based on two previously published difference-of-Gaussian channel models demonstrated good quantitative agreement with human observers. One interpretation of these results treats the CNPW observer as a channelized Hotelling observer with implicit internal noise

  13. Quantitative diffusion weighted imaging parameters in tumor and peritumoral stroma for prediction of molecular subtypes in breast cancer

    NASA Astrophysics Data System (ADS)

    He, Ting; Fan, Ming; Zhang, Peng; Li, Hui; Zhang, Juan; Shao, Guoliang; Li, Lihua

    2018-03-01

    Breast cancer can be classified into four molecular subtypes of Luminal A, Luminal B, HER2 and Basal-like, which have significant differences in treatment and survival outcomes. We in this study aim to predict immunohistochemistry (IHC) determined molecular subtypes of breast cancer using image features derived from tumor and peritumoral stroma region based on diffusion weighted imaging (DWI). A dataset of 126 breast cancer patients were collected who underwent preoperative breast MRI with a 3T scanner. The apparent diffusion coefficients (ADCs) were recorded from DWI, and breast image was segmented into regions comprising the tumor and the surrounding stromal. Statistical characteristics in various breast tumor and peritumoral regions were computed, including mean, minimum, maximum, variance, interquartile range, range, skewness, and kurtosis of ADC values. Additionally, the difference of features between each two regions were also calculated. The univariate logistic based classifier was performed for evaluating the performance of the individual features for discriminating subtypes. For multi-class classification, multivariate logistic regression model was trained and validated. The results showed that the tumor boundary and proximal peritumoral stroma region derived features have a higher performance in classification compared to that of the other regions. Furthermore, the prediction model using statistical features, difference features and all the features combined from these regions generated AUC values of 0.774, 0.796 and 0.811, respectively. The results in this study indicate that ADC feature in tumor and peritumoral stromal region would be valuable for estimating the molecular subtype in breast cancer.

  14. An Anomalous Noise Events Detector for Dynamic Road Traffic Noise Mapping in Real-Life Urban and Suburban Environments.

    PubMed

    Socoró, Joan Claudi; Alías, Francesc; Alsina-Pagès, Rosa Ma

    2017-10-12

    One of the main aspects affecting the quality of life of people living in urban and suburban areas is their continued exposure to high Road Traffic Noise (RTN) levels. Until now, noise measurements in cities have been performed by professionals, recording data in certain locations to build a noise map afterwards. However, the deployment of Wireless Acoustic Sensor Networks (WASN) has enabled automatic noise mapping in smart cities. In order to obtain a reliable picture of the RTN levels affecting citizens, Anomalous Noise Events (ANE) unrelated to road traffic should be removed from the noise map computation. To this aim, this paper introduces an Anomalous Noise Event Detector (ANED) designed to differentiate between RTN and ANE in real time within a predefined interval running on the distributed low-cost acoustic sensors of a WASN. The proposed ANED follows a two-class audio event detection and classification approach, instead of multi-class or one-class classification schemes, taking advantage of the collection of representative acoustic data in real-life environments. The experiments conducted within the DYNAMAP project, implemented on ARM-based acoustic sensors, show the feasibility of the proposal both in terms of computational cost and classification performance using standard Mel cepstral coefficients and Gaussian Mixture Models (GMM). The two-class GMM core classifier relatively improves the baseline universal GMM one-class classifier F1 measure by 18.7% and 31.8% for suburban and urban environments, respectively, within the 1-s integration interval. Nevertheless, according to the results, the classification performance of the current ANED implementation still has room for improvement.

  15. Material classification and automatic content enrichment of images using supervised learning and knowledge bases

    NASA Astrophysics Data System (ADS)

    Mallepudi, Sri Abhishikth; Calix, Ricardo A.; Knapp, Gerald M.

    2011-02-01

    In recent years there has been a rapid increase in the size of video and image databases. Effective searching and retrieving of images from these databases is a significant current research area. In particular, there is a growing interest in query capabilities based on semantic image features such as objects, locations, and materials, known as content-based image retrieval. This study investigated mechanisms for identifying materials present in an image. These capabilities provide additional information impacting conditional probabilities about images (e.g. objects made of steel are more likely to be buildings). These capabilities are useful in Building Information Modeling (BIM) and in automatic enrichment of images. I2T methodologies are a way to enrich an image by generating text descriptions based on image analysis. In this work, a learning model is trained to detect certain materials in images. To train the model, an image dataset was constructed containing single material images of bricks, cloth, grass, sand, stones, and wood. For generalization purposes, an additional set of 50 images containing multiple materials (some not used in training) was constructed. Two different supervised learning classification models were investigated: a single multi-class SVM classifier, and multiple binary SVM classifiers (one per material). Image features included Gabor filter parameters for texture, and color histogram data for RGB components. All classification accuracy scores using the SVM-based method were above 85%. The second model helped in gathering more information from the images since it assigned multiple classes to the images. A framework for the I2T methodology is presented.

  16. Empirical Wavelet Transform Based Features for Classification of Parkinson's Disease Severity.

    PubMed

    Oung, Qi Wei; Muthusamy, Hariharan; Basah, Shafriza Nisha; Lee, Hoileong; Vijean, Vikneswaran

    2017-12-29

    Parkinson's disease (PD) is a type of progressive neurodegenerative disorder that has affected a large part of the population till now. Several symptoms of PD include tremor, rigidity, slowness of movements and vocal impairments. In order to develop an effective diagnostic system, a number of algorithms were proposed mainly to distinguish healthy individuals from the ones with PD. However, most of the previous works were conducted based on a binary classification, with the early PD stage and the advanced ones being treated equally. Therefore, in this work, we propose a multiclass classification with three classes of PD severity level (mild, moderate, severe) and healthy control. The focus is to detect and classify PD using signals from wearable motion and audio sensors based on both empirical wavelet transform (EWT) and empirical wavelet packet transform (EWPT) respectively. The EWT/EWPT was applied to decompose both speech and motion data signals up to five levels. Next, several features are extracted after obtaining the instantaneous amplitudes and frequencies from the coefficients of the decomposed signals by applying the Hilbert transform. The performance of the algorithm was analysed using three classifiers - K-nearest neighbour (KNN), probabilistic neural network (PNN) and extreme learning machine (ELM). Experimental results demonstrated that our proposed approach had the ability to differentiate PD from non-PD subjects, including their severity level - with classification accuracies of more than 90% using EWT/EWPT-ELM based on signals from motion and audio sensors respectively. Additionally, classification accuracy of more than 95% was achieved when EWT/EWPT-ELM is applied to signals from integration of both signal's information.

  17. Differential diagnosis of neurodegenerative diseases using structural MRI data

    PubMed Central

    Koikkalainen, Juha; Rhodius-Meester, Hanneke; Tolonen, Antti; Barkhof, Frederik; Tijms, Betty; Lemstra, Afina W.; Tong, Tong; Guerrero, Ricardo; Schuh, Andreas; Ledig, Christian; Rueckert, Daniel; Soininen, Hilkka; Remes, Anne M.; Waldemar, Gunhild; Hasselbalch, Steen; Mecocci, Patrizia; van der Flier, Wiesje; Lötjönen, Jyrki

    2016-01-01

    Different neurodegenerative diseases can cause memory disorders and other cognitive impairments. The early detection and the stratification of patients according to the underlying disease are essential for an efficient approach to this healthcare challenge. This emphasizes the importance of differential diagnostics. Most studies compare patients and controls, or Alzheimer's disease with one other type of dementia. Such a bilateral comparison does not resemble clinical practice, where a clinician is faced with a number of different possible types of dementia. Here we studied which features in structural magnetic resonance imaging (MRI) scans could best distinguish four types of dementia, Alzheimer's disease, frontotemporal dementia, vascular dementia, and dementia with Lewy bodies, and control subjects. We extracted an extensive set of features quantifying volumetric and morphometric characteristics from T1 images, and vascular characteristics from FLAIR images. Classification was performed using a multi-class classifier based on Disease State Index methodology. The classifier provided continuous probability indices for each disease to support clinical decision making. A dataset of 504 individuals was used for evaluation. The cross-validated classification accuracy was 70.6% and balanced accuracy was 69.1% for the five disease groups using only automatically determined MRI features. Vascular dementia patients could be detected with high sensitivity (96%) using features from FLAIR images. Controls (sensitivity 82%) and Alzheimer's disease patients (sensitivity 74%) could be accurately classified using T1-based features, whereas the most difficult group was the dementia with Lewy bodies (sensitivity 32%). These results were notable better than the classification accuracies obtained with visual MRI ratings (accuracy 44.6%, balanced accuracy 51.6%). Different quantification methods provided complementary information, and consequently, the best results were obtained by utilizing several quantification methods. The results prove that automatic quantification methods and computerized decision support methods are feasible for clinical practice and provide comprehensive information that may help clinicians in the diagnosis making. PMID:27104138

  18. Challenges and opportunities for translating medical microdevices: insights from the programmable bio-nano-chip

    PubMed Central

    McRae, Michael P; Simmons, Glennon; McDevitt, John T

    2016-01-01

    This perspective highlights the major challenges for the bioanalytical community, in particular the area of lab-on-a-chip sensors, as they relate to point-of-care diagnostics. There is a strong need for general-purpose and universal biosensing platforms that can perform multiplexed and multiclass assays on real-world clinical samples. However, the adoption of novel lab-on-a-chip/microfluidic devices has been slow as several key challenges remain for the translation of these new devices to clinical practice. A pipeline of promising medical microdevice technologies will be made possible by addressing the challenges of integration, failure to compete with cost and performance of existing technologies, requisite for new content, and regulatory approval and clinical adoption. PMID:27071710

  19. Antiretroviral Drugs Used in the Treatment of HIV Infection

    MedlinePlus

    ... on April 12, 2018. Multi-class Combination Products Brand Generic Name Pediatric Use Approval Date Time to ... can be found at Drugs@FDA or DailyMed . Brand Name Generic Name Pediatric Use Approval Date Time ...

  20. Network system effects of mileage fee.

    DOT National Transportation Integrated Search

    2015-08-01

    This project presents a comprehensive investigation about the network effects of MF to facilitate the : developments of proper MF policies. After a practice scan and a review of the recent literature on MF, a multi-class mathematical programming with...

  1. Multiclass feature selection for improved pediatric brain tumor segmentation

    NASA Astrophysics Data System (ADS)

    Ahmed, Shaheen; Iftekharuddin, Khan M.

    2012-03-01

    In our previous work, we showed that fractal-based texture features are effective in detection, segmentation and classification of posterior-fossa (PF) pediatric brain tumor in multimodality MRI. We exploited an information theoretic approach such as Kullback-Leibler Divergence (KLD) for feature selection and ranking different texture features. We further incorporated the feature selection technique with segmentation method such as Expectation Maximization (EM) for segmentation of tumor T and non tumor (NT) tissues. In this work, we extend the two class KLD technique to multiclass for effectively selecting the best features for brain tumor (T), cyst (C) and non tumor (NT). We further obtain segmentation robustness for each tissue types by computing Bay's posterior probabilities and corresponding number of pixels for each tissue segments in MRI patient images. We evaluate improved tumor segmentation robustness using different similarity metric for 5 patients in T1, T2 and FLAIR modalities.

  2. A novel algorithm of super-resolution image reconstruction based on multi-class dictionaries for natural scene

    NASA Astrophysics Data System (ADS)

    Wu, Wei; Zhao, Dewei; Zhang, Huan

    2015-12-01

    Super-resolution image reconstruction is an effective method to improve the image quality. It has important research significance in the field of image processing. However, the choice of the dictionary directly affects the efficiency of image reconstruction. A sparse representation theory is introduced into the problem of the nearest neighbor selection. Based on the sparse representation of super-resolution image reconstruction method, a super-resolution image reconstruction algorithm based on multi-class dictionary is analyzed. This method avoids the redundancy problem of only training a hyper complete dictionary, and makes the sub-dictionary more representatives, and then replaces the traditional Euclidean distance computing method to improve the quality of the whole image reconstruction. In addition, the ill-posed problem is introduced into non-local self-similarity regularization. Experimental results show that the algorithm is much better results than state-of-the-art algorithm in terms of both PSNR and visual perception.

  3. Activity recognition from minimal distinguishing subsequence mining

    NASA Astrophysics Data System (ADS)

    Iqbal, Mohammad; Pao, Hsing-Kuo

    2017-08-01

    Human activity recognition is one of the most important research topics in the era of Internet of Things. To separate different activities given sensory data, we utilize a Minimal Distinguishing Subsequence (MDS) mining approach to efficiently find distinguishing patterns among different activities. We first transform the sensory data into a series of sensor triggering events and operate the MDS mining procedure afterwards. The gap constraints are also considered in the MDS mining. Given the multi-class nature of most activity recognition tasks, we modify the MDS mining approach from a binary case to a multi-class one to fit the need for multiple activity recognition. We also study how to select the best parameter set including the minimal and the maximal support thresholds in finding the MDSs for effective activity recognition. Overall, the prediction accuracy is 86.59% on the van Kasteren dataset which consists of four different activities for recognition.

  4. Approximation-based common principal component for feature extraction in multi-class brain-computer interfaces.

    PubMed

    Hoang, Tuan; Tran, Dat; Huang, Xu

    2013-01-01

    Common Spatial Pattern (CSP) is a state-of-the-art method for feature extraction in Brain-Computer Interface (BCI) systems. However it is designed for 2-class BCI classification problems. Current extensions of this method to multiple classes based on subspace union and covariance matrix similarity do not provide a high performance. This paper presents a new approach to solving multi-class BCI classification problems by forming a subspace resembled from original subspaces and the proposed method for this approach is called Approximation-based Common Principal Component (ACPC). We perform experiments on Dataset 2a used in BCI Competition IV to evaluate the proposed method. This dataset was designed for motor imagery classification with 4 classes. Preliminary experiments show that the proposed ACPC feature extraction method when combining with Support Vector Machines outperforms CSP-based feature extraction methods on the experimental dataset.

  5. Time series modeling by a regression approach based on a latent process.

    PubMed

    Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

    2009-01-01

    Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.

  6. Decoding of visual activity patterns from fMRI responses using multivariate pattern analyses and convolutional neural network.

    PubMed

    Zafar, Raheel; Kamel, Nidal; Naufal, Mohamad; Malik, Aamir Saeed; Dass, Sarat C; Ahmad, Rana Fayyaz; Abdullah, Jafri M; Reza, Faruque

    2017-01-01

    Decoding of human brain activity has always been a primary goal in neuroscience especially with functional magnetic resonance imaging (fMRI) data. In recent years, Convolutional neural network (CNN) has become a popular method for the extraction of features due to its higher accuracy, however it needs a lot of computation and training data. In this study, an algorithm is developed using Multivariate pattern analysis (MVPA) and modified CNN to decode the behavior of brain for different images with limited data set. Selection of significant features is an important part of fMRI data analysis, since it reduces the computational burden and improves the prediction performance; significant features are selected using t-test. MVPA uses machine learning algorithms to classify different brain states and helps in prediction during the task. General linear model (GLM) is used to find the unknown parameters of every individual voxel and the classification is done using multi-class support vector machine (SVM). MVPA-CNN based proposed algorithm is compared with region of interest (ROI) based method and MVPA based estimated values. The proposed method showed better overall accuracy (68.6%) compared to ROI (61.88%) and estimation values (64.17%).

  7. Accelerating the Original Profile Kernel.

    PubMed

    Hamp, Tobias; Goldberg, Tatyana; Rost, Burkhard

    2013-01-01

    One of the most accurate multi-class protein classification systems continues to be the profile-based SVM kernel introduced by the Leslie group. Unfortunately, its CPU requirements render it too slow for practical applications of large-scale classification tasks. Here, we introduce several software improvements that enable significant acceleration. Using various non-redundant data sets, we demonstrate that our new implementation reaches a maximal speed-up as high as 14-fold for calculating the same kernel matrix. Some predictions are over 200 times faster and render the kernel as possibly the top contender in a low ratio of speed/performance. Additionally, we explain how to parallelize various computations and provide an integrative program that reduces creating a production-quality classifier to a single program call. The new implementation is available as a Debian package under a free academic license and does not depend on commercial software. For non-Debian based distributions, the source package ships with a traditional Makefile-based installer. Download and installation instructions can be found at https://rostlab.org/owiki/index.php/Fast_Profile_Kernel. Bugs and other issues may be reported at https://rostlab.org/bugzilla3/enter_bug.cgi?product=fastprofkernel.

  8. An auditory multiclass brain-computer interface with natural stimuli: Usability evaluation with healthy participants and a motor impaired end user.

    PubMed

    Simon, Nadine; Käthner, Ivo; Ruf, Carolin A; Pasqualotto, Emanuele; Kübler, Andrea; Halder, Sebastian

    2014-01-01

    Brain-computer interfaces (BCIs) can serve as muscle independent communication aids. Persons, who are unable to control their eye muscles (e.g., in the completely locked-in state) or have severe visual impairments for other reasons, need BCI systems that do not rely on the visual modality. For this reason, BCIs that employ auditory stimuli were suggested. In this study, a multiclass BCI spelling system was implemented that uses animal voices with directional cues to code rows and columns of a letter matrix. To reveal possible training effects with the system, 11 healthy participants performed spelling tasks on 2 consecutive days. In a second step, the system was tested by a participant with amyotrophic lateral sclerosis (ALS) in two sessions. In the first session, healthy participants spelled with an average accuracy of 76% (3.29 bits/min) that increased to 90% (4.23 bits/min) on the second day. Spelling accuracy by the participant with ALS was 20% in the first and 47% in the second session. The results indicate a strong training effect for both the healthy participants and the participant with ALS. While healthy participants reached high accuracies in the first session and second session, accuracies for the participant with ALS were not sufficient for satisfactory communication in both sessions. More training sessions might be needed to improve spelling accuracies. The study demonstrated the feasibility of the auditory BCI with healthy users and stresses the importance of training with auditory multiclass BCIs, especially for potential end-users of BCI with disease.

  9. High Prevalence of HIV Drug Resistance Among Newly Diagnosed Infants Aged <18 Months: Results From a Nationwide Surveillance in Nigeria.

    PubMed

    Inzaule, Seth C; Osi, Samuels J; Akinbiyi, Gbenga; Emeka, Asadu; Khamofu, Hadiza; Mpazanje, Rex; Ilesanmi, Oluwafunke; Ndembi, Nicaise; Odafe, Solomon; Sigaloff, Kim C E; Rinke de Wit, Tobias F; Akanmu, Sulaimon

    2018-01-01

    WHO recommends protease-inhibitor-based first-line regimen in infants because of risk of drug resistance from failed prophylaxis used in prevention of mother-to-child transmission (PMTCT). However, cost and logistics impede implementation in sub-Saharan Africa, and >75% of children still receive nonnucleoside reverse transcriptase inhibitor-based regimen (NNRTI) used in PMTCT. We assessed the national pretreatment drug resistance prevalence of HIV-infected children aged <18 months in Nigeria, using WHO-recommended HIV drug resistance surveillance protocol. We used remnant dried blood spots collected between June 2014 and July 2015 from 15 early infant diagnosis facilities spread across all the 6 geopolitical regions of Nigeria. Sampling was through a probability proportional-to-size approach. HIV drug resistance was determined by population-based sequencing. Overall, in 48% of infants (205 of 430) drug resistance mutations (DRM) were detected, conferring resistance to predominantly NNRTIs (45%). NRTI and multiclass NRTI/NNRTI resistance were present at 22% and 20%, respectively, while resistance to protease inhibitors was at 2%. Among 204 infants with exposure to drugs for PMTCT, 57% had DRMs, conferring NNRTI resistance in 54% and multiclass NRTI/NNRTI resistance in 29%. DRMs were also detected in 34% of 132 PMTCT unexposed infants. A high frequency of PDR, mainly NNRTI-associated, was observed in a nationwide surveillance among newly diagnosed HIV-infected children in Nigeria. PDR prevalence was equally high in PMTCT-unexposed infants. Our results support the use of protease inhibitor-based first-line regimens in HIV-infected young children regardless of PMTCT history and underscore the need to accelerate implementation of the newly disseminated guideline in Nigeria.

  10. QuEChERS GC-MS validation and monitoring of pesticide residues in different foods in the tomato classification group.

    PubMed

    Ramírez Restrepo, Andrés; Gallo Ortiz, Andrés Fernando; Hoyos Ossa, Duvan Esteban; Peñuela Mesa, Gustavo Antonio

    2014-09-01

    The objective of this study was to validate (SANCO/12495/2011 and NTC-ISO/IEC 17025) multi-residue multi-class methods using QuEChERS sample preparation and GC-MS for the analysis of regulated pesticides in tomatoes (Solanum lycopersicum), tamarillos (Solanum betaceum) and goldenberries (Physalis peruviana). These Latin American products are representative and widely produced in Antioquia (Colombia). Sample preparation followed the UNE-EN 15662 method (150 mg MgSO4, 25mg primary secondary amines and 25mg of octadecylsiloxane for cleanup; graphitized carbon black was added for tomatoes). Extracts were injected using a programmed temperature-vaporizing injector. The residues were validated over a range from 0.02 mg/kg to 0.20 mg/kg, with 24 analytes validated in tomatoes, 33 in tamarillos and 28 in goldenberries. An initial risk assessment was enabled by monitoring 24 samples in the municipalities of El Peñol, Marinilla and San Vicente Ferrer. Risks were found for tomatoes, but no significant risks were found for tamarillos or goldenberries. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Discriminative motif discovery via simulated evolution and random under-sampling.

    PubMed

    Song, Tao; Gu, Hong

    2014-01-01

    Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.

  12. Occurrence of antibiotics in mussels and clams from various FAO areas.

    PubMed

    Chiesa, Luca Maria; Nobile, Maria; Malandra, Renato; Panseri, Sara; Arioli, Francesco

    2018-02-01

    Filter feeders, like mussels and clams, are suitable bioindicators of environmental pollution. These shellfish, when destined for human consumption, undergo a depuration step that aims to nullify their pathogenic microorganism load and decrease chemical contamination. Nevertheless, the lack of contamination by drugs may not be guaranteed. Antimicrobials are a class of drugs of particular concern due to the increasing phenomenon of antibiotic resistance. Their use in breeding and aquaculture is a major cause of this. We developed a multiclass method for the HPLC-MS/MS analysis of 29 antimicrobials, validated according to the Commission Decision 2002/657/UE guidelines, and applied it to 50 mussel and 50 clam samples derived from various Food and Agricultural Organisation marine zones. The results obtained, indicate a negligible presence of antibiotics. Just one clam sample showed the presence of oxytetracycline at a concentration slightly higher than the European Union Maximum residue limit set for fish. Copyright © 2017. Published by Elsevier Ltd.

  13. Evaluation of an analytical methodology using QuEChERS and GC-SQ/MS for the investigation of the level of pesticide residues in Brazilian melons.

    PubMed

    da Silva Sousa, Jonas; de Castro, Rubens Carius; de Albuquerque Andrade, Gilliane; Lima, Cleidiane Gomes; Lima, Lucélia Kátia; Milhome, Maria Aparecida Liberato; do Nascimento, Ronaldo Ferreira

    2013-12-01

    A multiresidue method based on the sample preparation by modified QuEChERS and detection by gas chromatography coupled to single quadruple mass spectrometers (GC-SQ/MS) was used for the analysis of 35 multiclass pesticides in melons (Cucumis melo inodorus) produced in Ceara-Brazil. The rates of recovery for pesticides studied were satisfactory (except for the etridiazole), ranging from 85% to 117% with a relative standard deviation (RSD) of less than 15%, at concentrations between 0.05 and 0.20 mg kg(-1). The limit of quantification (LOQ) for most compounds was below the MRLs established in Brazil. The combined relative uncertainty (Uc) and expanded uncertainty (Ue) was determined using repeatability, recovery and calibration curves data for each pesticide. Analysis of commercial melons samples revealed the presence of pesticides bifenthrin and imazalil at levels below the MRLs established by ANVISA, EU and USEPA. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. An Anomalous Noise Events Detector for Dynamic Road Traffic Noise Mapping in Real-Life Urban and Suburban Environments

    PubMed Central

    2017-01-01

    One of the main aspects affecting the quality of life of people living in urban and suburban areas is their continued exposure to high Road Traffic Noise (RTN) levels. Until now, noise measurements in cities have been performed by professionals, recording data in certain locations to build a noise map afterwards. However, the deployment of Wireless Acoustic Sensor Networks (WASN) has enabled automatic noise mapping in smart cities. In order to obtain a reliable picture of the RTN levels affecting citizens, Anomalous Noise Events (ANE) unrelated to road traffic should be removed from the noise map computation. To this aim, this paper introduces an Anomalous Noise Event Detector (ANED) designed to differentiate between RTN and ANE in real time within a predefined interval running on the distributed low-cost acoustic sensors of a WASN. The proposed ANED follows a two-class audio event detection and classification approach, instead of multi-class or one-class classification schemes, taking advantage of the collection of representative acoustic data in real-life environments. The experiments conducted within the DYNAMAP project, implemented on ARM-based acoustic sensors, show the feasibility of the proposal both in terms of computational cost and classification performance using standard Mel cepstral coefficients and Gaussian Mixture Models (GMM). The two-class GMM core classifier relatively improves the baseline universal GMM one-class classifier F1 measure by 18.7% and 31.8% for suburban and urban environments, respectively, within the 1-s integration interval. Nevertheless, according to the results, the classification performance of the current ANED implementation still has room for improvement. PMID:29023397

  15. Supervised learning methods for pathological arterial pulse wave differentiation: A SVM and neural networks approach.

    PubMed

    Paiva, Joana S; Cardoso, João; Pereira, Tânia

    2018-01-01

    The main goal of this study was to develop an automatic method based on supervised learning methods, able to distinguish healthy from pathologic arterial pulse wave (APW), and those two from noisy waveforms (non-relevant segments of the signal), from the data acquired during a clinical examination with a novel optical system. The APW dataset analysed was composed by signals acquired in a clinical environment from a total of 213 subjects, including healthy volunteers and non-healthy patients. The signals were parameterised by means of 39pulse features: morphologic, time domain statistics, cross-correlation features, wavelet features. Multiclass Support Vector Machine Recursive Feature Elimination (SVM RFE) method was used to select the most relevant features. A comparative study was performed in order to evaluate the performance of the two classifiers: Support Vector Machine (SVM) and Artificial Neural Network (ANN). SVM achieved a statistically significant better performance for this problem with an average accuracy of 0.9917±0.0024 and a F-Measure of 0.9925±0.0019, in comparison with ANN, which reached the values of 0.9847±0.0032 and 0.9852±0.0031 for Accuracy and F-Measure, respectively. A significant difference was observed between the performances obtained with SVM classifier using a different number of features from the original set available. The comparison between SVM and NN allowed reassert the higher performance of SVM. The results obtained in this study showed the potential of the proposed method to differentiate those three important signal outcomes (healthy, pathologic and noise) and to reduce bias associated with clinical diagnosis of cardiovascular disease using APW. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. An approach for combining airborne LiDAR and high-resolution aerial color imagery using Gaussian processes

    NASA Astrophysics Data System (ADS)

    Liu, Yansong; Monteiro, Sildomar T.; Saber, Eli

    2015-10-01

    Changes in vegetation cover, building construction, road network and traffic conditions caused by urban expansion affect the human habitat as well as the natural environment in rapidly developing cities. It is crucial to assess these changes and respond accordingly by identifying man-made and natural structures with accurate classification algorithms. With the increase in use of multi-sensor remote sensing systems, researchers are able to obtain a more complete description of the scene of interest. By utilizing multi-sensor data, the accuracy of classification algorithms can be improved. In this paper, we propose a method for combining 3D LiDAR point clouds and high-resolution color images to classify urban areas using Gaussian processes (GP). GP classification is a powerful non-parametric classification method that yields probabilistic classification results. It makes predictions in a way that addresses the uncertainty of real world. In this paper, we attempt to identify man-made and natural objects in urban areas including buildings, roads, trees, grass, water and vehicles. LiDAR features are derived from the 3D point clouds and the spatial and color features are extracted from RGB images. For classification, we use the Laplacian approximation for GP binary classification on the new combined feature space. The multiclass classification has been implemented by using one-vs-all binary classification strategy. The result of applying support vector machines (SVMs) and logistic regression (LR) classifier is also provided for comparison. Our experiments show a clear improvement of classification results by using the two sensors combined instead of each sensor separately. Also we found the advantage of applying GP approach to handle the uncertainty in classification result without compromising accuracy compared to SVM, which is considered as the state-of-the-art classification method.

  17. Graph Theory-Based Brain Connectivity for Automatic Classification of Multiple Sclerosis Clinical Courses.

    PubMed

    Kocevar, Gabriel; Stamile, Claudio; Hannoun, Salem; Cotton, François; Vukusic, Sandra; Durand-Dubief, Françoise; Sappey-Marinier, Dominique

    2016-01-01

    Purpose: In this work, we introduce a method to classify Multiple Sclerosis (MS) patients into four clinical profiles using structural connectivity information. For the first time, we try to solve this question in a fully automated way using a computer-based method. The main goal is to show how the combination of graph-derived metrics with machine learning techniques constitutes a powerful tool for a better characterization and classification of MS clinical profiles. Materials and Methods: Sixty-four MS patients [12 Clinical Isolated Syndrome (CIS), 24 Relapsing Remitting (RR), 24 Secondary Progressive (SP), and 17 Primary Progressive (PP)] along with 26 healthy controls (HC) underwent MR examination. T1 and diffusion tensor imaging (DTI) were used to obtain structural connectivity matrices for each subject. Global graph metrics, such as density and modularity, were estimated and compared between subjects' groups. These metrics were further used to classify patients using tuned Support Vector Machine (SVM) combined with Radial Basic Function (RBF) kernel. Results: When comparing MS patients to HC subjects, a greater assortativity, transitivity, and characteristic path length as well as a lower global efficiency were found. Using all graph metrics, the best F -Measures (91.8, 91.8, 75.6, and 70.6%) were obtained for binary (HC-CIS, CIS-RR, RR-PP) and multi-class (CIS-RR-SP) classification tasks, respectively. When using only one graph metric, the best F -Measures (83.6, 88.9, and 70.7%) were achieved for modularity with previous binary classification tasks. Conclusion: Based on a simple DTI acquisition associated with structural brain connectivity analysis, this automatic method allowed an accurate classification of different MS patients' clinical profiles.

  18. Psoriasis image representation using patch-based dictionary learning for erythema severity scoring.

    PubMed

    George, Yasmeen; Aldeen, Mohammad; Garnavi, Rahil

    2018-06-01

    Psoriasis is a chronic skin disease which can be life-threatening. Accurate severity scoring helps dermatologists to decide on the treatment. In this paper, we present a semi-supervised computer-aided system for automatic erythema severity scoring in psoriasis images. Firstly, the unsupervised stage includes a novel image representation method. We construct a dictionary, which is then used in the sparse representation for local feature extraction. To acquire the final image representation vector, an aggregation method is exploited over the local features. Secondly, the supervised phase is where various multi-class machine learning (ML) classifiers are trained for erythema severity scoring. Finally, we compare the proposed system with two popular unsupervised feature extractor methods, namely: bag of visual words model (BoVWs) and AlexNet pretrained model. Root mean square error (RMSE) and F1 score are used as performance measures for the learned dictionaries and the trained ML models, respectively. A psoriasis image set consisting of 676 images, is used in this study. Experimental results demonstrate that the use of the proposed procedure can provide a setup where erythema scoring is accurate and consistent. Also, it is revealed that dictionaries with large number of atoms and small patch sizes yield the best representative erythema severity features. Further, random forest (RF) outperforms other classifiers with F1 score 0.71, followed by support vector machine (SVM) and boosting with 0.66 and 0.64 scores, respectively. Furthermore, the conducted comparative studies confirm the effectiveness of the proposed approach with improvement of 9% and 12% over BoVWs and AlexNet based features, respectively. Crown Copyright © 2018. Published by Elsevier Ltd. All rights reserved.

  19. Progressively expanded neural network for automatic material identification in hyperspectral imagery

    NASA Astrophysics Data System (ADS)

    Paheding, Sidike

    The science of hyperspectral remote sensing focuses on the exploitation of the spectral signatures of various materials to enhance capabilities including object detection, recognition, and material characterization. Hyperspectral imagery (HSI) has been extensively used for object detection and identification applications since it provides plenty of spectral information to uniquely identify materials by their reflectance spectra. HSI-based object detection algorithms can be generally classified into stochastic and deterministic approaches. Deterministic approaches are comparatively simple to apply since it is usually based on direct spectral similarity such as spectral angles or spectral correlation. In contrast, stochastic algorithms require statistical modeling and estimation for target class and non-target class. Over the decades, many single class object detection methods have been proposed in the literature, however, deterministic multiclass object detection in HSI has not been explored. In this work, we propose a deterministic multiclass object detection scheme, named class-associative spectral fringe-adjusted joint transform correlation. Human brain is capable of simultaneously processing high volumes of multi-modal data received every second of the day. In contrast, a machine sees input data simply as random binary numbers. Although machines are computationally efficient, they are inferior when comes to data abstraction and interpretation. Thus, mimicking the learning strength of human brain has been current trend in artificial intelligence. In this work, we present a biological inspired neural network, named progressively expanded neural network (PEN Net), based on nonlinear transformation of input neurons to a feature space for better pattern differentiation. In PEN Net, discrete fixed excitations are disassembled and scattered in the feature space as a nonlinear line. Each disassembled element on the line corresponds to a pattern with similar features. Unlike the conventional neural network where hidden neurons need to be iteratively adjusted to achieve better accuracy, our proposed PEN Net does not require hidden neurons tuning which achieves better computational efficiency, and it has also shown superior performance in HSI classification tasks compared to the state-of-the-arts. Spectral-spatial features based HSI classification framework has shown stronger strength compared to spectral-only based methods. In our lastly proposed technique, PEN Net is incorporated with multiscale spatial features (i.e., multiscale complete local binary pattern) to perform a spectral-spatial classification of HSI. Several experiments demonstrate excellent performance of our proposed technique compared to the more recent developed approaches.

  20. Learning machines and sleeping brains: Automatic sleep stage classification using decision-tree multi-class support vector machines.

    PubMed

    Lajnef, Tarek; Chaibi, Sahbi; Ruby, Perrine; Aguera, Pierre-Emmanuel; Eichenlaub, Jean-Baptiste; Samet, Mounir; Kachouri, Abdennaceur; Jerbi, Karim

    2015-07-30

    Sleep staging is a critical step in a range of electrophysiological signal processing pipelines used in clinical routine as well as in sleep research. Although the results currently achievable with automatic sleep staging methods are promising, there is need for improvement, especially given the time-consuming and tedious nature of visual sleep scoring. Here we propose a sleep staging framework that consists of a multi-class support vector machine (SVM) classification based on a decision tree approach. The performance of the method was evaluated using polysomnographic data from 15 subjects (electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) recordings). The decision tree, or dendrogram, was obtained using a hierarchical clustering technique and a wide range of time and frequency-domain features were extracted. Feature selection was carried out using forward sequential selection and classification was evaluated using k-fold cross-validation. The dendrogram-based SVM (DSVM) achieved mean specificity, sensitivity and overall accuracy of 0.92, 0.74 and 0.88 respectively, compared to expert visual scoring. Restricting DSVM classification to data where both experts' scoring was consistent (76.73% of the data) led to a mean specificity, sensitivity and overall accuracy of 0.94, 0.82 and 0.92 respectively. The DSVM framework outperforms classification with more standard multi-class "one-against-all" SVM and linear-discriminant analysis. The promising results of the proposed methodology suggest that it may be a valuable alternative to existing automatic methods and that it could accelerate visual scoring by providing a robust starting hypnogram that can be further fine-tuned by expert inspection. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Pred-Skin: A Fast and Reliable Web Application to Assess Skin Sensitization Effect of Chemicals.

    PubMed

    Braga, Rodolpho C; Alves, Vinicius M; Muratov, Eugene N; Strickland, Judy; Kleinstreuer, Nicole; Trospsha, Alexander; Andrade, Carolina Horta

    2017-05-22

    Chemically induced skin sensitization is a complex immunological disease with a profound impact on quality of life and working ability. Despite some progress in developing alternative methods for assessing the skin sensitization potential of chemical substances, there is no in vitro test that correlates well with human data. Computational QSAR models provide a rapid screening approach and contribute valuable information for the assessment of chemical toxicity. We describe the development of a freely accessible web-based and mobile application for the identification of potential skin sensitizers. The application is based on previously developed binary QSAR models of skin sensitization potential from human (109 compounds) and murine local lymph node assay (LLNA, 515 compounds) data with good external correct classification rate (0.70-0.81 and 0.72-0.84, respectively). We also included a multiclass skin sensitization potency model based on LLNA data (accuracy ranging between 0.73 and 0.76). When a user evaluates a compound in the web app, the outputs are (i) binary predictions of human and murine skin sensitization potential; (ii) multiclass prediction of murine skin sensitization; and (iii) probability maps illustrating the predicted contribution of chemical fragments. The app is the first tool available that incorporates quantitative structure-activity relationship (QSAR) models based on human data as well as multiclass models for LLNA. The Pred-Skin web app version 1.0 is freely available for the web, iOS, and Android (in development) at the LabMol web portal ( http://labmol.com.br/predskin/ ), in the Apple Store, and on Google Play, respectively. We will continuously update the app as new skin sensitization data and respective models become available.

  2. Enhanced HMAX model with feedforward feature learning for multiclass categorization.

    PubMed

    Li, Yinlin; Wu, Wei; Zhang, Bo; Li, Fengfu

    2015-01-01

    In recent years, the interdisciplinary research between neuroscience and computer vision has promoted the development in both fields. Many biologically inspired visual models are proposed, and among them, the Hierarchical Max-pooling model (HMAX) is a feedforward model mimicking the structures and functions of V1 to posterior inferotemporal (PIT) layer of the primate visual cortex, which could generate a series of position- and scale- invariant features. However, it could be improved with attention modulation and memory processing, which are two important properties of the primate visual cortex. Thus, in this paper, based on recent biological research on the primate visual cortex, we still mimic the first 100-150 ms of visual cognition to enhance the HMAX model, which mainly focuses on the unsupervised feedforward feature learning process. The main modifications are as follows: (1) To mimic the attention modulation mechanism of V1 layer, a bottom-up saliency map is computed in the S1 layer of the HMAX model, which can support the initial feature extraction for memory processing; (2) To mimic the learning, clustering and short-term memory to long-term memory conversion abilities of V2 and IT, an unsupervised iterative clustering method is used to learn clusters with multiscale middle level patches, which are taken as long-term memory; (3) Inspired by the multiple feature encoding mode of the primate visual cortex, information including color, orientation, and spatial position are encoded in different layers of the HMAX model progressively. By adding a softmax layer at the top of the model, multiclass categorization experiments can be conducted, and the results on Caltech101 show that the enhanced model with a smaller memory size exhibits higher accuracy than the original HMAX model, and could also achieve better accuracy than other unsupervised feature learning methods in multiclass categorization task.

  3. 24 CFR 330.35 - Investors.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 24 Housing and Urban Development 2 2010-04-01 2010-04-01 false Investors. 330.35 Section 330.35... SECURITIES § 330.35 Investors. Association guaranteed multiclass securities may not be suitable investments for all investors. No investor should purchase securities of any class unless the investor understands...

  4. 24 CFR 330.20 - Eligible participants.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ...'s policies regarding participation by minority and/or women-owned businesses and take appropriate... status as a minority and/or women-owned business. (iii) Trustees. A trustee is selected by the Sponsor... deliver, eligible collateral to a trust in exchange for a single Association guaranteed multiclass...

  5. WhichP450: a multi-class categorical model to predict the major metabolising CYP450 isoform for a compound

    NASA Astrophysics Data System (ADS)

    Hunt, Peter A.; Segall, Matthew D.; Tyzack, Jonathan D.

    2018-02-01

    In the development of novel pharmaceuticals, the knowledge of how many, and which, Cytochrome P450 isoforms are involved in the phase I metabolism of a compound is important. Potential problems can arise if a compound is metabolised predominantly by a single isoform in terms of drug-drug interactions or genetic polymorphisms that would lead to variations in exposure in the general population. Combined with models of regioselectivities of metabolism by each isoform, such a model would also aid in the prediction of the metabolites likely to be formed by P450-mediated metabolism. We describe the generation of a multi-class random forest model to predict which, out of a list of the seven leading Cytochrome P450 isoforms, would be the major metabolising isoforms for a novel compound. The model has a 76% success rate with a top-1 criterion and an 88% success rate for a top-2 criterion and shows significant enrichment over randomised models.

  6. Analytical studies on the instabilities of heterogeneous intelligent traffic flow

    NASA Astrophysics Data System (ADS)

    Ngoduy, D.

    2013-10-01

    It has been widely reported in literature that a small perturbation in traffic flow such as a sudden deceleration of a vehicle could lead to the formation of traffic jams without a clear bottleneck. These traffic jams are usually related to instabilities in traffic flow. The applications of intelligent traffic systems are a potential solution to reduce the amplitude or to eliminate the formation of such traffic instabilities. A lot of research has been conducted to theoretically study the effect of intelligent vehicles, for example adaptive cruise control vehicles, using either computer simulation or analytical method. However, most current analytical research has only applied to single class traffic flow. To this end, the main topic of this paper is to perform a linear stability analysis to find the stability threshold of heterogeneous traffic flow using microscopic models, particularly the effect of intelligent vehicles on heterogeneous (or multi-class) traffic flow instabilities. The analytical results will show how intelligent vehicle percentages affect the stability of multi-class traffic flow.

  7. Highly informative multiclass profiling of lipids by ultra-high performance liquid chromatography - Low resolution (quadrupole) mass spectrometry by using electrospray ionization and atmospheric pressure chemical ionization interfaces.

    PubMed

    Beccaria, Marco; Inferrera, Veronica; Rigano, Francesca; Gorynski, Krzysztof; Purcaro, Giorgia; Pawliszyn, Janusz; Dugo, Paola; Mondello, Luigi

    2017-08-04

    A simple, fast, and versatile method, using an ultra-high performance liquid chromatography system coupled with a low resolution (single quadrupole) mass spectrometer was optimized to perform multiclass lipid profiling of human plasma. Particular attention was made to develop a method suitable for both electrospray ionization and atmospheric pressure chemical ionization interfaces (sequentially in positive- and negative-ion mode), without any modification of the chromatographic conditions (mobile phase, flow-rate, gradient, etc.). Emphasis was given to the extrapolation of the structural information based on the fragmentation pattern obtained using atmospheric pressure chemical ionization interface, under each different ionization condition, highlighting the complementary information obtained using the electrospray ionization interface, of support for related molecule ions identification. Furthermore, mass spectra of phosphatidylserine and phosphatidylinositol obtained using the atmospheric pressure chemical ionization interface are reported and discussed for the first time. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. A Multiagent-based Intrusion Detection System with the Support of Multi-Class Supervised Classification

    NASA Astrophysics Data System (ADS)

    Shyu, Mei-Ling; Sainani, Varsha

    The increasing number of network security related incidents have made it necessary for the organizations to actively protect their sensitive data with network intrusion detection systems (IDSs). IDSs are expected to analyze a large volume of data while not placing a significantly added load on the monitoring systems and networks. This requires good data mining strategies which take less time and give accurate results. In this study, a novel data mining assisted multiagent-based intrusion detection system (DMAS-IDS) is proposed, particularly with the support of multiclass supervised classification. These agents can detect and take predefined actions against malicious activities, and data mining techniques can help detect them. Our proposed DMAS-IDS shows superior performance compared to central sniffing IDS techniques, and saves network resources compared to other distributed IDS with mobile agents that activate too many sniffers causing bottlenecks in the network. This is one of the major motivations to use a distributed model based on multiagent platform along with a supervised classification technique.

  9. Bayes estimation on parameters of the single-class classifier. [for remotely sensed crop data

    NASA Technical Reports Server (NTRS)

    Lin, G. C.; Minter, T. C.

    1976-01-01

    Normal procedures used for designing a Bayes classifier to classify wheat as the major crop of interest require not only training samples of wheat but also those of nonwheat. Therefore, ground truth must be available for the class of interest plus all confusion classes. The single-class Bayes classifier classifies data into the class of interest or the class 'other' but requires training samples only from the class of interest. This paper will present a procedure for Bayes estimation on the mean vector, covariance matrix, and a priori probability of the single-class classifier using labeled samples from the class of interest and unlabeled samples drawn from the mixture density function.

  10. Development of multi-class, multi-criteria bicycle traffic assignment models and solution algorithms

    DOT National Transportation Integrated Search

    2015-08-31

    Cycling is gaining popularity both as a mode of travel in urban communities and as an alternative mode to private motorized vehicles due to its wide range of benefits (health, environmental, and economical). However, this change in modal share is not...

  11. How large a training set is needed to develop a classifier for microarray data?

    PubMed

    Dobbin, Kevin K; Zhao, Yingdong; Simon, Richard M

    2008-01-01

    A common goal of gene expression microarray studies is the development of a classifier that can be used to divide patients into groups with different prognoses, or with different expected responses to a therapy. These types of classifiers are developed on a training set, which is the set of samples used to train a classifier. The question of how many samples are needed in the training set to produce a good classifier from high-dimensional microarray data is challenging. We present a model-based approach to determining the sample size required to adequately train a classifier. It is shown that sample size can be determined from three quantities: standardized fold change, class prevalence, and number of genes or features on the arrays. Numerous examples and important experimental design issues are discussed. The method is adapted to address ex post facto determination of whether the size of a training set used to develop a classifier was adequate. An interactive web site for performing the sample size calculations is provided. We showed that sample size calculations for classifier development from high-dimensional microarray data are feasible, discussed numerous important considerations, and presented examples.

  12. Environmental Monitoring Networks Optimization Using Advanced Active Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Kanevski, Mikhail; Volpi, Michele; Copa, Loris

    2010-05-01

    The problem of environmental monitoring networks optimization (MNO) belongs to one of the basic and fundamental tasks in spatio-temporal data collection, analysis, and modeling. There are several approaches to this problem, which can be considered as a design or redesign of monitoring network by applying some optimization criteria. The most developed and widespread methods are based on geostatistics (family of kriging models, conditional stochastic simulations). In geostatistics the variance is mainly used as an optimization criterion which has some advantages and drawbacks. In the present research we study an application of advanced techniques following from the statistical learning theory (SLT) - support vector machines (SVM) and the optimization of monitoring networks when dealing with a classification problem (data are discrete values/classes: hydrogeological units, soil types, pollution decision levels, etc.) is considered. SVM is a universal nonlinear modeling tool for classification problems in high dimensional spaces. The SVM solution is maximizing the decision boundary between classes and has a good generalization property for noisy data. The sparse solution of SVM is based on support vectors - data which contribute to the solution with nonzero weights. Fundamentally the MNO for classification problems can be considered as a task of selecting new measurement points which increase the quality of spatial classification and reduce the testing error (error on new independent measurements). In SLT this is a typical problem of active learning - a selection of the new unlabelled points which efficiently reduce the testing error. A classical approach (margin sampling) to active learning is to sample the points closest to the classification boundary. This solution is suboptimal when points (or generally the dataset) are redundant for the same class. In the present research we propose and study two new advanced methods of active learning adapted to the solution of MNO problem: 1) hierarchical top-down clustering in an input space in order to remove redundancy when data are clustered, and 2) a general method (independent on classifier) which gives posterior probabilities that can be used to define the classifier confidence and corresponding proposals for new measurement points. The basic ideas and procedures are explained by applying simulated data sets. The real case study deals with the analysis and mapping of soil types, which is a multi-class classification problem. Maps of soil types are important for the analysis and 3D modeling of heavy metals migration in soil and prediction risk mapping. The results obtained demonstrate the high quality of SVM mapping and efficiency of monitoring network optimization by using active learning approaches. The research was partly supported by SNSF projects No. 200021-126505 and 200020-121835.

  13. The application of improved NeuroEvolution of Augmenting Topologies neural network in Marcellus Shale lithofacies prediction

    NASA Astrophysics Data System (ADS)

    Wang, Guochang; Cheng, Guojian; Carr, Timothy R.

    2013-04-01

    The organic-rich Marcellus Shale was deposited in a foreland basin during Middle Devonian. In terms of mineral composition and organic matter richness, we define seven mudrock lithofacies: three organic-rich lithofacies and four organic-poor lithofacies. The 3D lithofacies model is very helpful to determine geologic and engineering sweet spots, and consequently useful for designing horizontal well trajectories and stimulation strategies. The NeuroEvolution of Augmenting Topologies (NEAT) is relatively new idea in the design of neural networks, and shed light on classification (i.e., Marcellus Shale lithofacies prediction). We have successfully enhanced the capability and efficiency of NEAT in three aspects. First, we introduced two new attributes of node gene, the node location and recurrent connection (RCC), to increase the calculation efficiency. Second, we evolved the population size from an initial small value to big, instead of using the constant value, which saves time and computer memory, especially for complex learning tasks. Third, in multiclass pattern recognition problems, we combined feature selection of input variables and modular neural network to automatically select input variables and optimize network topology for each binary classifier. These improvements were tested and verified by true if an odd number of its arguments are true and false otherwise (XOR) experiments, and were powerful for classification.

  14. Structural knowledge learning from maps for supervised land cover/use classification: Application to the monitoring of land cover/use maps in French Guiana

    NASA Astrophysics Data System (ADS)

    Bayoudh, Meriam; Roux, Emmanuel; Richard, Gilles; Nock, Richard

    2015-03-01

    The number of satellites and sensors devoted to Earth observation has become increasingly elevated, delivering extensive data, especially images. At the same time, the access to such data and the tools needed to process them has considerably improved. In the presence of such data flow, we need automatic image interpretation methods, especially when it comes to the monitoring and prediction of environmental and societal changes in highly dynamic socio-environmental contexts. This could be accomplished via artificial intelligence. The concept described here relies on the induction of classification rules that explicitly take into account structural knowledge, using Aleph, an Inductive Logic Programming (ILP) system, combined with a multi-class classification procedure. This methodology was used to monitor changes in land cover/use of the French Guiana coastline. One hundred and fifty-eight classification rules were induced from 3 diachronic land cover/use maps including 38 classes. These rules were expressed in first order logic language, which makes them easily understandable by non-experts. A 10-fold cross-validation gave significant average values of 84.62%, 99.57% and 77.22% for classification accuracy, specificity and sensitivity, respectively. Our methodology could be beneficial to automatically classify new objects and to facilitate object-based classification procedures.

  15. Audio-based bolt-loosening detection technique of bolt joint

    NASA Astrophysics Data System (ADS)

    Zhang, Yang; Zhao, Xuefeng; Su, Wensheng; Xue, Zhigang

    2018-03-01

    Bolt joint, as the commonest coupling structure, is widely used in electro-mechanical system. However, it is the weakest part of the whole system. The increase of preload tension force can raise the reliability and strength of the bolt joint. Therefore, the pretension force is one of the most important factors to ensure the stability of bolt joint. According to the way of generating pretension force, the pretension force can be monitored by bolt torque, degrees and elongation. The existing bolt-loosening monitoring methods all require expensive equipment, which greatly restricts the practicality of the bolt-loosening monitoring. In this paper, a new method of bolt-loosening detection technique based on audio is proposed. The sound that bolt is hit by a hammer is recorded on the Smartphone, and the collected audio signal is classified and identified by support vector machine algorithm. First, a verification test was designed and the results show that this new method can identify the damage of bolt looseness accurately. Second, a variety of bolt-loosening was identified. The results indicate that this method has a high accuracy in multiclass classification of the bolt looseness. This bolt-loosening detection technique based on audio not only can reduce the requirements of technical and professional experience, but also make bolt-loosening monitoring simpler and easier.

  16. A study on quantifying COPD severity by combining pulmonary function tests and CT image analysis

    NASA Astrophysics Data System (ADS)

    Nimura, Yukitaka; Kitasaka, Takayuki; Honma, Hirotoshi; Takabatake, Hirotsugu; Mori, Masaki; Natori, Hiroshi; Mori, Kensaku

    2011-03-01

    This paper describes a novel method that can evaluate chronic obstructive pulmonary disease (COPD) severity by combining measurements of pulmonary function tests and measurements obtained from CT image analysis. There is no cure for COPD. However, with regular medical care and consistent patient compliance with treatments and lifestyle changes, the symptoms of COPD can be minimized and progression of the disease can be slowed. Therefore, many diagnosis methods based on CT image analysis have been proposed for quantifying COPD. Most of diagnosis methods for COPD extract the lesions as low-attenuation areas (LAA) by thresholding and evaluate the COPD severity by calculating the LAA in the lung (LAA%). However, COPD is usually the result of a combination of two conditions, emphysema and chronic obstructive bronchitis. Therefore, the previous methods based on only LAA% do not work well. The proposed method utilizes both of information including the measurements of pulmonary function tests and the results of the chest CT image analysis to evaluate the COPD severity. In this paper, we utilize a multi-class AdaBoost to combine both of information and classify the COPD severity into five stages automatically. The experimental results revealed that the accuracy rate of the proposed method was 88.9% (resubstitution scheme) and 64.4% (leave-one-out scheme).

  17. A Wearable Home BCI system: preliminary results with SSVEP protocol.

    PubMed

    Piccini, Luca; Parini, Sergio; Maggi, Luca; Andreoni, Giuseppe

    2005-01-01

    This paper presents and discusses the realization and the performances of a wearable system for EEG-based BCI applications. The system (called Kimera) consists of a two-layer hardware architecture (the wireless acquisition and transmission board based on a Bluetooth ® ARM chip, and a low power miniaturized biosignal acquisition analog front end) together with a software suite (called Bellerophonte) for the Graphic User Interface management, protocol execution, data recording, transmission and processing. The implemented BCI system was based on the SSVEP protocol, applied to a two state selection by using standards display/monitor with a couple of high efficiency LEDs. The frequency features of the signal were computed and used in the intention detection. The BCI algorithm is based on a supervised classifier implemented through a multi-class Canonical Discriminant Analysis (CDA) with a continuous realtime feedback based on the mahalanobis distance parameter. Five healthy subjects participated in the first phase for a preliminary device validation. The obtained results are very interesting and promising, being lined out to the most recent performance reported in literature with a significant improvement both in system and in classification capabilities. The user-friendliness and low cost of the Kimera& Bellerophonte platform make it suitable for the development of home BCI applications.

  18. Linear Subpixel Learning Algorithm for Land Cover Classification from WELD using High Performance Computing

    NASA Technical Reports Server (NTRS)

    Kumar, Uttam; Nemani, Ramakrishna R.; Ganguly, Sangram; Kalia, Subodh; Michaelis, Andrew

    2017-01-01

    In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS-national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91 percent was achieved, which is a 6 percent improvement in unmixing based classification relative to per-pixel-based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.

  19. Linear Subpixel Learning Algorithm for Land Cover Classification from WELD using High Performance Computing

    NASA Astrophysics Data System (ADS)

    Ganguly, S.; Kumar, U.; Nemani, R. R.; Kalia, S.; Michaelis, A.

    2017-12-01

    In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS - national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91% was achieved, which is a 6% improvement in unmixing based classification relative to per-pixel based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.

  20. Evaluating the Effectiveness of Flood Control Strategies in Contrasting Urban Watersheds and Implications for Houston's Future Flood Vulnerability

    NASA Astrophysics Data System (ADS)

    Ganguly, S.; Kumar, U.; Nemani, R. R.; Kalia, S.; Michaelis, A.

    2016-12-01

    In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS - national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91% was achieved, which is a 6% improvement in unmixing based classification relative to per-pixel based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.

  1. Network-centric decision architecture for financial or 1/f data models

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger M.; Handley, James W.; Massey, Stoney; Case, Carl T.; Songy, Claude G.

    2002-12-01

    This paper presents a decision architecture algorithm for training neural equation based networks to make autonomous multi-goal oriented, multi-class decisions. These architectures make decisions based on their individual goals and draw from the same network centric feature set. Traditionally, these architectures are comprised of neural networks that offer marginal performance due to lack of convergence of the training set. We present an approach for autonomously extracting sample points as I/O exemplars for generation of multi-branch, multi-node decision architectures populated by adaptively derived neural equations. To test the robustness of this architecture, open source data sets in the form of financial time series were used, requiring a three-class decision space analogous to the lethal, non-lethal, and clutter discrimination problem. This algorithm and the results of its application are presented here.

  2. Multicategory reclassification statistics for assessing improvements in diagnostic accuracy

    PubMed Central

    Li, Jialiang; Jiang, Binyan; Fine, Jason P.

    2013-01-01

    In this paper, we extend the definitions of the net reclassification improvement (NRI) and the integrated discrimination improvement (IDI) in the context of multicategory classification. Both measures were proposed in Pencina and others (2008. Evaluating the added predictive ability of a new marker: from area under the receiver operating characteristic (ROC) curve to reclassification and beyond. Statistics in Medicine 27, 157–172) as numeric characterizations of accuracy improvement for binary diagnostic tests and were shown to have certain advantage over analyses based on ROC curves or other regression approaches. Estimation and inference procedures for the multiclass NRI and IDI are provided in this paper along with necessary asymptotic distributional results. Simulations are conducted to study the finite-sample properties of the proposed estimators. Two medical examples are considered to illustrate our methodology. PMID:23197381

  3. Multi-Class Classification for Identifying JPEG Steganography Embedding Methods

    DTIC Science & Technology

    2008-09-01

    B.H. (2000). STEGANOGRAPHY: Hidden Images, A New Challenge in the Fight Against Child Porn . UPDATE, Volume 13, Number 2, pp. 1-4, Retrieved June 3...Other crimes involving the use of steganography include child pornography where the stego files are used to hide a predator’s location when posting

  4. 24 CFR 330.25 - Fees.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 24 Housing and Urban Development 2 2010-04-01 2010-04-01 false Fees. 330.25 Section 330.25 Housing... SECURITIES § 330.25 Fees. The Association, in its discretion, through publication in the Multiclass Guide or on the GNMA electronic bulletin board, may impose fees for application, guaranty, transfer, change...

  5. Participant Analysis of a Multi-Class, Multi-State, On-Line, Discussion List.

    ERIC Educational Resources Information Center

    Knupfer, Nancy Nelson; And Others

    As interest in distance education increases, many university instructors are experimenting with listserv discussion within their courses. Six educational technology professors at four universities initiated a listserv discussion group within their various classes for 52 graduate students. Discussion topics were four general themes within…

  6. Multiclass analysis of antibiotic residues in honey by ultraperformance liquid chromatography-tandem mass spectrometry.

    PubMed

    Vidal, Jose Luis Martínez; Aguilera-Luiz, María Del Mar; Romero-González, Roberto; Frenich, Antonia Garrido

    2009-03-11

    A method has been developed and validated for the simultaneous analysis of different veterinary drug residues (macrolides, tetracyclines, quinolones, and sulfonamides) in honey. Honey samples were dissolved with Na(2)EDTA, and veterinary residues were extracted from the supernatant by solid-phase extraction (SPE), using OASIS HLB cartridges. The separation and determination was carried out by ultraperformance liquid chromatography coupled to tandem mass spectrometry (UPLC-MS/MS), using an electrospay ionization source (ESI) in positive mode. Data acquisition under MS/MS was achieved by applying multiple reaction monitoring (MRM) of two ion transitions per compound to provide a high degree of sensitivity and specificity. The method was validated, and mean recoveries were evaluated at three concentration levels (10, 50, and 100 microg/kg), ranging from 70 to 120% except for doxycycline, erythromycin, and tylmicosin with recovery higher than 50% at the three levels assayed. Relative standard deviations (RSDs) of the recoveries were less than 20% within the intraday precision and less than 25% within the interday precision. The limits of quantification (LOQs) were always lower than 4 microg/kg. The developed procedure was applied to 16 honey samples, and erythromycin, sarafloxacin, and tylosin were found in a few samples.

  7. Multiclass method for the quantification of 92 veterinary antimicrobial drugs in livestock excreta, wastewater, and surface water by liquid chromatography with tandem mass spectrometry.

    PubMed

    Gao, Jinfang; Cui, Yonghui; Tao, Yanfei; Huang, Lingli; Peng, Dapeng; Xie, Shuyu; Wang, Xu; Liu, Zhenli; Chen, Dongmei; Yuan, Zonghui

    2016-11-01

    A simple multiresidue method was developed for detecting and quantifying 92 veterinary antimicrobial drugs from eight classes (β-lactams, quinolones, sulfonamides, tetracyclines, lincomycins, macrolides, chloramphenicols, and pleuromutilin) in livestock excreta and water by liquid chromatography with tandem mass spectrometry. The feces samples were extracted by ultrasound-assisted extraction with a mixture of acetonitrile/water (80:20, v/v) and edetate disodium, followed by a cleanup using solid-phase extraction with an amino cartridge. Water samples were purified with hydrophilic-lipophilic balance solid-phase extraction column. Urine samples were extracted with acetonitrile and edetate disodium. Detection of veterinary antimicrobial drugs was achieved by liquid chromatography with tandem mass spectrometry using both positive and negative electrospray ionization mode. The recovery values of veterinary antimicrobial drugs in feces, urine, and water samples were 75-99, 85-110, and 85-101% and associated relative standard deviations were less than 15, 10, and 8%, respectively. The limits of quantification in feces, urine, and water samples were 0.5-1, 0.5-1, and 0.01-0.05 μg/L, respectively. This method was applied to determine real samples obtained from local farms and provides reliable quantification and identification results of 92 veterinary antimicrobial drugs in livestock excreta and water. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Comparison of Hybrid Classifiers for Crop Classification Using Normalized Difference Vegetation Index Time Series: A Case Study for Major Crops in North Xinjiang, China

    PubMed Central

    Hao, Pengyu; Wang, Li; Niu, Zheng

    2015-01-01

    A range of single classifiers have been proposed to classify crop types using time series vegetation indices, and hybrid classifiers are used to improve discriminatory power. Traditional fusion rules use the product of multi-single classifiers, but that strategy cannot integrate the classification output of machine learning classifiers. In this research, the performance of two hybrid strategies, multiple voting (M-voting) and probabilistic fusion (P-fusion), for crop classification using NDVI time series were tested with different training sample sizes at both pixel and object levels, and two representative counties in north Xinjiang were selected as study area. The single classifiers employed in this research included Random Forest (RF), Support Vector Machine (SVM), and See 5 (C 5.0). The results indicated that classification performance improved (increased the mean overall accuracy by 5%~10%, and reduced standard deviation of overall accuracy by around 1%) substantially with the training sample number, and when the training sample size was small (50 or 100 training samples), hybrid classifiers substantially outperformed single classifiers with higher mean overall accuracy (1%~2%). However, when abundant training samples (4,000) were employed, single classifiers could achieve good classification accuracy, and all classifiers obtained similar performances. Additionally, although object-based classification did not improve accuracy, it resulted in greater visual appeal, especially in study areas with a heterogeneous cropping pattern. PMID:26360597

  9. The Mighty Ganges and its Journey Through the Silk City: A Case Study of Water Quality and its Impact on Health in Bhagalpur, Bihar, India, using Machine Learning, GIS & Remote Sensing

    NASA Astrophysics Data System (ADS)

    Zaman, B.; Kumar, N.

    2016-12-01

    River Ganges with an approximate stretch of 2525 km serves about 40% of India's population across 11 states, one of which is Bihar. The district Bhagalpur is located in the eastern part of Bihar and extends between the north latitudes of 25°03'40" and 25°30'00" and east longitudes of 86°30'00" and 87°29'45" encompassing approximately 66 km stretch of the Ganges. It forms a part of the mid- Gangetic alluvium plain covering an area of 2570 km2. The total population of the district stands at 3.03 million with a population density of 743 per km2. Ganges is a life line of millions of people with utmost religious significance but its banks have become a dumping ground for untreated urban sewage, industrial waste, disposal of solid corpses etc. which has led to severe environmental issues and as reported by the Central Ground water Board, the southern part of the city is affected by arsenic contamination in ground water (> 50 mg/L as per WHO norm). The municipal corporation is trying to cope up. This study aims at a comprehensive analysis of water quality along the entire 66 km stretch of the river. The methodology would involve dividing the stretch into 1 km sub-study areas and collection of 10 water samples from each stretch. Samples will also be collected at disposal points from industries especially the silk manufacturing units, sewage disposal points, cremation grounds, pesticide disposal points. A high resolution remotely sensed imagery of the city would be used and the multi-class relevance vector machine (MCRVM) would be used to broadly classify the landuse/landcover and this synoptic view of the city would facilitate the understanding of the urban environment. In conjunction, a standard questionnaire on health along with GPS locations would be collected from sample population inhabiting the demarcated stretches. Analysis would include physical, chemical and bacteriological tests on water samples. The results would bring forth the water quality and check for permissible limits, its effect the state of human health, the compliance of waste disposal with specified standards and landcover map of the city for understanding the spatial configuration of the city. The research would be of immense use to the agencies responsible for restoring and preserving water quality of the River Ganges thus to improving quality of life in this silk city.

  10. 75 FR 10853 - Self-Regulatory Organizations; Chicago Board Options Exchange, Incorporated; Notice of Filing and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-03-09

    ...-Regulatory Organizations; Chicago Board Options Exchange, Incorporated; Notice of Filing and Immediate Effectiveness of Proposed Rule Change Related to Multi-Class Broad Based Index Option Spread Orders March 2... thereunder,\\2\\ notice is hereby given that on February 18, 2010, the Chicago Board Options Exchange...

  11. A Different Mirror: The Position of Immigrant Writers in British Society.

    ERIC Educational Resources Information Center

    Williams, Bronwyn T.

    In teaching international students in Britain and students in the United States in multicultural and multiclass classrooms, a common resistance was found to the consideration of how culture and society shape identity. Even international students from collectivist cultures, who see their identities as inextricable from their communities in their…

  12. Global Citizens Are Made, Not Born: Multiclass Role-Playing Simulation of Global Decision Making

    ERIC Educational Resources Information Center

    Levintova, Ekaterina; Johnson, Terri; Scheberle, Denise; Vonck, Kevin

    2011-01-01

    Globalization, global citizenship, and political engagement have become such buzzwords and cliches that we often lose the sense of their meaning. Global citizenship in particular is an elusive concept to operationalize. This article proposes to look at three dimensions of global citizenship: legal (rights and obligations), psychological…

  13. 78 FR 23281 - Notice of Proposed Information Collection; Comment Request: Ginnie Mae Multiclass Securities...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-04-18

    ... Statements Accountant for 15 8 120 32 3840 (attached to closing Sponsor. letter). Accountants' Closing Accountant...... 15 8 120 8 960 Letter. Accountants' OCS Letter... Accountant...... 15 8 120 8 960 Structuring Data Accountant...... 15 8 120 8 960 Financial Statements...... Accountant...... 15 8 120 1 120...

  14. Clifford support vector machines for classification, regression, and recurrence.

    PubMed

    Bayro-Corrochano, Eduardo Jose; Arana-Daniel, Nancy

    2010-11-01

    This paper introduces the Clifford support vector machines (CSVM) as a generalization of the real and complex-valued support vector machines using the Clifford geometric algebra. In this framework, we handle the design of kernels involving the Clifford or geometric product. In this approach, one redefines the optimization variables as multivectors. This allows us to have a multivector as output. Therefore, we can represent multiple classes according to the dimension of the geometric algebra in which we work. We show that one can apply CSVM for classification and regression and also to build a recurrent CSVM. The CSVM is an attractive approach for the multiple input multiple output processing of high-dimensional geometric entities. We carried out comparisons between CSVM and the current approaches to solve multiclass classification and regression. We also study the performance of the recurrent CSVM with experiments involving time series. The authors believe that this paper can be of great use for researchers and practitioners interested in multiclass hypercomplex computing, particularly for applications in complex and quaternion signal and image processing, satellite control, neurocomputation, pattern recognition, computer vision, augmented virtual reality, robotics, and humanoids.

  15. A Fast Reduced Kernel Extreme Learning Machine.

    PubMed

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Validated multiclass targeted determination of antibiotics in fish with high performance liquid chromatography-benchtop quadrupole orbitrap hybrid mass spectrometry.

    PubMed

    Chiesa, Luca; Panseri, Sara; Pasquale, Elisa; Malandra, Renato; Pavlovic, Radmila; Arioli, Francesco

    2018-08-30

    High performance liquid chromatography, coupled with a benchtop Q-Exactive Orbitrap high-resolution mass spectrometer, was successfully applied for the determination of 24 target antibiotics (selected beta-lactams, tetracyclines, fluoroquinolones, sulfonamids, phenicols, macrolides, cephalosporins, lincosamides, diaminopyrimidine) in fish matrices. The Q-Exactive parameters were carefully studied to accomplish the best compromise between a suitable scan speed and selectivity, considering the restrictions associated with generic sample preparation methodology. Retention time, an exact mass with tolerance of 2 ppm and data-dependent MS 2 spectra were the main identifiers. The method was validated through specificity, linearity, recovery, intra- and inter-day repeatability, decision limit (CCα) and detection capability (CCβ), according to 2002/657/EC. The values of CCα and CCβ ranged from 29.2 to 36.8 and 32.5 to 48.9, respectively, while overall recovery ranged from 91.1 to 105.6%. Fifty fish samples were analysed, showing the sporadic incidence of enrofloxacin, chlortetracycline, oxytetracycline, amoxicillin and trimethoprim, albeit below the maximum residual levels. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. Use of Multi-class Empirical Orthogonal Function for Identification of Hydrogeological Parameters and Spatiotemporal Pattern of Multiple Recharges in Groundwater Modeling

    NASA Astrophysics Data System (ADS)

    Huang, C. L.; Hsu, N. S.; Yeh, W. W. G.; Hsieh, I. H.

    2017-12-01

    This study develops an innovative calibration method for regional groundwater modeling by using multi-class empirical orthogonal functions (EOFs). The developed method is an iterative approach. Prior to carrying out the iterative procedures, the groundwater storage hydrographs associated with the observation wells are calculated. The combined multi-class EOF amplitudes and EOF expansion coefficients of the storage hydrographs are then used to compute the initial gauss of the temporal and spatial pattern of multiple recharges. The initial guess of the hydrogeological parameters are also assigned according to in-situ pumping experiment. The recharges include net rainfall recharge and boundary recharge, and the hydrogeological parameters are riverbed leakage conductivity, horizontal hydraulic conductivity, vertical hydraulic conductivity, storage coefficient, and specific yield. The first step of the iterative algorithm is to conduct the numerical model (i.e. MODFLOW) by the initial guess / adjusted values of the recharges and parameters. Second, in order to determine the best EOF combination of the error storage hydrographs for determining the correction vectors, the objective function is devised as minimizing the root mean square error (RMSE) of the simulated storage hydrographs. The error storage hydrograph are the differences between the storage hydrographs computed from observed and simulated groundwater level fluctuations. Third, adjust the values of recharges and parameters and repeat the iterative procedures until the stopping criterion is reached. The established methodology was applied to the groundwater system of Ming-Chu Basin, Taiwan. The study period is from January 1st to December 2ed in 2012. Results showed that the optimal EOF combination for the multiple recharges and hydrogeological parameters can decrease the RMSE of the simulated storage hydrographs dramatically within three calibration iterations. It represents that the iterative approach that using EOF techniques can capture the groundwater flow tendency and detects the correction vector of the simulated error sources. Hence, the established EOF-based methodology can effectively and accurately identify the multiple recharges and hydrogeological parameters.

  18. The challenges of developing a generic extraction procedure to analyze multi-class veterinary drug residues in milk and honey using ultra-high pressure liquid chromatography quadrupole time-of-flight mass spectrometry.

    PubMed

    Wang, Jian; Leung, Daniel

    2012-08-01

    This paper discusses the analytical challenges to develop a generic extraction procedure to analyze or screen multi-class veterinary drugs in milk and honey using ultra-high pressure liquid chromatography quadrupole time-of-flight mass spectrometry (UHPLC QqTOF MS). The veterinary drugs in this study included aminoglycosides, endectocides, fluoroquinolones, ionophores, β-lactams or penicillins, macrolides, NSAIDs, phenicols, sulfonamides and tetracyclines. Veterinary drugs were extracted using a QuEChERS (quick, easy, cheap, effective, rugged, and safe) method, which entailed the use of acetonitrile containing 1% acetic acid, sodium acetate, ethylenediaminetetra acetic acid disodium (EDTA) and magnesium sulfate, and no clean-up was performed. Chromatographic separation was achieved on a reversed-phase Acquity UPLC BEH C(18) , 100 × 2.1 mm, 1.7 µm column with 0.1% formic acid and 10 mM ammonium formate in water, and acetonitrile as mobile phases. Due to poor chromatographic retention, aminoglycosides were first dropped from the list, and because of poor extractability, β-lactams and tetracyclines were also excluded from the method. The method was able to quantify 31 or screen up to 54 drugs (unbound) in honey, and to quantify 34 or screen up to 59 drugs in milk. UHPLC QqTOF data were acquired in TOF MS full-scan mode that allowed both quantification and confirmation of veterinary drugs and identification of their degradation products in samples. The method could achieve detection limits as low as 1 µg/kg with analytical range from 1 to 100 µg/kg. The developed method was intended to be used for screening of as many analytes as possible in one single analysis, or unequivocal confirmation of positive findings and degradation product identification based on accurate mass measurement and isotopic patterns. © Her Majesty the Queen in Right of Canada 2012. Reproduced with the permission of the Minister of Agriculture.

  19. Mapping the genetic and tissular diversity of 64 phenolic compounds in Citrus species using a UPLC–MS approach

    PubMed Central

    Durand-Hulak, Marie; Dugrand, Audray; Duval, Thibault; Bidel, Luc P. R.; Jay-Allemand, Christian; Froelicher, Yann; Bourgaud, Frédéric; Fanciullino, Anne-Laure

    2015-01-01

    Background and Aims Phenolic compounds contribute to food quality and have potential health benefits. Consequently, they are an important target of selection for Citrus species. Numerous studies on this subject have revealed new molecules, potential biosynthetic pathways and linkage between species. Although polyphenol profiles are correlated with gene expression, which is responsive to developmental and environmental cues, these factors are not monitored in most studies. A better understanding of the biosynthetic pathway and its regulation requires more information about environmental conditions, tissue specificity and connections between competing sub-pathways. This study proposes a rapid method, from sampling to analysis, that allows the quantitation of multiclass phenolic compounds across contrasting tissues and cultivars. Methods Leaves and fruits of 11 cultivated citrus of commercial interest were collected from adult trees grown in an experimental orchard. Sixty-four phenolic compounds were simultaneously quantified by ultra-high-performance liquid chromatography coupled with mass spectrometry. Key Results Combining data from vegetative tissues with data from fruit tissues improved cultivar classification based on polyphenols. The analysis of metabolite distribution highlighted the massive accumulation of specific phenolic compounds in leaves and the external part of the fruit pericarp, which reflects their involvement in plant defence. The overview of the biosynthetic pathway obtained confirmed some regulatory steps, for example those catalysed by rhamnosyltransferases. The results suggest that three other steps are responsible for the different metabolite profiles in ‘Clementine’ and ‘Star Ruby’ grapefruit. Conclusions The method described provides a high-throughput method to study the distribution of phenolic compounds across contrasting tissues and cultivars in Citrus, and offers the opportunity to investigate their regulation and physiological roles. The method was validated in four different tissues and allowed the identification and quantitation of 64 phenolic compounds in 20 min, which represents an improvement over existing methods of analysing multiclass polyphenols. PMID:25757470

  20. Soy sauce classification by geographic region and fermentation based on artificial neural network and genetic algorithm.

    PubMed

    Xu, Libin; Li, Yang; Xu, Ning; Hu, Yong; Wang, Chao; He, Jianjun; Cao, Yueze; Chen, Shigui; Li, Dongsheng

    2014-12-24

    This work demonstrated the possibility of using artificial neural networks to classify soy sauce from China. The aroma profiles of different soy sauce samples were differentiated using headspace solid-phase microextraction. The soy sauce samples were analyzed by gas chromatography-mass spectrometry, and 22 and 15 volatile aroma compounds were selected for sensitivity analysis to classify the samples by fermentation and geographic region, respectively. The 15 selected samples can be classified by fermentation and geographic region with a prediction success rate of 100%. Furans and phenols represented the variables with the greatest contribution in classifying soy sauce samples by fermentation and geographic region, respectively.

  1. Autonomous Segmentation of Outcrop Images Using Computer Vision and Machine Learning

    NASA Astrophysics Data System (ADS)

    Francis, R.; McIsaac, K.; Osinski, G. R.; Thompson, D. R.

    2013-12-01

    As planetary exploration missions become increasingly complex and capable, the motivation grows for improved autonomous science. New capabilities for onboard science data analysis may relieve radio-link data limits and provide greater throughput of scientific information. Adaptive data acquisition, storage and downlink may ultimately hold implications for mission design and operations. For surface missions, geology remains an essential focus, and the investigation of in place, exposed geological materials provides the greatest scientific insight and context for the formation and history of planetary materials and processes. The goal of this research program is to develop techniques for autonomous segmentation of images of rock outcrops. Recognition of the relationships between different geological units is the first step in mapping and interpreting a geological setting. Applications of automatic segmentation include instrument placement and targeting and data triage for downlink. Here, we report on the development of a new technique in which a photograph of a rock outcrop is processed by several elementary image processing techniques, generating a feature space which can be interrogated and classified. A distance metric learning technique (Multiclass Discriminant Analysis, or MDA) is tested as a means of finding the best numerical representation of the feature space. MDA produces a linear transformation that maximizes the separation between data points from different geological units. This ';training step' is completed on one or more images from a given locality. Then we apply the same transformation to improve the segmentation of new scenes containing similar materials to those used for training. The technique was tested using imagery from Mars analogue settings at the Cima volcanic flows in the Mojave Desert, California; impact breccias from the Sudbury impact structure in Ontario, Canada; and an outcrop showing embedded mineral veins in Gale Crater on Mars. These initial results show promising performance in segmenting images, including multi-class scenes with complex boundaries. In particular, the system was able to learn to distinguish between successive layers of volcanic deposits, including massive basalts overlaying lahar materials. It was also able to separate clasts from ground mass in outcrops of impact breccia, and to find veins of hydrated material within a clay-bearing host rock. The tests also reveal initial details about the types of visual information relevant to segmentation of these types of scenes, providing guidance for further development of the technique. Funding for this work was provided in part by the Canadian Astrobiology Training Program. A portion of this research was performed at the Jet Propulsion Laboratory, California Institute of Technology. Copyright 2013 The University of Western Ontario. All Rights Reserved.

  2. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    PubMed

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Ruggedness testing and validation of a practical analytical method for > 100 veterinary drug residues in bovine muscle by ultrahigh performance liquid chromatography – tandem mass spectrometry

    USDA-ARS?s Scientific Manuscript database

    In this study, optimization, extension, and validation of a streamlined, qualitative and quantitative multiclass, multiresidue method was conducted to monitor great than100 veterinary drug residues in meat using ultrahigh-performance liquid chromatography – tandem mass spectrometry (UHPLC-MS/MS). I...

  4. Working Together: An Empirical Analysis of a Multiclass Legislative-Executive Branch Simulation

    ERIC Educational Resources Information Center

    Kalaf-Hughes, Nicole; Mills, Russell W.

    2016-01-01

    Much of the research on the use of simulations in the political science classroom focuses on how simulations model different events in the real world, including political campaigns, international diplomacy, and legislative bargaining. In the case of American Politics, many simulations focus on the behavior of Congress and the legislative process,…

  5. 77 FR 71433 - Notice of Proposed Information Collection: Comment Request; Ginnie Mae Multiclass Securities...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-11-30

    ... Sponsor........ 15 8 120 0.33 39.6 Issuance Statement Attorney for Sponsor........ 15 8 120 0.5 60 Tax... Statements (attached to closing Accountant for Sponsor...... 15 8 120 32 3840 letter). Accountants' Closing Letter Accountant 15 8 120 8 960 Accountants' OCS Letter Accountant 15 8 120 8 960 Structuring Data...

  6. 75 FR 5336 - Notice of Proposed Information Collection: Comment Request; Ginnie Mae Multiclass Securities...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-02-02

    ... 8 120 0.5 60 Tax Opinion Attorney for Sponsor........ 15 8 120 4 480 Transfer Affidavit Attorney for... Final Data Statements (attached to closing Accountant for Sponsor...... 15 8 120 32 3,840 letter). Accountants' Closing Letter Accountant 15 8 120 8 960 Accountants' OCS Letter Accountant 15 8 120 8 960...

  7. Designing an Engaged Swarm: Toward a "Techne" for Multi-Class, Interdisciplinary Collaborations with Nonprofit Partners

    ERIC Educational Resources Information Center

    McCarthy, Seán

    2016-01-01

    This essay proposes a model of university-community partnership called "an engaged swarm" that mobilizes networks of students from across classes and disciplines to work with off-campus partners such as nonprofits. Based on theories that translate the distributed, adaptive, and flexible activity of actors in biological systems to…

  8. 78 FR 21393 - Notice of Submission of Proposed Information Collection to OMB Ginnie Mae Multiclass Securities...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-04-10

    ..., allowing the private sector to combine and restructure cash flows from Ginnie Mae Single Class MBS into... program, Ginnie Mae guarantees, with the full faith and credit of the United States, the timely payment of... combine and restructure cash flows from Ginnie Mae Single Class MBS into securities that meet unique...

  9. Multi-Class Analytical Models of the DECsystem-10 Job-Swapping Behavior.

    DTIC Science & Technology

    1981-12-01

    Mark of the Unicorn for developing Scribble, the best text formatter for a CP/M home computer I’ve ever seen, and for their patient help when I...requests for service will tend to decrease as the queue length grows . The finite population model, also known as the machine interference model, was

  10. Consensus Classification Using Non-Optimized Classifiers.

    PubMed

    Brownfield, Brett; Lemos, Tony; Kalivas, John H

    2018-04-03

    Classifying samples into categories is a common problem in analytical chemistry and other fields. Classification is usually based on only one method, but numerous classifiers are available with some being complex, such as neural networks, and others are simple, such as k nearest neighbors. Regardless, most classification schemes require optimization of one or more tuning parameters for best classification accuracy, sensitivity, and specificity. A process not requiring exact selection of tuning parameter values would be useful. To improve classification, several ensemble approaches have been used in past work to combine classification results from multiple optimized single classifiers. The collection of classifications for a particular sample are then combined by a fusion process such as majority vote to form the final classification. Presented in this Article is a method to classify a sample by combining multiple classification methods without specifically classifying the sample by each method, that is, the classification methods are not optimized. The approach is demonstrated on three analytical data sets. The first is a beer authentication set with samples measured on five instruments, allowing fusion of multiple instruments by three ways. The second data set is composed of textile samples from three classes based on Raman spectra. This data set is used to demonstrate the ability to classify simultaneously with different data preprocessing strategies, thereby reducing the need to determine the ideal preprocessing method, a common prerequisite for accurate classification. The third data set contains three wine cultivars for three classes measured at 13 unique chemical and physical variables. In all cases, fusion of nonoptimized classifiers improves classification. Also presented are atypical uses of Procrustes analysis and extended inverted signal correction (EISC) for distinguishing sample similarities to respective classes.

  11. A Human Activity Recognition System Based on Dynamic Clustering of Skeleton Data.

    PubMed

    Manzi, Alessandro; Dario, Paolo; Cavallo, Filippo

    2017-05-11

    Human activity recognition is an important area in computer vision, with its wide range of applications including ambient assisted living. In this paper, an activity recognition system based on skeleton data extracted from a depth camera is presented. The system makes use of machine learning techniques to classify the actions that are described with a set of a few basic postures. The training phase creates several models related to the number of clustered postures by means of a multiclass Support Vector Machine (SVM), trained with Sequential Minimal Optimization (SMO). The classification phase adopts the X-means algorithm to find the optimal number of clusters dynamically. The contribution of the paper is twofold. The first aim is to perform activity recognition employing features based on a small number of informative postures, extracted independently from each activity instance; secondly, it aims to assess the minimum number of frames needed for an adequate classification. The system is evaluated on two publicly available datasets, the Cornell Activity Dataset (CAD-60) and the Telecommunication Systems Team (TST) Fall detection dataset. The number of clusters needed to model each instance ranges from two to four elements. The proposed approach reaches excellent performances using only about 4 s of input data (~100 frames) and outperforms the state of the art when it uses approximately 500 frames on the CAD-60 dataset. The results are promising for the test in real context.

  12. A study on automated anatomical labeling to arteries concerning with colon from 3D abdominal CT images

    NASA Astrophysics Data System (ADS)

    Hoang, Bui Huy; Oda, Masahiro; Jiang, Zhengang; Kitasaka, Takayuki; Misawa, Kazunari; Fujiwara, Michitaka; Mori, Kensaku

    2011-03-01

    This paper presents an automated anatomical labeling method of arteries extracted from contrasted 3D CT images based on multi-class AdaBoost. In abdominal surgery, understanding of vasculature related to a target organ such as the colon is very important. Therefore, the anatomical structure of blood vessels needs to be understood by computers in a system supporting abdominal surgery. There are several researches on automated anatomical labeling, but there is no research on automated anatomical labeling to arteries concerning with the colon. The proposed method obtains a tree structure of arteries from the artery region and calculates features values of each branch. These feature values are thickness, curvature, direction, and running vectors of branch. Then, candidate arterial names are computed by classifiers that are trained to output artery names. Finally, a global optimization process is applied to the candidate arterial names to determine final names. Target arteries of this paper are nine lower abdominal arteries (AO, LCIA, RCIA, LEIA, REIA, SMA, IMA, LIIA, RIIA). We applied the proposed method to 14 cases of 3D abdominal contrasted CT images, and evaluated the results by leave-one-out scheme. The average precision and recall rates of the proposed method were 87.9% and 93.3%, respectively. The results of this method are applicable for anatomical name display of surgical simulation and computer aided surgery.

  13. Five-class differential diagnostics of neurodegenerative diseases using random undersampling boosting.

    PubMed

    Tong, Tong; Ledig, Christian; Guerrero, Ricardo; Schuh, Andreas; Koikkalainen, Juha; Tolonen, Antti; Rhodius, Hanneke; Barkhof, Frederik; Tijms, Betty; Lemstra, Afina W; Soininen, Hilkka; Remes, Anne M; Waldemar, Gunhild; Hasselbalch, Steen; Mecocci, Patrizia; Baroni, Marta; Lötjönen, Jyrki; Flier, Wiesje van der; Rueckert, Daniel

    2017-01-01

    Differentiating between different types of neurodegenerative diseases is not only crucial in clinical practice when treatment decisions have to be made, but also has a significant potential for the enrichment of clinical trials. The purpose of this study is to develop a classification framework for distinguishing the four most common neurodegenerative diseases, including Alzheimer's disease, frontotemporal lobe degeneration, Dementia with Lewy bodies and vascular dementia, as well as patients with subjective memory complaints. Different biomarkers including features from images (volume features, region-wise grading features) and non-imaging features (CSF measures) were extracted for each subject. In clinical practice, the prevalence of different dementia types is imbalanced, posing challenges for learning an effective classification model. Therefore, we propose the use of the RUSBoost algorithm in order to train classifiers and to handle the class imbalance training problem. Furthermore, a multi-class feature selection method based on sparsity is integrated into the proposed framework to improve the classification performance. It also provides a way for investigating the importance of different features and regions. Using a dataset of 500 subjects, the proposed framework achieved a high accuracy of 75.2% with a balanced accuracy of 69.3% for the five-class classification using ten-fold cross validation, which is significantly better than the results using support vector machine or random forest, demonstrating the feasibility of the proposed framework to support clinical decision making.

  14. Near-infrared Raman spectroscopy for estimating biochemical changes associated with different pathological conditions of cervix

    NASA Astrophysics Data System (ADS)

    Daniel, Amuthachelvi; Prakasarao, Aruna; Ganesan, Singaravelu

    2018-02-01

    The molecular level changes associated with oncogenesis precede the morphological changes in cells and tissues. Hence molecular level diagnosis would promote early diagnosis of the disease. Raman spectroscopy is capable of providing specific spectral signature of various biomolecules present in the cells and tissues under various pathological conditions. The aim of this work is to develop a non-linear multi-class statistical methodology for discrimination of normal, neoplastic and malignant cells/tissues. The tissues were classified as normal, pre-malignant and malignant by employing Principal Component Analysis followed by Artificial Neural Network (PC-ANN). The overall accuracy achieved was 99%. Further, to get an insight into the quantitative biochemical composition of the normal, neoplastic and malignant tissues, a linear combination of the major biochemicals by non-negative least squares technique was fit to the measured Raman spectra of the tissues. This technique confirms the changes in the major biomolecules such as lipids, nucleic acids, actin, glycogen and collagen associated with the different pathological conditions. To study the efficacy of this technique in comparison with histopathology, we have utilized Principal Component followed by Linear Discriminant Analysis (PC-LDA) to discriminate the well differentiated, moderately differentiated and poorly differentiated squamous cell carcinoma with an accuracy of 94.0%. And the results demonstrated that Raman spectroscopy has the potential to complement the good old technique of histopathology.

  15. Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers.

    PubMed

    Labaj, Wojciech; Papiez, Anna; Polanski, Andrzej; Polanska, Joanna

    2017-03-01

    Large collections of data in studies on cancer such as leukaemia provoke the necessity of applying tailored analysis algorithms to ensure supreme information extraction. In this work, a custom-fit pipeline is demonstrated for thorough investigation of the voluminous MILE gene expression data set. Three analyses are accomplished, each for gaining a deeper understanding of the processes underlying leukaemia types and subtypes. First, the main disease groups are tested for differential expression against the healthy control as in a standard case-control study. Here, the basic knowledge on molecular mechanisms is confirmed quantitatively and by literature references. Second, pairwise comparison testing is performed for juxtaposing the main leukaemia types among each other. In this case by means of the Dice coefficient similarity measure the general relations are pointed out. Moreover, lists of candidate main leukaemia group biomarkers are proposed. Finally, with this approach being successful, the third analysis provides insight into all of the studied subtypes, followed by the emergence of four leukaemia subtype biomarkers. In addition, the class enhanced DEG signature obtained on the basis of novel pipeline processing leads to significantly better classification power of multi-class data classifiers. The developed methodology consisting of batch effect adjustment, adaptive noise and feature filtration coupled with adequate statistical testing and biomarker definition proves to be an effective approach towards knowledge discovery in high-throughput molecular biology experiments.

  16. Liquid-Based Medium Used to Prepare Cytological Breast Nipple Fluid Improves the Quality of Cellular Samples Automatic Collection

    PubMed Central

    Zonta, Marco Antonio; Velame, Fernanda; Gema, Samara; Filassi, Jose Roberto; Longatto-Filho, Adhemar

    2014-01-01

    Background Breast cancer is the second cause of death in women worldwide. The spontaneous breast nipple discharge may contain cells that can be analyzed for malignancy. Halo® Mamo Cyto Test (HMCT) was recently developed as an automated system indicated to aspirate cells from the breast ducts. The objective of this study was to standardize the methodology of sampling and sample preparation of nipple discharge obtained by the automated method Halo breast test and perform cytological evaluation in samples preserved in liquid medium (SurePath™). Methods We analyzed 564 nipple fluid samples, from women between 20 and 85 years old, without history of breast disease and neoplasia, no pregnancy, and without gynecologic medical history, collected by HMCT method and preserved in two different vials with solutions for transport. Results From 306 nipple fluid samples from method 1, 199 (65%) were classified as unsatisfactory (class 0), 104 (34%) samples were classified as benign findings (class II), and three (1%) were classified as undetermined to neoplastic cells (class III). From 258 samples analyzed in method 2, 127 (49%) were classified as class 0, 124 (48%) were classified as class II, and seven (2%) were classified as class III. Conclusion Our study suggests an improvement in the quality and quantity of cellular samples when the association of the two methodologies is performed, Halo breast test and the method in liquid medium. PMID:29147397

  17. Development of a Support Vector Machine - Based Image Analysis System for Focal Liver Lesions Classification in Magnetic Resonance Images

    NASA Astrophysics Data System (ADS)

    Gatos, I.; Tsantis, S.; Karamesini, M.; Skouroliakou, A.; Kagadis, G.

    2015-09-01

    Purpose: The design and implementation of a computer-based image analysis system employing the support vector machine (SVM) classifier system for the classification of Focal Liver Lesions (FLLs) on routine non-enhanced, T2-weighted Magnetic Resonance (MR) images. Materials and Methods: The study comprised 92 patients; each one of them has undergone MRI performed on a Magnetom Concerto (Siemens). Typical signs on dynamic contrast-enhanced MRI and biopsies were employed towards a three class categorization of the 92 cases: 40-benign FLLs, 25-Hepatocellular Carcinomas (HCC) within Cirrhotic liver parenchyma and 27-liver metastases from Non-Cirrhotic liver. Prior to FLLs classification an automated lesion segmentation algorithm based on Marcov Random Fields was employed in order to acquire each FLL Region of Interest. 42 texture features derived from the gray-level histogram, co-occurrence and run-length matrices and 12 morphological features were obtained from each lesion. Stepwise multi-linear regression analysis was utilized to avoid feature redundancy leading to a feature subset that fed the multiclass SVM classifier designed for lesion classification. SVM System evaluation was performed by means of leave-one-out method and ROC analysis. Results: Maximum accuracy for all three classes (90.0%) was obtained by means of the Radial Basis Kernel Function and three textural features (Inverse- Different-Moment, Sum-Variance and Long-Run-Emphasis) that describe lesion's contrast, variability and shape complexity. Sensitivity values for the three classes were 92.5%, 81.5% and 96.2% respectively, whereas specificity values were 94.2%, 95.3% and 95.5%. The AUC value achieved for the selected subset was 0.89 with 0.81 - 0.94 confidence interval. Conclusion: The proposed SVM system exhibit promising results that could be utilized as a second opinion tool to the radiologist in order to decrease the time/cost of diagnosis and the need for patients to undergo invasive examination.

  18. Using textons to rank crystallization droplets by the likely presence of crystals

    PubMed Central

    Ng, Jia Tsing; Dekker, Carien; Kroemer, Markus; Osborne, Michael; von Delft, Frank

    2014-01-01

    The visual inspection of crystallization experiments is an important yet time-consuming and subjective step in X-ray crystallo­graphy. Previously published studies have focused on automatically classifying crystallization droplets into distinct but ultimately arbitrary experiment outcomes; here, a method is described that instead ranks droplets by their likelihood of containing crystals or microcrystals, thereby prioritizing for visual inspection those images that are most likely to contain useful information. The use of textons is introduced to describe crystallization droplets objectively, allowing them to be scored with the posterior probability of a random forest classifier trained against droplets manually annotated for the presence or absence of crystals or microcrystals. Unlike multi-class classification, this two-class system lends itself naturally to unidirectional ranking, which is most useful for assisting sequential viewing because images can be arranged simply by using these scores: this places droplets with probable crystalline behaviour early in the viewing order. Using this approach, the top ten wells included at least one human-annotated crystal or microcrystal for 94% of the plates in a data set of 196 plates imaged with a Minstrel HT system. The algorithm is robustly transferable to at least one other imaging system: when the parameters trained from Minstrel HT images are applied to a data set imaged by the Rock Imager system, human-annotated crystals ranked in the top ten wells for 90% of the plates. Because rearranging images is fundamental to the approach, a custom viewer was written to seamlessly support such ranked viewing, along with another important output of the algorithm, namely the shape of the curve of scores, which is itself a useful overview of the behaviour of the plate; additional features with known usefulness were adopted from existing viewers. Evidence is presented that such ranked viewing of images allows faster but more accurate evaluation of drops, in particular for the identification of microcrystals. PMID:25286854

  19. A Novel Feature Optimization for Wearable Human-Computer Interfaces Using Surface Electromyography Sensors

    PubMed Central

    Zhang, Xiong; Zhao, Yacong; Zhang, Yu; Zhong, Xuefei; Fan, Zhaowen

    2018-01-01

    The novel human-computer interface (HCI) using bioelectrical signals as input is a valuable tool to improve the lives of people with disabilities. In this paper, surface electromyography (sEMG) signals induced by four classes of wrist movements were acquired from four sites on the lower arm with our designed system. Forty-two features were extracted from the time, frequency and time-frequency domains. Optimal channels were determined from single-channel classification performance rank. The optimal-feature selection was according to a modified entropy criteria (EC) and Fisher discrimination (FD) criteria. The feature selection results were evaluated by four different classifiers, and compared with other conventional feature subsets. In online tests, the wearable system acquired real-time sEMG signals. The selected features and trained classifier model were used to control a telecar through four different paradigms in a designed environment with simple obstacles. Performance was evaluated based on travel time (TT) and recognition rate (RR). The results of hardware evaluation verified the feasibility of our acquisition systems, and ensured signal quality. Single-channel analysis results indicated that the channel located on the extensor carpi ulnaris (ECU) performed best with mean classification accuracy of 97.45% for all movement’s pairs. Channels placed on ECU and the extensor carpi radialis (ECR) were selected according to the accuracy rank. Experimental results showed that the proposed FD method was better than other feature selection methods and single-type features. The combination of FD and random forest (RF) performed best in offline analysis, with 96.77% multi-class RR. Online results illustrated that the state-machine paradigm with a 125 ms window had the highest maneuverability and was closest to real-life control. Subjects could accomplish online sessions by three sEMG-based paradigms, with average times of 46.02, 49.06 and 48.08 s, respectively. These experiments validate the feasibility of proposed real-time wearable HCI system and algorithms, providing a potential assistive device interface for persons with disabilities. PMID:29543737

  20. Beyond maximum speed—a novel two-stimulus paradigm for brain-computer interfaces based on event-related potentials (P300-BCI)

    NASA Astrophysics Data System (ADS)

    Kaufmann, Tobias; Kübler, Andrea

    2014-10-01

    Objective. The speed of brain-computer interfaces (BCI), based on event-related potentials (ERP), is inherently limited by the commonly used one-stimulus paradigm. In this paper, we introduce a novel paradigm that can increase the spelling speed by a factor of 2, thereby extending the one-stimulus paradigm to a two-stimulus paradigm. Two different stimuli (a face and a symbol) are presented at the same time, superimposed on different characters and ERPs are classified using a multi-class classifier. Here, we present the proof-of-principle that is achieved with healthy participants. Approach. Eight participants were confronted with the novel two-stimulus paradigm and, for comparison, with two one-stimulus paradigms that used either one of the stimuli. Classification accuracies (percentage of correctly predicted letters) and elicited ERPs from the three paradigms were compared in a comprehensive offline analysis. Main results. The accuracies slightly decreased with the novel system compared to the established one-stimulus face paradigm. However, the use of two stimuli allowed for spelling at twice the maximum speed of the one-stimulus paradigms, and participants still achieved an average accuracy of 81.25%. This study introduced an alternative way of increasing the spelling speed in ERP-BCIs and illustrated that ERP-BCIs may not yet have reached their speed limit. Future research is needed in order to improve the reliability of the novel approach, as some participants displayed reduced accuracies. Furthermore, a comparison to the most recent BCI systems with individually adjusted, rapid stimulus timing is needed to draw conclusions about the practical relevance of the proposed paradigm. Significance. We introduced a novel two-stimulus paradigm that might be of high value for users who have reached the speed limit with the current one-stimulus ERP-BCI systems.

  1. Focal liver lesions segmentation and classification in nonenhanced T2-weighted MRI.

    PubMed

    Gatos, Ilias; Tsantis, Stavros; Karamesini, Maria; Spiliopoulos, Stavros; Karnabatidis, Dimitris; Hazle, John D; Kagadis, George C

    2017-07-01

    To automatically segment and classify focal liver lesions (FLLs) on nonenhanced T2-weighted magnetic resonance imaging (MRI) scans using a computer-aided diagnosis (CAD) algorithm. 71 FLLs (30 benign lesions, 19 hepatocellular carcinomas, and 22 metastases) on T2-weighted MRI scans were delineated by the proposed CAD scheme. The FLL segmentation procedure involved wavelet multiscale analysis to extract accurate edge information and mean intensity values for consecutive edges computed using horizontal and vertical analysis that were fed into the subsequent fuzzy C-means algorithm for final FLL border extraction. Texture information for each extracted lesion was derived using 42 first- and second-order textural features from grayscale value histogram, co-occurrence, and run-length matrices. Twelve morphological features were also extracted to capture any shape differentiation between classes. Feature selection was performed with stepwise multilinear regression analysis that led to a reduced feature subset. A multiclass Probabilistic Neural Network (PNN) classifier was then designed and used for lesion classification. PNN model evaluation was performed using the leave-one-out (LOO) method and receiver operating characteristic (ROC) curve analysis. The mean overlap between the automatically segmented FLLs and the manual segmentations performed by radiologists was 0.91 ± 0.12. The highest classification accuracies in the PNN model for the benign, hepatocellular carcinoma, and metastatic FLLs were 94.1%, 91.4%, and 94.1%, respectively, with sensitivity/specificity values of 90%/97.3%, 89.5%/92.2%, and 90.9%/95.6% respectively. The overall classification accuracy for the proposed system was 90.1%. Our diagnostic system using sophisticated FLL segmentation and classification algorithms is a powerful tool for routine clinical MRI-based liver evaluation and can be a supplement to contrast-enhanced MRI to prevent unnecessary invasive procedures. © 2017 American Association of Physicists in Medicine.

  2. High-performance thin-layer chromatography screening of multi class antibiotics in animal food by bioluminescent bioautography and electrospray ionization mass spectrometry.

    PubMed

    Chen, Yisheng; Schwack, Wolfgang

    2014-08-22

    The world-wide usage and partly abuse of veterinary antibiotics resulted in a pressing need to control residues in animal-derived foods. Large-scale screening for residues of antibiotics is typically performed by microbial agar diffusion tests. This work employing high-performance thin-layer chromatography (HPTLC) combined with bioautography and electrospray ionization mass spectrometry introduces a rapid and efficient method for a multi-class screening of antibiotic residues. The viability of the bioluminescent bacterium Aliivibrio fischeri to the studied antibiotics (16 species of 5 groups) was optimized on amino plates, enabling detection sensitivity down to the strictest maximum residue limits. The HPTLC method was developed not to separate the individual antibiotics, but for cleanup of sample extracts. The studied antibiotics either remained at the start zones (tetracyclines, aminoglycosides, fluoroquinolones, and macrolides) or migrated into the front (amphenicols), while interfering co-extracted matrix compounds were dispersed at hRf 20-80. Only after a few hours, the multi-sample plate image clearly revealed the presence or absence of antibiotic residues. Moreover, molecular information as to the suspected findings was rapidly achieved by HPTLC-mass spectrometry. Showing remarkable sensitivity and matrix-tolerance, the established method was successfully applied to milk and kidney samples. Copyright © 2014 Elsevier B.V. All rights reserved.

  3. High-Throughput Analytical Techniques for the Determination of the Residues of 653 Multiclass Pesticides and Chemical Pollutants in Tea, Part VII: A GC-MS, GC-MS/MS, and LC-MS/MS Study of the Degradation Profiles of Pesticide Residues in Green Tea.

    PubMed

    Chang, Qiao-Ying; Pang, Guo-Fang; Fan, Chun-Lin; Chen, Hui; Yang, Fang; Li, Jie; Wen, Bi-Fang

    2016-11-01

    GC-MS, GC-tandem MS (MS/MS), and LC-MS/MS were used to mathematically define the degradation profiles of pesticide residues in two field trials. Nineteen pesticides were studied in the first field trial and 11 in the second. The results of the field trials demonstrated that the degradation profiles of pesticide residues in green tea can be described with power functions to successfully estimate the amount of time, following pesticide application, pesticide residues appearing in tea in concentrations at and/or above the maximum residue limit (MRL) decrease to concentrations below the MRL. Stability tests on green tea samples stored at room temperature were conducted to determine whether pesticide-incurred green tea samples prepared according to the method used in the field trials would be suitable for the preparation of reference standards for laboratory-proficiency testing trials. This paper reports the results of a GC-MS, GC-MS/MS, and LC-MS/MS study, as well as the suitability of the samples prepared under these conditions for use as pesticide reference standards in tea analysis.

  4. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

    PubMed Central

    Thanh Noi, Phan; Kappas, Martin

    2017-01-01

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909

  5. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery.

    PubMed

    Thanh Noi, Phan; Kappas, Martin

    2017-12-22

    In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.

  6. An expert support system for breast cancer diagnosis using color wavelet features.

    PubMed

    Issac Niwas, S; Palanisamy, P; Chibbar, Rajni; Zhang, W J

    2012-10-01

    Breast cancer diagnosis can be done through the pathologic assessments of breast tissue samples such as core needle biopsy technique. The result of analysis on this sample by pathologist is crucial for breast cancer patient. In this paper, nucleus of tissue samples are investigated after decomposition by means of the Log-Gabor wavelet on HSV color domain and an algorithm is developed to compute the color wavelet features. These features are used for breast cancer diagnosis using Support Vector Machine (SVM) classifier algorithm. The ability of properly trained SVM is to correctly classify patterns and make them particularly suitable for use in an expert system that aids in the diagnosis of cancer tissue samples. The results are compared with other multivariate classifiers such as Naïves Bayes classifier and Artificial Neural Network. The overall accuracy of the proposed method using SVM classifier will be further useful for automation in cancer diagnosis.

  7. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.

    PubMed

    Lê Cao, Kim-Anh; Boitard, Simon; Besse, Philippe

    2011-06-22

    Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.

  8. Natural stimuli improve auditory BCIs with respect to ergonomics and performance

    NASA Astrophysics Data System (ADS)

    Höhne, Johannes; Krenzlin, Konrad; Dähne, Sven; Tangermann, Michael

    2012-08-01

    Moving from well-controlled, brisk artificial stimuli to natural and less-controlled stimuli seems counter-intuitive for event-related potential (ERP) studies. As natural stimuli typically contain a richer internal structure, they might introduce higher levels of variance and jitter in the ERP responses. Both characteristics are unfavorable for a good single-trial classification of ERPs in the context of a multi-class brain-computer interface (BCI) system, where the class-discriminant information between target stimuli and non-target stimuli must be maximized. For the application in an auditory BCI system, however, the transition from simple artificial tones to natural syllables can be useful despite the variance introduced. In the presented study, healthy users (N = 9) participated in an offline auditory nine-class BCI experiment with artificial and natural stimuli. It is shown that the use of syllables as natural stimuli does not only improve the users’ ergonomic ratings; also the classification performance is increased. Moreover, natural stimuli obtain a better balance in multi-class decisions, such that the number of systematic confusions between the nine classes is reduced. Hopefully, our findings may contribute to make auditory BCI paradigms more user friendly and applicable for patients.

  9. Urine cell-based DNA methylation classifier for monitoring bladder cancer.

    PubMed

    van der Heijden, Antoine G; Mengual, Lourdes; Ingelmo-Torres, Mercedes; Lozano, Juan J; van Rijt-van de Westerlo, Cindy C M; Baixauli, Montserrat; Geavlete, Bogdan; Moldoveanud, Cristian; Ene, Cosmin; Dinney, Colin P; Czerniak, Bogdan; Schalken, Jack A; Kiemeney, Lambertus A L M; Ribal, Maria J; Witjes, J Alfred; Alcaraz, Antonio

    2018-01-01

    Current standard methods used to detect and monitor bladder cancer (BC) are invasive or have low sensitivity. This study aimed to develop a urine methylation biomarker classifier for BC monitoring and validate this classifier in patients in follow-up for bladder cancer (PFBC). Voided urine samples ( N  = 725) from BC patients, controls, and PFBC were prospectively collected in four centers. Finally, 626 urine samples were available for analysis. DNA was extracted from the urinary cells and bisulfite modificated, and methylation status was analyzed using pyrosequencing. Cytology was available from a subset of patients ( N  = 399). In the discovery phase, seven selected genes from the literature ( CDH13 , CFTR , NID2 , SALL3 , TMEFF2 , TWIST1 , and VIM2 ) were studied in 111 BC and 57 control samples. This training set was used to develop a gene classifier by logistic regression and was validated in 458 PFBC samples (173 with recurrence). A three-gene methylation classifier containing CFTR , SALL3 , and TWIST1 was developed in the training set (AUC 0.874). The classifier achieved an AUC of 0.741 in the validation series. Cytology results were available for 308 samples from the validation set. Cytology achieved AUC 0.696 whereas the classifier in this subset of patients reached an AUC 0.768. Combining the methylation classifier with cytology results achieved an AUC 0.86 in the validation set, with a sensitivity of 96%, a specificity of 40%, and a positive and negative predictive value of 56 and 92%, respectively. The combination of the three-gene methylation classifier and cytology results has high sensitivity and high negative predictive value in a real clinical scenario (PFBC). The proposed classifier is a useful test for predicting BC recurrence and decrease the number of cystoscopies in the follow-up of BC patients. If only patients with a positive combined classifier result would be cystoscopied, 36% of all cystoscopies can be prevented.

  10. Diverse expected gradient active learning for relative attributes.

    PubMed

    You, Xinge; Wang, Ruxin; Tao, Dacheng

    2014-07-01

    The use of relative attributes for semantic understanding of images and videos is a promising way to improve communication between humans and machines. However, it is extremely labor- and time-consuming to define multiple attributes for each instance in large amount of data. One option is to incorporate active learning, so that the informative samples can be actively discovered and then labeled. However, most existing active-learning methods select samples one at a time (serial mode), and may therefore lose efficiency when learning multiple attributes. In this paper, we propose a batch-mode active-learning method, called diverse expected gradient active learning. This method integrates an informativeness analysis and a diversity analysis to form a diverse batch of queries. Specifically, the informativeness analysis employs the expected pairwise gradient length as a measure of informativeness, while the diversity analysis forces a constraint on the proposed diverse gradient angle. Since simultaneous optimization of these two parts is intractable, we utilize a two-step procedure to obtain the diverse batch of queries. A heuristic method is also introduced to suppress imbalanced multiclass distributions. Empirical evaluations of three different databases demonstrate the effectiveness and efficiency of the proposed approach.

  11. Diverse Expected Gradient Active Learning for Relative Attributes.

    PubMed

    You, Xinge; Wang, Ruxin; Tao, Dacheng

    2014-06-02

    The use of relative attributes for semantic understanding of images and videos is a promising way to improve communication between humans and machines. However, it is extremely labor- and time-consuming to define multiple attributes for each instance in large amount of data. One option is to incorporate active learning, so that the informative samples can be actively discovered and then labeled. However, most existing active-learning methods select samples one at a time (serial mode), and may therefore lose efficiency when learning multiple attributes. In this paper, we propose a batch-mode active-learning method, called Diverse Expected Gradient Active Learning (DEGAL). This method integrates an informativeness analysis and a diversity analysis to form a diverse batch of queries. Specifically, the informativeness analysis employs the expected pairwise gradient length as a measure of informativeness, while the diversity analysis forces a constraint on the proposed diverse gradient angle. Since simultaneous optimization of these two parts is intractable, we utilize a two-step procedure to obtain the diverse batch of queries. A heuristic method is also introduced to suppress imbalanced multi-class distributions. Empirical evaluations of three different databases demonstrate the effectiveness and efficiency of the proposed approach.

  12. Confidence Preserving Machine for Facial Action Unit Detection

    PubMed Central

    Zeng, Jiabei; Chu, Wen-Sheng; De la Torre, Fernando; Cohn, Jeffrey F.; Xiong, Zhang

    2016-01-01

    Facial action unit (AU) detection from video has been a long-standing problem in automated facial expression analysis. While progress has been made, accurate detection of facial AUs remains challenging due to ubiquitous sources of errors, such as inter-personal variability, pose, and low-intensity AUs. In this paper, we refer to samples causing such errors as hard samples, and the remaining as easy samples. To address learning with the hard samples, we propose the Confidence Preserving Machine (CPM), a novel two-stage learning framework that combines multiple classifiers following an “easy-to-hard” strategy. During the training stage, CPM learns two confident classifiers. Each classifier focuses on separating easy samples of one class from all else, and thus preserves confidence on predicting each class. During the testing stage, the confident classifiers provide “virtual labels” for easy test samples. Given the virtual labels, we propose a quasi-semi-supervised (QSS) learning strategy to learn a person-specific (PS) classifier. The QSS strategy employs a spatio-temporal smoothness that encourages similar predictions for samples within a spatio-temporal neighborhood. In addition, to further improve detection performance, we introduce two CPM extensions: iCPM that iteratively augments training samples to train the confident classifiers, and kCPM that kernelizes the original CPM model to promote nonlinearity. Experiments on four spontaneous datasets GFT [15], BP4D [56], DISFA [42], and RU-FACS [3] illustrate the benefits of the proposed CPM models over baseline methods and state-of-the-art semisupervised learning and transfer learning methods. PMID:27479964

  13. Multiclass Data Segmentation using Diffuse Interface Methods on Graphs

    DTIC Science & Technology

    2014-01-01

    37] that performs interac- tive image segmentation using the solution to a combinatorial Dirichlet problem. Elmoataz et al . have developed general...izations of the graph Laplacian [25] for image denoising and manifold smoothing. Couprie et al . in [18] define a conve- niently parameterized graph...continuous setting carry over to the discrete graph representation. For general data segmentation, Bresson et al . in [8], present rigorous convergence

  14. Divergence and Necessary Conditions for Extremums

    NASA Technical Reports Server (NTRS)

    Quirein, J. A.

    1973-01-01

    The problem is considered of finding a dimension reducing transformation matrix B that maximizes the divergence in the reduced dimension for multi-class cases. A comparitively simple expression for the gradient of the average divergence with respect to B is developed. The developed expression for the gradient contains no eigenvectors or eigenvalues; also, all matrix inversions necessary to evaluate the gradient are available from computing the average divergence.

  15. AI User Support System for SAP ERP

    NASA Astrophysics Data System (ADS)

    Vlasov, Vladimir; Chebotareva, Victoria; Rakhimov, Marat; Kruglikov, Sergey

    2017-10-01

    An intelligent system for SAP ERP user support is proposed in this paper. It enables automatic replies on users’ requests for support, saving time for problem analysis and resolution and improving responsiveness for end users. The system is based on an ensemble of machine learning algorithms of multiclass text classification, providing efficient question understanding, and a special framework for evidence retrieval, providing the best answer derivation.

  16. Evaluation of a new carbon/zirconia-based sorbent for the cleanup of food extracts in multiclass analysis of pesticides and environmental contaminants

    USDA-ARS?s Scientific Manuscript database

    A novel carbon/zirconia based material, SupelTM QuE Verde (Verde), was evaluated in a filter-vial dispersive solid phase extraction (d-SPE) cleanup of QuEChERS extracts of pork, salmon, kale, and avocado for residual analysis of pesticides and environmental contaminants. Low pressure (LP) GC-MS/MS w...

  17. Enhancing Performance and Bit Rates in a Brain-Computer Interface System With Phase-to-Amplitude Cross-Frequency Coupling: Evidences From Traditional c-VEP, Fast c-VEP, and SSVEP Designs.

    PubMed

    Dimitriadis, Stavros I; Marimpis, Avraam D

    2018-01-01

    A brain-computer interface (BCI) is a channel of communication that transforms brain activity into specific commands for manipulating a personal computer or other home or electrical devices. In other words, a BCI is an alternative way of interacting with the environment by using brain activity instead of muscles and nerves. For that reason, BCI systems are of high clinical value for targeted populations suffering from neurological disorders. In this paper, we present a new processing approach in three publicly available BCI data sets: (a) a well-known multi-class ( N = 6) coded-modulated Visual Evoked potential (c-VEP)-based BCI system for able-bodied and disabled subjects; (b) a multi-class ( N = 32) c-VEP with slow and fast stimulus representation; and (c) a steady-state Visual Evoked potential (SSVEP) multi-class ( N = 5) flickering BCI system. Estimating cross-frequency coupling (CFC) and namely δ-θ [δ: (0.5-4 Hz), θ: (4-8 Hz)] phase-to-amplitude coupling (PAC) within sensor and across experimental time, we succeeded in achieving high classification accuracy and Information Transfer Rates (ITR) in the three data sets. Our approach outperformed the originally presented ITR on the three data sets. The bit rates obtained for both the disabled and able-bodied subjects reached the fastest reported level of 324 bits/min with the PAC estimator. Additionally, our approach outperformed alternative signal features such as the relative power (29.73 bits/min) and raw time series analysis (24.93 bits/min) and also the original reported bit rates of 10-25 bits/min . In the second data set, we succeeded in achieving an average ITR of 124.40 ± 11.68 for the slow 60 Hz and an average ITR of 233.99 ± 15.75 for the fast 120 Hz. In the third data set, we succeeded in achieving an average ITR of 106.44 ± 8.94. Current methodology outperforms any previous methodologies applied to each of the three free available BCI datasets.

  18. Centre-based restricted nearest feature plane with angle classifier for face recognition

    NASA Astrophysics Data System (ADS)

    Tang, Linlin; Lu, Huifen; Zhao, Liang; Li, Zuohua

    2017-10-01

    An improved classifier based on the nearest feature plane (NFP), called the centre-based restricted nearest feature plane with the angle (RNFPA) classifier, is proposed for the face recognition problems here. The famous NFP uses the geometrical information of samples to increase the number of training samples, but it increases the computation complexity and it also has an inaccuracy problem coursed by the extended feature plane. To solve the above problems, RNFPA exploits a centre-based feature plane and utilizes a threshold of angle to restrict extended feature space. By choosing the appropriate angle threshold, RNFPA can improve the performance and decrease computation complexity. Experiments in the AT&T face database, AR face database and FERET face database are used to evaluate the proposed classifier. Compared with the original NFP classifier, the nearest feature line (NFL) classifier, the nearest neighbour (NN) classifier and some other improved NFP classifiers, the proposed one achieves competitive performance.

  19. Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease.

    PubMed

    Taguchi, Y-h; Iwadate, Mitsuo; Umeyama, Hideaki

    2015-04-30

    Feature extraction (FE) is difficult, particularly if there are more features than samples, as small sample numbers often result in biased outcomes or overfitting. Furthermore, multiple sample classes often complicate FE because evaluating performance, which is usual in supervised FE, is generally harder than the two-class problem. Developing sample classification independent unsupervised methods would solve many of these problems. Two principal component analysis (PCA)-based FE, specifically, variational Bayes PCA (VBPCA) was extended to perform unsupervised FE, and together with conventional PCA (CPCA)-based unsupervised FE, were tested as sample classification independent unsupervised FE methods. VBPCA- and CPCA-based unsupervised FE both performed well when applied to simulated data, and a posttraumatic stress disorder (PTSD)-mediated heart disease data set that had multiple categorical class observations in mRNA/microRNA expression of stressed mouse heart. A critical set of PTSD miRNAs/mRNAs were identified that show aberrant expression between treatment and control samples, and significant, negative correlation with one another. Moreover, greater stability and biological feasibility than conventional supervised FE was also demonstrated. Based on the results obtained, in silico drug discovery was performed as translational validation of the methods. Our two proposed unsupervised FE methods (CPCA- and VBPCA-based) worked well on simulated data, and outperformed two conventional supervised FE methods on a real data set. Thus, these two methods have suggested equivalence for FE on categorical multiclass data sets, with potential translational utility for in silico drug discovery.

  20. Development, validation and determination of multiclass pesticide residues in cocoa beans using gas chromatography and liquid chromatography tandem mass spectrometry.

    PubMed

    Zainudin, Badrul Hisyam; Salleh, Salsazali; Mohamed, Rahmat; Yap, Ken Choy; Muhamad, Halimah

    2015-04-01

    An efficient and rapid method for the analysis of pesticide residues in cocoa beans using gas and liquid chromatography-tandem mass spectrometry was developed, validated and applied to imported and domestic cocoa beans samples collected over 2 years from smallholders and Malaysian ports. The method was based on solvent extraction method and covers 26 pesticides (insecticides, fungicides, and herbicides) of different chemical classes. The recoveries for all pesticides at 10 and 50 μg/kg were in the range of 70-120% with relative standard deviations of less than 20%. Good selectivity and sensitivity were obtained with method limit of quantification of 10 μg/kg. The expanded uncertainty measurements were in the range of 4-25%. Finally, the proposed method was successfully applied for the routine analysis of pesticide residues in cocoa beans via a monitoring study where 10% of them was found positive for chlorpyrifos, ametryn and metalaxyl. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Online adaptive decision trees: pattern classification and function approximation.

    PubMed

    Basak, Jayanta

    2006-09-01

    Recently we have shown that decision trees can be trained in the online adaptive (OADT) mode (Basak, 2004), leading to better generalization score. OADTs were bottlenecked by the fact that they are able to handle only two-class classification tasks with a given structure. In this article, we provide an architecture based on OADT, ExOADT, which can handle multiclass classification tasks and is able to perform function approximation. ExOADT is structurally similar to OADT extended with a regression layer. We also show that ExOADT is capable not only of adapting the local decision hyperplanes in the nonterminal nodes but also has the potential of smoothly changing the structure of the tree depending on the data samples. We provide the learning rules based on steepest gradient descent for the new model ExOADT. Experimentally we demonstrate the effectiveness of ExOADT in the pattern classification and function approximation tasks. Finally, we briefly discuss the relationship of ExOADT with other classification models.

  2. Validation Study on a Rapid Method for Simultaneous Determination of Pesticide Residues in Vegetables and Fruits by LC-MS/MS.

    PubMed

    Sato, Tamaki; Miyamoto, Iori; Uemura, Masako; Nakatani, Tadashi; Kakutani, Naoya; Yamano, Tetsuo

    2016-01-01

    A validation study was carried out on a rapid method for the simultaneous determination of pesticide residues in vegetables and fruits by LC-MS/MS. Preparation of the test solution was performed by a solid-phase extraction technique with QuEChERS (STQ method). Pesticide residues were extracted with acetonitrile using a homogenizer, followed by salting-out and dehydration at the same time. The acetonitrile layer was purified with C18 and PSA mini-columns. The method was assessed for 130 pesticide residues in 14 kinds of vegetables and fruits at the concentration level of 0.01 μg/g according to the method validation guideline of the Ministry of Health, Labour and Welfare of Japan. As a result 75 to 120 pesticide residues were determined satisfactorily in the tested samples. Thus, this method could be useful for a rapid and simultaneous determination of multi-class pesticide residues in various vegetables and fruits.

  3. Training echo state networks for rotation-invariant bone marrow cell classification.

    PubMed

    Kainz, Philipp; Burgsteiner, Harald; Asslaber, Martin; Ahammer, Helmut

    2017-01-01

    The main principle of diagnostic pathology is the reliable interpretation of individual cells in context of the tissue architecture. Especially a confident examination of bone marrow specimen is dependent on a valid classification of myeloid cells. In this work, we propose a novel rotation-invariant learning scheme for multi-class echo state networks (ESNs), which achieves very high performance in automated bone marrow cell classification. Based on representing static images as temporal sequence of rotations, we show how ESNs robustly recognize cells of arbitrary rotations by taking advantage of their short-term memory capacity. The performance of our approach is compared to a classification random forest that learns rotation-invariance in a conventional way by exhaustively training on multiple rotations of individual samples. The methods were evaluated on a human bone marrow image database consisting of granulopoietic and erythropoietic cells in different maturation stages. Our ESN approach to cell classification does not rely on segmentation of cells or manual feature extraction and can therefore directly be applied to image data.

  4. Wheat signature modeling and analysis for improved training statistics

    NASA Technical Reports Server (NTRS)

    Nalepka, R. F. (Principal Investigator); Malila, W. A.; Cicone, R. C.; Gleason, J. M.

    1976-01-01

    The author has identified the following significant results. The spectral, spatial, and temporal characteristics of wheat and other signatures in LANDSAT multispectral scanner data were examined through empirical analysis and simulation. Irrigation patterns varied widely within Kansas; 88 percent of wheat acreage in Finney was irrigated and 24 percent in Morton, as opposed to less than 3 percent for western 2/3's of the State. The irrigation practice was definitely correlated with the observed spectral response; wheat variety differences produced observable spectral differences due to leaf coloration and different dates of maturation. Between-field differences were generally greater than within-field differences, and boundary pixels produced spectral features distinct from those within field centers. Multiclass boundary pixels contributed much of the observed bias in proportion estimates. The variability between signatures obtained by different draws of training data decreased as the sample size became larger; also, the resulting signatures became more robust and the particular decision threshold value became less important.

  5. A machine learning approach to the potential-field method for implicit modeling of geological structures

    NASA Astrophysics Data System (ADS)

    Gonçalves, Ítalo Gomes; Kumaira, Sissa; Guadagnin, Felipe

    2017-06-01

    Implicit modeling has experienced a rise in popularity over the last decade due to its advantages in terms of speed and reproducibility in comparison with manual digitization of geological structures. The potential-field method consists in interpolating a scalar function that indicates to which side of a geological boundary a given point belongs to, based on cokriging of point data and structural orientations. This work proposes a vector potential-field solution from a machine learning perspective, recasting the problem as multi-class classification, which alleviates some of the original method's assumptions. The potentials related to each geological class are interpreted in a compositional data framework. Variogram modeling is avoided through the use of maximum likelihood to train the model, and an uncertainty measure is introduced. The methodology was applied to the modeling of a sample dataset provided with the software Move™. The calculations were implemented in the R language and 3D visualizations were prepared with the rgl package.

  6. Effect of separate sampling on classification accuracy.

    PubMed

    Shahrokh Esfahani, Mohammad; Dougherty, Edward R

    2014-01-15

    Measurements are commonly taken from two phenotypes to build a classifier, where the number of data points from each class is predetermined, not random. In this 'separate sampling' scenario, the data cannot be used to estimate the class prior probabilities. Moreover, predetermined class sizes can severely degrade classifier performance, even for large samples. We employ simulations using both synthetic and real data to show the detrimental effect of separate sampling on a variety of classification rules. We establish propositions related to the effect on the expected classifier error owing to a sampling ratio different from the population class ratio. From these we derive a sample-based minimax sampling ratio and provide an algorithm for approximating it from the data. We also extend to arbitrary distributions the classical population-based Anderson linear discriminant analysis minimax sampling ratio derived from the discriminant form of the Bayes classifier. All the codes for synthetic data and real data examples are written in MATLAB. A function called mmratio, whose output is an approximation of the minimax sampling ratio of a given dataset, is also written in MATLAB. All the codes are available at: http://gsp.tamu.edu/Publications/supplementary/shahrokh13b.

  7. Recognition Using Hybrid Classifiers.

    PubMed

    Osadchy, Margarita; Keren, Daniel; Raviv, Dolev

    2016-04-01

    A canonical problem in computer vision is category recognition (e.g., find all instances of human faces, cars etc., in an image). Typically, the input for training a binary classifier is a relatively small sample of positive examples, and a huge sample of negative examples, which can be very diverse, consisting of images from a large number of categories. The difficulty of the problem sharply increases with the dimension and size of the negative example set. We propose to alleviate this problem by applying a "hybrid" classifier, which replaces the negative samples by a prior, and then finds a hyperplane which separates the positive samples from this prior. The method is extended to kernel space and to an ensemble-based approach. The resulting binary classifiers achieve an identical or better classification rate than SVM, while requiring far smaller memory and lower computational complexity to train and apply.

  8. A two-dimensional matrix image based feature extraction method for classification of sEMG: A comparative analysis based on SVM, KNN and RBF-NN.

    PubMed

    Wen, Tingxi; Zhang, Zhongnan; Qiu, Ming; Zeng, Ming; Luo, Weizhen

    2017-01-01

    The computer mouse is an important human-computer interaction device. But patients with physical finger disability are unable to operate this device. Surface EMG (sEMG) can be monitored by electrodes on the skin surface and is a reflection of the neuromuscular activities. Therefore, we can control limbs auxiliary equipment by utilizing sEMG classification in order to help the physically disabled patients to operate the mouse. To develop a new a method to extract sEMG generated by finger motion and apply novel features to classify sEMG. A window-based data acquisition method was presented to extract signal samples from sEMG electordes. Afterwards, a two-dimensional matrix image based feature extraction method, which differs from the classical methods based on time domain or frequency domain, was employed to transform signal samples to feature maps used for classification. In the experiments, sEMG data samples produced by the index and middle fingers at the click of a mouse button were separately acquired. Then, characteristics of the samples were analyzed to generate a feature map for each sample. Finally, the machine learning classification algorithms (SVM, KNN, RBF-NN) were employed to classify these feature maps on a GPU. The study demonstrated that all classifiers can identify and classify sEMG samples effectively. In particular, the accuracy of the SVM classifier reached up to 100%. The signal separation method is a convenient, efficient and quick method, which can effectively extract the sEMG samples produced by fingers. In addition, unlike the classical methods, the new method enables to extract features by enlarging sample signals' energy appropriately. The classical machine learning classifiers all performed well by using these features.

  9. Transferring genomics to the clinic: distinguishing Burkitt and diffuse large B cell lymphomas.

    PubMed

    Sha, Chulin; Barrans, Sharon; Care, Matthew A; Cunningham, David; Tooze, Reuben M; Jack, Andrew; Westhead, David R

    2015-01-01

    Classifiers based on molecular criteria such as gene expression signatures have been developed to distinguish Burkitt lymphoma and diffuse large B cell lymphoma, which help to explore the intermediate cases where traditional diagnosis is difficult. Transfer of these research classifiers into a clinical setting is challenging because there are competing classifiers in the literature based on different methodology and gene sets with no clear best choice; classifiers based on one expression measurement platform may not transfer effectively to another; and, classifiers developed using fresh frozen samples may not work effectively with the commonly used and more convenient formalin fixed paraffin-embedded samples used in routine diagnosis. Here we thoroughly compared two published high profile classifiers developed on data from different Affymetrix array platforms and fresh-frozen tissue, examining their transferability and concordance. Based on this analysis, a new Burkitt and diffuse large B cell lymphoma classifier (BDC) was developed and employed on Illumina DASL data from our own paraffin-embedded samples, allowing comparison with the diagnosis made in a central haematopathology laboratory and evaluation of clinical relevance. We show that both previous classifiers can be recapitulated using very much smaller gene sets than originally employed, and that the classification result is closely dependent on the Burkitt lymphoma criteria applied in the training set. The BDC classification on our data exhibits high agreement (~95 %) with the original diagnosis. A simple outcome comparison in the patients presenting intermediate features on conventional criteria suggests that the cases classified as Burkitt lymphoma by BDC have worse response to standard diffuse large B cell lymphoma treatment than those classified as diffuse large B cell lymphoma. In this study, we comprehensively investigate two previous Burkitt lymphoma molecular classifiers, and implement a new gene expression classifier, BDC, that works effectively on paraffin-embedded samples and provides useful information for treatment decisions. The classifier is available as a free software package under the GNU public licence within the R statistical software environment through the link http://www.bioinformatics.leeds.ac.uk/labpages/softwares/ or on github https://github.com/Sharlene/BDC.

  10. Multiclass Data Segmentation Using Diffuse Interface Methods on Graphs

    DTIC Science & Technology

    2014-01-01

    interac- tive image segmentation using the solution to a combinatorial Dirichlet problem. Elmoataz et al . have developed general- izations of the graph...Laplacian [25] for image denoising and manifold smoothing. Couprie et al . in [18] define a conve- niently parameterized graph-based energy function that...over to the discrete graph representation. For general data segmentation, Bresson et al . in [8], present rigorous convergence results for two algorithms

  11. Constraint Drive Generation of Vision Algorithms on an Elastic Infrastructure

    DTIC Science & Technology

    2014-10-01

    DIRECTOR: / S / / S / PATRICK K. McCABE MICHAEL J . WESSING Work Unit Manager Deputy Chief...partitioned into training and validation slices and we anno - tate images in descending order from the beginning of the key space. A separate contiguous...and S. Nowozin. On feature combination for multiclass object classification. In ICCV, pages 221–228, 2009. [4] J .-P. Heo, Y. Lee, J . He, S.-F. Chang

  12. 37 CFR 2.161 - Requirements for a complete affidavit or declaration of continued use or excusable nonuse.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... the registration number; (d)(1) Include the fee required by § 2.6 for each class of goods or services... period under section 8(a)(3) of the Act, include the grace period surcharge per class required by § 2.6; (3) If at least one fee is submitted for a multi-class registration, but the class(es) to which the...

  13. Training the max-margin sequence model with the relaxed slack variables.

    PubMed

    Niu, Lingfeng; Wu, Jianmin; Shi, Yong

    2012-09-01

    Sequence models are widely used in many applications such as natural language processing, information extraction and optical character recognition, etc. We propose a new approach to train the max-margin based sequence model by relaxing the slack variables in this paper. With the canonical feature mapping definition, the relaxed problem is solved by training a multiclass Support Vector Machine (SVM). Compared with the state-of-the-art solutions for the sequence learning, the new method has the following advantages: firstly, the sequence training problem is transformed into a multiclassification problem, which is more widely studied and already has quite a few off-the-shelf training packages; secondly, this new approach reduces the complexity of training significantly and achieves comparable prediction performance compared with the existing sequence models; thirdly, when the size of training data is limited, by assigning different slack variables to different microlabel pairs, the new method can use the discriminative information more frugally and produces more reliable model; last but not least, by employing kernels in the intermediate multiclass SVM, nonlinear feature space can be easily explored. Experimental results on the task of named entity recognition, information extraction and handwritten letter recognition with the public datasets illustrate the efficiency and effectiveness of our method. Copyright © 2012 Elsevier Ltd. All rights reserved.

  14. Decoding of top-down cognitive processing for SSVEP-controlled BMI

    PubMed Central

    Min, Byoung-Kyong; Dähne, Sven; Ahn, Min-Hee; Noh, Yung-Kyun; Müller, Klaus-Robert

    2016-01-01

    We present a fast and accurate non-invasive brain-machine interface (BMI) based on demodulating steady-state visual evoked potentials (SSVEPs) in electroencephalography (EEG). Our study reports an SSVEP-BMI that, for the first time, decodes primarily based on top-down and not bottom-up visual information processing. The experimental setup presents a grid-shaped flickering line array that the participants observe while intentionally attending to a subset of flickering lines representing the shape of a letter. While the flickering pixels stimulate the participant’s visual cortex uniformly with equal probability, the participant’s intention groups the strokes and thus perceives a ‘letter Gestalt’. We observed decoding accuracy of 35.81% (up to 65.83%) with a regularized linear discriminant analysis; on average 2.05-fold, and up to 3.77-fold greater than chance levels in multi-class classification. Compared to the EEG signals, an electrooculogram (EOG) did not significantly contribute to decoding accuracies. Further analysis reveals that the top-down SSVEP paradigm shows the most focalised activation pattern around occipital visual areas; Granger causality analysis consistently revealed prefrontal top-down control over early visual processing. Taken together, the present paradigm provides the first neurophysiological evidence for the top-down SSVEP BMI paradigm, which potentially enables multi-class intentional control of EEG-BMIs without using gaze-shifting. PMID:27808125

  15. Endogenous Sensory Discrimination and Selection by a Fast Brain Switch for a High Transfer Rate Brain-Computer Interface.

    PubMed

    Xu, Ren; Jiang, Ning; Dosen, Strahinja; Lin, Chuang; Mrachacz-Kersting, Natalie; Dremstrup, Kim; Farina, Dario

    2016-08-01

    In this study, we present a novel multi-class brain-computer interface (BCI) for communication and control. In this system, the information processing is shared by the algorithm (computer) and the user (human). Specifically, an electro-tactile cycle was presented to the user, providing the choice (class) by delivering timely sensory input. The user discriminated these choices by his/her endogenous sensory ability and selected the desired choice with an intuitive motor task. This selection was detected by a fast brain switch based on real-time detection of movement-related cortical potentials from scalp EEG. We demonstrated the feasibility of such a system with a four-class BCI, yielding a true positive rate of  ∼ 80% and  ∼ 70%, and an information transfer rate of  ∼ 7 bits/min and  ∼ 5 bits/min, for the movement and imagination selection command, respectively. Furthermore, when the system was extended to eight classes, the throughput of the system was improved, demonstrating the capability of accommodating a large number of classes. Combining the endogenous sensory discrimination with the fast brain switch, the proposed system could be an effective, multi-class, gaze-independent BCI system for communication and control applications.

  16. Design and analysis of compound flexible skin based on deformable honeycomb

    NASA Astrophysics Data System (ADS)

    Zou, Tingting; Zhou, Li

    2017-04-01

    In this study, we focused at the development and verification of a robust framework for surface crack detection in steel pipes using measured vibration responses; with the presence of multiple progressive damage occurring in different locations within the structure. Feature selection, dimensionality reduction, and multi-class support vector machine were established for this purpose. Nine damage cases, at different locations, orientations and length, were introduced into the pipe structure. The pipe was impacted 300 times using an impact hammer, after each damage case, the vibration data were collected using 3 PZT wafers which were installed on the outer surface of the pipe. At first, damage sensitive features were extracted using the frequency response function approach followed by recursive feature elimination for dimensionality reduction. Then, a multi-class support vector machine learning algorithm was employed to train the data and generate a statistical model. Once the model is established, decision values and distances from the hyper-plane were generated for the new collected data using the trained model. This process was repeated on the data collected from each sensor. Overall, using a single sensor for training and testing led to a very high accuracy reaching 98% in the assessment of the 9 damage cases used in this study.

  17. Decoding of top-down cognitive processing for SSVEP-controlled BMI

    NASA Astrophysics Data System (ADS)

    Min, Byoung-Kyong; Dähne, Sven; Ahn, Min-Hee; Noh, Yung-Kyun; Müller, Klaus-Robert

    2016-11-01

    We present a fast and accurate non-invasive brain-machine interface (BMI) based on demodulating steady-state visual evoked potentials (SSVEPs) in electroencephalography (EEG). Our study reports an SSVEP-BMI that, for the first time, decodes primarily based on top-down and not bottom-up visual information processing. The experimental setup presents a grid-shaped flickering line array that the participants observe while intentionally attending to a subset of flickering lines representing the shape of a letter. While the flickering pixels stimulate the participant’s visual cortex uniformly with equal probability, the participant’s intention groups the strokes and thus perceives a ‘letter Gestalt’. We observed decoding accuracy of 35.81% (up to 65.83%) with a regularized linear discriminant analysis; on average 2.05-fold, and up to 3.77-fold greater than chance levels in multi-class classification. Compared to the EEG signals, an electrooculogram (EOG) did not significantly contribute to decoding accuracies. Further analysis reveals that the top-down SSVEP paradigm shows the most focalised activation pattern around occipital visual areas; Granger causality analysis consistently revealed prefrontal top-down control over early visual processing. Taken together, the present paradigm provides the first neurophysiological evidence for the top-down SSVEP BMI paradigm, which potentially enables multi-class intentional control of EEG-BMIs without using gaze-shifting.

  18. Experiments on Supervised Learning Algorithms for Text Categorization

    NASA Technical Reports Server (NTRS)

    Namburu, Setu Madhavi; Tu, Haiying; Luo, Jianhui; Pattipati, Krishna R.

    2005-01-01

    Modern information society is facing the challenge of handling massive volume of online documents, news, intelligence reports, and so on. How to use the information accurately and in a timely manner becomes a major concern in many areas. While the general information may also include images and voice, we focus on the categorization of text data in this paper. We provide a brief overview of the information processing flow for text categorization, and discuss two supervised learning algorithms, viz., support vector machines (SVM) and partial least squares (PLS), which have been successfully applied in other domains, e.g., fault diagnosis [9]. While SVM has been well explored for binary classification and was reported as an efficient algorithm for text categorization, PLS has not yet been applied to text categorization. Our experiments are conducted on three data sets: Reuter's- 21578 dataset about corporate mergers and data acquisitions (ACQ), WebKB and the 20-Newsgroups. Results show that the performance of PLS is comparable to SVM in text categorization. A major drawback of SVM for multi-class categorization is that it requires a voting scheme based on the results of pair-wise classification. PLS does not have this drawback and could be a better candidate for multi-class text categorization.

  19. TU-FG-209-12: Treatment Site and View Recognition in X-Ray Images with Hierarchical Multiclass Recognition Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, X; Mazur, T; Yang, D

    Purpose: To investigate an approach of automatically recognizing anatomical sites and imaging views (the orientation of the image acquisition) in 2D X-ray images. Methods: A hierarchical (binary tree) multiclass recognition model was developed to recognize the treatment sites and views in x-ray images. From top to bottom of the tree, the treatment sites are grouped hierarchically from more general to more specific. Each node in the hierarchical model was designed to assign images to one of two categories of anatomical sites. The binary image classification function of each node in the hierarchical model is implemented by using a PCA transformationmore » and a support vector machine (SVM) model. The optimal PCA transformation matrices and SVM models are obtained by learning from a set of sample images. Alternatives of the hierarchical model were developed to support three scenarios of site recognition that may happen in radiotherapy clinics, including two or one X-ray images with or without view information. The performance of the approach was tested with images of 120 patients from six treatment sites – brain, head-neck, breast, lung, abdomen and pelvis – with 20 patients per site and two views (AP and RT) per patient. Results: Given two images in known orthogonal views (AP and RT), the hierarchical model achieved a 99% average F1 score to recognize the six sites. Site specific view recognition models have 100 percent accuracy. The computation time to process a new patient case (preprocessing, site and view recognition) is 0.02 seconds. Conclusion: The proposed hierarchical model of site and view recognition is effective and computationally efficient. It could be useful to automatically and independently confirm the treatment sites and views in daily setup x-ray 2D images. It could also be applied to guide subsequent image processing tasks, e.g. site and view dependent contrast enhancement and image registration. The senior author received research grants from ViewRay Inc. and Varian Medical System.« less

  20. Towards multilevel mental stress assessment using SVM with ECOC: an EEG approach.

    PubMed

    Al-Shargie, Fares; Tang, Tong Boon; Badruddin, Nasreen; Kiguchi, Masashi

    2018-01-01

    Mental stress has been identified as one of the major contributing factors that leads to various diseases such as heart attack, depression, and stroke. To avoid this, stress quantification is important for clinical intervention and disease prevention. This study aims to investigate the feasibility of exploiting electroencephalography (EEG) signals to discriminate between different stress levels. We propose a new assessment protocol whereby the stress level is represented by the complexity of mental arithmetic (MA) task for example, at three levels of difficulty, and the stressors are time pressure and negative feedback. Using 18-male subjects, the experimental results showed that there were significant differences in EEG response between the control and stress conditions at different levels of MA task with p values < 0.001. Furthermore, we found a significant reduction in alpha rhythm power from one stress level to another level, p values < 0.05. In comparison, results from self-reporting questionnaire NASA-TLX approach showed no significant differences between stress levels. In addition, we developed a discriminant analysis method based on multiclass support vector machine (SVM) with error-correcting output code (ECOC). Different stress levels were detected with an average classification accuracy of 94.79%. The lateral index (LI) results further showed dominant right prefrontal cortex (PFC) to mental stress (reduced alpha rhythm). The study demonstrated the feasibility of using EEG in classifying multilevel mental stress and reported alpha rhythm power at right prefrontal cortex as a suitable index.

  1. A machine learning pipeline for automated registration and classification of 3D lidar data

    NASA Astrophysics Data System (ADS)

    Rajagopal, Abhejit; Chellappan, Karthik; Chandrasekaran, Shivkumar; Brown, Andrew P.

    2017-05-01

    Despite the large availability of geospatial data, registration and exploitation of these datasets remains a persis- tent challenge in geoinformatics. Popular signal processing and machine learning algorithms, such as non-linear SVMs and neural networks, rely on well-formatted input models as well as reliable output labels, which are not always immediately available. In this paper we outline a pipeline for gathering, registering, and classifying initially unlabeled wide-area geospatial data. As an illustrative example, we demonstrate the training and test- ing of a convolutional neural network to recognize 3D models in the OGRIP 2007 LiDAR dataset using fuzzy labels derived from OpenStreetMap as well as other datasets available on OpenTopography.org. When auxiliary label information is required, various text and natural language processing filters are used to extract and cluster keywords useful for identifying potential target classes. A subset of these keywords are subsequently used to form multi-class labels, with no assumption of independence. Finally, we employ class-dependent geometry extraction routines to identify candidates from both training and testing datasets. Our regression networks are able to identify the presence of 6 structural classes, including roads, walls, and buildings, in volumes as big as 8000 m3 in as little as 1.2 seconds on a commodity 4-core Intel CPU. The presented framework is neither dataset nor sensor-modality limited due to the registration process, and is capable of multi-sensor data-fusion.

  2. Detecting Seismic Events Using a Supervised Hidden Markov Model

    NASA Astrophysics Data System (ADS)

    Burks, L.; Forrest, R.; Ray, J.; Young, C.

    2017-12-01

    We explore the use of supervised hidden Markov models (HMMs) to detect seismic events in streaming seismogram data. Current methods for seismic event detection include simple triggering algorithms, such as STA/LTA and the Z-statistic, which can lead to large numbers of false positives that must be investigated by an analyst. The hypothesis of this study is that more advanced detection methods, such as HMMs, may decreases false positives while maintaining accuracy similar to current methods. We train a binary HMM classifier using 2 weeks of 3-component waveform data from the International Monitoring System (IMS) that was carefully reviewed by an expert analyst to pick all seismic events. Using an ensemble of simple and discrete features, such as the triggering of STA/LTA, the HMM predicts the time at which transition occurs from noise to signal. Compared to the STA/LTA detection algorithm, the HMM detects more true events, but the false positive rate remains unacceptably high. Future work to potentially decrease the false positive rate may include using continuous features, a Gaussian HMM, and multi-class HMMs to distinguish between types of seismic waves (e.g., P-waves and S-waves). Acknowledgement: Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525.SAND No: SAND2017-8154 A

  3. Multiclass Classification of Agro-Ecological Zones for Arabica Coffee: An Improved Understanding of the Impacts of Climate Change.

    PubMed

    Bunn, Christian; Läderach, Peter; Pérez Jimenez, Juan Guillermo; Montagnon, Christophe; Schilling, Timothy

    2015-01-01

    Cultivation of Coffea arabica is highly sensitive to and has been shown to be negatively impacted by progressive climatic changes. Previous research contributed little to support forward-looking adaptation. Agro-ecological zoning is a common tool to identify homologous environments and prioritize research. We demonstrate here a pragmatic approach to describe spatial changes in agro-climatic zones suitable for coffee under current and future climates. We defined agro-ecological zones suitable to produce arabica coffee by clustering geo-referenced coffee occurrence locations based on bio-climatic variables. We used random forest classification of climate data layers to model the spatial distribution of these agro-ecological zones. We used these zones to identify spatially explicit impact scenarios and to choose locations for the long-term evaluation of adaptation measures as climate changes. We found that in zones currently classified as hot and dry, climate change will impact arabica more than those that are better suited to it. Research in these zones should therefore focus on expanding arabica's environmental limits. Zones that currently have climates better suited for arabica will migrate upwards by about 500m in elevation. In these zones the up-slope migration will be gradual, but will likely have negative ecosystem impacts. Additionally, we identified locations that with high probability will not change their climatic characteristics and are suitable to evaluate C. arabica germplasm in the face of climate change. These locations should be used to investigate long term adaptation strategies to production systems.

  4. Local classifier weighting by quadratic programming.

    PubMed

    Cevikalp, Hakan; Polikar, Robi

    2008-10-01

    It has been widely accepted that the classification accuracy can be improved by combining outputs of multiple classifiers. However, how to combine multiple classifiers with various (potentially conflicting) decisions is still an open problem. A rich collection of classifier combination procedures -- many of which are heuristic in nature -- have been developed for this goal. In this brief, we describe a dynamic approach to combine classifiers that have expertise in different regions of the input space. To this end, we use local classifier accuracy estimates to weight classifier outputs. Specifically, we estimate local recognition accuracies of classifiers near a query sample by utilizing its nearest neighbors, and then use these estimates to find the best weights of classifiers to label the query. The problem is formulated as a convex quadratic optimization problem, which returns optimal nonnegative classifier weights with respect to the chosen objective function, and the weights ensure that locally most accurate classifiers are weighted more heavily for labeling the query sample. Experimental results on several data sets indicate that the proposed weighting scheme outperforms other popular classifier combination schemes, particularly on problems with complex decision boundaries. Hence, the results indicate that local classification-accuracy-based combination techniques are well suited for decision making when the classifiers are trained by focusing on different regions of the input space.

  5. An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics

    PubMed Central

    Torii, Manabu; Yin, Lanlan; Nguyen, Thang; Mazumdar, Chand T.; Liu, Hongfang; Hartley, David M.; Nelson, Noele P.

    2014-01-01

    Purpose Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles. Methods Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers. Results Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period. Conclusions The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages. PMID:21134784

  6. Use of Unlabeled Samples for Mitigating the Hughes Phenomenon

    NASA Technical Reports Server (NTRS)

    Landgrebe, David A.; Shahshahani, Behzad M.

    1993-01-01

    The use of unlabeled samples in improving the performance of classifiers is studied. When the number of training samples is fixed and small, additional feature measurements may reduce the performance of a statistical classifier. It is shown that by using unlabeled samples, estimates of the parameters can be improved and therefore this phenomenon may be mitigated. Various methods for using unlabeled samples are reviewed and experimental results are provided.

  7. A BRDF statistical model applying to space target materials modeling

    NASA Astrophysics Data System (ADS)

    Liu, Chenghao; Li, Zhi; Xu, Can; Tian, Qichen

    2017-10-01

    In order to solve the problem of poor effect in modeling the large density BRDF measured data with five-parameter semi-empirical model, a refined statistical model of BRDF which is suitable for multi-class space target material modeling were proposed. The refined model improved the Torrance-Sparrow model while having the modeling advantages of five-parameter model. Compared with the existing empirical model, the model contains six simple parameters, which can approximate the roughness distribution of the material surface, can approximate the intensity of the Fresnel reflectance phenomenon and the attenuation of the reflected light's brightness with the azimuth angle changes. The model is able to achieve parameter inversion quickly with no extra loss of accuracy. The genetic algorithm was used to invert the parameters of 11 different samples in the space target commonly used materials, and the fitting errors of all materials were below 6%, which were much lower than those of five-parameter model. The effect of the refined model is verified by comparing the fitting results of the three samples at different incident zenith angles in 0° azimuth angle. Finally, the three-dimensional modeling visualizations of these samples in the upper hemisphere space was given, in which the strength of the optical scattering of different materials could be clearly shown. It proved the good describing ability of the refined model at the material characterization as well.

  8. Development and validation of a rapid multi-class method for the confirmation of fourteen prohibited medicinal additives in pig and poultry compound feed by liquid chromatography-tandem mass spectrometry.

    PubMed

    Cronly, Mark; Behan, P; Foley, B; Malone, E; Earley, S; Gallagher, M; Shearan, P; Regan, L

    2010-12-01

    A confirmatory method has been developed to allow for the analysis of fourteen prohibited medicinal additives in pig and poultry compound feed. These compounds are prohibited for use as feed additives although some are still authorised for use in medicated feed. Feed samples are extracted by acetonitrile with addition of sodium sulfate. The extracts undergo a hexane wash to aid with sample purification. The extracts are then evaporated to dryness and reconstituted in initial mobile phase. The samples undergo an ultracentrifugation step prior to injection onto the LC-MS/MS system and are analysed in a run time of 26 min. The LC-MS/MS system is run in MRM mode with both positive and negative electrospray ionisation. The method was validated over three days and is capable of quantitatively analysing for metronidazole, dimetridazole, ronidazole, ipronidazole, chloramphenicol, sulfamethazine, dinitolimide, ethopabate, carbadox and clopidol. The method is also capable of qualitatively analysing for sulfadiazine, tylosin, virginiamycin and avilamycin. A level of 100 microg kg(-1) was used for validation purposes and the method is capable of analysing to this level for all the compounds. Validation criteria of trueness, precision, repeatability and reproducibility along with measurement uncertainty are calculated for all analytes. Copyright (c) 2010 Elsevier B.V. All rights reserved.

  9. Occurrences of pharmaceuticals in drinking water sources of major river watersheds, China.

    PubMed

    Sun, Jing; Luo, Qian; Wang, Donghong; Wang, Zijian

    2015-07-01

    Pharmaceuticals in drinking water sources (DWSs) have raised significant concerns for their persistent input and potential human health risks. Currently, little is known about the occurrence of pharmaceuticals in DWSs in China. In this study, a survey for multi-class pharmaceuticals in DWSs of five major river watersheds in China was conducted from 2012 to 2013. Samples were collected from 25 sampling sites in rivers and reservoirs. 135 pharmaceuticals were analyzed using solid-phase extraction and ultra-performance liquid chromatography tandem mass spectrometry. The results showed that a total of 70 pharmaceuticals were present in the samples, and the most frequently detected ones included sulfonamides, macrolides, antiepileptic drugs, anti-inflammatory drugs, and β-blockers, etc. Amongst these, maximum concentrations of lincomycin, sulfamethoxazole, acetaminophen and paraxanthine were between 44 ng/L and 134 ng/L, and those of metoprolol, diphenhydramine, venlafaxine, nalidixic acid and androstenedione were less than 1 ng/L. Concentrations of the two that were most persistent, DEET and carbamazepine, were 0.8-10.2 ng/L and 0.01-3.5 ng/L, respectively. Higher concentrations of cotinine were observed in warm season than in cold season, while concentrations of lincomycin were the opposite. In a causality analysis, the occurrence of pharmaceuticals in DWSs depends mainly on the detection limits of the methods, their usage and the persistence in the aquatic environment. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. A consensus prognostic gene expression classifier for ER positive breast cancer

    PubMed Central

    Teschendorff, Andrew E; Naderi, Ali; Barbosa-Morais, Nuno L; Pinder, Sarah E; Ellis, Ian O; Aparicio, Sam; Brenton, James D; Caldas, Carlos

    2006-01-01

    Background A consensus prognostic gene expression classifier is still elusive in heterogeneous diseases such as breast cancer. Results Here we perform a combined analysis of three major breast cancer microarray data sets to hone in on a universally valid prognostic molecular classifier in estrogen receptor (ER) positive tumors. Using a recently developed robust measure of prognostic separation, we further validate the prognostic classifier in three external independent cohorts, confirming the validity of our molecular classifier in a total of 877 ER positive samples. Furthermore, we find that molecular classifiers may not outperform classical prognostic indices but that they can be used in hybrid molecular-pathological classification schemes to improve prognostic separation. Conclusion The prognostic molecular classifier presented here is the first to be valid in over 877 ER positive breast cancer samples and across three different microarray platforms. Larger multi-institutional studies will be needed to fully determine the added prognostic value of molecular classifiers when combined with standard prognostic factors. PMID:17076897

  11. Bayesian Kernel Methods for Non-Gaussian Distributions: Binary and Multi-class Classification Problems

    DTIC Science & Technology

    2013-05-28

    those of the support vector machine and relevance vector machine, and the model runs more quickly than the other algorithms . When one class occurs...incremental support vector machine algorithm for online learning when fewer than 50 data points are available. (a) Papers published in peer-reviewed journals...learning environments, where data processing occurs one observation at a time and the classification algorithm improves over time with new

  12. Age determination of bottled Chinese rice wine by VIS-NIR spectroscopy

    NASA Astrophysics Data System (ADS)

    Yu, Haiyan; Lin, Tao; Ying, Yibin; Pan, Xingxiang

    2006-10-01

    The feasibility of non-invasive visible and near infrared (VIS-NIR) spectroscopy for determining wine age (1, 2, 3, 4, and 5 years) of Chinese rice wine was investigated. Samples of Chinese rice wine were analyzed in 600 mL square brown glass bottles with side length of approximately 64 mm at room temperature. VIS-NIR spectra of 100 bottled Chinese rice wine samples were collected in transmission mode in the wavelength range of 350-1200 nm by a fiber spectrometer system. Discriminant models were developed based on discriminant analysis (DA) together with raw, first and second derivative spectra. The concentration of alcoholic degree, total acid, and °Brix was determined to validate the NIR results. The calibration result for raw spectra was better than that for first and second derivative spectra. The percentage of samples correctly classified for raw spectra was 98%. For 1-, 2-, and 3-year-old sample groups, the sample were all correctly classified, and for 4- and 5-year-old sample groups, the percentage of samples correctly classified was 92.9%, respectively. In validation analysis, the percentage of samples correctly classified was 100%. The results demonstrated that VIS-NIR spectroscopic technique could be used as a non-invasive, rapid and reliable method for predicting wine age of bottled Chinese rice wine.

  13. Fuzziness-based active learning framework to enhance hyperspectral image classification performance for discriminative and generative classifiers

    PubMed Central

    2018-01-01

    Hyperspectral image classification with a limited number of training samples without loss of accuracy is desirable, as collecting such data is often expensive and time-consuming. However, classifiers trained with limited samples usually end up with a large generalization error. To overcome the said problem, we propose a fuzziness-based active learning framework (FALF), in which we implement the idea of selecting optimal training samples to enhance generalization performance for two different kinds of classifiers, discriminative and generative (e.g. SVM and KNN). The optimal samples are selected by first estimating the boundary of each class and then calculating the fuzziness-based distance between each sample and the estimated class boundaries. Those samples that are at smaller distances from the boundaries and have higher fuzziness are chosen as target candidates for the training set. Through detailed experimentation on three publically available datasets, we showed that when trained with the proposed sample selection framework, both classifiers achieved higher classification accuracy and lower processing time with the small amount of training data as opposed to the case where the training samples were selected randomly. Our experiments demonstrate the effectiveness of our proposed method, which equates favorably with the state-of-the-art methods. PMID:29304512

  14. Ensemble stump classifiers and gene expression signatures in lung cancer.

    PubMed

    Frey, Lewis; Edgerton, Mary; Fisher, Douglas; Levy, Shawn

    2007-01-01

    Microarray data sets for cancer tumor tissue generally have very few samples, each sample having thousands of probes (i.e., continuous variables). The sparsity of samples makes it difficult for machine learning techniques to discover probes relevant to the classification of tumor tissue. By combining data from different platforms (i.e., data sources), data sparsity is reduced, but this typically requires normalizing data from the different platforms, which can be non-trivial. This paper proposes a variant on the idea of ensemble learners to circumvent the need for normalization. To facilitate comprehension we build ensembles of very simple classifiers known as decision stumps--decision trees of one test each. The Ensemble Stump Classifier (ESC) identifies an mRNA signature having three probes and high accuracy for distinguishing between adenocarcinoma and squamous cell carcinoma of the lung across four data sets. In terms of accuracy, ESC outperforms a decision tree classifier on all four data sets, outperforms ensemble decision trees on three data sets, and simple stump classifiers on two data sets.

  15. Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

    PubMed Central

    Theis, Fabian J.

    2017-01-01

    Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464

  16. Determination of pharmaceutical residues and assessment of their removal efficiency at the Daugavgriva municipal wastewater treatment plant in Riga, Latvia.

    PubMed

    Reinholds, I; Muter, O; Pugajeva, I; Rusko, J; Perkons, I; Bartkevics, V

    2017-01-01

    Pharmaceutical products (PPs) belong to emerging contaminants that may accumulate along with other chemical pollutants in wastewaters (WWs) entering industrial and/or urban wastewater treatment plants (WWTPs). In the present study, the technique of ultra-high-performance liquid chromatography coupled to Orbitrap high-resolution mass spectrometry (Orbitrap-HRMS) was applied for the analysis of 24 multi-class PPs in WW samples collected at different technological stages of Daugavgriva WWTP located in Riga, Latvia. Caffeine and acetaminophen levels in the range of 7,570-11,403 ng/L and 810-1,883 ng/L, respectively, were the predominant compounds among 19 PPs determined in the WW. The results indicate that aerobic digestion in biological ponds was insufficiently effective to degrade most of the PPs (reduction efficiency <0-50.0%) with the exception of four PPs that showed degradation efficiency varying from 55.0 to 99.9%. Tests of short-term chemical and enzymatic hydrolysis for PP degradation in WW samples were performed, and the results reflected the complexity of different degradation mechanisms and physicochemical transformations of PPs. The toxicological studies of WW impact on Daphnia magna indicated gradual reduction of the total toxicity through the treatment stages at the WWTP.

  17. Development and validation of rapid multiresidue and multi-class analysis for antibiotics and anthelmintics in feed by ultra-high-performance liquid chromatography coupled to tandem mass spectrometry.

    PubMed

    Robert, Christelle; Brasseur, Pierre-Yves; Dubois, Michel; Delahaut, Philippe; Gillard, Nathalie

    2016-08-01

    A new multi-residue method for the analysis of veterinary drugs, namely amoxicillin, chlortetracycline, colistins A and B, doxycycline, fenbendazole, flubendazole, ivermectin, lincomycin, oxytetracycline, sulfadiazine, tiamulin, tilmicosin and trimethoprim, was developed and validated for feed. After acidic extraction, the samples were centrifuged, purified by SPE and analysed by ultra-high-performance liquid chromatography coupled to tandem mass spectrometry. Quantitative validation was done in accordance with the guidelines laid down in European Commission Decision 2002/657/CE. Matrix-matched calibration with internal standards was used to reduce matrix effects. The target level was set at the authorised carryover level (1%) and validation levels were set at 0.5%, 1% and 1.5%. Method performances were evaluated by the following parameters: linearity (0.986 < R(2) < 0.999), precision (repeatability < 12.4% and reproducibility < 14.0%), accuracy (89% < recovery < 107%), sensitivity, decision limit (CCα), detection capability (CCβ), selectivity and expanded measurement uncertainty (k = 2).This method has been used successfully for three years for routine monitoring of antibiotic residues in feeds during which period 20% of samples were found to exceed the 1% authorised carryover limit and were deemed non-compliant.

  18. Pesticide residues in Portuguese strawberries grown in 2009-2010 using integrated pest management and organic farming.

    PubMed

    Fernandes, Virgínia C; Domingues, Valentina F; Mateus, Nuno; Delerue-Matos, Cristina

    2012-11-01

    Pesticides are among the most widely used chemicals in the world. Because of the widespread use of agricultural chemicals in food production, people are exposed to low levels of pesticide residues through their diets. Scientists do not yet have a total understanding of the health effects of these pesticide residues. This work aims to determine differences in terms of pesticide residue content in Portuguese strawberries grown using different agriculture practices. The Quick, Easy, Cheap, Effective, Rugged, and Safe sample preparation method was conducted and shown to have good performance for multiclass pesticides extraction in strawberries. The screening of 25 pesticides residue was performed by gas chromatography-tandem mass spectrometry. In quantitative validation, acceptable performances were achieved with recoveries of 70-120 and <12 % residual standard deviation for 25 pesticides. Good linearity was obtained for all the target compounds, with highly satisfactory repeatability. The limits of detection were in the range of 0.1-28 μg/kg. The method was applied to analyze strawberry samples from organic and integrated pest management (IPM) practices harvested in 2009-2010. The results showed the presence of fludioxonil, bifenthrin, mepanipyrim, tolylfluanid, cyprodinil, tetraconazole, and malathion when using IPM below the maximum residue levels.

  19. Determination of mycotoxins in different food commodities by ultra-high-pressure liquid chromatography coupled to triple quadrupole mass spectrometry.

    PubMed

    Beltrán, Eduardo; Ibáñez, María; Sancho, Juan Vicente; Hernández, Félix

    2009-06-01

    A rapid multianalyte-multiclass method with little sample manipulation has been developed for the simultaneous determination of eleven mycotoxins in different food commodities by using ultra-high-pressure liquid chromatography coupled to triple quadrupole mass spectrometry (UHPLC/MS/MS). Toxins were extracted from the samples with acetonitrile/water (80:20, v/v) 0.1% HCOOH and, after a two-fold dilution with water, directly injected into the system. Thanks to the fast high-resolution separation of UHPLC, the eleven mycotoxins were separated by gradient elution in only 4 min. The method has been validated in three food matrices (maize kernels, dry pasta (wheat), and eight-multicereal babyfood (wheat, maize, rice, oat, barley, rye, sorghum, millet)) at four different concentration levels. Satisfactory recoveries were obtained (70-110%) and precision (expressed as relative standard deviation) was typically below 15% with very few exceptions. Quantification of samples was carried out with matrix-matched standards calibration. The lowest concentration successfully validated in sample was as low as 0.5 microg/kg for aflatoxins and ochratoxin A in babyfood, and 20 microg/kg for the rest of the selected mycotoxins in all matrices tested. Deoxynivalenol could be only validated at 200 microg/kg, due the poor sensitivity for this mycotoxin analysis. With only two exceptions (HT-2 and deoxynivalenol), the limits of detection (LODs), estimated for a signal-to-noise ratio of 3 from the chromatograms of samples spiked at the lowest level validated, varied between 0.1 and 1 microg/kg in the three food matrices tested. The method was applied to the analysis of different kinds of samples. Positive findings were confirmed by acquiring two transitions (Q quantification, q confirmation) and evaluating the Q/q ratio. Copyright (c) 2009 John Wiley & Sons, Ltd.

  20. Development and optimization of a solid-phase microextraction gas chromatography-tandem mass spectrometry methodology to analyse ultraviolet filters in beach sand.

    PubMed

    Vila, Marlene; Llompart, Maria; Garcia-Jares, Carmen; Homem, Vera; Dagnac, Thierry

    2018-06-06

    A methodology based on solid-phase microextraction (SPME) followed by gas chromatography-tandem mass spectrometry (GC-MS/MS) has been developed for the simultaneous analysis of eleven multiclass ultraviolet (UV) filters in beach sand. To the best of our knowledge, this is the first time that this extraction technique is applied to the analysis of UV filters in sand samples, and in other kind of environmental solid samples. Main extraction parameters such as the fibre coating, the amount of sample, the addition of salt, the volume of water added to the sand, and the temperature were optimized. An experimental design approach was implemented in order to find out the most favourable conditions. The final conditions consisted of adding 1 mL of water to 1 g of sample followed by the headspace SPME for 20 min at 100 °C, using PDMS/DVB as fibre coating. The SPME-GC-MS/MS method was validated in terms of linearity, accuracy, limits of detection and quantification, and precision. Recovery studies were also performed at three concentration levels in real Atlantic and Mediterranean sand samples. The recoveries were generally above 85% and relative standard deviations below 11%. The limits of detection were in the pg g -1 level. The validated methodology was successfully applied to the analysis of real sand samples collected from Atlantic Ocean beaches in the Northwest coast of Spain and Portugal, Canary Islands (Spain), and from Mediterranean Sea beaches in Mallorca Island (Spain). The most frequently found UV filters were ethylhexyl salicylate (EHS), homosalate (HMS), 4-methylbenzylidene camphor (4MBC), 2-ethylhexyl methoxycinnamate (2EHMC) and octocrylene (OCR), with concentrations up to 670 ng g -1 . Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Simulation techniques for estimating error in the classification of normal patterns

    NASA Technical Reports Server (NTRS)

    Whitsitt, S. J.; Landgrebe, D. A.

    1974-01-01

    Methods of efficiently generating and classifying samples with specified multivariate normal distributions were discussed. Conservative confidence tables for sample sizes are given for selective sampling. Simulation results are compared with classified training data. Techniques for comparing error and separability measure for two normal patterns are investigated and used to display the relationship between the error and the Chernoff bound.

  2. Determination of Minimum Training Sample Size for Microarray-Based Cancer Outcome Prediction–An Empirical Assessment

    PubMed Central

    Cheng, Ningtao; Wu, Leihong; Cheng, Yiyu

    2013-01-01

    The promise of microarray technology in providing prediction classifiers for cancer outcome estimation has been confirmed by a number of demonstrable successes. However, the reliability of prediction results relies heavily on the accuracy of statistical parameters involved in classifiers. It cannot be reliably estimated with only a small number of training samples. Therefore, it is of vital importance to determine the minimum number of training samples and to ensure the clinical value of microarrays in cancer outcome prediction. We evaluated the impact of training sample size on model performance extensively based on 3 large-scale cancer microarray datasets provided by the second phase of MicroArray Quality Control project (MAQC-II). An SSNR-based (scale of signal-to-noise ratio) protocol was proposed in this study for minimum training sample size determination. External validation results based on another 3 cancer datasets confirmed that the SSNR-based approach could not only determine the minimum number of training samples efficiently, but also provide a valuable strategy for estimating the underlying performance of classifiers in advance. Once translated into clinical routine applications, the SSNR-based protocol would provide great convenience in microarray-based cancer outcome prediction in improving classifier reliability. PMID:23861920

  3. A novel modular ANN architecture for efficient monitoring of gases/odours in real-time

    NASA Astrophysics Data System (ADS)

    Mishra, A.; Rajput, N. S.

    2018-04-01

    Data pre-processing is tremendously used for enhanced classification of gases. However, it suppresses the concentration variances of different gas samples. A classical solution of using single artificial neural network (ANN) architecture is also inefficient and renders degraded quantification. In this paper, a novel modular ANN design has been proposed to provide an efficient and scalable solution in real–time. Here, two separate ANN blocks viz. classifier block and quantifier block have been used to provide efficient and scalable gas monitoring in real—time. The classifier ANN consists of two stages. In the first stage, the Net 1-NDSRT has been trained to transform raw sensor responses into corresponding virtual multi-sensor responses using normalized difference sensor response transformation (NDSRT). These responses have been fed to the second stage (i.e., Net 2-classifier ). The Net 2-classifier has been trained to classify various gas samples to their respective class. Further, the quantifier block has parallel ANN modules, multiplexed to quantify each gas. Therefore, the classifier ANN decides class and quantifier ANN decides the exact quantity of the gas/odor present in the respective sample of that class.

  4. Effective Sequential Classifier Training for SVM-Based Multitemporal Remote Sensing Image Classification

    NASA Astrophysics Data System (ADS)

    Guo, Yiqing; Jia, Xiuping; Paull, David

    2018-06-01

    The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.

  5. Molecular differential diagnosis of follicular thyroid carcinoma and adenoma based on gene expression profiling by using formalin-fixed paraffin-embedded tissues

    PubMed Central

    2013-01-01

    Background Differential diagnosis between malignant follicular thyroid cancer (FTC) and benign follicular thyroid adenoma (FTA) is a great challenge for even an experienced pathologist and requires special effort. Molecular markers may potentially support a differential diagnosis between FTC and FTA in postoperative specimens. The purpose of this study was to derive molecular support for differential post-operative diagnosis, in the form of a simple multigene mRNA-based classifier that would differentiate between FTC and FTA tissue samples. Methods A molecular classifier was created based on a combined analysis of two microarray datasets (using 66 thyroid samples). The performance of the classifier was assessed using an independent dataset comprising 71 formalin-fixed paraffin-embedded (FFPE) samples (31 FTC and 40 FTA), which were analysed by quantitative real-time PCR (qPCR). In addition, three other microarray datasets (62 samples) were used to confirm the utility of the classifier. Results Five of 8 genes selected from training datasets (ELMO1, EMCN, ITIH5, KCNAB1, SLCO2A1) were amplified by qPCR in FFPE material from an independent sample set. Three other genes did not amplify in FFPE material, probably due to low abundance. All 5 analysed genes were downregulated in FTC compared to FTA. The sensitivity and specificity of the 5-gene classifier tested on the FFPE dataset were 71% and 72%, respectively. Conclusions The proposed approach could support histopathological examination: 5-gene classifier may aid in molecular discrimination between FTC and FTA in FFPE material. PMID:24099521

  6. Lidar-based multinomial classification algorithms for tropical forest degradation status: Implications for biomass estimation

    NASA Astrophysics Data System (ADS)

    Duffy, P.; Keller, M.; Longo, M.; Morton, D. C.; dos-Santos, M. N.; Pinagé, E. R.

    2017-12-01

    There is an urgent need to quantify the effects of land use and land cover change on carbon stocks in tropical forests to support REDD+ policies and improve characterization of global carbon budgets. This need is underscored by the fact that the variability in forest biomass estimates from global forest carbon maps is artificially low relative to estimates generated from forest inventory and high-resolution airborne lidar data. Both deforestation and degradation processes (e.g. logging, fire, and fragmentation) affect carbon fluxes at varying spatial and temporal scales. While the spatial extent and impact of deforestation has been relatively well characterized, the quantification of degradation processes is still poorly constrained. In the Brazilian Amazon, the largest source of uncertainty in CO2 emissions estimates is data on changes in tropical forest carbon stocks through time, followed closely by incomplete information on the carbon losses from forest degradation. In this work, we present a method for classifying the degradation status of tropical forests using higher order moments (skewness and kurtosis) of lidar return distributions aggregated at grids with resolution ranging from 50 m to 250 m. Across multiple spatial resolutions, we quantify the strength of the functional relationship between the lidar returns and the classification based on historical time series of Landsat imagery. Our results show that the higher order moments of the lidar return distributions provide sufficient information to build multinomial models that accurately classify the landscape into intact, logged, and burned forests. Model fit improved with coarser spatial resolution with Kappa statistics of 0.70 at 50 m, and 0.77 at 250 m. In addition, multi-class AUC was estimated as 0.87 at 50 m, and 0.95 at 250 m. This classification provides important information regarding the applicability of the use of lidar data for regional monitoring of recent logging, as well as the trajectory of the carbon budget. Differentiating between the biomass changes associated with deforestation and degradation processes is critical for accurate accounting of disturbance impacts on carbon cycling within the Brazilian Amazon and global tropical forests.

  7. EEG complexity as a biomarker for autism spectrum disorder risk

    PubMed Central

    2011-01-01

    Background Complex neurodevelopmental disorders may be characterized by subtle brain function signatures early in life before behavioral symptoms are apparent. Such endophenotypes may be measurable biomarkers for later cognitive impairments. The nonlinear complexity of electroencephalography (EEG) signals is believed to contain information about the architecture of the neural networks in the brain on many scales. Early detection of abnormalities in EEG signals may be an early biomarker for developmental cognitive disorders. The goal of this paper is to demonstrate that the modified multiscale entropy (mMSE) computed on the basis of resting state EEG data can be used as a biomarker of normal brain development and distinguish typically developing children from a group of infants at high risk for autism spectrum disorder (ASD), defined on the basis of an older sibling with ASD. Methods Using mMSE as a feature vector, a multiclass support vector machine algorithm was used to classify typically developing and high-risk groups. Classification was computed separately within each age group from 6 to 24 months. Results Multiscale entropy appears to go through a different developmental trajectory in infants at high risk for autism (HRA) than it does in typically developing controls. Differences appear to be greatest at ages 9 to 12 months. Using several machine learning algorithms with mMSE as a feature vector, infants were classified with over 80% accuracy into control and HRA groups at age 9 months. Classification accuracy for boys was close to 100% at age 9 months and remains high (70% to 90%) at ages 12 and 18 months. For girls, classification accuracy was highest at age 6 months, but declines thereafter. Conclusions This proof-of-principle study suggests that mMSE computed from resting state EEG signals may be a useful biomarker for early detection of risk for ASD and abnormalities in cognitive development in infants. To our knowledge, this is the first demonstration of an information theoretic analysis of EEG data for biomarkers in infants at risk for a complex neurodevelopmental disorder. PMID:21342500

  8. Cultivating engineering innovation ability based on optoelectronic experimental platform

    NASA Astrophysics Data System (ADS)

    Li, Dangjuan; Wu, Shenjiang

    2017-08-01

    As the supporting experimental platform of the Xi'an Technological University education reform experimental class, "optical technological innovation experimental platform" integrated the design and comprehensive experiments of the optical multi-class courses. On the basis of summing up the past two years teaching experience, platform pilot projects were improve. It has played a good role by making the use of an open teaching model in the cultivating engineering innovation spirit and scientific thinking of the students.

  9. Mapping online transportation service quality and multiclass classification problem solving priorities

    NASA Astrophysics Data System (ADS)

    Alamsyah, Andry; Rachmadiansyah, Imam

    2018-03-01

    Online transportation service is known for its accessibility, transparency, and tariff affordability. These points make online transportation have advantages over the existing conventional transportation service. Online transportation service is an example of disruptive technology that change the relationship between customers and companies. In Indonesia, there are high competition among online transportation provider, hence the companies must maintain and monitor their service level. To understand their position, we apply both sentiment analysis and multiclass classification to understand customer opinions. From negative sentiments, we can identify problems and establish problem-solving priorities. As a case study, we use the most popular online transportation provider in Indonesia: Gojek and Grab. Since many customers are actively give compliment and complain about company’s service level on Twitter, therefore we collect 61,721 tweets in Bahasa during one month observations. We apply Naive Bayes and Support Vector Machine methods to see which model perform best for our data. The result reveal Gojek has better service quality with 19.76% positive and 80.23% negative sentiments than Grab with 9.2% positive and 90.8% negative. The Gojek highest problem-solving priority is regarding application problems, while Grab is about unusable promos. The overall result shows general problems of both case study are related to accessibility dimension which indicate lack of capability to provide good digital access to the end users.

  10. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    PubMed

    Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  11. A blood-based proteomic classifier for the molecular characterization of pulmonary nodules.

    PubMed

    Li, Xiao-jun; Hayward, Clive; Fong, Pui-Yee; Dominguez, Michel; Hunsucker, Stephen W; Lee, Lik Wee; McLean, Matthew; Law, Scott; Butler, Heather; Schirm, Michael; Gingras, Olivier; Lamontagne, Julie; Allard, Rene; Chelsky, Daniel; Price, Nathan D; Lam, Stephen; Massion, Pierre P; Pass, Harvey; Rom, William N; Vachani, Anil; Fang, Kenneth C; Hood, Leroy; Kearney, Paul

    2013-10-16

    Each year, millions of pulmonary nodules are discovered by computed tomography and subsequently biopsied. Because most of these nodules are benign, many patients undergo unnecessary and costly invasive procedures. We present a 13-protein blood-based classifier that differentiates malignant and benign nodules with high confidence, thereby providing a diagnostic tool to avoid invasive biopsy on benign nodules. Using a systems biology strategy, we identified 371 protein candidates and developed a multiple reaction monitoring (MRM) assay for each. The MRM assays were applied in a three-site discovery study (n = 143) on plasma samples from patients with benign and stage IA lung cancer matched for nodule size, age, gender, and clinical site, producing a 13-protein classifier. The classifier was validated on an independent set of plasma samples (n = 104), exhibiting a negative predictive value (NPV) of 90%. Validation performance on samples from a nondiscovery clinical site showed an NPV of 94%, indicating the general effectiveness of the classifier. A pathway analysis demonstrated that the classifier proteins are likely modulated by a few transcription regulators (NF2L2, AHR, MYC, and FOS) that are associated with lung cancer, lung inflammation, and oxidative stress networks. The classifier score was independent of patient nodule size, smoking history, and age, which are risk factors used for clinical management of pulmonary nodules. Thus, this molecular test provides a potential complementary tool to help physicians in lung cancer diagnosis.

  12. Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

    PubMed

    Tuo, Youlin; An, Ning; Zhang, Ming

    2018-03-01

    The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.

  13. Stackable differential mobility analyzer for aerosol measurement

    DOEpatents

    Cheng, Meng-Dawn [Oak Ridge, TN; Chen, Da-Ren [Creve Coeur, MO

    2007-05-08

    A multi-stage differential mobility analyzer (MDMA) for aerosol measurements includes a first electrode or grid including at least one inlet or injection slit for receiving an aerosol including charged particles for analysis. A second electrode or grid is spaced apart from the first electrode. The second electrode has at least one sampling outlet disposed at a plurality different distances along its length. A volume between the first and the second electrode or grid between the inlet or injection slit and a distal one of the plurality of sampling outlets forms a classifying region, the first and second electrodes for charging to suitable potentials to create an electric field within the classifying region. At least one inlet or injection slit in the second electrode receives a sheath gas flow into an upstream end of the classifying region, wherein each sampling outlet functions as an independent DMA stage and classifies different size ranges of charged particles based on electric mobility simultaneously.

  14. Pesticide extraction from table grapes and plums using ionic liquid based dispersive liquid-liquid microextraction.

    PubMed

    Ravelo-Pérez, Lidia M; Hernández-Borges, Javier; Herrera-Herrera, Antonio V; Rodríguez-Delgado, Miguel Angel

    2009-12-01

    Room temperature ionic liquids (RTILs) have been used as extraction solvents in dispersive liquid-liquid microextraction (DLLME) for the determination of eight multi-class pesticides (i.e. thiophanate-methyl, carbofuran, carbaryl, tebuconazole, iprodione, oxyfluorfen, hexythiazox, and fenazaquin) in table grapes and plums. The developed method involves the combination of DLLME and high-performance liquid chromatography with diode array detection. Samples were first homogenized and extracted with acetonitrile. After evaporation and reconstitution of the extract in water containing sodium chloride, a quick DLLME procedure that used the ionic liquid 1-hexyl-3-methylimidazolium hexafluorophosphate ([C(6)MIM][PF(6)]) and methanol was developed. The RTIL dissolved in a very small volume of acetonitrile was directed injected in the chromatographic system. The comparison between the calibration curves obtained from standards and from spiked sample extracts (matrix-matched calibration) showed the existence of a strong matrix effect for most of the analyzed pesticides. A recovery study was also developed with five consecutive extractions of the two types of fruits spiked at three concentration levels. Mean recovery values were in the range of 72-100% for table grapes and 66-105% for plum samples (except for thiophanate-methyl and carbofuran, which were 64-75% and 58-66%, respectively). Limits of detection (LODs) were in the range 0.651-5.44 microg/kg for table grapes and 0.902-6.33 microg/kg for plums, representing LODs below the maximum residue limits (MRLs) established by the European Union in these fruits. The potential of the method was demonstrated by analyzing 12 commercial fruit samples (six of each type).

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moradi, Hamid; Murugkar, Sangeeta; Ahmad, Abrar

    Purpose: To improve classification by reducing batch effect in samples from the ovarian carcinoma cell lines A2780s (parental wild type) and A2780cp (cisplatin cross-radio-resistant), before, right after, and 24 hours after irradiation to 10Gy. Methods: Spectra were acquired with a home built confocal Raman microscope in 3 distinct runs of six samples: unirradiated s&cp (control pair), then 0h and 24h after irradiation. The Raman spectra were noise reduced, then background subtracted with SMIRF algorithm. ∼35 cell spectra were collected from each sample in 1024 channels from 700cm-1 to 1618cm-1. The spectra were analyzed by regularized multiclass LDA. For feature reductionmore » the spectra were grouped into 3 overlapping group pairs: s-cp, 0Gy–10Gy0h and 0Gy10–Gy24h. The three features, the three differences of the mean spectra were mapped to the analysis sub-space by the inverse regularized covariance matrix. The batch effect noticeably confounded the dose and time effect. Results: To remove the batch effect, the 2+2=4D subspace extended by the covariance matrix of the means of the 0Gy control groups was subtracted from the spectra of each sample. Repeating the analysis on the spectra with the control group variability removed, the batch effect was dramatically reduced in the dose and time directions enabling sharp linear discrimination. The cell type classification also improved. Conclusions: We identified a efficient batch effect removal technique crucial to the applicability of Raman microscopy to radiosensitivity studies both on cell cultures and potential clinical diagnostic applications.« less

  16. Nonlinearity-aware based dimensionality reduction and over-sampling for AD/MCI classification from MRI measures.

    PubMed

    Cao, Peng; Liu, Xiaoli; Yang, Jinzhu; Zhao, Dazhe; Huang, Min; Zhang, Jian; Zaiane, Osmar

    2017-12-01

    Alzheimer's disease (AD) has been not only a substantial financial burden to the health care system but also an emotional burden to patients and their families. Making accurate diagnosis of AD based on brain magnetic resonance imaging (MRI) is becoming more and more critical and emphasized at the earliest stages. However, the high dimensionality and imbalanced data issues are two major challenges in the study of computer aided AD diagnosis. The greatest limitations of existing dimensionality reduction and over-sampling methods are that they assume a linear relationship between the MRI features (predictor) and the disease status (response). To better capture the complicated but more flexible relationship, we propose a multi-kernel based dimensionality reduction and over-sampling approaches. We combined Marginal Fisher Analysis with ℓ 2,1 -norm based multi-kernel learning (MKMFA) to achieve the sparsity of region-of-interest (ROI), which leads to simultaneously selecting a subset of the relevant brain regions and learning a dimensionality transformation. Meanwhile, a multi-kernel over-sampling (MKOS) was developed to generate synthetic instances in the optimal kernel space induced by MKMFA, so as to compensate for the class imbalanced distribution. We comprehensively evaluate the proposed models for the diagnostic classification (binary class and multi-class classification) including all subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. The experimental results not only demonstrate the proposed method has superior performance over multiple comparable methods, but also identifies relevant imaging biomarkers that are consistent with prior medical knowledge. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. A survey of supervised machine learning models for mobile-phone based pathogen identification and classification

    NASA Astrophysics Data System (ADS)

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Tseng, Derek; Benien, Parul; Ozcan, Aydogan

    2017-03-01

    Giardia lamblia causes a disease known as giardiasis, which results in diarrhea, abdominal cramps, and bloating. Although conventional pathogen detection methods used in water analysis laboratories offer high sensitivity and specificity, they are time consuming, and need experts to operate bulky equipment and analyze the samples. Here we present a field-portable and cost-effective smartphone-based waterborne pathogen detection platform that can automatically classify Giardia cysts using machine learning. Our platform enables the detection and quantification of Giardia cysts in one hour, including sample collection, labeling, filtration, and automated counting steps. We evaluated the performance of three prototypes using Giardia-spiked water samples from different sources (e.g., reagent-grade, tap, non-potable, and pond water samples). We populated a training database with >30,000 cysts and estimated our detection sensitivity and specificity using 20 different classifier models, including decision trees, nearest neighbor classifiers, support vector machines (SVMs), and ensemble classifiers, and compared their speed of training and classification, as well as predicted accuracies. Among them, cubic SVM, medium Gaussian SVM, and bagged-trees were the most promising classifier types with accuracies of 94.1%, 94.2%, and 95%, respectively; we selected the latter as our preferred classifier for the detection and enumeration of Giardia cysts that are imaged using our mobile-phone fluorescence microscope. Without the need for any experts or microbiologists, this field-portable pathogen detection platform can present a useful tool for water quality monitoring in resource-limited-settings.

  18. Multiresidue analysis of multiclass pesticides and polyaromatic hydrocarbons in fatty fish by gas chromatography tandem mass spectrometry and evaluation of matrix effect.

    PubMed

    Chatterjee, Niladri S; Utture, Sagar; Banerjee, Kaushik; Ahammed Shabeer, T P; Kamble, Narayan; Mathew, Suseela; Ashok Kumar, K

    2016-04-01

    This paper reports a selective and sensitive method for multiresidue determination of 119 chemical residues including pesticides and polyaromatic hydrocarbons (PAH) in high fatty fish matrix. The novel sample preparation method involved extraction of the target analytes from homogenized fish meat (5 g) in acetonitrile (15 mL, 1% acetic acid) after three-phase partitioning with hexane (2 mL) and the remaining aqueous layer. An aliquot (1.5 mL) of the acetonitrile layer was aspirated and subjected to two-stage dispersive solid phase extraction (dSPE) cleanup and the residues were finally estimated by gas chromatography mass spectrometry with selected reaction monitoring (GC-MS/MS). The co-eluted matrix components were identified on the basis of their accurate mass by GC with quadrupole time of flight MS. Addition of hexane during extraction and optimized dSPE cleanup significantly minimized the matrix effects. Recoveries at 10, 25 and 50 μg/kg were within 60-120% with associated precision, RSD<11%. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Ultra performance liquid chromatography atmospheric pressure photoionization high resolution mass spectrometric method for determination of multiclass pesticide residues in grape and mango juices.

    PubMed

    Deme, Pragney; Upadhyayula, Vijayasarathi V R

    2015-04-15

    A novel analytical method was developed for determination of organochlorine, synthetic pyrethroid, organophosphate and carbamate pesticide residues in fruit juices using ultra performance liquid chromatography-atmospheric pressure photoionization-high resolution mass spectrometry (UPLC-APPI-HRMS). The analytes were extracted from fruit juices by dispersive solid-phase extraction using multi-walled carbon nanotubes (MWCNTs). The analysis was carried out in full scan mode using dual ionization mode of APPI in the mass range of 100-650 units. The limit of detection and limit of quantification values for the pesticides were in the range of 0.025-0.15 ng mL(-1) and 0.1-0.5 ng mL(-1) respectively. The matrix effect of the method was found to be low and extraction recoveries were in the range of 60-110%. Some of the real fruits juice samples showed the presence of some pesticides in the range of 6.5-24.8 ng L(-1). Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. High-throughput screening for multi-class veterinary drug residues in animal muscle using liquid chromatography/tandem mass spectrometry with on-line solid-phase extraction.

    PubMed

    Tang, Hubert Po-On; Ho, Clare; Lai, Shirley Sau-Ling

    2006-01-01

    A rapid qualitative method using on-line column-switching liquid chromatography/tandem mass spectrometry (LC/MS/MS) was developed and validated for screening 13 target veterinary drugs: four macrolides - erythromycin A, josamycin (leucomycin A3), kitasamycin (leucomycin A5), and tylosin A; six (fluoro)quinolones - ciprofloxacin, danofloxacin, enrofloxacin, flumequine, oxolinic acid, and sarafloxacin; and lincomycin, virginiamycin M1, and trimethoprim in different animal muscles. Clindamycin, norfloxacin, nalidixic acid, oleandomycin, ormetoprim, and roxithromycin were used as the internal standards. After simple deproteination and analyte extraction of muscle samples using acetonitrile, the supernatant was subjected to on-line cleanup and direct analysis by LC/MS/MS. On-line cleanup with an extraction cartridge packed with hydrophilic-hydrophobic polymer sorbent followed by fast LC using a short C18 column resulted in a total analysis cycle of 6 min for 19 drugs. This screening method considerably reduced the time and the cost for the quantitative and confirmatory analyses. The application of a control point approach was also introduced and explained. Copyright (c) 2006 John Wiley & Sons, Ltd.

Top