classify samples based: Topics by Science.gov

Sample records for classify samples based

Centre-based restricted nearest feature plane with angle classifier for face recognition

NASA Astrophysics Data System (ADS)

Tang, Linlin; Lu, Huifen; Zhao, Liang; Li, Zuohua

2017-10-01

An improved classifier based on the nearest feature plane (NFP), called the centre-based restricted nearest feature plane with the angle (RNFPA) classifier, is proposed for the face recognition problems here. The famous NFP uses the geometrical information of samples to increase the number of training samples, but it increases the computation complexity and it also has an inaccuracy problem coursed by the extended feature plane. To solve the above problems, RNFPA exploits a centre-based feature plane and utilizes a threshold of angle to restrict extended feature space. By choosing the appropriate angle threshold, RNFPA can improve the performance and decrease computation complexity. Experiments in the AT&T face database, AR face database and FERET face database are used to evaluate the proposed classifier. Compared with the original NFP classifier, the nearest feature line (NFL) classifier, the nearest neighbour (NN) classifier and some other improved NFP classifiers, the proposed one achieves competitive performance.
Determination of Minimum Training Sample Size for Microarray-Based Cancer Outcome Prediction–An Empirical Assessment

PubMed Central

Cheng, Ningtao; Wu, Leihong; Cheng, Yiyu

2013-01-01

The promise of microarray technology in providing prediction classifiers for cancer outcome estimation has been confirmed by a number of demonstrable successes. However, the reliability of prediction results relies heavily on the accuracy of statistical parameters involved in classifiers. It cannot be reliably estimated with only a small number of training samples. Therefore, it is of vital importance to determine the minimum number of training samples and to ensure the clinical value of microarrays in cancer outcome prediction. We evaluated the impact of training sample size on model performance extensively based on 3 large-scale cancer microarray datasets provided by the second phase of MicroArray Quality Control project (MAQC-II). An SSNR-based (scale of signal-to-noise ratio) protocol was proposed in this study for minimum training sample size determination. External validation results based on another 3 cancer datasets confirmed that the SSNR-based approach could not only determine the minimum number of training samples efficiently, but also provide a valuable strategy for estimating the underlying performance of classifiers in advance. Once translated into clinical routine applications, the SSNR-based protocol would provide great convenience in microarray-based cancer outcome prediction in improving classifier reliability. PMID:23861920
A two-dimensional matrix image based feature extraction method for classification of sEMG: A comparative analysis based on SVM, KNN and RBF-NN.

PubMed

Wen, Tingxi; Zhang, Zhongnan; Qiu, Ming; Zeng, Ming; Luo, Weizhen

2017-01-01

The computer mouse is an important human-computer interaction device. But patients with physical finger disability are unable to operate this device. Surface EMG (sEMG) can be monitored by electrodes on the skin surface and is a reflection of the neuromuscular activities. Therefore, we can control limbs auxiliary equipment by utilizing sEMG classification in order to help the physically disabled patients to operate the mouse. To develop a new a method to extract sEMG generated by finger motion and apply novel features to classify sEMG. A window-based data acquisition method was presented to extract signal samples from sEMG electordes. Afterwards, a two-dimensional matrix image based feature extraction method, which differs from the classical methods based on time domain or frequency domain, was employed to transform signal samples to feature maps used for classification. In the experiments, sEMG data samples produced by the index and middle fingers at the click of a mouse button were separately acquired. Then, characteristics of the samples were analyzed to generate a feature map for each sample. Finally, the machine learning classification algorithms (SVM, KNN, RBF-NN) were employed to classify these feature maps on a GPU. The study demonstrated that all classifiers can identify and classify sEMG samples effectively. In particular, the accuracy of the SVM classifier reached up to 100%. The signal separation method is a convenient, efficient and quick method, which can effectively extract the sEMG samples produced by fingers. In addition, unlike the classical methods, the new method enables to extract features by enlarging sample signals' energy appropriately. The classical machine learning classifiers all performed well by using these features.
Transferring genomics to the clinic: distinguishing Burkitt and diffuse large B cell lymphomas.

PubMed

Sha, Chulin; Barrans, Sharon; Care, Matthew A; Cunningham, David; Tooze, Reuben M; Jack, Andrew; Westhead, David R

2015-01-01

Classifiers based on molecular criteria such as gene expression signatures have been developed to distinguish Burkitt lymphoma and diffuse large B cell lymphoma, which help to explore the intermediate cases where traditional diagnosis is difficult. Transfer of these research classifiers into a clinical setting is challenging because there are competing classifiers in the literature based on different methodology and gene sets with no clear best choice; classifiers based on one expression measurement platform may not transfer effectively to another; and, classifiers developed using fresh frozen samples may not work effectively with the commonly used and more convenient formalin fixed paraffin-embedded samples used in routine diagnosis. Here we thoroughly compared two published high profile classifiers developed on data from different Affymetrix array platforms and fresh-frozen tissue, examining their transferability and concordance. Based on this analysis, a new Burkitt and diffuse large B cell lymphoma classifier (BDC) was developed and employed on Illumina DASL data from our own paraffin-embedded samples, allowing comparison with the diagnosis made in a central haematopathology laboratory and evaluation of clinical relevance. We show that both previous classifiers can be recapitulated using very much smaller gene sets than originally employed, and that the classification result is closely dependent on the Burkitt lymphoma criteria applied in the training set. The BDC classification on our data exhibits high agreement (~95 %) with the original diagnosis. A simple outcome comparison in the patients presenting intermediate features on conventional criteria suggests that the cases classified as Burkitt lymphoma by BDC have worse response to standard diffuse large B cell lymphoma treatment than those classified as diffuse large B cell lymphoma. In this study, we comprehensively investigate two previous Burkitt lymphoma molecular classifiers, and implement a new gene expression classifier, BDC, that works effectively on paraffin-embedded samples and provides useful information for treatment decisions. The classifier is available as a free software package under the GNU public licence within the R statistical software environment through the link http://www.bioinformatics.leeds.ac.uk/labpages/softwares/ or on github https://github.com/Sharlene/BDC.
Soy sauce classification by geographic region and fermentation based on artificial neural network and genetic algorithm.

PubMed

Xu, Libin; Li, Yang; Xu, Ning; Hu, Yong; Wang, Chao; He, Jianjun; Cao, Yueze; Chen, Shigui; Li, Dongsheng

2014-12-24

This work demonstrated the possibility of using artificial neural networks to classify soy sauce from China. The aroma profiles of different soy sauce samples were differentiated using headspace solid-phase microextraction. The soy sauce samples were analyzed by gas chromatography-mass spectrometry, and 22 and 15 volatile aroma compounds were selected for sensitivity analysis to classify the samples by fermentation and geographic region, respectively. The 15 selected samples can be classified by fermentation and geographic region with a prediction success rate of 100%. Furans and phenols represented the variables with the greatest contribution in classifying soy sauce samples by fermentation and geographic region, respectively.
How large a training set is needed to develop a classifier for microarray data?

PubMed

Dobbin, Kevin K; Zhao, Yingdong; Simon, Richard M

2008-01-01

A common goal of gene expression microarray studies is the development of a classifier that can be used to divide patients into groups with different prognoses, or with different expected responses to a therapy. These types of classifiers are developed on a training set, which is the set of samples used to train a classifier. The question of how many samples are needed in the training set to produce a good classifier from high-dimensional microarray data is challenging. We present a model-based approach to determining the sample size required to adequately train a classifier. It is shown that sample size can be determined from three quantities: standardized fold change, class prevalence, and number of genes or features on the arrays. Numerous examples and important experimental design issues are discussed. The method is adapted to address ex post facto determination of whether the size of a training set used to develop a classifier was adequate. An interactive web site for performing the sample size calculations is provided. We showed that sample size calculations for classifier development from high-dimensional microarray data are feasible, discussed numerous important considerations, and presented examples.
Consensus Classification Using Non-Optimized Classifiers.

PubMed

Brownfield, Brett; Lemos, Tony; Kalivas, John H

2018-04-03

Classifying samples into categories is a common problem in analytical chemistry and other fields. Classification is usually based on only one method, but numerous classifiers are available with some being complex, such as neural networks, and others are simple, such as k nearest neighbors. Regardless, most classification schemes require optimization of one or more tuning parameters for best classification accuracy, sensitivity, and specificity. A process not requiring exact selection of tuning parameter values would be useful. To improve classification, several ensemble approaches have been used in past work to combine classification results from multiple optimized single classifiers. The collection of classifications for a particular sample are then combined by a fusion process such as majority vote to form the final classification. Presented in this Article is a method to classify a sample by combining multiple classification methods without specifically classifying the sample by each method, that is, the classification methods are not optimized. The approach is demonstrated on three analytical data sets. The first is a beer authentication set with samples measured on five instruments, allowing fusion of multiple instruments by three ways. The second data set is composed of textile samples from three classes based on Raman spectra. This data set is used to demonstrate the ability to classify simultaneously with different data preprocessing strategies, thereby reducing the need to determine the ideal preprocessing method, a common prerequisite for accurate classification. The third data set contains three wine cultivars for three classes measured at 13 unique chemical and physical variables. In all cases, fusion of nonoptimized classifiers improves classification. Also presented are atypical uses of Procrustes analysis and extended inverted signal correction (EISC) for distinguishing sample similarities to respective classes.
Fuzziness-based active learning framework to enhance hyperspectral image classification performance for discriminative and generative classifiers

PubMed Central

2018-01-01

Hyperspectral image classification with a limited number of training samples without loss of accuracy is desirable, as collecting such data is often expensive and time-consuming. However, classifiers trained with limited samples usually end up with a large generalization error. To overcome the said problem, we propose a fuzziness-based active learning framework (FALF), in which we implement the idea of selecting optimal training samples to enhance generalization performance for two different kinds of classifiers, discriminative and generative (e.g. SVM and KNN). The optimal samples are selected by first estimating the boundary of each class and then calculating the fuzziness-based distance between each sample and the estimated class boundaries. Those samples that are at smaller distances from the boundaries and have higher fuzziness are chosen as target candidates for the training set. Through detailed experimentation on three publically available datasets, we showed that when trained with the proposed sample selection framework, both classifiers achieved higher classification accuracy and lower processing time with the small amount of training data as opposed to the case where the training samples were selected randomly. Our experiments demonstrate the effectiveness of our proposed method, which equates favorably with the state-of-the-art methods. PMID:29304512
An Exemplar-Based Multi-View Domain Generalization Framework for Visual Recognition.

PubMed

Niu, Li; Li, Wen; Xu, Dong; Cai, Jianfei

2018-02-01

In this paper, we propose a new exemplar-based multi-view domain generalization (EMVDG) framework for visual recognition by learning robust classifier that are able to generalize well to arbitrary target domain based on the training samples with multiple types of features (i.e., multi-view features). In this framework, we aim to address two issues simultaneously. First, the distribution of training samples (i.e., the source domain) is often considerably different from that of testing samples (i.e., the target domain), so the performance of the classifiers learnt on the source domain may drop significantly on the target domain. Moreover, the testing data are often unseen during the training procedure. Second, when the training data are associated with multi-view features, the recognition performance can be further improved by exploiting the relation among multiple types of features. To address the first issue, considering that it has been shown that fusing multiple SVM classifiers can enhance the domain generalization ability, we build our EMVDG framework upon exemplar SVMs (ESVMs), in which a set of ESVM classifiers are learnt with each one trained based on one positive training sample and all the negative training samples. When the source domain contains multiple latent domains, the learnt ESVM classifiers are expected to be grouped into multiple clusters. To address the second issue, we propose two approaches under the EMVDG framework based on the consensus principle and the complementary principle, respectively. Specifically, we propose an EMVDG_CO method by adding a co-regularizer to enforce the cluster structures of ESVM classifiers on different views to be consistent based on the consensus principle. Inspired by multiple kernel learning, we also propose another EMVDG_MK method by fusing the ESVM classifiers from different views based on the complementary principle. In addition, we further extend our EMVDG framework to exemplar-based multi-view domain adaptation (EMVDA) framework when the unlabeled target domain data are available during the training procedure. The effectiveness of our EMVDG and EMVDA frameworks for visual recognition is clearly demonstrated by comprehensive experiments on three benchmark data sets.
Effect of separate sampling on classification accuracy.

PubMed

Shahrokh Esfahani, Mohammad; Dougherty, Edward R

2014-01-15

Measurements are commonly taken from two phenotypes to build a classifier, where the number of data points from each class is predetermined, not random. In this 'separate sampling' scenario, the data cannot be used to estimate the class prior probabilities. Moreover, predetermined class sizes can severely degrade classifier performance, even for large samples. We employ simulations using both synthetic and real data to show the detrimental effect of separate sampling on a variety of classification rules. We establish propositions related to the effect on the expected classifier error owing to a sampling ratio different from the population class ratio. From these we derive a sample-based minimax sampling ratio and provide an algorithm for approximating it from the data. We also extend to arbitrary distributions the classical population-based Anderson linear discriminant analysis minimax sampling ratio derived from the discriminant form of the Bayes classifier. All the codes for synthetic data and real data examples are written in MATLAB. A function called mmratio, whose output is an approximation of the minimax sampling ratio of a given dataset, is also written in MATLAB. All the codes are available at: http://gsp.tamu.edu/Publications/supplementary/shahrokh13b.
Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks.

PubMed

Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi

2014-12-08

Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the "small sample size" (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0-1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.
Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks

PubMed Central

Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi

2014-01-01

Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the “small sample size” (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0–1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system. PMID:25494350
Classification and identification of molecules through factor analysis method based on terahertz spectroscopy

NASA Astrophysics Data System (ADS)

Huang, Jianglou; Liu, Jinsong; Wang, Kejia; Yang, Zhengang; Liu, Xiaming

2018-06-01

By means of factor analysis approach, a method of molecule classification is built based on the measured terahertz absorption spectra of the molecules. A data matrix can be obtained by sampling the absorption spectra at different frequency points. The data matrix is then decomposed into the product of two matrices: a weight matrix and a characteristic matrix. By using the K-means clustering to deal with the weight matrix, these molecules can be classified. A group of samples (spirobenzopyran, indole, styrene derivatives and inorganic salts) has been prepared, and measured via a terahertz time-domain spectrometer. These samples are classified with 75% accuracy compared to that directly classified via their molecular formulas.
Effective Sequential Classifier Training for SVM-Based Multitemporal Remote Sensing Image Classification

NASA Astrophysics Data System (ADS)

Guo, Yiqing; Jia, Xiuping; Paull, David

2018-06-01

The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.
Molecular differential diagnosis of follicular thyroid carcinoma and adenoma based on gene expression profiling by using formalin-fixed paraffin-embedded tissues

PubMed Central

2013-01-01

Background Differential diagnosis between malignant follicular thyroid cancer (FTC) and benign follicular thyroid adenoma (FTA) is a great challenge for even an experienced pathologist and requires special effort. Molecular markers may potentially support a differential diagnosis between FTC and FTA in postoperative specimens. The purpose of this study was to derive molecular support for differential post-operative diagnosis, in the form of a simple multigene mRNA-based classifier that would differentiate between FTC and FTA tissue samples. Methods A molecular classifier was created based on a combined analysis of two microarray datasets (using 66 thyroid samples). The performance of the classifier was assessed using an independent dataset comprising 71 formalin-fixed paraffin-embedded (FFPE) samples (31 FTC and 40 FTA), which were analysed by quantitative real-time PCR (qPCR). In addition, three other microarray datasets (62 samples) were used to confirm the utility of the classifier. Results Five of 8 genes selected from training datasets (ELMO1, EMCN, ITIH5, KCNAB1, SLCO2A1) were amplified by qPCR in FFPE material from an independent sample set. Three other genes did not amplify in FFPE material, probably due to low abundance. All 5 analysed genes were downregulated in FTC compared to FTA. The sensitivity and specificity of the 5-gene classifier tested on the FFPE dataset were 71% and 72%, respectively. Conclusions The proposed approach could support histopathological examination: 5-gene classifier may aid in molecular discrimination between FTC and FTA in FFPE material. PMID:24099521
[MicroRNA Target Prediction Based on Support Vector Machine Ensemble Classification Algorithm of Under-sampling Technique].

PubMed

Chen, Zhiru; Hong, Wenxue

2016-02-01

Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
Comparison of Hybrid Classifiers for Crop Classification Using Normalized Difference Vegetation Index Time Series: A Case Study for Major Crops in North Xinjiang, China

PubMed Central

Hao, Pengyu; Wang, Li; Niu, Zheng

2015-01-01

A range of single classifiers have been proposed to classify crop types using time series vegetation indices, and hybrid classifiers are used to improve discriminatory power. Traditional fusion rules use the product of multi-single classifiers, but that strategy cannot integrate the classification output of machine learning classifiers. In this research, the performance of two hybrid strategies, multiple voting (M-voting) and probabilistic fusion (P-fusion), for crop classification using NDVI time series were tested with different training sample sizes at both pixel and object levels, and two representative counties in north Xinjiang were selected as study area. The single classifiers employed in this research included Random Forest (RF), Support Vector Machine (SVM), and See 5 (C 5.0). The results indicated that classification performance improved (increased the mean overall accuracy by 5%~10%, and reduced standard deviation of overall accuracy by around 1%) substantially with the training sample number, and when the training sample size was small (50 or 100 training samples), hybrid classifiers substantially outperformed single classifiers with higher mean overall accuracy (1%~2%). However, when abundant training samples (4,000) were employed, single classifiers could achieve good classification accuracy, and all classifiers obtained similar performances. Additionally, although object-based classification did not improve accuracy, it resulted in greater visual appeal, especially in study areas with a heterogeneous cropping pattern. PMID:26360597
An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics

PubMed Central

Torii, Manabu; Yin, Lanlan; Nguyen, Thang; Mazumdar, Chand T.; Liu, Hongfang; Hartley, David M.; Nelson, Noele P.

2014-01-01

Purpose Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles. Methods Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers. Results Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period. Conclusions The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages. PMID:21134784
Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

PubMed

Sankari, E Siva; Manimegalai, D

2017-12-21

Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations

PubMed Central

Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

2015-01-01

Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. PMID:26089862

Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.

PubMed

Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

2015-01-01

Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.
Target discrimination method for SAR images based on semisupervised co-training

NASA Astrophysics Data System (ADS)

Wang, Yan; Du, Lan; Dai, Hui

2018-01-01

Synthetic aperture radar (SAR) target discrimination is usually performed in a supervised manner. However, supervised methods for SAR target discrimination may need lots of labeled training samples, whose acquirement is costly, time consuming, and sometimes impossible. This paper proposes an SAR target discrimination method based on semisupervised co-training, which utilizes a limited number of labeled samples and an abundant number of unlabeled samples. First, Lincoln features, widely used in SAR target discrimination, are extracted from the training samples and partitioned into two sets according to their physical meanings. Second, two support vector machine classifiers are iteratively co-trained with the extracted two feature sets based on the co-training algorithm. Finally, the trained classifiers are exploited to classify the test data. The experimental results on real SAR images data not only validate the effectiveness of the proposed method compared with the traditional supervised methods, but also demonstrate the superiority of co-training over self-training, which only uses one feature set.
An ensemble of dissimilarity based classifiers for Mackerel gender determination

NASA Astrophysics Data System (ADS)

Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.

2014-03-01

Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.
Recognition Using Hybrid Classifiers.

PubMed

Osadchy, Margarita; Keren, Daniel; Raviv, Dolev

2016-04-01

A canonical problem in computer vision is category recognition (e.g., find all instances of human faces, cars etc., in an image). Typically, the input for training a binary classifier is a relatively small sample of positive examples, and a huge sample of negative examples, which can be very diverse, consisting of images from a large number of categories. The difficulty of the problem sharply increases with the dimension and size of the negative example set. We propose to alleviate this problem by applying a "hybrid" classifier, which replaces the negative samples by a prior, and then finds a hyperplane which separates the positive samples from this prior. The method is extended to kernel space and to an ensemble-based approach. The resulting binary classifiers achieve an identical or better classification rate than SVM, while requiring far smaller memory and lower computational complexity to train and apply.
Generative Models for Similarity-based Classification

DTIC Science & Technology

2007-01-01

NC), local nearest centroid (local NC), k-nearest neighbors ( kNN ), and condensed nearest neighbors (CNN) are all similarity-based classifiers which...vector machine to the k nearest neighbors of the test sample [80]. The SVM- KNN method was developed to address the robustness and dimensionality...concerns that afflict nearest neighbors and SVMs. Similarly to the nearest-means classifier, the SVM- KNN is a hybrid local and global classifier developed
Comparison of disease prevalence in two populations in the presence of misclassification.

PubMed

Tang, Man-Lai; Qiu, Shi-Fang; Poon, Wai-Yin

2012-11-01

Comparing disease prevalence in two groups is an important topic in medical research, and prevalence rates are obtained by classifying subjects according to whether they have the disease. Both high-cost infallible gold-standard classifiers or low-cost fallible classifiers can be used to classify subjects. However, statistical analysis that is based on data sets with misclassifications leads to biased results. As a compromise between the two classification approaches, partially validated sets are often used in which all individuals are classified by fallible classifiers, and some of the individuals are validated by the accurate gold-standard classifiers. In this article, we develop several reliable test procedures and approximate sample size formulas for disease prevalence studies based on the difference between two disease prevalence rates with two independent partially validated series. Empirical studies show that (i) the Score test produces close-to-nominal level and is preferred in practice; and (ii) the sample size formula based on the Score test is also fairly accurate in terms of the empirical power and type I error rate, and is hence recommended. A real example from an aplastic anemia study is used to illustrate the proposed methodologies. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Predicting Classifier Performance with Limited Training Data: Applications to Computer-Aided Diagnosis in Breast and Prostate Cancer

PubMed Central

Basavanhally, Ajay; Viswanath, Satish; Madabhushi, Anant

2015-01-01

Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets. PMID:25993029
Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis.

PubMed

Ozçift, Akin

2011-05-01

Supervised classification algorithms are commonly used in the designing of computer-aided diagnosis systems. In this study, we present a resampling strategy based Random Forests (RF) ensemble classifier to improve diagnosis of cardiac arrhythmia. Random forests is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. In this way, an RF ensemble classifier performs better than a single tree from classification performance point of view. In general, multiclass datasets having unbalanced distribution of sample sizes are difficult to analyze in terms of class discrimination. Cardiac arrhythmia is such a dataset that has multiple classes with small sample sizes and it is therefore adequate to test our resampling based training strategy. The dataset contains 452 samples in fourteen types of arrhythmias and eleven of these classes have sample sizes less than 15. Our diagnosis strategy consists of two parts: (i) a correlation based feature selection algorithm is used to select relevant features from cardiac arrhythmia dataset. (ii) RF machine learning algorithm is used to evaluate the performance of selected features with and without simple random sampling to evaluate the efficiency of proposed training strategy. The resultant accuracy of the classifier is found to be 90.0% and this is a quite high diagnosis performance for cardiac arrhythmia. Furthermore, three case studies, i.e., thyroid, cardiotocography and audiology, are used to benchmark the effectiveness of the proposed method. The results of experiments demonstrated the efficiency of random sampling strategy in training RF ensemble classification algorithm. Copyright © 2011 Elsevier Ltd. All rights reserved.
A blood-based proteomic classifier for the molecular characterization of pulmonary nodules.

PubMed

Li, Xiao-jun; Hayward, Clive; Fong, Pui-Yee; Dominguez, Michel; Hunsucker, Stephen W; Lee, Lik Wee; McLean, Matthew; Law, Scott; Butler, Heather; Schirm, Michael; Gingras, Olivier; Lamontagne, Julie; Allard, Rene; Chelsky, Daniel; Price, Nathan D; Lam, Stephen; Massion, Pierre P; Pass, Harvey; Rom, William N; Vachani, Anil; Fang, Kenneth C; Hood, Leroy; Kearney, Paul

2013-10-16

Each year, millions of pulmonary nodules are discovered by computed tomography and subsequently biopsied. Because most of these nodules are benign, many patients undergo unnecessary and costly invasive procedures. We present a 13-protein blood-based classifier that differentiates malignant and benign nodules with high confidence, thereby providing a diagnostic tool to avoid invasive biopsy on benign nodules. Using a systems biology strategy, we identified 371 protein candidates and developed a multiple reaction monitoring (MRM) assay for each. The MRM assays were applied in a three-site discovery study (n = 143) on plasma samples from patients with benign and stage IA lung cancer matched for nodule size, age, gender, and clinical site, producing a 13-protein classifier. The classifier was validated on an independent set of plasma samples (n = 104), exhibiting a negative predictive value (NPV) of 90%. Validation performance on samples from a nondiscovery clinical site showed an NPV of 94%, indicating the general effectiveness of the classifier. A pathway analysis demonstrated that the classifier proteins are likely modulated by a few transcription regulators (NF2L2, AHR, MYC, and FOS) that are associated with lung cancer, lung inflammation, and oxidative stress networks. The classifier score was independent of patient nodule size, smoking history, and age, which are risk factors used for clinical management of pulmonary nodules. Thus, this molecular test provides a potential complementary tool to help physicians in lung cancer diagnosis.
A fuzzy classifier system for process control

NASA Technical Reports Server (NTRS)

Karr, C. L.; Phillips, J. C.

1994-01-01

A fuzzy classifier system that discovers rules for controlling a mathematical model of a pH titration system was developed by researchers at the U.S. Bureau of Mines (USBM). Fuzzy classifier systems successfully combine the strengths of learning classifier systems and fuzzy logic controllers. Learning classifier systems resemble familiar production rule-based systems, but they represent their IF-THEN rules by strings of characters rather than in the traditional linguistic terms. Fuzzy logic is a tool that allows for the incorporation of abstract concepts into rule based-systems, thereby allowing the rules to resemble the familiar 'rules-of-thumb' commonly used by humans when solving difficult process control and reasoning problems. Like learning classifier systems, fuzzy classifier systems employ a genetic algorithm to explore and sample new rules for manipulating the problem environment. Like fuzzy logic controllers, fuzzy classifier systems encapsulate knowledge in the form of production rules. The results presented in this paper demonstrate the ability of fuzzy classifier systems to generate a fuzzy logic-based process control system.
Liquid-Based Medium Used to Prepare Cytological Breast Nipple Fluid Improves the Quality of Cellular Samples Automatic Collection

PubMed Central

Zonta, Marco Antonio; Velame, Fernanda; Gema, Samara; Filassi, Jose Roberto; Longatto-Filho, Adhemar

2014-01-01

Background Breast cancer is the second cause of death in women worldwide. The spontaneous breast nipple discharge may contain cells that can be analyzed for malignancy. Halo® Mamo Cyto Test (HMCT) was recently developed as an automated system indicated to aspirate cells from the breast ducts. The objective of this study was to standardize the methodology of sampling and sample preparation of nipple discharge obtained by the automated method Halo breast test and perform cytological evaluation in samples preserved in liquid medium (SurePath™). Methods We analyzed 564 nipple fluid samples, from women between 20 and 85 years old, without history of breast disease and neoplasia, no pregnancy, and without gynecologic medical history, collected by HMCT method and preserved in two different vials with solutions for transport. Results From 306 nipple fluid samples from method 1, 199 (65%) were classified as unsatisfactory (class 0), 104 (34%) samples were classified as benign findings (class II), and three (1%) were classified as undetermined to neoplastic cells (class III). From 258 samples analyzed in method 2, 127 (49%) were classified as class 0, 124 (48%) were classified as class II, and seven (2%) were classified as class III. Conclusion Our study suggests an improvement in the quality and quantity of cellular samples when the association of the two methodologies is performed, Halo breast test and the method in liquid medium. PMID:29147397
A survey of supervised machine learning models for mobile-phone based pathogen identification and classification

NASA Astrophysics Data System (ADS)

Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Tseng, Derek; Benien, Parul; Ozcan, Aydogan

2017-03-01

Giardia lamblia causes a disease known as giardiasis, which results in diarrhea, abdominal cramps, and bloating. Although conventional pathogen detection methods used in water analysis laboratories offer high sensitivity and specificity, they are time consuming, and need experts to operate bulky equipment and analyze the samples. Here we present a field-portable and cost-effective smartphone-based waterborne pathogen detection platform that can automatically classify Giardia cysts using machine learning. Our platform enables the detection and quantification of Giardia cysts in one hour, including sample collection, labeling, filtration, and automated counting steps. We evaluated the performance of three prototypes using Giardia-spiked water samples from different sources (e.g., reagent-grade, tap, non-potable, and pond water samples). We populated a training database with >30,000 cysts and estimated our detection sensitivity and specificity using 20 different classifier models, including decision trees, nearest neighbor classifiers, support vector machines (SVMs), and ensemble classifiers, and compared their speed of training and classification, as well as predicted accuracies. Among them, cubic SVM, medium Gaussian SVM, and bagged-trees were the most promising classifier types with accuracies of 94.1%, 94.2%, and 95%, respectively; we selected the latter as our preferred classifier for the detection and enumeration of Giardia cysts that are imaged using our mobile-phone fluorescence microscope. Without the need for any experts or microbiologists, this field-portable pathogen detection platform can present a useful tool for water quality monitoring in resource-limited-settings.
Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model

PubMed Central

Wang, Guofeng; Yang, Yinwei; Li, Zhimeng

2014-01-01

Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514
Force sensor based tool condition monitoring using a heterogeneous ensemble learning model.

PubMed

Wang, Guofeng; Yang, Yinwei; Li, Zhimeng

2014-11-14

Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.
Computer-aided diagnosis of early knee osteoarthritis based on MRI T2 mapping.

PubMed

Wu, Yixiao; Yang, Ran; Jia, Sen; Li, Zhanjun; Zhou, Zhiyang; Lou, Ting

2014-01-01

This work was aimed at studying the method of computer-aided diagnosis of early knee OA (OA: osteoarthritis). Based on the technique of MRI (MRI: Magnetic Resonance Imaging) T2 Mapping, through computer image processing, feature extraction, calculation and analysis via constructing a classifier, an effective computer-aided diagnosis method for knee OA was created to assist doctors in their accurate, timely and convenient detection of potential risk of OA. In order to evaluate this method, a total of 1380 data from the MRI images of 46 samples of knee joints were collected. These data were then modeled through linear regression on an offline general platform by the use of the ImageJ software, and a map of the physical parameter T2 was reconstructed. After the image processing, the T2 values of ten regions in the WORMS (WORMS: Whole-organ Magnetic Resonance Imaging Score) areas of the articular cartilage were extracted to be used as the eigenvalues in data mining. Then,a RBF (RBF: Radical Basis Function) network classifier was built to classify and identify the collected data. The classifier exhibited a final identification accuracy of 75%, indicating a good result of assisting diagnosis. Since the knee OA classifier constituted by a weights-directly-determined RBF neural network didn't require any iteration, our results demonstrated that the optimal weights, appropriate center and variance could be yielded through simple procedures. Furthermore, the accuracy for both the training samples and the testing samples from the normal group could reach 100%. Finally, the classifier was superior both in time efficiency and classification performance to the frequently used classifiers based on iterative learning. Thus it was suitable to be used as an aid to computer-aided diagnosis of early knee OA.
Cancer classification through filtering progressive transductive support vector machine based on gene expression data

NASA Astrophysics Data System (ADS)

Lu, Xinguo; Chen, Dan

2017-08-01

Traditional supervised classifiers neglect a large amount of data which not have sufficient follow-up information, only work with labeled data. Consequently, the small sample size limits the advancement of design appropriate classifier. In this paper, a transductive learning method which combined with the filtering strategy in transductive framework and progressive labeling strategy is addressed. The progressive labeling strategy does not need to consider the distribution of labeled samples to evaluate the distribution of unlabeled samples, can effective solve the problem of evaluate the proportion of positive and negative samples in work set. Our experiment result demonstrate that the proposed technique have great potential in cancer prediction based on gene expression.
An ensemble predictive modeling framework for breast cancer classification.

PubMed

Nagarajan, Radhakrishnan; Upreti, Meenakshi

2017-12-01

Molecular changes often precede clinical presentation of diseases and can be useful surrogates with potential to assist in informed clinical decision making. Recent studies have demonstrated the usefulness of modeling approaches such as classification that can predict the clinical outcomes from molecular expression profiles. While useful, a majority of these approaches implicitly use all molecular markers as features in the classification process often resulting in sparse high-dimensional projection of the samples often comparable to that of the sample size. In this study, a variant of the recently proposed ensemble classification approach is used for predicting good and poor-prognosis breast cancer samples from their molecular expression profiles. In contrast to traditional single and ensemble classifiers, the proposed approach uses multiple base classifiers with varying feature sets obtained from two-dimensional projection of the samples in conjunction with a majority voting strategy for predicting the class labels. In contrast to our earlier implementation, base classifiers in the ensembles are chosen based on maximal sensitivity and minimal redundancy by choosing only those with low average cosine distance. The resulting ensemble sets are subsequently modeled as undirected graphs. Performance of four different classification algorithms is shown to be better within the proposed ensemble framework in contrast to using them as traditional single classifier systems. Significance of a subset of genes with high-degree centrality in the network abstractions across the poor-prognosis samples is also discussed. Copyright © 2017 Elsevier Inc. All rights reserved.
Discriminant Analysis of Defective and Non-Defective Field Pea (Pisum sativum L.) into Broad Market Grades Based on Digital Image Features.

PubMed

McDonald, Linda S; Panozzo, Joseph F; Salisbury, Phillip A; Ford, Rebecca

2016-01-01

Field peas (Pisum sativum L.) are generally traded based on seed appearance, which subjectively defines broad market-grades. In this study, we developed an objective Linear Discriminant Analysis (LDA) model to classify market grades of field peas based on seed colour, shape and size traits extracted from digital images. Seeds were imaged in a high-throughput system consisting of a camera and laser positioned over a conveyor belt. Six colour intensity digital images were captured (under 405, 470, 530, 590, 660 and 850nm light) for each seed, and surface height was measured at each pixel by laser. Colour, shape and size traits were compiled across all seed in each sample to determine the median trait values. Defective and non-defective seed samples were used to calibrate and validate the model. Colour components were sufficient to correctly classify all non-defective seed samples into correct market grades. Defective samples required a combination of colour, shape and size traits to achieve 87% and 77% accuracy in market grade classification of calibration and validation sample-sets respectively. Following these results, we used the same colour, shape and size traits to develop an LDA model which correctly classified over 97% of all validation samples as defective or non-defective.
Discriminant Analysis of Defective and Non-Defective Field Pea (Pisum sativum L.) into Broad Market Grades Based on Digital Image Features

PubMed Central

McDonald, Linda S.; Panozzo, Joseph F.; Salisbury, Phillip A.; Ford, Rebecca

2016-01-01

Field peas (Pisum sativum L.) are generally traded based on seed appearance, which subjectively defines broad market-grades. In this study, we developed an objective Linear Discriminant Analysis (LDA) model to classify market grades of field peas based on seed colour, shape and size traits extracted from digital images. Seeds were imaged in a high-throughput system consisting of a camera and laser positioned over a conveyor belt. Six colour intensity digital images were captured (under 405, 470, 530, 590, 660 and 850nm light) for each seed, and surface height was measured at each pixel by laser. Colour, shape and size traits were compiled across all seed in each sample to determine the median trait values. Defective and non-defective seed samples were used to calibrate and validate the model. Colour components were sufficient to correctly classify all non-defective seed samples into correct market grades. Defective samples required a combination of colour, shape and size traits to achieve 87% and 77% accuracy in market grade classification of calibration and validation sample-sets respectively. Following these results, we used the same colour, shape and size traits to develop an LDA model which correctly classified over 97% of all validation samples as defective or non-defective. PMID:27176469
Feature selection and classification of multiparametric medical images using bagging and SVM

NASA Astrophysics Data System (ADS)

Fan, Yong; Resnick, Susan M.; Davatzikos, Christos

2008-03-01

This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.

Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification.

PubMed

Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

2016-01-01

Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.
Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification

PubMed Central

Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

2016-01-01

Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases. PMID:26764911
Urine cell-based DNA methylation classifier for monitoring bladder cancer.

PubMed

van der Heijden, Antoine G; Mengual, Lourdes; Ingelmo-Torres, Mercedes; Lozano, Juan J; van Rijt-van de Westerlo, Cindy C M; Baixauli, Montserrat; Geavlete, Bogdan; Moldoveanud, Cristian; Ene, Cosmin; Dinney, Colin P; Czerniak, Bogdan; Schalken, Jack A; Kiemeney, Lambertus A L M; Ribal, Maria J; Witjes, J Alfred; Alcaraz, Antonio

2018-01-01

Current standard methods used to detect and monitor bladder cancer (BC) are invasive or have low sensitivity. This study aimed to develop a urine methylation biomarker classifier for BC monitoring and validate this classifier in patients in follow-up for bladder cancer (PFBC). Voided urine samples ( N = 725) from BC patients, controls, and PFBC were prospectively collected in four centers. Finally, 626 urine samples were available for analysis. DNA was extracted from the urinary cells and bisulfite modificated, and methylation status was analyzed using pyrosequencing. Cytology was available from a subset of patients ( N = 399). In the discovery phase, seven selected genes from the literature ( CDH13 , CFTR , NID2 , SALL3 , TMEFF2 , TWIST1 , and VIM2 ) were studied in 111 BC and 57 control samples. This training set was used to develop a gene classifier by logistic regression and was validated in 458 PFBC samples (173 with recurrence). A three-gene methylation classifier containing CFTR , SALL3 , and TWIST1 was developed in the training set (AUC 0.874). The classifier achieved an AUC of 0.741 in the validation series. Cytology results were available for 308 samples from the validation set. Cytology achieved AUC 0.696 whereas the classifier in this subset of patients reached an AUC 0.768. Combining the methylation classifier with cytology results achieved an AUC 0.86 in the validation set, with a sensitivity of 96%, a specificity of 40%, and a positive and negative predictive value of 56 and 92%, respectively. The combination of the three-gene methylation classifier and cytology results has high sensitivity and high negative predictive value in a real clinical scenario (PFBC). The proposed classifier is a useful test for predicting BC recurrence and decrease the number of cystoscopies in the follow-up of BC patients. If only patients with a positive combined classifier result would be cystoscopied, 36% of all cystoscopies can be prevented.
Local classifier weighting by quadratic programming.

PubMed

Cevikalp, Hakan; Polikar, Robi

2008-10-01

It has been widely accepted that the classification accuracy can be improved by combining outputs of multiple classifiers. However, how to combine multiple classifiers with various (potentially conflicting) decisions is still an open problem. A rich collection of classifier combination procedures -- many of which are heuristic in nature -- have been developed for this goal. In this brief, we describe a dynamic approach to combine classifiers that have expertise in different regions of the input space. To this end, we use local classifier accuracy estimates to weight classifier outputs. Specifically, we estimate local recognition accuracies of classifiers near a query sample by utilizing its nearest neighbors, and then use these estimates to find the best weights of classifiers to label the query. The problem is formulated as a convex quadratic optimization problem, which returns optimal nonnegative classifier weights with respect to the chosen objective function, and the weights ensure that locally most accurate classifiers are weighted more heavily for labeling the query sample. Experimental results on several data sets indicate that the proposed weighting scheme outperforms other popular classifier combination schemes, particularly on problems with complex decision boundaries. Hence, the results indicate that local classification-accuracy-based combination techniques are well suited for decision making when the classifiers are trained by focusing on different regions of the input space.
Multi-view L2-SVM and its multi-view core vector machine.

PubMed

Huang, Chengquan; Chung, Fu-lai; Wang, Shitong

2016-03-01

In this paper, a novel L2-SVM based classifier Multi-view L2-SVM is proposed to address multi-view classification tasks. The proposed Multi-view L2-SVM classifier does not have any bias in its objective function and hence has the flexibility like μ-SVC in the sense that the number of the yielded support vectors can be controlled by a pre-specified parameter. The proposed Multi-view L2-SVM classifier can make full use of the coherence and the difference of different views through imposing the consensus among multiple views to improve the overall classification performance. Besides, based on the generalized core vector machine GCVM, the proposed Multi-view L2-SVM classifier is extended into its GCVM version MvCVM which can realize its fast training on large scale multi-view datasets, with its asymptotic linear time complexity with the sample size and its space complexity independent of the sample size. Our experimental results demonstrated the effectiveness of the proposed Multi-view L2-SVM classifier for small scale multi-view datasets and the proposed MvCVM classifier for large scale multi-view datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.
Frog sound identification using extended k-nearest neighbor classifier

NASA Astrophysics Data System (ADS)

Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati

2017-09-01

Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.
A qualitative signature for early diagnosis of hepatocellular carcinoma based on relative expression orderings.

PubMed

Ao, Lu; Zhang, Zimei; Guan, Qingzhou; Guo, Yating; Guo, You; Zhang, Jiahui; Lv, Xingwei; Huang, Haiyan; Zhang, Huarong; Wang, Xianlong; Guo, Zheng

2018-04-23

Currently, using biopsy specimens to confirm suspicious liver lesions of early hepatocellular carcinoma are not entirely reliable because of insufficient sampling amount and inaccurate sampling location. It is necessary to develop a signature to aid early hepatocellular carcinoma diagnosis using biopsy specimens even when the sampling location is inaccurate. Based on the within-sample relative expression orderings of gene pairs, we identified a simple qualitative signature to distinguish both hepatocellular carcinoma and adjacent non-tumour tissues from cirrhosis tissues of non-hepatocellular carcinoma patients. A signature consisting of 19 gene pairs was identified in the training data sets and validated in 2 large collections of samples from biopsy and surgical resection specimens. For biopsy specimens, 95.7% of 141 hepatocellular carcinoma tissues and all (100%) of 108 cirrhosis tissues of non-hepatocellular carcinoma patients were correctly classified. Especially, all (100%) of 60 hepatocellular carcinoma adjacent normal tissues and 77.5% of 80 hepatocellular carcinoma adjacent cirrhosis tissues were classified to hepatocellular carcinoma. For surgical resection specimens, 99.7% of 733 hepatocellular carcinoma specimens were correctly classified to hepatocellular carcinoma, while 96.1% of 254 hepatocellular carcinoma adjacent cirrhosis tissues and 95.9% of 538 hepatocellular carcinoma adjacent normal tissues were classified to hepatocellular carcinoma. In contrast, 17.0% of 47 cirrhosis from non-hepatocellular carcinoma patients waiting for liver transplantation were classified to hepatocellular carcinoma, indicating that some patients with long-lasting cirrhosis could have already gained hepatocellular carcinoma characteristics. The signature can distinguish both hepatocellular carcinoma tissues and tumour-adjacent tissues from cirrhosis tissues of non-hepatocellular carcinoma patients even using inaccurately sampled biopsy specimens, which can aid early diagnosis of hepatocellular carcinoma. © 2018 The Authors. Liver International Published by John Wiley & Sons Ltd.
Classification of biosensor time series using dynamic time warping: applications in screening cancer cells with characteristic biomarkers.

PubMed

Rai, Shesh N; Trainor, Patrick J; Khosravi, Farhad; Kloecker, Goetz; Panchapakesan, Balaji

2016-01-01

The development of biosensors that produce time series data will facilitate improvements in biomedical diagnostics and in personalized medicine. The time series produced by these devices often contains characteristic features arising from biochemical interactions between the sample and the sensor. To use such characteristic features for determining sample class, similarity-based classifiers can be utilized. However, the construction of such classifiers is complicated by the variability in the time domains of such series that renders the traditional distance metrics such as Euclidean distance ineffective in distinguishing between biological variance and time domain variance. The dynamic time warping (DTW) algorithm is a sequence alignment algorithm that can be used to align two or more series to facilitate quantifying similarity. In this article, we evaluated the performance of DTW distance-based similarity classifiers for classifying time series that mimics electrical signals produced by nanotube biosensors. Simulation studies demonstrated the positive performance of such classifiers in discriminating between time series containing characteristic features that are obscured by noise in the intensity and time domains. We then applied a DTW distance-based k -nearest neighbors classifier to distinguish the presence/absence of mesenchymal biomarker in cancer cells in buffy coats in a blinded test. Using a train-test approach, we find that the classifier had high sensitivity (90.9%) and specificity (81.8%) in differentiating between EpCAM-positive MCF7 cells spiked in buffy coats and those in plain buffy coats.
Stackable differential mobility analyzer for aerosol measurement

DOEpatents

Cheng, Meng-Dawn [Oak Ridge, TN; Chen, Da-Ren [Creve Coeur, MO

2007-05-08

A multi-stage differential mobility analyzer (MDMA) for aerosol measurements includes a first electrode or grid including at least one inlet or injection slit for receiving an aerosol including charged particles for analysis. A second electrode or grid is spaced apart from the first electrode. The second electrode has at least one sampling outlet disposed at a plurality different distances along its length. A volume between the first and the second electrode or grid between the inlet or injection slit and a distal one of the plurality of sampling outlets forms a classifying region, the first and second electrodes for charging to suitable potentials to create an electric field within the classifying region. At least one inlet or injection slit in the second electrode receives a sheath gas flow into an upstream end of the classifying region, wherein each sampling outlet functions as an independent DMA stage and classifies different size ranges of charged particles based on electric mobility simultaneously.
AVNM: A Voting based Novel Mathematical Rule for Image Classification.

PubMed

Vidyarthi, Ankit; Mittal, Namita

2016-12-01

In machine learning, the accuracy of the system depends upon classification result. Classification accuracy plays an imperative role in various domains. Non-parametric classifier like K-Nearest Neighbor (KNN) is the most widely used classifier for pattern analysis. Besides its easiness, simplicity and effectiveness characteristics, the main problem associated with KNN classifier is the selection of a number of nearest neighbors i.e. "k" for computation. At present, it is hard to find the optimal value of "k" using any statistical algorithm, which gives perfect accuracy in terms of low misclassification error rate. Motivated by the prescribed problem, a new sample space reduction weighted voting mathematical rule (AVNM) is proposed for classification in machine learning. The proposed AVNM rule is also non-parametric in nature like KNN. AVNM uses the weighted voting mechanism with sample space reduction to learn and examine the predicted class label for unidentified sample. AVNM is free from any initial selection of predefined variable and neighbor selection as found in KNN algorithm. The proposed classifier also reduces the effect of outliers. To verify the performance of the proposed AVNM classifier, experiments are made on 10 standard datasets taken from UCI database and one manually created dataset. The experimental result shows that the proposed AVNM rule outperforms the KNN classifier and its variants. Experimentation results based on confusion matrix accuracy parameter proves higher accuracy value with AVNM rule. The proposed AVNM rule is based on sample space reduction mechanism for identification of an optimal number of nearest neighbor selections. AVNM results in better classification accuracy and minimum error rate as compared with the state-of-art algorithm, KNN, and its variants. The proposed rule automates the selection of nearest neighbor selection and improves classification rate for UCI dataset and manually created dataset. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Age determination of bottled Chinese rice wine by VIS-NIR spectroscopy

NASA Astrophysics Data System (ADS)

Yu, Haiyan; Lin, Tao; Ying, Yibin; Pan, Xingxiang

2006-10-01

The feasibility of non-invasive visible and near infrared (VIS-NIR) spectroscopy for determining wine age (1, 2, 3, 4, and 5 years) of Chinese rice wine was investigated. Samples of Chinese rice wine were analyzed in 600 mL square brown glass bottles with side length of approximately 64 mm at room temperature. VIS-NIR spectra of 100 bottled Chinese rice wine samples were collected in transmission mode in the wavelength range of 350-1200 nm by a fiber spectrometer system. Discriminant models were developed based on discriminant analysis (DA) together with raw, first and second derivative spectra. The concentration of alcoholic degree, total acid, and °Brix was determined to validate the NIR results. The calibration result for raw spectra was better than that for first and second derivative spectra. The percentage of samples correctly classified for raw spectra was 98%. For 1-, 2-, and 3-year-old sample groups, the sample were all correctly classified, and for 4- and 5-year-old sample groups, the percentage of samples correctly classified was 92.9%, respectively. In validation analysis, the percentage of samples correctly classified was 100%. The results demonstrated that VIS-NIR spectroscopic technique could be used as a non-invasive, rapid and reliable method for predicting wine age of bottled Chinese rice wine.
Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

PubMed Central

Theis, Fabian J.

2017-01-01

Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464
A Nasal Brush-based Classifier of Asthma Identified by Machine Learning Analysis of Nasal RNA Sequence Data.

PubMed

Pandey, Gaurav; Pandey, Om P; Rogers, Angela J; Ahsen, Mehmet E; Hoffman, Gabriel E; Raby, Benjamin A; Weiss, Scott T; Schadt, Eric E; Bunyavanich, Supinda

2018-06-11

Asthma is a common, under-diagnosed disease affecting all ages. We sought to identify a nasal brush-based classifier of mild/moderate asthma. 190 subjects with mild/moderate asthma and controls underwent nasal brushing and RNA sequencing of nasal samples. A machine learning-based pipeline identified an asthma classifier consisting of 90 genes interpreted via an L2-regularized logistic regression classification model. This classifier performed with strong predictive value and sensitivity across eight test sets, including (1) a test set of independent asthmatic and control subjects profiled by RNA sequencing (positive and negative predictive values of 1.00 and 0.96, respectively; AUC of 0.994), (2) two independent case-control cohorts of asthma profiled by microarray, and (3) five cohorts with other respiratory conditions (allergic rhinitis, upper respiratory infection, cystic fibrosis, smoking), where the classifier had a low to zero misclassification rate. Following validation in large, prospective cohorts, this classifier could be developed into a nasal biomarker of asthma.
Fuzzy Nonlinear Proximal Support Vector Machine for Land Extraction Based on Remote Sensing Image

PubMed Central

Zhong, Xiaomei; Li, Jianping; Dou, Huacheng; Deng, Shijun; Wang, Guofei; Jiang, Yu; Wang, Yongjie; Zhou, Zebing; Wang, Li; Yan, Fei

2013-01-01

Currently, remote sensing technologies were widely employed in the dynamic monitoring of the land. This paper presented an algorithm named fuzzy nonlinear proximal support vector machine (FNPSVM) by basing on ETM+ remote sensing image. This algorithm is applied to extract various types of lands of the city Da’an in northern China. Two multi-category strategies, namely “one-against-one” and “one-against-rest” for this algorithm were described in detail and then compared. A fuzzy membership function was presented to reduce the effects of noises or outliers on the data samples. The approaches of feature extraction, feature selection, and several key parameter settings were also given. Numerous experiments were carried out to evaluate its performances including various accuracies (overall accuracies and kappa coefficient), stability, training speed, and classification speed. The FNPSVM classifier was compared to the other three classifiers including the maximum likelihood classifier (MLC), back propagation neural network (BPN), and the proximal support vector machine (PSVM) under different training conditions. The impacts of the selection of training samples, testing samples and features on the four classifiers were also evaluated in these experiments. PMID:23936016
A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.

PubMed

Gao, Xiang; Lin, Huaiying; Dong, Qunfeng

2017-01-01

Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes' theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.
Blood Based Biomarkers of Early Onset Breast Cancer

DTIC Science & Technology

2016-12-01

discretizes the data, and also using logistic elastic net – a form of linear regression - we were unable to build a classifier that could accurately...classifier for differentiating cases from controls off discretized data. The first pass analysis demonstrated a 35 gene signature that differentiated...to the discretized data for mRNA gene signature, the samples used to “train” were also included in the final samples used to “test” the algorithm
Multicentre prospective validation of a urinary peptidome-based classifier for the diagnosis of type 2 diabetic nephropathy

PubMed Central

Siwy, Justyna; Schanstra, Joost P.; Argiles, Angel; Bakker, Stephan J.L.; Beige, Joachim; Boucek, Petr; Brand, Korbinian; Delles, Christian; Duranton, Flore; Fernandez-Fernandez, Beatriz; Jankowski, Marie-Luise; Al Khatib, Mohammad; Kunt, Thomas; Lajer, Maria; Lichtinghagen, Ralf; Lindhardt, Morten; Maahs, David M; Mischak, Harald; Mullen, William; Navis, Gerjan; Noutsou, Marina; Ortiz, Alberto; Persson, Frederik; Petrie, John R.; Roob, Johannes M.; Rossing, Peter; Ruggenenti, Piero; Rychlik, Ivan; Serra, Andreas L.; Snell-Bergeon, Janet; Spasovski, Goce; Stojceva-Taneva, Olivera; Trillini, Matias; von der Leyen, Heiko; Winklhofer-Roob, Brigitte M.; Zürbig, Petra; Jankowski, Joachim

2014-01-01

Background Diabetic nephropathy (DN) is one of the major late complications of diabetes. Treatment aimed at slowing down the progression of DN is available but methods for early and definitive detection of DN progression are currently lacking. The ‘Proteomic prediction and Renin angiotensin aldosterone system Inhibition prevention Of early diabetic nephRopathy In TYpe 2 diabetic patients with normoalbuminuria trial’ (PRIORITY) aims to evaluate the early detection of DN in patients with type 2 diabetes (T2D) using a urinary proteome-based classifier (CKD273). Methods In this ancillary study of the recently initiated PRIORITY trial we aimed to validate for the first time the CKD273 classifier in a multicentre (9 different institutions providing samples from 165 T2D patients) prospective setting. In addition we also investigated the influence of sample containers, age and gender on the CKD273 classifier. Results We observed a high consistency of the CKD273 classification scores across the different centres with areas under the curves ranging from 0.95 to 1.00. The classifier was independent of age (range tested 16–89 years) and gender. Furthermore, the use of different urine storage containers did not affect the classification scores. Analysis of the distribution of the individual peptides of the classifier over the nine different centres showed that fragments of blood-derived and extracellular matrix proteins were the most consistently found. Conclusion We provide for the first time validation of this urinary proteome-based classifier in a multicentre prospective setting and show the suitability of the CKD273 classifier to be used in the PRIORITY trial. PMID:24589724
A bench-top hyperspectral imaging system to classify beef from Nellore cattle based on tenderness

NASA Astrophysics Data System (ADS)

Nubiato, Keni Eduardo Zanoni; Mazon, Madeline Rezende; Antonelo, Daniel Silva; Calkins, Chris R.; Naganathan, Govindarajan Konda; Subbiah, Jeyamkondan; da Luz e Silva, Saulo

2018-03-01

The aim of this study was to evaluate the accuracy of classification of Nellore beef aged for 0, 7, 14, or 21 days and classification based on tenderness and aging period using a bench-top hyperspectral imaging system. A hyperspectral imaging system (λ = 928-2524 nm) was used to collect hyperspectral images of the Longissimus thoracis et lumborum (aging n = 376 and tenderness n = 345) of Nellore cattle. The image processing steps included selection of region of interest, extraction of spectra, and indentification and evalution of selected wavelengths for classification. Six linear discriminant models were developed to classify samples based on tenderness and aging period. The model using the first derivative of partial absorbance spectra (give wavelength range spectra) was able to classify steaks based on the tenderness with an overall accuracy of 89.8%. The model using the first derivative of full absorbance spectra was able to classify steaks based on aging period with an overall accuracy of 84.8%. The results demonstrate that the HIS may be a viable technology for classifying beef based on tenderness and aging period.
Accurate determination of imaging modality using an ensemble of text- and image-based classifiers.

PubMed

Kahn, Charles E; Kalpathy-Cramer, Jayashree; Lam, Cesar A; Eldredge, Christina E

2012-02-01

Imaging modality can aid retrieval of medical images for clinical practice, research, and education. We evaluated whether an ensemble classifier could outperform its constituent individual classifiers in determining the modality of figures from radiology journals. Seventeen automated classifiers analyzed 77,495 images from two radiology journals. Each classifier assigned one of eight imaging modalities--computed tomography, graphic, magnetic resonance imaging, nuclear medicine, positron emission tomography, photograph, ultrasound, or radiograph-to each image based on visual and/or textual information. Three physicians determined the modality of 5,000 randomly selected images as a reference standard. A "Simple Vote" ensemble classifier assigned each image to the modality that received the greatest number of individual classifiers' votes. A "Weighted Vote" classifier weighted each individual classifier's vote based on performance over a training set. For each image, this classifier's output was the imaging modality that received the greatest weighted vote score. We measured precision, recall, and F score (the harmonic mean of precision and recall) for each classifier. Individual classifiers' F scores ranged from 0.184 to 0.892. The simple vote and weighted vote classifiers correctly assigned 4,565 images (F score, 0.913; 95% confidence interval, 0.905-0.921) and 4,672 images (F score, 0.934; 95% confidence interval, 0.927-0.941), respectively. The weighted vote classifier performed significantly better than all individual classifiers. An ensemble classifier correctly determined the imaging modality of 93% of figures in our sample. The imaging modality of figures published in radiology journals can be determined with high accuracy, which will improve systems for image retrieval.
Comparison of Genetic Algorithm, Particle Swarm Optimization and Biogeography-based Optimization for Feature Selection to Classify Clusters of Microcalcifications

NASA Astrophysics Data System (ADS)

Khehra, Baljit Singh; Pharwaha, Amar Partap Singh

2017-04-01

Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.

Robust online tracking via adaptive samples selection with saliency detection

NASA Astrophysics Data System (ADS)

Yan, Jia; Chen, Xi; Zhu, QiuPing

2013-12-01

Online tracking has shown to be successful in tracking of previously unknown objects. However, there are two important factors which lead to drift problem of online tracking, the one is how to select the exact labeled samples even when the target locations are inaccurate, and the other is how to handle the confusors which have similar features with the target. In this article, we propose a robust online tracking algorithm with adaptive samples selection based on saliency detection to overcome the drift problem. To deal with the problem of degrading the classifiers using mis-aligned samples, we introduce the saliency detection method to our tracking problem. Saliency maps and the strong classifiers are combined to extract the most correct positive samples. Our approach employs a simple yet saliency detection algorithm based on image spectral residual analysis. Furthermore, instead of using the random patches as the negative samples, we propose a reasonable selection criterion, in which both the saliency confidence and similarity are considered with the benefits that confusors in the surrounding background are incorporated into the classifiers update process before the drift occurs. The tracking task is formulated as a binary classification via online boosting framework. Experiment results in several challenging video sequences demonstrate the accuracy and stability of our tracker.
Hybrid Radar Emitter Recognition Based on Rough k-Means Classifier and Relevance Vector Machine

PubMed Central

Yang, Zhutian; Wu, Zhilu; Yin, Zhendong; Quan, Taifan; Sun, Hongjian

2013-01-01

Due to the increasing complexity of electromagnetic signals, there exists a significant challenge for recognizing radar emitter signals. In this paper, a hybrid recognition approach is presented that classifies radar emitter signals by exploiting the different separability of samples. The proposed approach comprises two steps, namely the primary signal recognition and the advanced signal recognition. In the former step, a novel rough k-means classifier, which comprises three regions, i.e., certain area, rough area and uncertain area, is proposed to cluster the samples of radar emitter signals. In the latter step, the samples within the rough boundary are used to train the relevance vector machine (RVM). Then RVM is used to recognize the samples in the uncertain area; therefore, the classification accuracy is improved. Simulation results show that, for recognizing radar emitter signals, the proposed hybrid recognition approach is more accurate, and presents lower computational complexity than traditional approaches. PMID:23344380
ANALYSIS OF SAMPLING TECHNIQUES FOR IMBALANCED DATA: AN N=648 ADNI STUDY

PubMed Central

Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M.; Ye, Jieping

2013-01-01

Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer’s disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and under sampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1). a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2). sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. PMID:24176869
A false sense of security? Can tiered approach be trusted to accurately classify immunogenicity samples?

PubMed

Jaki, Thomas; Allacher, Peter; Horling, Frank

2016-09-05

Detecting and characterizing of anti-drug antibodies (ADA) against a protein therapeutic are crucially important to monitor the unwanted immune response. Usually a multi-tiered approach that initially rapidly screens for positive samples that are subsequently confirmed in a separate assay is employed for testing of patient samples for ADA activity. In this manuscript we evaluate the ability of different methods used to classify subject with screening and competition based confirmatory assays. We find that for the overall performance of the multi-stage process the method used for confirmation is most important where a t-test is best when differences are moderate to large. Moreover we find that, when differences between positive and negative samples are not sufficiently large, using a competition based confirmation step does yield poor classification of positive samples. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Stackable differential mobility analyzer for aerosol measurement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cheng, Meng-Dawn; Chen, Da-Ren

2007-05-08

A multi-stage differential mobility analyzer (MDMA) for aerosol measurements includes a first electrode or grid including at least one inlet or injection slit for receiving an aerosol including charged particles for analysis. A second electrode or grid is spaced apart from the first electrode. The second electrode has at least one sampling outlet disposed at a plurality different distances along its length. A volume between the first and the second electrode or grid between the inlet or injection slit and a distal one of the plurality of sampling outlets forms a classifying region, the first and second electrodes for chargingmore » to suitable potentials to create an electric field within the classifying region. At least one inlet or injection slit in the second electrode receives a sheath gas flow into an upstream end of the classifying region, wherein each sampling outlet functions as an independent DMA stage and classifies different size ranges of charged particles based on electric mobility simultaneously.« less
An evaluation of sampling and full enumeration strategies for Fisher Jenks classification in big data settings

USGS Publications Warehouse

Rey, Sergio J.; Stephens, Philip A.; Laura, Jason R.

2017-01-01

Large data contexts present a number of challenges to optimal choropleth map classifiers. Application of optimal classifiers to a sample of the attribute space is one proposed solution. The properties of alternative sampling-based classification methods are examined through a series of Monte Carlo simulations. The impacts of spatial autocorrelation, number of desired classes, and form of sampling are shown to have significant impacts on the accuracy of map classifications. Tradeoffs between improved speed of the sampling approaches and loss of accuracy are also considered. The results suggest the possibility of guiding the choice of classification scheme as a function of the properties of large data sets.
D Semantic Labeling of ALS Data Based on Domain Adaption by Transferring and Fusing Random Forest Models

NASA Astrophysics Data System (ADS)

Wu, J.; Yao, W.; Zhang, J.; Li, Y.

2018-04-01

Labeling 3D point cloud data with traditional supervised learning methods requires considerable labelled samples, the collection of which is cost and time expensive. This work focuses on adopting domain adaption concept to transfer existing trained random forest classifiers (based on source domain) to new data scenes (target domain), which aims at reducing the dependence of accurate 3D semantic labeling in point clouds on training samples from the new data scene. Firstly, two random forest classifiers were firstly trained with existing samples previously collected for other data. They were different from each other by using two different decision tree construction algorithms: C4.5 with information gain ratio and CART with Gini index. Secondly, four random forest classifiers adapted to the target domain are derived through transferring each tree in the source random forest models with two types of operations: structure expansion and reduction-SER and structure transfer-STRUT. Finally, points in target domain are labelled by fusing the four newly derived random forest classifiers using weights of evidence based fusion model. To validate our method, experimental analysis was conducted using 3 datasets: one is used as the source domain data (Vaihingen data for 3D Semantic Labelling); another two are used as the target domain data from two cities in China (Jinmen city and Dunhuang city). Overall accuracies of 85.5 % and 83.3 % for 3D labelling were achieved for Jinmen city and Dunhuang city data respectively, with only 1/3 newly labelled samples compared to the cases without domain adaption.
A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory.

PubMed

Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco

2011-01-01

Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithms.
The Preservation of Two Infant Temperaments into Adolescence

ERIC Educational Resources Information Center

Kagan, Jerome; Snidman, Nancy; Kahn, Vali; Towsley, Sara

2007-01-01

This "Monograph" reports theoretically relevant behavioral, biological, and self-report assessments of a sample of 14-17-year-olds who had been classified into one of four temperamental groups at 4 months of age. The infant temperamental categories were based on observed behavior to a battery of unfamiliar stimuli. The infants classified as high…
Molecular Characterization of Hypoderma SPP. in Domestic Ruminants from Turkey and Pakistan.

PubMed

Ahmed, Haroon; Simsek, Sami; Saki, Cem Ecmel; Kesik, Harun Kaya; Kilinc, Seyma Gunyakti

2017-08-01

The aim of this study was to determine the morphological and molecular characterization of Hypoderma spp. in cattle and yak from provinces in Turkey and Pakistan. In total, 78 Hypoderma larvae were collected from slaughtered animals in Turkey and Pakistan from October 2015 to January 2016. Thirty-eight of these 78 Hypoderma larvae were morphologically classified as third instar larvae (L3s) of Hypoderma bovis, 37 were classified as Hypoderma lineatum, and 3 were classified as suspected or unidentified. The restriction enzyme TaqI was used to differentiate the Hypoderma spp. by polymerase chain reaction (PCR)-restriction fragment length polymorphism (RFLP). According to the sequences and the PCR-RFLP results, all larval samples from cattle from Turkey were classified as H. bovis, except for 1 sample classified as H. lineatum. All Hypoderma larvae from Pakistan were classified as H. lineatum from cattle and as Hypoderma sinense from yak. This study provides the first molecular characterization of H. lineatum (cattle) and H. sinense (yak) in Pakistan based on PCR-RFLP and sequencing results.
Effective Heart Disease Detection Based on Quantitative Computerized Traditional Chinese Medicine Using Representation Based Classifiers.

PubMed

Shu, Ting; Zhang, Bob; Tang, Yuan Yan

2017-01-01

At present, heart disease is the number one cause of death worldwide. Traditionally, heart disease is commonly detected using blood tests, electrocardiogram, cardiac computerized tomography scan, cardiac magnetic resonance imaging, and so on. However, these traditional diagnostic methods are time consuming and/or invasive. In this paper, we propose an effective noninvasive computerized method based on facial images to quantitatively detect heart disease. Specifically, facial key block color features are extracted from facial images and analyzed using the Probabilistic Collaborative Representation Based Classifier. The idea of facial key block color analysis is founded in Traditional Chinese Medicine. A new dataset consisting of 581 heart disease and 581 healthy samples was experimented by the proposed method. In order to optimize the Probabilistic Collaborative Representation Based Classifier, an analysis of its parameters was performed. According to the experimental results, the proposed method obtains the highest accuracy compared with other classifiers and is proven to be effective at heart disease detection.
A fast learning method for large scale and multi-class samples of SVM

NASA Astrophysics Data System (ADS)

Fan, Yu; Guo, Huiming

2017-06-01

A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
METHODS TO CLASSIFY ENVIRONMENTAL SAMPLES BASED ON MOLD ANALYSES BY QPCR

EPA Science Inventory

Quantitative PCR (QPCR) analysis of molds in indoor environmental samples produces highly accurate speciation and enumeration data. In a number of studies, eighty of the most common or potentially problematic indoor molds were identified and quantified in dust samples from homes...
Active learning based segmentation of Crohns disease from abdominal MRI.

PubMed

Mahapatra, Dwarikanath; Vos, Franciscus M; Buhmann, Joachim M

2016-05-01

This paper proposes a novel active learning (AL) framework, and combines it with semi supervised learning (SSL) for segmenting Crohns disease (CD) tissues from abdominal magnetic resonance (MR) images. Robust fully supervised learning (FSL) based classifiers require lots of labeled data of different disease severities. Obtaining such data is time consuming and requires considerable expertise. SSL methods use a few labeled samples, and leverage the information from many unlabeled samples to train an accurate classifier. AL queries labels of most informative samples and maximizes gain from the labeling effort. Our primary contribution is in designing a query strategy that combines novel context information with classification uncertainty and feature similarity. Combining SSL and AL gives a robust segmentation method that: (1) optimally uses few labeled samples and many unlabeled samples; and (2) requires lower training time. Experimental results show our method achieves higher segmentation accuracy than FSL methods with fewer samples and reduced training effort. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
[Studies on the brand traceability of milk powder based on NIR spectroscopy technology].

PubMed

Guan, Xiao; Gu, Fang-Qing; Liu, Jing; Yang, Yong-Jian

2013-10-01

Brand traceability of several different kinds of milk powder was studied by combining near infrared spectroscopy diffuse reflectance mode with soft independent modeling of class analogy (SIMCA) in the present paper. The near infrared spectrum of 138 samples, including 54 Guangming milk powder samples, 43 Netherlands samples, and 33 Nestle samples and 8 Yili samples, were collected. After pretreatment of full spectrum data variables in training set, principal component analysis was performed, and the contribution rate of the cumulative variance of the first three principal components was about 99.07%. Milk powder principal component regression model based on SIMCA was established, and used to classify the milk powder samples in prediction sets. The results showed that the recognition rate of Guangming milk powder, Netherlands milk powder and Nestle milk powder was 78%, 75% and 100%, the rejection rate was 100%, 87%, and 88%, respectively. Therefore, the near infrared spectroscopy combined with SIMCA model can classify milk powder with high accuracy, and is a promising identification method of milk powder variety.
Estimation of the diagnostic threshold accounting for decision costs and sampling uncertainty.

PubMed

Skaltsa, Konstantina; Jover, Lluís; Carrasco, Josep Lluís

2010-10-01

Medical diagnostic tests are used to classify subjects as non-diseased or diseased. The classification rule usually consists of classifying subjects using the values of a continuous marker that is dichotomised by means of a threshold. Here, the optimum threshold estimate is found by minimising a cost function that accounts for both decision costs and sampling uncertainty. The cost function is optimised either analytically in a normal distribution setting or empirically in a free-distribution setting when the underlying probability distributions of diseased and non-diseased subjects are unknown. Inference of the threshold estimates is based on approximate analytically standard errors and bootstrap-based approaches. The performance of the proposed methodology is assessed by means of a simulation study, and the sample size required for a given confidence interval precision and sample size ratio is also calculated. Finally, a case example based on previously published data concerning the diagnosis of Alzheimer's patients is provided in order to illustrate the procedure.
In vivo classification of human skin burns using machine learning and quantitative features captured by optical coherence tomography

NASA Astrophysics Data System (ADS)

Singla, Neeru; Srivastava, Vishal; Singh Mehta, Dalip

2018-02-01

We report the first fully automated detection of human skin burn injuries in vivo, with the goal of automatic surgical margin assessment based on optical coherence tomography (OCT) images. Our proposed automated procedure entails building a machine-learning-based classifier by extracting quantitative features from normal and burn tissue images recorded by OCT. In this study, 56 samples (28 normal, 28 burned) were imaged by OCT and eight features were extracted. A linear model classifier was trained using 34 samples and 22 samples were used to test the model. Sensitivity of 91.6% and specificity of 90% were obtained. Our results demonstrate the capability of a computer-aided technique for accurately and automatically identifying burn tissue resection margins during surgical treatment.
Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study.

PubMed

Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M; Ye, Jieping

2014-02-15

Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer's disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and undersampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1) a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2) sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. © 2013 Elsevier Inc. All rights reserved.
A comprehensive simulation study on classification of RNA-Seq data.

PubMed

Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet

2017-01-01

RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.
Please Don't Move-Evaluating Motion Artifact From Peripheral Quantitative Computed Tomography Scans Using Textural Features.

PubMed

Rantalainen, Timo; Chivers, Paola; Beck, Belinda R; Robertson, Sam; Hart, Nicolas H; Nimphius, Sophia; Weeks, Benjamin K; McIntyre, Fleur; Hands, Beth; Siafarikas, Aris

Most imaging methods, including peripheral quantitative computed tomography (pQCT), are susceptible to motion artifacts particularly in fidgety pediatric populations. Methods currently used to address motion artifact include manual screening (visual inspection) and objective assessments of the scans. However, previously reported objective methods either cannot be applied on the reconstructed image or have not been tested for distal bone sites. Therefore, the purpose of the present study was to develop and validate motion artifact classifiers to quantify motion artifact in pQCT scans. Whether textural features could provide adequate motion artifact classification performance in 2 adolescent datasets with pQCT scans from tibial and radial diaphyses and epiphyses was tested. The first dataset was split into training (66% of sample) and validation (33% of sample) datasets. Visual classification was used as the ground truth. Moderate to substantial classification performance (J48 classifier, kappa coefficients from 0.57 to 0.80) was observed in the validation dataset with the novel texture-based classifier. In applying the same classifier to the second cross-sectional dataset, a slight-to-fair (κ = 0.01-0.39) classification performance was observed. Overall, this novel textural analysis-based classifier provided a moderate-to-substantial classification of motion artifact when the classifier was specifically trained for the measurement device and population. Classification based on textural features may be used to prescreen obviously acceptable and unacceptable scans, with a subsequent human-operated visual classification of any remaining scans. Copyright © 2017 The International Society for Clinical Densitometry. Published by Elsevier Inc. All rights reserved.

Multicategory nets of single-layer perceptrons: complexity and sample-size issues.

PubMed

Raudys, Sarunas; Kybartas, Rimantas; Zavadskas, Edmundas Kazimieras

2010-05-01

The standard cost function of multicategory single-layer perceptrons (SLPs) does not minimize the classification error rate. In order to reduce classification error, it is necessary to: 1) refuse the traditional cost function, 2) obtain near to optimal pairwise linear classifiers by specially organized SLP training and optimal stopping, and 3) fuse their decisions properly. To obtain better classification in unbalanced training set situations, we introduce the unbalance correcting term. It was found that fusion based on the Kulback-Leibler (K-L) distance and the Wu-Lin-Weng (WLW) method result in approximately the same performance in situations where sample sizes are relatively small. The explanation for this observation is by theoretically known verity that an excessive minimization of inexact criteria becomes harmful at times. Comprehensive comparative investigations of six real-world pattern recognition (PR) problems demonstrated that employment of SLP-based pairwise classifiers is comparable and as often as not outperforming the linear support vector (SV) classifiers in moderate dimensional situations. The colored noise injection used to design pseudovalidation sets proves to be a powerful tool for facilitating finite sample problems in moderate-dimensional PR tasks.
Optimal number of features as a function of sample size for various classification rules.

PubMed

Hua, Jianping; Xiong, Zixiang; Lowey, James; Suh, Edward; Dougherty, Edward R

2005-04-15

Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. For the companion website, please visit http://public.tgen.org/tamu/ofs/ e-dougherty@ee.tamu.edu.
OpenCL based machine learning labeling of biomedical datasets

NASA Astrophysics Data System (ADS)

Amoros, Oscar; Escalera, Sergio; Puig, Anna

2011-03-01

In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and labeling speeds.
Multiclass classification of microarray data samples with a reduced number of genes

PubMed Central

2011-01-01

Background Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. Results A novel bound on the maximum number of genes that can be handled by binary classifiers in binary mediated multiclass classification algorithms of microarray data samples is presented. The bound suggests that high-dimensional binary output domains might favor the existence of accurate and sparse binary mediated multiclass classifiers for microarray data samples. Conclusions A comprehensive experimental work shows that the bound is indeed useful to induce accurate and sparse multiclass classifiers for microarray data samples. PMID:21342522
The evaluation of alternate methodologies for land cover classification in an urbanizing area

NASA Technical Reports Server (NTRS)

Smekofski, R. M.

1981-01-01

The usefulness of LANDSAT in classifying land cover and in identifying and classifying land use change was investigated using an urbanizing area as the study area. The question of what was the best technique for classification was the primary focus of the study. The many computer-assisted techniques available to analyze LANDSAT data were evaluated. Techniques of statistical training (polygons from CRT, unsupervised clustering, polygons from digitizer and binary masks) were tested with minimum distance to the mean, maximum likelihood and canonical analysis with minimum distance to the mean classifiers. The twelve output images were compared to photointerpreted samples, ground verified samples and a current land use data base. Results indicate that for a reconnaissance inventory, the unsupervised training with canonical analysis-minimum distance classifier is the most efficient. If more detailed ground truth and ground verification is available, the polygons from the digitizer training with the canonical analysis minimum distance is more accurate.
Active Self-Paced Learning for Cost-Effective and Progressive Face Identification.

PubMed

Lin, Liang; Wang, Keze; Meng, Deyu; Zuo, Wangmeng; Zhang, Lei

2018-01-01

This paper aims to develop a novel cost-effective framework for face identification, which progressively maintains a batch of classifiers with the increasing face images of different individuals. By naturally combining two recently rising techniques: active learning (AL) and self-paced learning (SPL), our framework is capable of automatically annotating new instances and incorporating them into training under weak expert recertification. We first initialize the classifier using a few annotated samples for each individual, and extract image features using the convolutional neural nets. Then, a number of candidates are selected from the unannotated samples for classifier updating, in which we apply the current classifiers ranking the samples by the prediction confidence. In particular, our approach utilizes the high-confidence and low-confidence samples in the self-paced and the active user-query way, respectively. The neural nets are later fine-tuned based on the updated classifiers. Such heuristic implementation is formulated as solving a concise active SPL optimization problem, which also advances the SPL development by supplementing a rational dynamic curriculum constraint. The new model finely accords with the "instructor-student-collaborative" learning mode in human education. The advantages of this proposed framework are two-folds: i) The required number of annotated samples is significantly decreased while the comparable performance is guaranteed. A dramatic reduction of user effort is also achieved over other state-of-the-art active learning techniques. ii) The mixture of SPL and AL effectively improves not only the classifier accuracy compared to existing AL/SPL methods but also the robustness against noisy data. We evaluate our framework on two challenging datasets, which include hundreds of persons under diverse conditions, and demonstrate very promising results. Please find the code of this project at: http://hcp.sysu.edu.cn/projects/aspl/.
Solubility classification of airborne uranium products collected at the perimeter of the Allied Chemical Plant, Metropolis, Illinois

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kalkwarf, D.R.

1980-05-01

Airborne uranium products were collected at the perimeter of the uranium-conversion plant operated by the Allied Chemical Corporation at Metropolis, Illinois, and the dissolution rates of these products were classified in terms of the ICRP Task Group Lung Model. Assignments were based on measurements of the dissolution half-times exhibited by uranium components of the dust samples as they dissolved in simulated lung fluid at 37/sup 0/C. Based on three trials, the dissolution behavior of dust with aerodynamic equivalent diameter (AED) less than 5.5 ..mu..m and collected nearest the closest residence to the plant was classified 0.40 D, 0.60 Y. Basedmore » on two trials, the dissolution behavior of dust with AED greater than 5.5 ..mu..m and collected at this location was classified 0.37 D, 0.63 Y. Based on one trial, the dissolution behavior of dust with AED less than 5.5 ..mu..m and collected at a location on the opposite side of the plant was classified 0.68 D, 0.32 Y. There was some evidence for adsorption of dissolved uranium onto other dust components during dissolution, and preliminary dissolution trials are recommended for future samples in order to optimize the fluid replacement schedule.« less
Self-similarity Clustering Event Detection Based on Triggers Guidance

NASA Astrophysics Data System (ADS)

Zhang, Xianfei; Li, Bicheng; Tian, Yuxuan

Traditional method of Event Detection and Characterization (EDC) regards event detection task as classification problem. It makes words as samples to train classifier, which can lead to positive and negative samples of classifier imbalance. Meanwhile, there is data sparseness problem of this method when the corpus is small. This paper doesn't classify event using word as samples, but cluster event in judging event types. It adopts self-similarity to convergence the value of K in K-means algorithm by the guidance of event triggers, and optimizes clustering algorithm. Then, combining with named entity and its comparative position information, the new method further make sure the pinpoint type of event. The new method avoids depending on template of event in tradition methods, and its result of event detection can well be used in automatic text summarization, text retrieval, and topic detection and tracking.
A new algorithm for reducing the workload of experts in performing systematic reviews.

PubMed

Matwin, Stan; Kouznetsov, Alexandre; Inkpen, Diana; Frunza, Oana; O'Blenis, Peter

2010-01-01

To determine whether a factorized version of the complement naïve Bayes (FCNB) classifier can reduce the time spent by experts reviewing journal articles for inclusion in systematic reviews of drug class efficacy for disease treatment. The proposed classifier was evaluated on a test collection built from 15 systematic drug class reviews used in previous work. The FCNB classifier was constructed to classify each article as containing high-quality, drug class-specific evidence or not. Weight engineering (WE) techniques were added to reduce underestimation for Medical Subject Headings (MeSH)-based and Publication Type (PubType)-based features. Cross-validation experiments were performed to evaluate the classifier's parameters and performance. Work saved over sampling (WSS) at no less than a 95% recall was used as the main measure of performance. The minimum workload reduction for a systematic review for one topic, achieved with a FCNB/WE classifier, was 8.5%; the maximum was 62.2% and the average over the 15 topics was 33.5%. This is 15.0% higher than the average workload reduction obtained using a voting perceptron-based automated citation classification system. The FCNB/WE classifier is simple, easy to implement, and produces significantly better results in reducing the workload than previously achieved. The results support it being a useful algorithm for machine-learning-based automation of systematic reviews of drug class efficacy for disease treatment.
Label-free capture of breast cancer cells spiked in buffy coats using carbon nanotube antibody micro-arrays

NASA Astrophysics Data System (ADS)

Khosravi, Farhad; Trainor, Patrick; Rai, Shesh N.; Kloecker, Goetz; Wickstrom, Eric; Panchapakesan, Balaji

2016-04-01

We demonstrate the rapid and label-free capture of breast cancer cells spiked in buffy coats using nanotube-antibody micro-arrays. Single wall carbon nanotube arrays were manufactured using photo-lithography, metal deposition, and etching techniques. Anti-epithelial cell adhesion molecule (EpCAM) antibodies were functionalized to the surface of the nanotube devices using 1-pyrene-butanoic acid succinimidyl ester functionalization method. Following functionalization, plain buffy coat and MCF7 cell spiked buffy coats were adsorbed on to the nanotube device and electrical signatures were recorded for differences in interaction between samples. A statistical classifier for the ‘liquid biopsy’ was developed to create a predictive model based on dynamic time warping to classify device electrical signals that corresponded to plain (control) or spiked buffy coats (case). In training test, the device electrical signals originating from buffy versus spiked buffy samples were classified with ˜100% sensitivity, ˜91% specificity and ˜96% accuracy. In the blinded test, the signals were classified with ˜91% sensitivity, ˜82% specificity and ˜86% accuracy. A heatmap was generated to visually capture the relationship between electrical signatures and the sample condition. Confocal microscopic analysis of devices that were classified as spiked buffy coats based on their electrical signatures confirmed the presence of cancer cells, their attachment to the device and overexpression of EpCAM receptors. The cell numbers were counted to be ˜1-17 cells per 5 μl per device suggesting single cell sensitivity in spiked buffy coats that is scalable to higher volumes using the micro-arrays.
Classification of document page images based on visual similarity of layout structures

NASA Astrophysics Data System (ADS)

Shin, Christian K.; Doermann, David S.

1999-12-01

Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout of a document contains a significant amount of information that can be used to classify a document's type in the absence of domain specific models. A document type or genre can be defined by the user based primarily on layout structure. Our classification approach is based on 'visual similarity' of the layout structure by building a supervised classifier, given examples of the class. We use image features, such as the percentages of tex and non-text (graphics, image, table, and ruling) content regions, column structures, variations in the point size of fonts, the density of content area, and various statistics on features of connected components which can be derived from class samples without class knowledge. In order to obtain class labels for training samples, we conducted a user relevance test where subjects ranked UW-I document images with respect to the 12 representative images. We implemented our classification scheme using the OC1, a decision tree classifier, and report our findings.
[Leptospirosis in animal reproduction: III. Role of the hardjo serovar in bovine leptospirosis in Rio de Janeiro, Brazil].

PubMed

Lilenbaum, W; Dos Santos, M R

1995-01-01

Four hundred and five serum samples were drawn from cows with reproductive problems which were not vaccinated against leptospirosis from 21 dairy farms. Three distinct geographic regions were determined and the farms were also classified considering the production system, based on technological, zootechnical and sanitary resources. A total of 277 positive reactions were observed, corresponding to 68.39% of the samples. The predominant serovar was hardjo, reactive on 85 samples (20.98%), predominant on nine farms and observed on 17 farms (80.95%). It was observed the predominance of hardjo in all studied regions and on properties classified as type "A" (22 samples) and type "B" (49 samples). The role of this serovar on bovine leptospirosis in Brazil compared with other countries is discussed.
Evaluation of a segment-based LANDSAT full-frame approach to corp area estimation

NASA Technical Reports Server (NTRS)

Bauer, M. E. (Principal Investigator); Hixson, M. M.; Davis, S. M.

1981-01-01

As the registration of LANDSAT full frames enters the realm of current technology, sampling methods should be examined which utilize other than the segment data used for LACIE. The effect of separating the functions of sampling for training and sampling for area estimation. The frame selected for analysis was acquired over north central Iowa on August 9, 1978. A stratification of he full-frame was defined. Training data came from segments within the frame. Two classification and estimation procedures were compared: statistics developed on one segment were used to classify that segment, and pooled statistics from the segments were used to classify a systematic sample of pixels. Comparisons to USDA/ESCS estimates illustrate that the full-frame sampling approach can provide accurate and precise area estimates.
Textual and shape-based feature extraction and neuro-fuzzy classifier for nuclear track recognition

NASA Astrophysics Data System (ADS)

Khayat, Omid; Afarideh, Hossein

2013-04-01

Track counting algorithms as one of the fundamental principles of nuclear science have been emphasized in the recent years. Accurate measurement of nuclear tracks on solid-state nuclear track detectors is the aim of track counting systems. Commonly track counting systems comprise a hardware system for the task of imaging and software for analysing the track images. In this paper, a track recognition algorithm based on 12 defined textual and shape-based features and a neuro-fuzzy classifier is proposed. Features are defined so as to discern the tracks from the background and small objects. Then, according to the defined features, tracks are detected using a trained neuro-fuzzy system. Features and the classifier are finally validated via 100 Alpha track images and 40 training samples. It is shown that principle textual and shape-based features concomitantly yield a high rate of track detection compared with the single-feature based methods.
Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

PubMed

Tuo, Youlin; An, Ning; Zhang, Ming

2018-03-01

The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.
An Improvement To The k-Nearest Neighbor Classifier For ECG Database

NASA Astrophysics Data System (ADS)

Jaafar, Haryati; Hidayah Ramli, Nur; Nasir, Aimi Salihah Abdul

2018-03-01

The k nearest neighbor (kNN) is a non-parametric classifier and has been widely used for pattern classification. However, in practice, the performance of kNN often tends to fail due to the lack of information on how the samples are distributed among them. Moreover, kNN is no longer optimal when the training samples are limited. Another problem observed in kNN is regarding the weighting issues in assigning the class label before classification. Thus, to solve these limitations, a new classifier called Mahalanobis fuzzy k-nearest centroid neighbor (MFkNCN) is proposed in this study. Here, a Mahalanobis distance is applied to avoid the imbalance of samples distribition. Then, a surrounding rule is employed to obtain the nearest centroid neighbor based on the distributions of training samples and its distance to the query point. Consequently, the fuzzy membership function is employed to assign the query point to the class label which is frequently represented by the nearest centroid neighbor Experimental studies from electrocardiogram (ECG) signal is applied in this study. The classification performances are evaluated in two experimental steps i.e. different values of k and different sizes of feature dimensions. Subsequently, a comparative study of kNN, kNCN, FkNN and MFkCNN classifier is conducted to evaluate the performances of the proposed classifier. The results show that the performance of MFkNCN consistently exceeds the kNN, kNCN and FkNN with the best classification rates of 96.5%.
Support vector machines-based fault diagnosis for turbo-pump rotor

NASA Astrophysics Data System (ADS)

Yuan, Sheng-Fa; Chu, Fu-Lei

2006-05-01

Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.
Improving EEG-Based Motor Imagery Classification for Real-Time Applications Using the QSA Method.

PubMed

Batres-Mendoza, Patricia; Ibarra-Manzano, Mario A; Guerra-Hernandez, Erick I; Almanza-Ojeda, Dora L; Montoro-Sanjose, Carlos R; Romero-Troncoso, Rene J; Rostro-Gonzalez, Horacio

2017-01-01

We present an improvement to the quaternion-based signal analysis (QSA) technique to extract electroencephalography (EEG) signal features with a view to developing real-time applications, particularly in motor imagery (IM) cognitive processes. The proposed methodology (iQSA, improved QSA) extracts features such as the average, variance, homogeneity, and contrast of EEG signals related to motor imagery in a more efficient manner (i.e., by reducing the number of samples needed to classify the signal and improving the classification percentage) compared to the original QSA technique. Specifically, we can sample the signal in variable time periods (from 0.5 s to 3 s, in half-a-second intervals) to determine the relationship between the number of samples and their effectiveness in classifying signals. In addition, to strengthen the classification process a number of boosting-technique-based decision trees were implemented. The results show an 82.30% accuracy rate for 0.5 s samples and 73.16% for 3 s samples. This is a significant improvement compared to the original QSA technique that offered results from 33.31% to 40.82% without sampling window and from 33.44% to 41.07% with sampling window, respectively. We can thus conclude that iQSA is better suited to develop real-time applications.
Improving EEG-Based Motor Imagery Classification for Real-Time Applications Using the QSA Method

PubMed Central

Batres-Mendoza, Patricia; Guerra-Hernandez, Erick I.; Almanza-Ojeda, Dora L.; Montoro-Sanjose, Carlos R.

2017-01-01

We present an improvement to the quaternion-based signal analysis (QSA) technique to extract electroencephalography (EEG) signal features with a view to developing real-time applications, particularly in motor imagery (IM) cognitive processes. The proposed methodology (iQSA, improved QSA) extracts features such as the average, variance, homogeneity, and contrast of EEG signals related to motor imagery in a more efficient manner (i.e., by reducing the number of samples needed to classify the signal and improving the classification percentage) compared to the original QSA technique. Specifically, we can sample the signal in variable time periods (from 0.5 s to 3 s, in half-a-second intervals) to determine the relationship between the number of samples and their effectiveness in classifying signals. In addition, to strengthen the classification process a number of boosting-technique-based decision trees were implemented. The results show an 82.30% accuracy rate for 0.5 s samples and 73.16% for 3 s samples. This is a significant improvement compared to the original QSA technique that offered results from 33.31% to 40.82% without sampling window and from 33.44% to 41.07% with sampling window, respectively. We can thus conclude that iQSA is better suited to develop real-time applications. PMID:29348744
Prognostic significance of microsatellite instability‑associated pathways and genes in gastric cancer.

PubMed

Hang, Xiaosheng; Li, Dapeng; Wang, Jianping; Wang, Ge

2018-07-01

The aim of the present study was to reveal the potential molecular mechanisms of microsatellite instability (MSI) on the prognosis of gastric cancer (GC). The investigation was performed based on an RNAseq expression profiling dataset downloaded from The Cancer Genome Atlas, including 64 high‑level MSI (MSI‑H) GC samples, 44 low‑level MSI (MSI‑L) GC samples and 187 stable microsatellite (MSI‑S) GC samples. Differentially expressed genes (DEGs) were identified between the MSI‑H, MSI‑L and MSI‑S samples. Pathway enrichment analysis was performed for the identified DEGs and the pathway deviation scores of the significant enrichment pathways were calculated. A Multi‑Layer Perceptron (MLP) classifier, based on the different pathways associated with the MSI statuses was constructed for predicting the outcome of patients with GC, which was validated in another independent dataset. A total of 190 DEGs were selected between the MSI‑H, MSI‑L and MSI‑S samples. The MLP classifier was established based on the deviation scores of 10 significant pathways, among which antigen processing and presentation, and inflammatory bowel disease pathways were significantly enriched with HLA‑DRB5, HLA‑DMA, HLA‑DQA1 and HLA‑DRA; the measles, toxoplasmosis and herpes simplex infection pathways were significantly enriched with Janus kinase 2 (JAK2), caspase‑8 (CASP8) and Fas. The classifier performed well on an independent validation set with 100 GC samples. Taken together, the results indicated that MSI status may affect GC prognosis, partly through the antigen processing and presentation, inflammatory bowel disease, measles, toxoplasmosis and herpes simplex infection pathways. HLA‑DRB5, HLA‑DMA, HLA‑DQA1, HLA‑DRA, JAK2, CASP8 and Fas may be predictive factors for prognosis in GC.

Rock images classification by using deep convolution neural network

NASA Astrophysics Data System (ADS)

Cheng, Guojian; Guo, Wenhui

2017-08-01

Granularity analysis is one of the most essential issues in authenticate under microscope. To improve the efficiency and accuracy of traditional manual work, an convolutional neural network based method is proposed for granularity analysis from thin section image, which chooses and extracts features from image samples while build classifier to recognize granularity of input image samples. 4800 samples from Ordos basin are used for experiments under colour spaces of HSV, YCbCr and RGB respectively. On the test dataset, the correct rate in RGB colour space is 98.5%, and it is believable in HSV and YCbCr colour space. The results show that the convolution neural network can classify the rock images with high reliability.
Bayes estimation on parameters of the single-class classifier. [for remotely sensed crop data

NASA Technical Reports Server (NTRS)

Lin, G. C.; Minter, T. C.

1976-01-01

Normal procedures used for designing a Bayes classifier to classify wheat as the major crop of interest require not only training samples of wheat but also those of nonwheat. Therefore, ground truth must be available for the class of interest plus all confusion classes. The single-class Bayes classifier classifies data into the class of interest or the class 'other' but requires training samples only from the class of interest. This paper will present a procedure for Bayes estimation on the mean vector, covariance matrix, and a priori probability of the single-class classifier using labeled samples from the class of interest and unlabeled samples drawn from the mixture density function.
Development and Validation of a Semiquantitative, Multitarget PCR Assay for Diagnosis of Bacterial Vaginosis

PubMed Central

Lembke, Bryndon D.; Ramachandran, Kalpana; Body, Barbara A.; Nye, Melinda B.; Rivers, Charles A.; Schwebke, Jane R.

2012-01-01

Quantitative PCR assays were developed for 4 organisms reported previously to be useful positive indicators for the diagnosis of bacterial vaginosis (BV)—Atopobium vaginae, Bacterial Vaginosis-Associated Bacterium 2 (BVAB-2), Gardnerella vaginalis, and Megasphaera-1—and a single organism (Lactobacillus crispatus) that has been implicated as a negative indicator for BV. Vaginal samples (n = 169), classified as positive (n = 108) or negative (n = 61) for BV based on a combination of the Nugent Gram stain score and Amsel clinical criteria, were analyzed for the presence and quantity of each of the marker organisms, and the results were used to construct a semiquantitative, multiplex PCR assay for BV based on detection of 3 positive indicator organisms (A. vaginae, BVAB-2, and Megasphaera-1) and classification of samples using a combinatorial scoring system. The prototype BV PCR assay was then used to analyze the 169-member developmental sample set and, in a prospective, blinded manner, an additional 227 BV-classified vaginal samples (110 BV-positive samples and 117 BV-negative samples). The BV PCR assay demonstrated a sensitivity of 96.7% (202/209), a specificity of 92.2% (153/166), a positive predictive value of 94.0%, and a negative predictive value of 95.6%, with 21 samples (5.3%) classified as indeterminate for BV. This assay provides a reproducible and objective means of evaluating critical components of the vaginal microflora in women with signs and symptoms of vaginitis and is comparable in diagnostic accuracy to the conventional gold standard for diagnosis of BV. PMID:22535982
Sleep spindle detection using deep learning: A validation study based on crowdsourcing.

PubMed

Dakun Tan; Rui Zhao; Jinbo Sun; Wei Qin

2015-08-01

Sleep spindles are significant transient oscillations observed on the electroencephalogram (EEG) in stage 2 of non-rapid eye movement sleep. Deep belief network (DBN) gaining great successes in images and speech is still a novel method to develop sleep spindle detection system. In this paper, crowdsourcing replacing gold standard was applied to generate three different labeled samples and constructed three classes of datasets with a combination of these samples. An F1-score measure was estimated to compare the performance of DBN to other three classifiers on classifying these samples, with the DBN obtaining an result of 92.78%. Then a comparison of two feature extraction methods based on power spectrum density was made on same dataset using DBN. In addition, the DBN trained in dataset was applied to detect sleep spindle from raw EEG recordings and performed a comparable capacity to expert group consensus.
Determination of the Characteristics and Classification of Near-Infrared Spectra of Patchouli Oil (Pogostemon Cablin Benth.) from Different Origin

NASA Astrophysics Data System (ADS)

Diego, M. C. R.; Purwanto, Y. A.; Sutrisno; Budiastra, I. W.

2018-05-01

Research related to the non-destructive method of near-infrared (NIR) spectroscopy in aromatic oil is still in development in Indonesia. The objectives of the study were to determine the characteristics of the near-infrared spectra of patchouli oil and classify it based on its origin. The samples were selected from seven different places in Indonesia (Bogor and Garut from West Java, Aceh, and Jambi from Sumatra and Konawe, Masamba and Kolaka from Sulawesi Island). The spectral data of patchouli oil was obtained by FT-NIR spectrometer at the wavelength of 1000-2500 nm, and after that, the samples were subjected to composition analysis using Gas Chromatography-Mass Spectrometry. The transmittance and absorbance spectra were analyzed and then principal component analysis (PCA) was carried out. Discriminant analysis (DA) of the principal component was developed to classify patchouli oil based on its origin. The result shows that the data of both spectra (transmittance and absorbance spectra) by the PC analysis give a similar result for discriminating the seven types of patchouli oil due to their distribution and behavior. The DA of the three principal component in both data processed spectra could classify patchouli oil accurately. This result exposed that NIR spectroscopy can be successfully used as a correct method to classify patchouli oil based on its origin.
Probabilistic classifiers with high-dimensional data

PubMed Central

Kim, Kyung In; Simon, Richard

2011-01-01

For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not “anticonservative” using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set. PMID:21087946
Classification of epileptic EEG signals based on simple random sampling and sequential feature selection.

PubMed

Ghayab, Hadi Ratham Al; Li, Yan; Abdulla, Shahab; Diykh, Mohammed; Wan, Xiangkui

2016-06-01

Electroencephalogram (EEG) signals are used broadly in the medical fields. The main applications of EEG signals are the diagnosis and treatment of diseases such as epilepsy, Alzheimer, sleep problems and so on. This paper presents a new method which extracts and selects features from multi-channel EEG signals. This research focuses on three main points. Firstly, simple random sampling (SRS) technique is used to extract features from the time domain of EEG signals. Secondly, the sequential feature selection (SFS) algorithm is applied to select the key features and to reduce the dimensionality of the data. Finally, the selected features are forwarded to a least square support vector machine (LS_SVM) classifier to classify the EEG signals. The LS_SVM classifier classified the features which are extracted and selected from the SRS and the SFS. The experimental results show that the method achieves 99.90, 99.80 and 100 % for classification accuracy, sensitivity and specificity, respectively.
A Predictive Model for Toxicity Effects Assessment of Biotransformed Hepatic Drugs Using Iterative Sampling Method.

PubMed

Tharwat, Alaa; Moemen, Yasmine S; Hassanien, Aboul Ella

2016-12-09

Measuring toxicity is one of the main steps in drug development. Hence, there is a high demand for computational models to predict the toxicity effects of the potential drugs. In this study, we used a dataset, which consists of four toxicity effects:mutagenic, tumorigenic, irritant and reproductive effects. The proposed model consists of three phases. In the first phase, rough set-based methods are used to select the most discriminative features for reducing the classification time and improving the classification performance. Due to the imbalanced class distribution, in the second phase, different sampling methods such as Random Under-Sampling, Random Over-Sampling and Synthetic Minority Oversampling Technique are used to solve the problem of imbalanced datasets. ITerative Sampling (ITS) method is proposed to avoid the limitations of those methods. ITS method has two steps. The first step (sampling step) iteratively modifies the prior distribution of the minority and majority classes. In the second step, a data cleaning method is used to remove the overlapping that is produced from the first step. In the third phase, Bagging classifier is used to classify an unknown drug into toxic or non-toxic. The experimental results proved that the proposed model performed well in classifying the unknown samples according to all toxic effects in the imbalanced datasets.
Combined data mining/NIR spectroscopy for purity assessment of lime juice

NASA Astrophysics Data System (ADS)

Shafiee, Sahameh; Minaei, Saeid

2018-06-01

This paper reports the data mining study on the NIR spectrum of lime juice samples to determine their purity (natural or synthetic). NIR spectra for 72 pure and synthetic lime juice samples were recorded in reflectance mode. Sample outliers were removed using PCA analysis. Different data mining techniques for feature selection (Genetic Algorithm (GA)) and classification (including the radial basis function (RBF) network, Support Vector Machine (SVM), and Random Forest (RF) tree) were employed. Based on the results, SVM proved to be the most accurate classifier as it achieved the highest accuracy (97%) using the raw spectrum information. The classifier accuracy dropped to 93% when selected feature vector by GA search method was applied as classifier input. It can be concluded that some relevant features which produce good performance with the SVM classifier are removed by feature selection. Also, reduced spectra using PCA do not show acceptable performance (total accuracy of 66% by RBFNN), which indicates that dimensional reduction methods such as PCA do not always lead to more accurate results. These findings demonstrate the potential of data mining combination with near-infrared spectroscopy for monitoring lime juice quality in terms of natural or synthetic nature.
Vehicle classification in WAMI imagery using deep network

NASA Astrophysics Data System (ADS)

Yi, Meng; Yang, Fan; Blasch, Erik; Sheaff, Carolyn; Liu, Kui; Chen, Genshe; Ling, Haibin

2016-05-01

Humans have always had a keen interest in understanding activities and the surrounding environment for mobility, communication, and survival. Thanks to recent progress in photography and breakthroughs in aviation, we are now able to capture tens of megapixels of ground imagery, namely Wide Area Motion Imagery (WAMI), at multiple frames per second from unmanned aerial vehicles (UAVs). WAMI serves as a great source for many applications, including security, urban planning and route planning. These applications require fast and accurate image understanding which is time consuming for humans, due to the large data volume and city-scale area coverage. Therefore, automatic processing and understanding of WAMI imagery has been gaining attention in both industry and the research community. This paper focuses on an essential step in WAMI imagery analysis, namely vehicle classification. That is, deciding whether a certain image patch contains a vehicle or not. We collect a set of positive and negative sample image patches, for training and testing the detector. Positive samples are 64 × 64 image patches centered on annotated vehicles. We generate two sets of negative images. The first set is generated from positive images with some location shift. The second set of negative patches is generated from randomly sampled patches. We also discard those patches if a vehicle accidentally locates at the center. Both positive and negative samples are randomly divided into 9000 training images and 3000 testing images. We propose to train a deep convolution network for classifying these patches. The classifier is based on a pre-trained AlexNet Model in the Caffe library, with an adapted loss function for vehicle classification. The performance of our classifier is compared to several traditional image classifier methods using Support Vector Machine (SVM) and Histogram of Oriented Gradient (HOG) features. While the SVM+HOG method achieves an accuracy of 91.2%, the accuracy of our deep network-based classifier reaches 97.9%.
Classification of Chinese herbs based on the cluster analysis of delayed luminescence.

PubMed

Pang, Jingxiang; Yang, Meina; Fu, Jialei; Zhao, Xiaolei; van Wijk, Eduard; Wang, Mei; Liu, Yanli; Zhou, Xiaoyan; Fan, Hua; Han, Jinxiang

2016-03-01

Traditional Chinese material medica are an important component of the Chinese pharmacopeia. According to the traditional Chinese medicinal concept, Chinese herbal medicines are classified into different categories based on their therapeutic effects, however, the bioactive principles cannot be solely explained by chemical analysis. The aim of this study is to classify different Chinese herbs based on their therapeutic effects by using delayed luminescence (DL). The DL of 56 Chinese herbs was measured using an ultra-sensitive luminescence detection system. The different DL parameters were used to classify Chinese herbs according to a hierarchical cluster analysis. The samples were divided into two groups based on their DL kinetic parameters. Interestingly, the DL classification results were quite consistent with classification according to the Chinese medicinal concepts of 'cold' and 'heat' properties. In this paper, we show for the first time that by using DL technology, it is possible to classify Chinese herbs according to the Chinese medicinal concept and it may even be possible to predict their therapeutic properties. Copyright © 2015 John Wiley & Sons, Ltd.
Classifier performance prediction for computer-aided diagnosis using a limited dataset.

PubMed

Sahiner, Berkman; Chan, Heang-Ping; Hadjiiski, Lubomir

2008-04-01

In a practical classifier design problem, the true population is generally unknown and the available sample is finite-sized. A common approach is to use a resampling technique to estimate the performance of the classifier that will be trained with the available sample. We conducted a Monte Carlo simulation study to compare the ability of the different resampling techniques in training the classifier and predicting its performance under the constraint of a finite-sized sample. The true population for the two classes was assumed to be multivariate normal distributions with known covariance matrices. Finite sets of sample vectors were drawn from the population. The true performance of the classifier is defined as the area under the receiver operating characteristic curve (AUC) when the classifier designed with the specific sample is applied to the true population. We investigated methods based on the Fukunaga-Hayes and the leave-one-out techniques, as well as three different types of bootstrap methods, namely, the ordinary, 0.632, and 0.632+ bootstrap. The Fisher's linear discriminant analysis was used as the classifier. The dimensionality of the feature space was varied from 3 to 15. The sample size n2 from the positive class was varied between 25 and 60, while the number of cases from the negative class was either equal to n2 or 3n2. Each experiment was performed with an independent dataset randomly drawn from the true population. Using a total of 1000 experiments for each simulation condition, we compared the bias, the variance, and the root-mean-squared error (RMSE) of the AUC estimated using the different resampling techniques relative to the true AUC (obtained from training on a finite dataset and testing on the population). Our results indicated that, under the study conditions, there can be a large difference in the RMSE obtained using different resampling methods, especially when the feature space dimensionality is relatively large and the sample size is small. Under this type of conditions, the 0.632 and 0.632+ bootstrap methods have the lowest RMSE, indicating that the difference between the estimated and the true performances obtained using the 0.632 and 0.632+ bootstrap will be statistically smaller than those obtained using the other three resampling methods. Of the three bootstrap methods, the 0.632+ bootstrap provides the lowest bias. Although this investigation is performed under some specific conditions, it reveals important trends for the problem of classifier performance prediction under the constraint of a limited dataset.
Adaptive classifier for steel strip surface defects

NASA Astrophysics Data System (ADS)

Jiang, Mingming; Li, Guangyao; Xie, Li; Xiao, Mang; Yi, Li

2017-01-01

Surface defects detection system has been receiving increased attention as its precision, speed and less cost. One of the most challenges is reacting to accuracy deterioration with time as aged equipment and changed processes. These variables will make a tiny change to the real world model but a big impact on the classification result. In this paper, we propose a new adaptive classifier with a Bayes kernel (BYEC) which update the model with small sample to it adaptive for accuracy deterioration. Firstly, abundant features were introduced to cover lots of information about the defects. Secondly, we constructed a series of SVMs with the random subspace of the features. Then, a Bayes classifier was trained as an evolutionary kernel to fuse the results from base SVMs. Finally, we proposed the method to update the Bayes evolutionary kernel. The proposed algorithm is experimentally compared with different algorithms, experimental results demonstrate that the proposed method can be updated with small sample and fit the changed model well. Robustness, low requirement for samples and adaptive is presented in the experiment.
Spectral classifier design with ensemble classifiers and misclassification-rejection: application to elastic-scattering spectroscopy for detection of colonic neoplasia.

PubMed

Rodriguez-Diaz, Eladio; Castanon, David A; Singh, Satish K; Bigio, Irving J

2011-06-01

Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These "rejected" samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20-33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk.
Spectral classifier design with ensemble classifiers and misclassification-rejection: application to elastic-scattering spectroscopy for detection of colonic neoplasia

PubMed Central

Rodriguez-Diaz, Eladio; Castanon, David A.; Singh, Satish K.; Bigio, Irving J.

2011-01-01

Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These “rejected” samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20–33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk. PMID:21721830
Selection-Fusion Approach for Classification of Datasets with Missing Values

PubMed Central

Ghannad-Rezaie, Mostafa; Soltanian-Zadeh, Hamid; Ying, Hao; Dong, Ming

2010-01-01

This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values. PMID:20212921
Applying under-sampling techniques and cost-sensitive learning methods on risk assessment of breast cancer.

PubMed

Hsu, Jia-Lien; Hung, Ping-Cheng; Lin, Hung-Yen; Hsieh, Chung-Ho

2015-04-01

Breast cancer is one of the most common cause of cancer mortality. Early detection through mammography screening could significantly reduce mortality from breast cancer. However, most of screening methods may consume large amount of resources. We propose a computational model, which is solely based on personal health information, for breast cancer risk assessment. Our model can be served as a pre-screening program in the low-cost setting. In our study, the data set, consisting of 3976 records, is collected from Taipei City Hospital starting from 2008.1.1 to 2008.12.31. Based on the dataset, we first apply the sampling techniques and dimension reduction method to preprocess the testing data. Then, we construct various kinds of classifiers (including basic classifiers, ensemble methods, and cost-sensitive methods) to predict the risk. The cost-sensitive method with random forest classifier is able to achieve recall (or sensitivity) as 100 %. At the recall of 100 %, the precision (positive predictive value, PPV), and specificity of cost-sensitive method with random forest classifier was 2.9 % and 14.87 %, respectively. In our study, we build a breast cancer risk assessment model by using the data mining techniques. Our model has the potential to be served as an assisting tool in the breast cancer screening.
Design of an audio advertisement dataset

NASA Astrophysics Data System (ADS)

Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

2015-12-01

Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.
Genetic variation, and biological activity of nucleopolyhedrovirus samples from larvae Heliothis virescens, Helicoverpa zea, and Helicoverpa armigera

USDA-ARS?s Scientific Manuscript database

To assess the diversity and relationships of baculoviruses found in insects of the heliothine pest complex, a PCR-based method was used to classify 90 samples of nucleopolyhedrovirus (NPV; Baculoviridae: Alphabaculovirus) obtained worldwide from larvae of Heliothis virescens (Fabricius), Helicoverpa...
Triacylglycerol stereospecific analysis and linear discriminant analysis for milk speciation.

PubMed

Blasi, Francesca; Lombardi, Germana; Damiani, Pietro; Simonetti, Maria Stella; Giua, Laura; Cossignani, Lina

2013-05-01

Product authenticity is an important topic in dairy sector. Dairy products sold for public consumption must be accurately labelled in accordance with the contained milk species. Linear discriminant analysis (LDA), a common chemometric procedure, has been applied to fatty acid% composition to classify pure milk samples (cow, ewe, buffalo, donkey, goat). All original grouped cases were correctly classified, while 90% of cross-validated grouped cases were correctly classified. Another objective of this research was the characterisation of cow-ewe milk mixtures in order to reveal a common fraud in dairy field, that is the addition of cow to ewe milk. Stereospecific analysis of triacylglycerols (TAG), a method based on chemical-enzymatic procedures coupled with chromatographic techniques, has been carried out to detect fraudulent milk additions, in particular 1, 3, 5% cow milk added to ewe milk. When only TAG composition data were used for the elaboration, 75% of original grouped cases were correctly classified, while totally correct classified samples were obtained when both total and intrapositional TAG data were used. Also the results of cross validation were better when TAG stereospecific analysis data were considered as LDA variables. In particular, 100% of cross-validated grouped cases were obtained when 5% cow milk mixtures were considered.

Transfer Learning for Class Imbalance Problems with Inadequate Data.

PubMed

Al-Stouhi, Samir; Reddy, Chandan K

2016-07-01

A fundamental problem in data mining is to effectively build robust classifiers in the presence of skewed data distributions. Class imbalance classifiers are trained specifically for skewed distribution datasets. Existing methods assume an ample supply of training examples as a fundamental prerequisite for constructing an effective classifier. However, when sufficient data is not readily available, the development of a representative classification algorithm becomes even more difficult due to the unequal distribution between classes. We provide a unified framework that will potentially take advantage of auxiliary data using a transfer learning mechanism and simultaneously build a robust classifier to tackle this imbalance issue in the presence of few training samples in a particular target domain of interest. Transfer learning methods use auxiliary data to augment learning when training examples are not sufficient and in this paper we will develop a method that is optimized to simultaneously augment the training data and induce balance into skewed datasets. We propose a novel boosting based instance-transfer classifier with a label-dependent update mechanism that simultaneously compensates for class imbalance and incorporates samples from an auxiliary domain to improve classification. We provide theoretical and empirical validation of our method and apply to healthcare and text classification applications.
A novel approach for small sample size family-based association studies: sequential tests.

PubMed

Ilk, Ozlem; Rajabli, Farid; Dungul, Dilay Ciglidag; Ozdag, Hilal; Ilk, Hakki Gokhan

2011-08-01

In this paper, we propose a sequential probability ratio test (SPRT) to overcome the problem of limited samples in studies related to complex genetic diseases. The results of this novel approach are compared with the ones obtained from the traditional transmission disequilibrium test (TDT) on simulated data. Although TDT classifies single-nucleotide polymorphisms (SNPs) to only two groups (SNPs associated with the disease and the others), SPRT has the flexibility of assigning SNPs to a third group, that is, those for which we do not have enough evidence and should keep sampling. It is shown that SPRT results in smaller ratios of false positives and negatives, as well as better accuracy and sensitivity values for classifying SNPs when compared with TDT. By using SPRT, data with small sample size become usable for an accurate association analysis.
Differentiation of Candida albicans, Candida glabrata, and Candida krusei by FT-IR and chemometrics by CHROMagar™ Candida.

PubMed

Wohlmeister, Denise; Vianna, Débora Renz Barreto; Helfer, Virginia Etges; Calil, Luciane Noal; Buffon, Andréia; Fuentefria, Alexandre Meneghello; Corbellini, Valeriano Antonio; Pilger, Diogo André

2017-10-01

Pathogenic Candida species are detected in clinical infections. CHROMagar™ is a phenotypical method used to identify Candida species, although it has limitations, which indicates the need for more sensitive and specific techniques. Infrared Spectroscopy (FT-IR) is an analytical vibrational technique used to identify patterns of metabolic fingerprint of biological matrixes, particularly whole microbial cell systems as Candida sp. in association of classificatory chemometrics algorithms. On the other hand, Soft Independent Modeling by Class Analogy (SIMCA) is one of the typical algorithms still little employed in microbiological classification. This study demonstrates the applicability of the FT-IR-technique by specular reflectance associated with SIMCA to discriminate Candida species isolated from vaginal discharges and grown on CHROMagar™. The differences in spectra of C. albicans, C. glabrata and C. krusei were suitable for use in the discrimination of these species, which was observed by PCA. Then, a SIMCA model was constructed with standard samples of three species and using the spectral region of 1792-1561cm -1 . All samples (n=48) were properly classified based on the chromogenic method using CHROMagar™ Candida. In total, 93.4% (n=45) of the samples were correctly and unambiguously classified (Class I). Two samples of C. albicans were classified correctly, though these could have been C. glabrata (Class II). Also, one C. glabrata sample could have been classified as C. krusei (Class II). Concerning these three samples, one triplicate of each was included in Class II and two in Class I. Therefore, FT-IR associated with SIMCA can be used to identify samples of C. albicans, C. glabrata, and C. krusei grown in CHROMagar™ Candida aiming to improve clinical applications of this technique. Copyright © 2017 Elsevier B.V. All rights reserved.
Deep convolutional neural network training enrichment using multi-view object-based analysis of Unmanned Aerial systems imagery for wetlands classification

NASA Astrophysics Data System (ADS)

Liu, Tao; Abd-Elrahman, Amr

2018-05-01

Deep convolutional neural network (DCNN) requires massive training datasets to trigger its image classification power, while collecting training samples for remote sensing application is usually an expensive process. When DCNN is simply implemented with traditional object-based image analysis (OBIA) for classification of Unmanned Aerial systems (UAS) orthoimage, its power may be undermined if the number training samples is relatively small. This research aims to develop a novel OBIA classification approach that can take advantage of DCNN by enriching the training dataset automatically using multi-view data. Specifically, this study introduces a Multi-View Object-based classification using Deep convolutional neural network (MODe) method to process UAS images for land cover classification. MODe conducts the classification on multi-view UAS images instead of directly on the orthoimage, and gets the final results via a voting procedure. 10-fold cross validation results show the mean overall classification accuracy increasing substantially from 65.32%, when DCNN was applied on the orthoimage to 82.08% achieved when MODe was implemented. This study also compared the performances of the support vector machine (SVM) and random forest (RF) classifiers with DCNN under traditional OBIA and the proposed multi-view OBIA frameworks. The results indicate that the advantage of DCNN over traditional classifiers in terms of accuracy is more obvious when these classifiers were applied with the proposed multi-view OBIA framework than when these classifiers were applied within the traditional OBIA framework.
A Comparison of Two Measures of HIV Diversity in Multi-Assay Algorithms for HIV Incidence Estimation

PubMed Central

Cousins, Matthew M.; Konikoff, Jacob; Sabin, Devin; Khaki, Leila; Longosz, Andrew F.; Laeyendecker, Oliver; Celum, Connie; Buchbinder, Susan P.; Seage, George R.; Kirk, Gregory D.; Moore, Richard D.; Mehta, Shruti H.; Margolick, Joseph B.; Brown, Joelle; Mayer, Kenneth H.; Kobin, Beryl A.; Wheeler, Darrell; Justman, Jessica E.; Hodder, Sally L.; Quinn, Thomas C.; Brookmeyer, Ron; Eshleman, Susan H.

2014-01-01

Background Multi-assay algorithms (MAAs) can be used to estimate HIV incidence in cross-sectional surveys. We compared the performance of two MAAs that use HIV diversity as one of four biomarkers for analysis of HIV incidence. Methods Both MAAs included two serologic assays (LAg-Avidity assay and BioRad-Avidity assay), HIV viral load, and an HIV diversity assay. HIV diversity was quantified using either a high resolution melting (HRM) diversity assay that does not require HIV sequencing (HRM score for a 239 base pair env region) or sequence ambiguity (the percentage of ambiguous bases in a 1,302 base pair pol region). Samples were classified as MAA positive (likely from individuals with recent HIV infection) if they met the criteria for all of the assays in the MAA. The following performance characteristics were assessed: (1) the proportion of samples classified as MAA positive as a function of duration of infection, (2) the mean window period, (3) the shadow (the time period before sample collection that is being assessed by the MAA), and (4) the accuracy of cross-sectional incidence estimates for three cohort studies. Results The proportion of samples classified as MAA positive as a function of duration of infection was nearly identical for the two MAAs. The mean window period was 141 days for the HRM-based MAA and 131 days for the sequence ambiguity-based MAA. The shadows for both MAAs were <1 year. Both MAAs provided cross-sectional HIV incidence estimates that were very similar to longitudinal incidence estimates based on HIV seroconversion. Conclusions MAAs that include the LAg-Avidity assay, the BioRad-Avidity assay, HIV viral load, and HIV diversity can provide accurate HIV incidence estimates. Sequence ambiguity measures obtained using a commercially-available HIV genotyping system can be used as an alternative to HRM scores in MAAs for cross-sectional HIV incidence estimation. PMID:24968135
An intelligent classifier for prognosis of cardiac resynchronization therapy based on speckle-tracking echocardiograms.

PubMed

Chao, Pei-Kuang; Wang, Chun-Li; Chan, Hsiao-Lung

2012-03-01

Predicting response after cardiac resynchronization therapy (CRT) has been a challenge of cardiologists. About 30% of selected patients based on the standard selection criteria for CRT do not show response after receiving the treatment. This study is aimed to build an intelligent classifier to assist in identifying potential CRT responders by speckle-tracking radial strain based on echocardiograms. The echocardiograms analyzed were acquired before CRT from 26 patients who have received CRT. Sequential forward selection was performed on the parameters obtained by peak-strain timing and phase space reconstruction on speckle-tracking radial strain to find an optimal set of features for creating intelligent classifiers. Support vector machine (SVM) with a linear, quadratic, and polynominal kernel were tested to build classifiers to identify potential responders and non-responders for CRT by selected features. Based on random sub-sampling validation, the best classification performance is correct rate about 95% with 96-97% sensitivity and 93-94% specificity achieved by applying SVM with a quadratic kernel on a set of 3 parameters. The selected 3 parameters contain both indexes extracted by peak-strain timing and phase space reconstruction. An intelligent classifier with an averaged correct rate, sensitivity and specificity above 90% for assisting in identifying CRT responders is built by speckle-tracking radial strain. The classifier can be applied to provide objective suggestion for patient selection of CRT. Copyright Â© 2011 Elsevier B.V. All rights reserved.
The generalization ability of SVM classification based on Markov sampling.

PubMed

Xu, Jie; Tang, Yuan Yan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang; Zhang, Baochang

2015-06-01

The previously known works studying the generalization ability of support vector machine classification (SVMC) algorithm are usually based on the assumption of independent and identically distributed samples. In this paper, we go far beyond this classical framework by studying the generalization ability of SVMC based on uniformly ergodic Markov chain (u.e.M.c.) samples. We analyze the excess misclassification error of SVMC based on u.e.M.c. samples, and obtain the optimal learning rate of SVMC for u.e.M.c. We also introduce a new Markov sampling algorithm for SVMC to generate u.e.M.c. samples from given dataset, and present the numerical studies on the learning performance of SVMC based on Markov sampling for benchmark datasets. The numerical studies show that the SVMC based on Markov sampling not only has better generalization ability as the number of training samples are bigger, but also the classifiers based on Markov sampling are sparsity when the size of dataset is bigger with regard to the input dimension.
AdaBoost-based algorithm for network intrusion detection.

PubMed

Hu, Weiming; Hu, Wei; Maybank, Steve

2008-04-01

Network intrusion detection aims at distinguishing the attacks on the Internet from normal use of the Internet. It is an indispensable part of the information security system. Due to the variety of network behaviors and the rapid development of attack fashions, it is necessary to develop fast machine-learning-based intrusion detection algorithms with high detection rates and low false-alarm rates. In this correspondence, we propose an intrusion detection algorithm based on the AdaBoost algorithm. In the algorithm, decision stumps are used as weak classifiers. The decision rules are provided for both categorical and continuous features. By combining the weak classifiers for continuous features and the weak classifiers for categorical features into a strong classifier, the relations between these two different types of features are handled naturally, without any forced conversions between continuous and categorical features. Adaptable initial weights and a simple strategy for avoiding overfitting are adopted to improve the performance of the algorithm. Experimental results show that our algorithm has low computational complexity and error rates, as compared with algorithms of higher computational complexity, as tested on the benchmark sample data.
Automated in vivo identification of fungal infection on human scalp using optical coherence tomography and machine learning

NASA Astrophysics Data System (ADS)

Dubey, Kavita; Srivastava, Vishal; Singh Mehta, Dalip

2018-04-01

Early identification of fungal infection on the human scalp is crucial for avoiding hair loss. The diagnosis of fungal infection on the human scalp is based on a visual assessment by trained experts or doctors. Optical coherence tomography (OCT) has the ability to capture fungal infection information from the human scalp with a high resolution. In this study, we present a fully automated, non-contact, non-invasive optical method for rapid detection of fungal infections based on the extracted features from A-line and B-scan images of OCT. A multilevel ensemble machine model is designed to perform automated classification, which shows the superiority of our classifier to the best classifier based on the features extracted from OCT images. In this study, 60 samples (30 fungal, 30 normal) were imaged by OCT and eight features were extracted. The classification algorithm had an average sensitivity, specificity and accuracy of 92.30, 90.90 and 91.66%, respectively, for identifying fungal and normal human scalps. This remarkable classifying ability makes the proposed model readily applicable to classifying the human scalp.
Using Chou's pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location.

PubMed

Jiang, Xiaoying; Wei, Rong; Zhao, Yanjun; Zhang, Tongliang

2008-05-01

The knowledge of subnuclear localization in eukaryotic cells is essential for understanding the life function of nucleus. Developing prediction methods and tools for proteins subnuclear localization become important research fields in protein science for special characteristics in cell nuclear. In this study, a novel approach has been proposed to predict protein subnuclear localization. Sample of protein is represented by Pseudo Amino Acid (PseAA) composition based on approximate entropy (ApEn) concept, which reflects the complexity of time series. A novel ensemble classifier is designed incorporating three AdaBoost classifiers. The base classifier algorithms in three AdaBoost are decision stumps, fuzzy K nearest neighbors classifier, and radial basis-support vector machines, respectively. Different PseAA compositions are used as input data of different AdaBoost classifier in ensemble. Genetic algorithm is used to optimize the dimension and weight factor of PseAA composition. Two datasets often used in published works are used to validate the performance of the proposed approach. The obtained results of Jackknife cross-validation test are higher and more balance than them of other methods on same datasets. The promising results indicate that the proposed approach is effective and practical. It might become a useful tool in protein subnuclear localization. The software in Matlab and supplementary materials are available freely by contacting the corresponding author.
Comparison of several chemometric methods of libraries and classifiers for the analysis of expired drugs based on Raman spectra.

PubMed

Gao, Qun; Liu, Yan; Li, Hao; Chen, Hui; Chai, Yifeng; Lu, Feng

2014-06-01

Some expired drugs are difficult to detect by conventional means. If they are repackaged and sold back into market, they will constitute a new public health challenge. For the detection of repackaged expired drugs within specification, paracetamol tablet from a manufacturer was used as a model drug in this study for comparison of Raman spectra-based library verification and classification methods. Raman spectra of different batches of paracetamol tablets were collected and a library including standard spectra of unexpired batches of tablets was established. The Raman spectrum of each sample was identified by cosine and correlation with the standard spectrum. The average HQI of the suspicious samples and the standard spectrum were calculated. The optimum threshold values were 0.997 and 0.998 respectively as a result of ROC and four evaluations, for which the accuracy was up to 97%. Three supervised classifiers, PLS-DA, SVM and k-NN, were chosen to establish two-class classification models and compared subsequently. They were used to establish a classification of expired batches and an unexpired batch, and predict the suspect samples. The average accuracy was 90.12%, 96.80% and 89.37% respectively. Different pre-processing techniques were tried to find that first derivative was optimal for methods of libraries and max-min normalization was optimal for that of classifiers. The results obtained from these studies indicated both libraries and classifier methods could detect the expired drugs effectively, and they should be used complementarily in the fast-screening. Copyright © 2014 Elsevier B.V. All rights reserved.
Comparison of four approaches to a rock facies classification problem

USGS Publications Warehouse

Dubois, M.K.; Bohling, Geoffrey C.; Chakrabarti, S.

2007-01-01

In this study, seven classifiers based on four different approaches were tested in a rock facies classification problem: classical parametric methods using Bayes' rule, and non-parametric methods using fuzzy logic, k-nearest neighbor, and feed forward-back propagating artificial neural network. Determining the most effective classifier for geologic facies prediction in wells without cores in the Panoma gas field, in Southwest Kansas, was the objective. Study data include 3600 samples with known rock facies class (from core) with each sample having either four or five measured properties (wire-line log curves), and two derived geologic properties (geologic constraining variables). The sample set was divided into two subsets, one for training and one for testing the ability of the trained classifier to correctly assign classes. Artificial neural networks clearly outperformed all other classifiers and are effective tools for this particular classification problem. Classical parametric models were inadequate due to the nature of the predictor variables (high dimensional and not linearly correlated), and feature space of the classes (overlapping). The other non-parametric methods tested, k-nearest neighbor and fuzzy logic, would need considerable improvement to match the neural network effectiveness, but further work, possibly combining certain aspects of the three non-parametric methods, may be justified. ?? 2006 Elsevier Ltd. All rights reserved.
Classifier-Guided Sampling for Complex Energy System Optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Backlund, Peter B.; Eddy, John P.

2015-09-01

This report documents the results of a Laboratory Directed Research and Development (LDRD) effort enti tled "Classifier - Guided Sampling for Complex Energy System Optimization" that was conducted during FY 2014 and FY 2015. The goal of this proj ect was to develop, implement, and test major improvements to the classifier - guided sampling (CGS) algorithm. CGS is type of evolutionary algorithm for perform ing search and optimization over a set of discrete design variables in the face of one or more objective functions. E xisting evolutionary algorithms, such as genetic algorithms , may require a large number of omore » bjecti ve function evaluations to identify optimal or near - optimal solutions . Reducing the number of evaluations can result in significant time savings, especially if the objective function is computationally expensive. CGS reduce s the evaluation count by us ing a Bayesian network classifier to filter out non - promising candidate designs , prior to evaluation, based on their posterior probabilit ies . In this project, b oth the single - objective and multi - objective version s of the CGS are developed and tested on a set of benchm ark problems. As a domain - specific case study, CGS is used to design a microgrid for use in islanded mode during an extended bulk power grid outage.« less
Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

PubMed

Akhtar, Naveed; Mian, Ajmal

2017-10-03

We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.
Oregon ground-water quality and its relation to hydrogeological factors; a statistical approach

USGS Publications Warehouse

Miller, T.L.; Gonthier, J.B.

1984-01-01

An appraisal of Oregon ground-water quality was made using existing data accessible through the U.S. Geological Survey computer system. The data available for about 1,000 sites were separated by aquifer units and hydrologic units. Selected statistical moments were described for 19 constituents including major ions. About 96 percent of all sites in the data base were sampled only once. The sample data were classified by aquifer unit and hydrologic unit and analysis of variance was run to determine if significant differences exist between the units within each of these two classifications for the same 19 constituents on which statistical moments were determined. Results of the analysis of variance indicated both classification variables performed about the same, but aquifer unit did provide more separation for some constituents. Samples from the Rogue River basin were classified by location within the flow system and type of flow system. The samples were then analyzed using analysis of variance on 14 constituents to determine if there were significant differences between subsets classified by flow path. Results of this analysis were not definitive, but classification as to the type of flow system did indicate potential for segregating water-quality data into distinct subsets. (USGS)
Walking Objectively Measured: Classifying Accelerometer Data with GPS and Travel Diaries

PubMed Central

Kang, Bumjoon; Moudon, Anne V.; Hurvitz, Philip M.; Reichley, Lucas; Saelens, Brian E.

2013-01-01

Purpose This study developed and tested an algorithm to classify accelerometer data as walking or non-walking using either GPS or travel diary data within a large sample of adults under free-living conditions. Methods Participants wore an accelerometer and a GPS unit, and concurrently completed a travel diary for 7 consecutive days. Physical activity (PA) bouts were identified using accelerometry count sequences. PA bouts were then classified as walking or non-walking based on a decision-tree algorithm consisting of 7 classification scenarios. Algorithm reliability was examined relative to two independent analysts’ classification of a 100-bout verification sample. The algorithm was then applied to the entire set of PA bouts. Results The 706 participants’ (mean age 51 years, 62% female, 80% non-Hispanic white, 70% college graduate or higher) yielded 4,702 person-days of data and had a total of 13,971 PA bouts. The algorithm showed a mean agreement of 95% with the independent analysts. It classified physical activity into 8,170 (58.5 %) walking bouts and 5,337 (38.2%) non-walking bouts; 464 (3.3%) bouts were not classified for lack of GPS and diary data. Nearly 70% of the walking bouts and 68% of the non-walking bouts were classified using only the objective accelerometer and GPS data. Travel diary data helped classify 30% of all bouts with no GPS data. The mean duration of PA bouts classified as walking was 15.2 min (SD=12.9). On average, participants had 1.7 walking bouts and 25.4 total walking minutes per day. Conclusions GPS and travel diary information can be helpful in classifying most accelerometer-derived PA bouts into walking or non-walking behavior. PMID:23439414
A machine learning approach for classification of anatomical coverage in CT

NASA Astrophysics Data System (ADS)

Wang, Xiaoyong; Lo, Pechin; Ramakrishna, Bharath; Goldin, Johnathan; Brown, Matthew

2016-03-01

Automatic classification of anatomical coverage of medical images is critical for big data mining and as a pre-processing step to automatically trigger specific computer aided diagnosis systems. The traditional way to identify scans through DICOM headers has various limitations due to manual entry of series descriptions and non-standardized naming conventions. In this study, we present a machine learning approach where multiple binary classifiers were used to classify different anatomical coverages of CT scans. A one-vs-rest strategy was applied. For a given training set, a template scan was selected from the positive samples and all other scans were registered to it. Each registered scan was then evenly split into k × k × k non-overlapping blocks and for each block the mean intensity was computed. This resulted in a 1 × k3 feature vector for each scan. The feature vectors were then used to train a SVM based classifier. In this feasibility study, four classifiers were built to identify anatomic coverages of brain, chest, abdomen-pelvis, and chest-abdomen-pelvis CT scans. Each classifier was trained and tested using a set of 300 scans from different subjects, composed of 150 positive samples and 150 negative samples. Area under the ROC curve (AUC) of the testing set was measured to evaluate the performance in a two-fold cross validation setting. Our results showed good classification performance with an average AUC of 0.96.
Vibration Sensor-Based Bearing Fault Diagnosis Using Ellipsoid-ARTMAP and Differential Evolution Algorithms

PubMed Central

Liu, Chang; Wang, Guofeng; Xie, Qinglu; Zhang, Yanchao

2014-01-01

Effective fault classification of rolling element bearings provides an important basis for ensuring safe operation of rotating machinery. In this paper, a novel vibration sensor-based fault diagnosis method using an Ellipsoid-ARTMAP network (EAM) and a differential evolution (DE) algorithm is proposed. The original features are firstly extracted from vibration signals based on wavelet packet decomposition. Then, a minimum-redundancy maximum-relevancy algorithm is introduced to select the most prominent features so as to decrease feature dimensions. Finally, a DE-based EAM (DE-EAM) classifier is constructed to realize the fault diagnosis. The major characteristic of EAM is that the sample distribution of each category is realized by using a hyper-ellipsoid node and smoothing operation algorithm. Therefore, it can depict the decision boundary of disperse samples accurately and effectively avoid over-fitting phenomena. To optimize EAM network parameters, the DE algorithm is presented and two objectives, including both classification accuracy and nodes number, are simultaneously introduced as the fitness functions. Meanwhile, an exponential criterion is proposed to realize final selection of the optimal parameters. To prove the effectiveness of the proposed method, the vibration signals of four types of rolling element bearings under different loads were collected. Moreover, to improve the robustness of the classifier evaluation, a two-fold cross validation scheme is adopted and the order of feature samples is randomly arranged ten times within each fold. The results show that DE-EAM classifier can recognize the fault categories of the rolling element bearings reliably and accurately. PMID:24936949
A Novel Acoustic Sensor Approach to Classify Seeds Based on Sound Absorption Spectra

PubMed Central

Gasso-Tortajada, Vicent; Ward, Alastair J.; Mansur, Hasib; Brøchner, Torben; Sørensen, Claus G.; Green, Ole

2010-01-01

A non-destructive and novel in situ acoustic sensor approach based on the sound absorption spectra was developed for identifying and classifying different seed types. The absorption coefficient spectra were determined by using the impedance tube measurement method. Subsequently, a multivariate statistical analysis, i.e., principal component analysis (PCA), was performed as a way to generate a classification of the seeds based on the soft independent modelling of class analogy (SIMCA) method. The results show that the sound absorption coefficient spectra of different seed types present characteristic patterns which are highly dependent on seed size and shape. In general, seed particle size and sphericity were inversely related with the absorption coefficient. PCA presented reliable grouping capabilities within the diverse seed types, since the 95% of the total spectral variance was described by the first two principal components. Furthermore, the SIMCA classification model based on the absorption spectra achieved optimal results as 100% of the evaluation samples were correctly classified. This study contains the initial structuring of an innovative method that will present new possibilities in agriculture and industry for classifying and determining physical properties of seeds and other materials. PMID:22163455
Sample size determination for disease prevalence studies with partially validated data.

PubMed

Qiu, Shi-Fang; Poon, Wai-Yin; Tang, Man-Lai

2016-02-01

Disease prevalence is an important topic in medical research, and its study is based on data that are obtained by classifying subjects according to whether a disease has been contracted. Classification can be conducted with high-cost gold standard tests or low-cost screening tests, but the latter are subject to the misclassification of subjects. As a compromise between the two, many research studies use partially validated datasets in which all data points are classified by fallible tests, and some of the data points are validated in the sense that they are also classified by the completely accurate gold-standard test. In this article, we investigate the determination of sample sizes for disease prevalence studies with partially validated data. We use two approaches. The first is to find sample sizes that can achieve a pre-specified power of a statistical test at a chosen significance level, and the second is to find sample sizes that can control the width of a confidence interval with a pre-specified confidence level. Empirical studies have been conducted to demonstrate the performance of various testing procedures with the proposed sample sizes. The applicability of the proposed methods are illustrated by a real-data example. © The Author(s) 2012.

Classification of ductal carcinoma in situ by gene expression profiling.

PubMed

Hannemann, Juliane; Velds, Arno; Halfwerk, Johannes B G; Kreike, Bas; Peterse, Johannes L; van de Vijver, Marc J

2006-01-01

Ductal carcinoma in situ (DCIS) is characterised by the intraductal proliferation of malignant epithelial cells. Several histological classification systems have been developed, but assessing the histological type/grade of DCIS lesions is still challenging, making treatment decisions based on these features difficult. To obtain insight in the molecular basis of the development of different types of DCIS and its progression to invasive breast cancer, we have studied differences in gene expression between different types of DCIS and between DCIS and invasive breast carcinomas. Gene expression profiling using microarray analysis has been performed on 40 in situ and 40 invasive breast cancer cases. DCIS cases were classified as well- (n = 6), intermediately (n = 18), and poorly (n = 14) differentiated type. Of the 40 invasive breast cancer samples, five samples were grade I, 11 samples were grade II, and 24 samples were grade III. Using two-dimensional hierarchical clustering, the basal-like type, ERB-B2 type, and the luminal-type tumours originally described for invasive breast cancer could also be identified in DCIS. Using supervised classification, we identified a gene expression classifier of 35 genes, which differed between DCIS and invasive breast cancer; a classifier of 43 genes could be identified separating between well- and poorly differentiated DCIS samples.
Classification of ductal carcinoma in situ by gene expression profiling

PubMed Central

Hannemann, Juliane; Velds, Arno; Halfwerk, Johannes BG; Kreike, Bas; Peterse, Johannes L; van de Vijver, Marc J

2006-01-01

Introduction Ductal carcinoma in situ (DCIS) is characterised by the intraductal proliferation of malignant epithelial cells. Several histological classification systems have been developed, but assessing the histological type/grade of DCIS lesions is still challenging, making treatment decisions based on these features difficult. To obtain insight in the molecular basis of the development of different types of DCIS and its progression to invasive breast cancer, we have studied differences in gene expression between different types of DCIS and between DCIS and invasive breast carcinomas. Methods Gene expression profiling using microarray analysis has been performed on 40 in situ and 40 invasive breast cancer cases. Results DCIS cases were classified as well- (n = 6), intermediately (n = 18), and poorly (n = 14) differentiated type. Of the 40 invasive breast cancer samples, five samples were grade I, 11 samples were grade II, and 24 samples were grade III. Using two-dimensional hierarchical clustering, the basal-like type, ERB-B2 type, and the luminal-type tumours originally described for invasive breast cancer could also be identified in DCIS. Conclusion Using supervised classification, we identified a gene expression classifier of 35 genes, which differed between DCIS and invasive breast cancer; a classifier of 43 genes could be identified separating between well- and poorly differentiated DCIS samples. PMID:17069663
The use of wavelength dispersive X-ray fluorescence in the identification of the elemental composition of vanilla samples and the determination of the geographic origin by discriminant function analysis.

PubMed

Hondrogiannis, Ellen; Rotta, Kathryn; Zapf, Charles M

2013-03-01

Sixteen elements found in 37 vanilla samples from Madagascar, Uganda, India, Indonesia (all Vanilla planifolia species), and Papa New Guinea (Vanilla tahitensis species) were measured by wavelength dispersive X-ray fluorescence (WDXRF) spectroscopy for the purpose of determining the elemental concentrations to discriminate among the origins. Pellets were prepared of the samples and elemental concentrations were calculated based on calibration curves created using 4 Natl. Inst. of Standards and Technology (NIST) standards. Discriminant analysis was used to successfully classify the vanilla samples by their species and their geographical region. Our method allows for higher throughput in the rapid screening of vanilla samples in less time than analytical methods currently available. Wavelength dispersive X-ray fluorescence spectroscopy and discriminant function analysis were used to classify vanilla from different origins resulting in a model that could potentially serve to rapidly validate these samples before purchasing from a producer. © 2013 Institute of Food Technologists®
Generating virtual training samples for sparse representation of face images and face recognition

NASA Astrophysics Data System (ADS)

Du, Yong; Wang, Yu

2016-03-01

There are many challenges in face recognition. In real-world scenes, images of the same face vary with changing illuminations, different expressions and poses, multiform ornaments, or even altered mental status. Limited available training samples cannot convey these possible changes in the training phase sufficiently, and this has become one of the restrictions to improve the face recognition accuracy. In this article, we view the multiplication of two images of the face as a virtual face image to expand the training set and devise a representation-based method to perform face recognition. The generated virtual samples really reflect some possible appearance and pose variations of the face. By multiplying a training sample with another sample from the same subject, we can strengthen the facial contour feature and greatly suppress the noise. Thus, more human essential information is retained. Also, uncertainty of the training data is simultaneously reduced with the increase of the training samples, which is beneficial for the training phase. The devised representation-based classifier uses both the original and new generated samples to perform the classification. In the classification phase, we first determine K nearest training samples for the current test sample by calculating the Euclidean distances between the test sample and training samples. Then, a linear combination of these selected training samples is used to represent the test sample, and the representation result is used to classify the test sample. The experimental results show that the proposed method outperforms some state-of-the-art face recognition methods.
Genetic programming based ensemble system for microarray data classification.

PubMed

Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To

2015-01-01

Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.
Genetic Programming Based Ensemble System for Microarray Data Classification

PubMed Central

Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To

2015-01-01

Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748
Bayesian Integration and Classification of Composition C-4 Plastic Explosives Based on Time-of-Flight-Secondary Ion Mass Spectrometry and Laser Ablation-Inductively Coupled Plasma Mass Spectrometry.

PubMed

Mahoney, Christine M; Kelly, Ryan T; Alexander, Liz; Newburn, Matt; Bader, Sydney; Ewing, Robert G; Fahey, Albert J; Atkinson, David A; Beagley, Nathaniel

2016-04-05

Time-of-flight-secondary ion mass spectrometry (TOF-SIMS) and laser ablation-inductively coupled plasma mass spectrometry (LA-ICPMS) were used for characterization and identification of unique signatures from a series of 18 Composition C-4 plastic explosives. The samples were obtained from various commercial and military sources around the country. Positive and negative ion TOF-SIMS data were acquired directly from the C-4 residue on Si surfaces, where the positive ion mass spectra obtained were consistent with the major composition of organic additives, and the negative ion mass spectra were more consistent with explosive content in the C-4 samples. Each series of mass spectra was subjected to partial least squares-discriminant analysis (PLS-DA), a multivariate statistical analysis approach which serves to first find the areas of maximum variance within different classes of C-4 and subsequently to classify unknown samples based on correlations between the unknown data set and the original data set (often referred to as a training data set). This method was able to successfully classify test samples of C-4, though with a limited degree of certainty. The classification accuracy of the method was further improved by integrating the positive and negative ion data using a Bayesian approach. The TOF-SIMS data was combined with a second analytical method, LA-ICPMS, which was used to analyze elemental signatures in the C-4. The integrated data were able to classify test samples with a high degree of certainty. Results indicate that this Bayesian integrated approach constitutes a robust classification method that should be employable even in dirty samples collected in the field.
Classification of Malaysia aromatic rice using multivariate statistical analysis

NASA Astrophysics Data System (ADS)

Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.

2015-05-01

Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC-MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md

Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy trainingmore » time, and prone to fatigue as the number of sample increased and inconsistent. The GC–MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.« less
DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.

PubMed

Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan

2016-12-23

With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.
Ensemble of classifiers for confidence-rated classification of NDE signal

NASA Astrophysics Data System (ADS)

Banerjee, Portia; Safdarnejad, Seyed; Udpa, Lalita; Udpa, Satish

2016-02-01

Ensemble of classifiers in general, aims to improve classification accuracy by combining results from multiple weak hypotheses into a single strong classifier through weighted majority voting. Improved versions of ensemble of classifiers generate self-rated confidence scores which estimate the reliability of each of its prediction and boost the classifier using these confidence-rated predictions. However, such a confidence metric is based only on the rate of correct classification. In existing works, although ensemble of classifiers has been widely used in computational intelligence, the effect of all factors of unreliability on the confidence of classification is highly overlooked. With relevance to NDE, classification results are affected by inherent ambiguity of classifica-tion, non-discriminative features, inadequate training samples and noise due to measurement. In this paper, we extend the existing ensemble classification by maximizing confidence of every classification decision in addition to minimizing the classification error. Initial results of the approach on data from eddy current inspection show improvement in classification performance of defect and non-defect indications.
A Machine Learned Classifier That Uses Gene Expression Data to Accurately Predict Estrogen Receptor Status

PubMed Central

Bastani, Meysam; Vos, Larissa; Asgarian, Nasimeh; Deschenes, Jean; Graham, Kathryn; Mackey, John; Greiner, Russell

2013-01-01

Background Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. Methods To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. Results This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. Conclusions Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions. PMID:24312637
Automatic discrimination between safe and unsafe swallowing using a reputation-based classifier

PubMed Central

2011-01-01

Background Swallowing accelerometry has been suggested as a potential non-invasive tool for bedside dysphagia screening. Various vibratory signal features and complementary measurement modalities have been put forth in the literature for the potential discrimination between safe and unsafe swallowing. To date, automatic classification of swallowing accelerometry has exclusively involved a single-axis of vibration although a second axis is known to contain additional information about the nature of the swallow. Furthermore, the only published attempt at automatic classification in adult patients has been based on a small sample of swallowing vibrations. Methods In this paper, a large corpus of dual-axis accelerometric signals were collected from 30 older adults (aged 65.47 ± 13.4 years, 15 male) referred to videofluoroscopic examination on the suspicion of dysphagia. We invoked a reputation-based classifier combination to automatically categorize the dual-axis accelerometric signals into safe and unsafe swallows, as labeled via videofluoroscopic review. From these participants, a total of 224 swallowing samples were obtained, 164 of which were labeled as unsafe swallows (swallows where the bolus entered the airway) and 60 as safe swallows. Three separate support vector machine (SVM) classifiers and eight different features were selected for classification. Results With selected time, frequency and information theoretic features, the reputation-based algorithm distinguished between safe and unsafe swallowing with promising accuracy (80.48 ± 5.0%), high sensitivity (97.1 ± 2%) and modest specificity (64 ± 8.8%). Interpretation of the most discriminatory features revealed that in general, unsafe swallows had lower mean vibration amplitude and faster autocorrelation decay, suggestive of decreased hyoid excursion and compromised coordination, respectively. Further, owing to its performance-based weighting of component classifiers, the static reputation-based algorithm outperformed the democratic majority voting algorithm on this clinical data set. Conclusion Given its computational efficiency and high sensitivity, reputation-based classification of dual-axis accelerometry ought to be considered in future developments of a point-of-care swallow assessment where clinical informatics are desired. PMID:22085802
Father Absence in Infancy.

ERIC Educational Resources Information Center

Pedersen, Frank A.; And Others

This document reports a study investigating the effects of father absence on measures of cognitive, social, and motivational development in infancy. The sample included 54 black infants, 27 of whom were classified "father-absent." This classification was based on two indices, (1) a dichotomy of father-absent or father-present based on…
Predictors of Eligibility for ESY. Final Report.

ERIC Educational Resources Information Center

Browder, Diane M.; And Others

Evaluation of eligibility for extended school year (ESY) services was made based on informaton contained in school files in a stratified sampling across Pennsylvania. Subjects had been classified as severely and profoundly mentally retarded and were divided into groups based on eligibility for programming in excess of 180 days or ineligibility for…
Identification and classification of similar looking food grains

NASA Astrophysics Data System (ADS)

Anami, B. S.; Biradar, Sunanda D.; Savakar, D. G.; Kulkarni, P. V.

2013-01-01

This paper describes the comparative study of Artificial Neural Network (ANN) and Support Vector Machine (SVM) classifiers by taking a case study of identification and classification of four pairs of similar looking food grains namely, Finger Millet, Mustard, Soyabean, Pigeon Pea, Aniseed, Cumin-seeds, Split Greengram and Split Blackgram. Algorithms are developed to acquire and process color images of these grains samples. The developed algorithms are used to extract 18 colors-Hue Saturation Value (HSV), and 42 wavelet based texture features. Back Propagation Neural Network (BPNN)-based classifier is designed using three feature sets namely color - HSV, wavelet-texture and their combined model. SVM model for color- HSV model is designed for the same set of samples. The classification accuracies ranging from 93% to 96% for color-HSV, ranging from 78% to 94% for wavelet texture model and from 92% to 97% for combined model are obtained for ANN based models. The classification accuracy ranging from 80% to 90% is obtained for color-HSV based SVM model. Training time required for the SVM based model is substantially lesser than ANN for the same set of images.
An Inmate Classification System Based on PCL: SV Factor Scores in a Sample of Prison Inmates

ERIC Educational Resources Information Center

Wogan, Michael; Mackenzie, Marci

2007-01-01

Psychopaths represent a significant management challenge in a prison population. A sample of ninety-five male inmates from three medium security prisons was tested using the Hare Psychopathy Checklist: Screening Version (PCL:SV). Using traditional criteria, 22% of the inmates were classified as psychopaths. Scores on the two factor dimensions of…
Automated Classification of ROSAT Sources Using Heterogeneous Multiwavelength Source Catalogs

NASA Technical Reports Server (NTRS)

McGlynn, Thomas; Suchkov, A. A.; Winter, E. L.; Hanisch, R. J.; White, R. L.; Ochsenbein, F.; Derriere, S.; Voges, W.; Corcoran, M. F.

2004-01-01

We describe an on-line system for automated classification of X-ray sources, ClassX, and present preliminary results of classification of the three major catalogs of ROSAT sources, RASS BSC, RASS FSC, and WGACAT, into six class categories: stars, white dwarfs, X-ray binaries, galaxies, AGNs, and clusters of galaxies. ClassX is based on a machine learning technology. It represents a system of classifiers, each classifier consisting of a considerable number of oblique decision trees. These trees are built as the classifier is 'trained' to recognize various classes of objects using a training sample of sources of known object types. Each source is characterized by a preselected set of parameters, or attributes; the same set is then used as the classifier conducts classification of sources of unknown identity. The ClassX pipeline features an automatic search for X-ray source counterparts among heterogeneous data sets in on-line data archives using Virtual Observatory protocols; it retrieves from those archives all the attributes required by the selected classifier and inputs them to the classifier. The user input to ClassX is typically a file with target coordinates, optionally complemented with target IDs. The output contains the class name, attributes, and class probabilities for all classified targets. We discuss ways to characterize and assess the classifier quality and performance and present the respective validation procedures. Based on both internal and external validation, we conclude that the ClassX classifiers yield reasonable and reliable classifications for ROSAT sources and have the potential to broaden class representation significantly for rare object types.
Analysis of spatial distribution of land cover maps accuracy

NASA Astrophysics Data System (ADS)

Khatami, R.; Mountrakis, G.; Stehman, S. V.

2017-12-01

Land cover maps have become one of the most important products of remote sensing science. However, classification errors will exist in any classified map and affect the reliability of subsequent map usage. Moreover, classification accuracy often varies over different regions of a classified map. These variations of accuracy will affect the reliability of subsequent analyses of different regions based on the classified maps. The traditional approach of map accuracy assessment based on an error matrix does not capture the spatial variation in classification accuracy. Here, per-pixel accuracy prediction methods are proposed based on interpolating accuracy values from a test sample to produce wall-to-wall accuracy maps. Different accuracy prediction methods were developed based on four factors: predictive domain (spatial versus spectral), interpolation function (constant, linear, Gaussian, and logistic), incorporation of class information (interpolating each class separately versus grouping them together), and sample size. Incorporation of spectral domain as explanatory feature spaces of classification accuracy interpolation was done for the first time in this research. Performance of the prediction methods was evaluated using 26 test blocks, with 10 km × 10 km dimensions, dispersed throughout the United States. The performance of the predictions was evaluated using the area under the curve (AUC) of the receiver operating characteristic. Relative to existing accuracy prediction methods, our proposed methods resulted in improvements of AUC of 0.15 or greater. Evaluation of the four factors comprising the accuracy prediction methods demonstrated that: i) interpolations should be done separately for each class instead of grouping all classes together; ii) if an all-classes approach is used, the spectral domain will result in substantially greater AUC than the spatial domain; iii) for the smaller sample size and per-class predictions, the spectral and spatial domain yielded similar AUC; iv) for the larger sample size (i.e., very dense spatial sample) and per-class predictions, the spatial domain yielded larger AUC; v) increasing the sample size improved accuracy predictions with a greater benefit accruing to the spatial domain; and vi) the function used for interpolation had the smallest effect on AUC.
Fatty acid profiles as a potential lipidomic biomarker of exposure to brevetoxin for endangered Florida manatees (Trichechus manatus latirostris).

PubMed

Wetzel, Dana L; Reynolds, John E; Sprinkel, Jay M; Schwacke, Lori; Mercurio, Philip; Rommel, Sentiel A

2010-11-15

Fatty acid signature analysis (FASA) is an important tool by which marine mammal scientists gain insight into foraging ecology. Fatty acid profiles (resulting from FASA) represent a potential biomarker to assess exposure to natural and anthropogenic stressors. Florida manatees are well studied, and an excellent necropsy program provides a basis against which to assess this budding tool. Results using samples from 54 manatees assigned to four cause-of-death categories indicated that those animals exposed to or that died due to brevetoxin exposure (red tide, or RT samples) demonstrate a distinctive hepatic fatty acid profile. Discriminant function analysis indicated that hepatic fatty acids could be used to classify RT versus non-RT liver samples with reasonable certainty. A discriminant function was derived based on 8 fatty acids which correctly classified 100% of samples from a training dataset (10 RT and 25 non-RT) and 85% of samples in a cross-validation dataset (5 RT and 13 non-RT). Of the latter dataset, all RT samples were correctly classified, but two of thirteen non-RT samples were incorrectly classified. However, the "incorrect" samples came from manatees that died due to other causes during documented red tide outbreaks; thus although the proximal cause of death was due to watercraft collisions, exposure to brevetoxin may have affected these individuals in ways that increased their vulnerability. This use of FASA could: a) provide an additional forensic tool to help scientists and managers to understand cause of death or debilitation due to exposure to red tide in manatees; b) serve as a model that could be applied to studies to improve assessments of cause of death in other marine mammals; and c) be used, as in humans, to help diagnose metabolic disorders or disease states in manatees and other species. Copyright © 2010 Elsevier B.V. All rights reserved.

Partner Violence Before and After Couples-Based Alcoholism Treatment for Female Alcoholic Patients

PubMed Central

Schumm, Jeremiah A.; O'Farrell, Timothy J.; Murphy, Christopher M.; Fals-Stewart, William

2010-01-01

This study examined partner violence before and in the first and second year after behavioral couples therapy (BCT) for 103 married or cohabiting women seeking alcohol dependence treatment and their male partners, and used a demographically matched non-alcoholic comparison sample. The treatment sample received M = 16.7 BCT sessions over 5-6 months. Follow-up rates for the treatment sample at years 1 and 2 were 88% and 83%, respectively. In the year before BCT, 68% of female alcoholic patients had been violent toward their male partner, nearly five times the comparison sample rate of 15%. In the year after BCT, violence prevalence decreased significantly to 31% of the treatment sample. Women were classified as remitted after treatment if they demonstrated abstinence or minimal substance use and no serious consequences related to substance use. In year 1 following BCT, 45% were classified as remitted, and 49% were classified as remitted in year 2. Among remitted patients in the year after BCT, violence prevalence of 22% did not differ from the comparison sample and was significantly lower than the rate among relapsed patients (38%). Results for male-perpetrated violence and for the second year after BCT were similar to the first year. Results supported predictions that partner violence would decrease after BCT, and that clinically significant violence reductions to the level of a non-alcoholic comparison sample would occur for patients whose alcoholism was remitted after BCT. These findings replicate previous research among men with alcoholism. PMID:19968389
Chemical data as markers of the geographical origins of sugarcane spirits.

PubMed

Serafim, F A T; Pereira-Filho, Edenir R; Franco, D W

2016-04-01

In an attempt to classify sugarcane spirits according to their geographic region of origin, chemical data for 24 analytes were evaluated in 50 cachaças produced using a similar procedure in selected regions of Brazil: São Paulo - SP (15), Minas Gerais - MG (11), Rio de Janeiro - RJ (11), Paraiba -PB (9), and Ceará - CE (4). Multivariate analysis was applied to the analytical results, and the predictive abilities of different classification methods were evaluated. Principal component analysis identified five groups, and chemical similarities were observed between MG and SP samples and between RJ and PB samples. CE samples presented a distinct chemical profile. Among the samples, partial linear square discriminant analysis (PLS-DA) classified 50.2% of the samples correctly, K-nearest neighbor (KNN) 86%, and soft independent modeling of class analogy (SIMCA) 56.2%. Therefore, in this proof of concept demonstration, the proposed approach based on chemical data satisfactorily predicted the cachaças' geographic origins. Copyright © 2015 Elsevier Ltd. All rights reserved.
Mapping urban impervious surface using object-based image analysis with WorldView-3 satellite imagery

NASA Astrophysics Data System (ADS)

Iabchoon, Sanwit; Wongsai, Sangdao; Chankon, Kanoksuk

2017-10-01

Land use and land cover (LULC) data are important to monitor and assess environmental change. LULC classification using satellite images is a method widely used on a global and local scale. Especially, urban areas that have various LULC types are important components of the urban landscape and ecosystem. This study aims to classify urban LULC using WorldView-3 (WV-3) very high-spatial resolution satellite imagery and the object-based image analysis method. A decision rules set was applied to classify the WV-3 images in Kathu subdistrict, Phuket province, Thailand. The main steps were as follows: (1) the image was ortho-rectified with ground control points and using the digital elevation model, (2) multiscale image segmentation was applied to divide the image pixel level into image object level, (3) development of the decision ruleset for LULC classification using spectral bands, spectral indices, spatial and contextual information, and (4) accuracy assessment was computed using testing data, which sampled by statistical random sampling. The results show that seven LULC classes (water, vegetation, open space, road, residential, building, and bare soil) were successfully classified with overall classification accuracy of 94.14% and a kappa coefficient of 92.91%.
MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging

NASA Astrophysics Data System (ADS)

Chen, Lei; Kamel, Mohamed S.

2016-01-01

In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.
Issues and challenges associated with classifying neoplasms in percutaneous needle biopsies of incidentally found small renal masses.

PubMed

Evans, Andrew J; Delahunt, Brett; Srigley, John R

2015-03-01

Percutaneous needle core biopsy has become acceptable for classifying renal tumours and guiding patient management in the setting of an incidentally-detected small renal mass (SRM), defined as an asymptomatic, non-palpable mass <4cm in maximum dimension. Long-held concerns preventing the incorporation of biopsies into routine patient care, including the perception of poor diagnostic yield and risks of complications such as bleeding or biopsy tract seeding, have largely been disproven. While needle biopsies for SRMs have traditionally been performed in academic centres, pathologists based in non-academic centres can expect to encounter these specimens as urologists and/or interventional radiologist trainees complete their training programs and begin work in non-academic centres. This review covers the rationale for performing these biopsies, the expected diagnostic yield, relevant differential diagnoses and an approach to classifying SRMs based on limited samples as well as the use of immunohistochemical (IHC) staining panels to aid in this process. There is also an undeniable learning curve for pathologists faced with reporting these biopsies and a number of issues and potential pitfalls attributable to sampling must be kept in mind by pathologists and clinicians alike. Copyright © 2015 Elsevier Inc. All rights reserved.
Mapping Phonetic Features for Voice-Driven Sound Synthesis

NASA Astrophysics Data System (ADS)

Janer, Jordi; Maestre, Esteban

In applications where the human voice controls the synthesis of musical instruments sounds, phonetics convey musical information that might be related to the sound of the imitated musical instrument. Our initial hypothesis is that phonetics are user- and instrument-dependent, but they remain constant for a single subject and instrument. We propose a user-adapted system, where mappings from voice features to synthesis parameters depend on how subjects sing musical articulations, i.e. note to note transitions. The system consists of two components. First, a voice signal segmentation module that automatically determines note-to-note transitions. Second, a classifier that determines the type of musical articulation for each transition based on a set of phonetic features. For validating our hypothesis, we run an experiment where subjects imitated real instrument recordings with their voice. Performance recordings consisted of short phrases of saxophone and violin performed in three grades of musical articulation labeled as: staccato, normal, legato. The results of a supervised training classifier (user-dependent) are compared to a classifier based on heuristic rules (user-independent). Finally, from the previous results we show how to control the articulation in a sample-concatenation synthesizer by selecting the most appropriate samples.
Lagrangian methods of cosmic web classification

NASA Astrophysics Data System (ADS)

Fisher, J. D.; Faltenbacher, A.; Johnson, M. S. T.

2016-05-01

The cosmic web defines the large-scale distribution of matter we see in the Universe today. Classifying the cosmic web into voids, sheets, filaments and nodes allows one to explore structure formation and the role environmental factors have on halo and galaxy properties. While existing studies of cosmic web classification concentrate on grid-based methods, this work explores a Lagrangian approach where the V-web algorithm proposed by Hoffman et al. is implemented with techniques borrowed from smoothed particle hydrodynamics. The Lagrangian approach allows one to classify individual objects (e.g. particles or haloes) based on properties of their nearest neighbours in an adaptive manner. It can be applied directly to a halo sample which dramatically reduces computational cost and potentially allows an application of this classification scheme to observed galaxy samples. Finally, the Lagrangian nature admits a straightforward inclusion of the Hubble flow negating the necessity of a visually defined threshold value which is commonly employed by grid-based classification methods.
Anytime query-tuned kernel machine classifiers via Cholesky factorization

NASA Technical Reports Server (NTRS)

DeCoste, D.

2002-01-01

We recently demonstrated 2 to 64-fold query-time speedups of Support Vector Machine and Kernel Fisher classifiers via a new computational geometry method for anytime output bounds (DeCoste,2002). This new paper refines our approach in two key ways. First, we introduce a simple linear algebra formulation based on Cholesky factorization, yielding simpler equations and lower computational overhead. Second, this new formulation suggests new methods for achieving additional speedups, including tuning on query samples. We demonstrate effectiveness on benchmark datasets.
Nondestructive detection of pork comprehensive quality based on spectroscopy and support vector machine

NASA Astrophysics Data System (ADS)

Liu, Yuanyuan; Peng, Yankun; Zhang, Leilei; Dhakal, Sagar; Wang, Caiping

2014-05-01

Pork is one of the highly consumed meat item in the world. With growing improvement of living standard, concerned stakeholders including consumers and regulatory body pay more attention to comprehensive quality of fresh pork. Different analytical-laboratory based technologies exist to determine quality attributes of pork. However, none of the technologies are able to meet industrial desire of rapid and non-destructive technological development. Current study used optical instrument as a rapid and non-destructive tool to classify 24 h-aged pork longissimus dorsi samples into three kinds of meat (PSE, Normal and DFD), on the basis of color L* and pH24. Total of 66 samples were used in the experiment. Optical system based on Vis/NIR spectral acquisition system (300-1100 nm) was self- developed in laboratory to acquire spectral signal of pork samples. Median smoothing filter (M-filter) and multiplication scatter correction (MSC) was used to remove spectral noise and signal drift. Support vector machine (SVM) prediction model was developed to classify the samples based on their comprehensive qualities. The results showed that the classification model is highly correlated with the actual quality parameters with classification accuracy more than 85%. The system developed in this study being simple and easy to use, results being promising, the system can be used in meat processing industry for real time, non-destructive and rapid detection of pork qualities in future.
A Novel Bearing Multi-Fault Diagnosis Approach Based on Weighted Permutation Entropy and an Improved SVM Ensemble Classifier.

PubMed

Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang

2018-06-14

Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.
Ethnozoology in Brazil: analysis of the methodological risks in published studies.

PubMed

Lyra-Neves, R M; Santos, E M; Medeiros, P M; Alves, R R N; Albuquerque, U P

2015-11-01

There has been a growth in the field of Ethnozoology throughout the years, especially in Brazil, where a considerable number of scientific articles pertaining to this subject has been published in recent decades. With this increase in publications comes the opportunity to assess the quality of these publications, as there are no known studies assessing the methodological risks in this area. Based on this observation, our objectives were to compile the papers published on the subject of ethnozoology and to answer the following questions: 1) Do the Brazilian ethnozoological studies use sound sampling methods?; 2) Is the sampling quality influenced by characteristics of the studies/publications? The studies found in databases and using web search engines were compiled to answer these questions. The studies were assessed based on their nature, sampling methods, use of hypotheses and tests, journal's impact factor, and animal group studied. The majority of the studies analyzed exhibited problems associated with the samples, as 144 (66.98%) studies were classified as having a high risk of bias. With regard to the characteristics analyzed, we determined that a quantitative nature and the use of tests are essential components of good sampling. Most studies classified as moderate and low risk either did not provide these data or provided data that were not clear; therefore, these studies were classified as being of a quali-quantitative nature. Studies performed with vertebrate groups were of high risk. Most of the papers analyzed here focused on fish, insects, and/or mollusks, thus highlighting the difficulties associated with conducting interviews regarding tetrapod vertebrates. Such difficulties are largely related to the extremely strict Brazilian laws, justified by the decline and extinction of some species, related to the use of wild tetrapod vertebrates.
Online clustering algorithms for radar emitter classification.

PubMed

Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max

2005-08-01

Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.
A hybrid sensing approach for pure and adulterated honey classification.

PubMed

Subari, Norazian; Mohamad Saleh, Junita; Md Shakaff, Ali Yeon; Zakaria, Ammar

2012-10-17

This paper presents a comparison between data from single modality and fusion methods to classify Tualang honey as pure or adulterated using Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) statistical classification approaches. Ten different brands of certified pure Tualang honey were obtained throughout peninsular Malaysia and Sumatera, Indonesia. Various concentrations of two types of sugar solution (beet and cane sugar) were used in this investigation to create honey samples of 20%, 40%, 60% and 80% adulteration concentrations. Honey data extracted from an electronic nose (e-nose) and Fourier Transform Infrared Spectroscopy (FTIR) were gathered, analyzed and compared based on fusion methods. Visual observation of classification plots revealed that the PCA approach able to distinct pure and adulterated honey samples better than the LDA technique. Overall, the validated classification results based on FTIR data (88.0%) gave higher classification accuracy than e-nose data (76.5%) using the LDA technique. Honey classification based on normalized low-level and intermediate-level FTIR and e-nose fusion data scored classification accuracies of 92.2% and 88.7%, respectively using the Stepwise LDA method. The results suggested that pure and adulterated honey samples were better classified using FTIR and e-nose fusion data than single modality data.
Motor Oil Classification Based on Time-Resolved Fluorescence

PubMed Central

Mu, Taotao; Chen, Siying; Zhang, Yinchao; Guo, Pan; Chen, He; Meng, Fandong

2014-01-01

A time-resolved fluorescence (TRF) technique is presented for classifying motor oils. The system is constructed with a third harmonic Nd:YAG laser, a spectrometer, and an intensified charge coupled device (ICCD) camera. Steady-state and time-resolved fluorescence (TRF) measurements are reported for several motor oils. It is found that steady-state fluorescence is insufficient to distinguish the motor oil samples. Then contour diagrams of TRF intensities (CDTRFIs) are acquired to serve as unique fingerprints to identify motor oils by using the distinct TRF of motor oils. CDTRFIs are preferable to steady-state fluorescence spectra for classifying different motor oils, making CDTRFIs a particularly choice for the development of fluorescence-based methods for the discrimination and characterization of motor oils. The two-dimensional fluorescence contour diagrams contain more information, not only the changing shapes of the LIF spectra but also the relative intensity. The results indicate that motor oils can be differentiated based on the new proposed method, which provides reliable methods for analyzing and classifying motor oils. PMID:24988439
A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers

PubMed Central

2012-01-01

Background Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? Results The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Conclusion Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway. PMID:23216969
A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers.

PubMed

Günther, Oliver P; Chen, Virginia; Freue, Gabriela Cohen; Balshaw, Robert F; Tebbutt, Scott J; Hollander, Zsuzsanna; Takhar, Mandeep; McMaster, W Robert; McManus, Bruce M; Keown, Paul A; Ng, Raymond T

2012-12-08

Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
A NAIVE BAYES SOURCE CLASSIFIER FOR X-RAY SOURCES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Broos, Patrick S.; Getman, Konstantin V.; Townsley, Leisa K.

2011-05-01

The Chandra Carina Complex Project (CCCP) provides a sensitive X-ray survey of a nearby starburst region over >1 deg{sup 2} in extent. Thousands of faint X-ray sources are found, many concentrated into rich young stellar clusters. However, significant contamination from unrelated Galactic and extragalactic sources is present in the X-ray catalog. We describe the use of a naive Bayes classifier to assign membership probabilities to individual sources, based on source location, X-ray properties, and visual/infrared properties. For the particular membership decision rule adopted, 75% of CCCP sources are classified as members, 11% are classified as contaminants, and 14% remain unclassified.more » The resulting sample of stars likely to be Carina members is used in several other studies, which appear in this special issue devoted to the CCCP.« less
Anomaly detection for medical images based on a one-class classification

NASA Astrophysics Data System (ADS)

Wei, Qi; Ren, Yinhao; Hou, Rui; Shi, Bibo; Lo, Joseph Y.; Carin, Lawrence

2018-02-01

Detecting an anomaly such as a malignant tumor or a nodule from medical images including mammogram, CT or PET images is still an ongoing research problem drawing a lot of attention with applications in medical diagnosis. A conventional way to address this is to learn a discriminative model using training datasets of negative and positive samples. The learned model can be used to classify a testing sample into a positive or negative class. However, in medical applications, the high unbalance between negative and positive samples poses a difficulty for learning algorithms, as they will be biased towards the majority group, i.e., the negative one. To address this imbalanced data issue as well as leverage the huge amount of negative samples, i.e., normal medical images, we propose to learn an unsupervised model to characterize the negative class. To make the learned model more flexible and extendable for medical images of different scales, we have designed an autoencoder based on a deep neural network to characterize the negative patches decomposed from large medical images. A testing image is decomposed into patches and then fed into the learned autoencoder to reconstruct these patches themselves. The reconstruction error of one patch is used to classify this patch into a binary class, i.e., a positive or a negative one, leading to a one-class classifier. The positive patches highlight the suspicious areas containing anomalies in a large medical image. The proposed method has been tested on InBreast dataset and achieves an AUC of 0.84. The main contribution of our work can be summarized as follows. 1) The proposed one-class learning requires only data from one class, i.e., the negative data; 2) The patch-based learning makes the proposed method scalable to images of different sizes and helps avoid the large scale problem for medical images; 3) The training of the proposed deep convolutional neural network (DCNN) based auto-encoder is fast and stable.
Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery

PubMed Central

Thanh Noi, Phan; Kappas, Martin

2017-01-01

In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909
Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery.

PubMed

Thanh Noi, Phan; Kappas, Martin

2017-12-22

In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.

An expert support system for breast cancer diagnosis using color wavelet features.

PubMed

Issac Niwas, S; Palanisamy, P; Chibbar, Rajni; Zhang, W J

2012-10-01

Breast cancer diagnosis can be done through the pathologic assessments of breast tissue samples such as core needle biopsy technique. The result of analysis on this sample by pathologist is crucial for breast cancer patient. In this paper, nucleus of tissue samples are investigated after decomposition by means of the Log-Gabor wavelet on HSV color domain and an algorithm is developed to compute the color wavelet features. These features are used for breast cancer diagnosis using Support Vector Machine (SVM) classifier algorithm. The ability of properly trained SVM is to correctly classify patterns and make them particularly suitable for use in an expert system that aids in the diagnosis of cancer tissue samples. The results are compared with other multivariate classifiers such as Naïves Bayes classifier and Artificial Neural Network. The overall accuracy of the proposed method using SVM classifier will be further useful for automation in cancer diagnosis.
SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.

PubMed

Pirooznia, Mehdi; Deng, Youping

2006-12-12

Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.
Design of partially supervised classifiers for multispectral image data

NASA Technical Reports Server (NTRS)

Jeon, Byeungwoo; Landgrebe, David

1993-01-01

A partially supervised classification problem is addressed, especially when the class definition and corresponding training samples are provided a priori only for just one particular class. In practical applications of pattern classification techniques, a frequently observed characteristic is the heavy, often nearly impossible requirements on representative prior statistical class characteristics of all classes in a given data set. Considering the effort in both time and man-power required to have a well-defined, exhaustive list of classes with a corresponding representative set of training samples, this 'partially' supervised capability would be very desirable, assuming adequate classifier performance can be obtained. Two different classification algorithms are developed to achieve simplicity in classifier design by reducing the requirement of prior statistical information without sacrificing significant classifying capability. The first one is based on optimal significance testing, where the optimal acceptance probability is estimated directly from the data set. In the second approach, the partially supervised classification is considered as a problem of unsupervised clustering with initially one known cluster or class. A weighted unsupervised clustering procedure is developed to automatically define other classes and estimate their class statistics. The operational simplicity thus realized should make these partially supervised classification schemes very viable tools in pattern classification.
A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update

NASA Astrophysics Data System (ADS)

Lotte, F.; Bougrain, L.; Cichocki, A.; Clerc, M.; Congedo, M.; Rakotomamonjy, A.; Yger, F.

2018-06-01

Objective. Most current electroencephalography (EEG)-based brain–computer interfaces (BCIs) are based on machine learning algorithms. There is a large diversity of classifier types that are used in this field, as described in our 2007 review paper. Now, approximately ten years after this review publication, many new algorithms have been developed and tested to classify EEG signals in BCIs. The time is therefore ripe for an updated review of EEG classification algorithms for BCIs. Approach. We surveyed the BCI and machine learning literature from 2007 to 2017 to identify the new classification approaches that have been investigated to design BCIs. We synthesize these studies in order to present such algorithms, to report how they were used for BCIs, what were the outcomes, and to identify their pros and cons. Main results. We found that the recently designed classification algorithms for EEG-based BCIs can be divided into four main categories: adaptive classifiers, matrix and tensor classifiers, transfer learning and deep learning, plus a few other miscellaneous classifiers. Among these, adaptive classifiers were demonstrated to be generally superior to static ones, even with unsupervised adaptation. Transfer learning can also prove useful although the benefits of transfer learning remain unpredictable. Riemannian geometry-based methods have reached state-of-the-art performances on multiple BCI problems and deserve to be explored more thoroughly, along with tensor-based methods. Shrinkage linear discriminant analysis and random forests also appear particularly useful for small training samples settings. On the other hand, deep learning methods have not yet shown convincing improvement over state-of-the-art BCI methods. Significance. This paper provides a comprehensive overview of the modern classification algorithms used in EEG-based BCIs, presents the principles of these methods and guidelines on when and how to use them. It also identifies a number of challenges to further advance EEG classification in BCI.
A Framework for Final Drive Simultaneous Failure Diagnosis Based on Fuzzy Entropy and Sparse Bayesian Extreme Learning Machine

PubMed Central

Ye, Qing; Pan, Hao; Liu, Changhua

2015-01-01

This research proposes a novel framework of final drive simultaneous failure diagnosis containing feature extraction, training paired diagnostic models, generating decision threshold, and recognizing simultaneous failure modes. In feature extraction module, adopt wavelet package transform and fuzzy entropy to reduce noise interference and extract representative features of failure mode. Use single failure sample to construct probability classifiers based on paired sparse Bayesian extreme learning machine which is trained only by single failure modes and have high generalization and sparsity of sparse Bayesian learning approach. To generate optimal decision threshold which can convert probability output obtained from classifiers into final simultaneous failure modes, this research proposes using samples containing both single and simultaneous failure modes and Grid search method which is superior to traditional techniques in global optimization. Compared with other frequently used diagnostic approaches based on support vector machine and probability neural networks, experiment results based on F 1-measure value verify that the diagnostic accuracy and efficiency of the proposed framework which are crucial for simultaneous failure diagnosis are superior to the existing approach. PMID:25722717
Specific primers design based on the superoxide dismutase b gene for Trypanosoma cruzi as a screening tool: Validation method using strains from Colombia classified according to their discrete typing unit.

PubMed

Olmo, Francisco; Escobedo-Orteg, Javier; Palma, Patricia; Sánchez-Moreno, Manuel; Mejía-Jaramillo, Ana; Triana, Omar; Marín, Clotilde

2014-11-01

To classify 21 new isolates of Trypanosoma cruzi (T. cruzi) according to the Discrete Typing Unit (DTU) which they belong to, as well as tune up a new pair of primers designed to detect the parasite in biological samples. Strains were isolated, DNA extracted, and classified by using three Polymerase Chain Reactions (PCR). Subsequently this DNA was used along with other isolates of various biological samples, for a new PCR using primers designed. Finally, the amplified fragments were sequenced. It was observed the predominance of DTU I in Colombia, as well as the specificity of our primers for detection of T. cruzi, while no band was obtained when other species were used. This work reveals the genetic variability of 21 new isolates of T. cruzi in Colombia.Our primers confirmed their specificity for detecting the presence of T. cruzi. Copyright © 2014 Hainan Medical College. Published by Elsevier B.V. All rights reserved.
Segmentation of thalamus from MR images via task-driven dictionary learning

NASA Astrophysics Data System (ADS)

Liu, Luoluo; Glaister, Jeffrey; Sun, Xiaoxia; Carass, Aaron; Tran, Trac D.; Prince, Jerry L.

2016-03-01

Automatic thalamus segmentation is useful to track changes in thalamic volume over time. In this work, we introduce a task-driven dictionary learning framework to find the optimal dictionary given a set of eleven features obtained from T1-weighted MRI and diffusion tensor imaging. In this dictionary learning framework, a linear classifier is designed concurrently to classify voxels as belonging to the thalamus or non-thalamus class. Morphological post-processing is applied to produce the final thalamus segmentation. Due to the uneven size of the training data samples for the non-thalamus and thalamus classes, a non-uniform sampling scheme is pro- posed to train the classifier to better discriminate between the two classes around the boundary of the thalamus. Experiments are conducted on data collected from 22 subjects with manually delineated ground truth. The experimental results are promising in terms of improvements in the Dice coefficient of the thalamus segmentation overstate-of-the-art atlas-based thalamus segmentation algorithms.
Segmentation of Thalamus from MR images via Task-Driven Dictionary Learning.

PubMed

Liu, Luoluo; Glaister, Jeffrey; Sun, Xiaoxia; Carass, Aaron; Tran, Trac D; Prince, Jerry L

2016-02-27

Automatic thalamus segmentation is useful to track changes in thalamic volume over time. In this work, we introduce a task-driven dictionary learning framework to find the optimal dictionary given a set of eleven features obtained from T1-weighted MRI and diffusion tensor imaging. In this dictionary learning framework, a linear classifier is designed concurrently to classify voxels as belonging to the thalamus or non-thalamus class. Morphological post-processing is applied to produce the final thalamus segmentation. Due to the uneven size of the training data samples for the non-thalamus and thalamus classes, a non-uniform sampling scheme is proposed to train the classifier to better discriminate between the two classes around the boundary of the thalamus. Experiments are conducted on data collected from 22 subjects with manually delineated ground truth. The experimental results are promising in terms of improvements in the Dice coefficient of the thalamus segmentation over state-of-the-art atlas-based thalamus segmentation algorithms.
Interface Prostheses With Classifier-Feedback-Based User Training.

PubMed

Fang, Yinfeng; Zhou, Dalin; Li, Kairu; Liu, Honghai

2017-11-01

It is evident that user training significantly affects performance of pattern-recognition-based myoelectric prosthetic device control. Despite plausible classification accuracy on offline datasets, online accuracy usually suffers from the changes in physiological conditions and electrode displacement. The user ability in generating consistent electromyographic (EMG) patterns can be enhanced via proper user training strategies in order to improve online performance. This study proposes a clustering-feedback strategy that provides real-time feedback to users by means of a visualized online EMG signal input as well as the centroids of the training samples, whose dimensionality is reduced to minimal number by dimension reduction. Clustering feedback provides a criterion that guides users to adjust motion gestures and muscle contraction forces intentionally. The experiment results have demonstrated that hand motion recognition accuracy increases steadily along the progress of the clustering-feedback-based user training, while conventional classifier-feedback methods, i.e., label feedback, hardly achieve any improvement. The result concludes that the use of proper classifier feedback can accelerate the process of user training, and implies prosperous future for the amputees with limited or no experience in pattern-recognition-based prosthetic device manipulation.It is evident that user training significantly affects performance of pattern-recognition-based myoelectric prosthetic device control. Despite plausible classification accuracy on offline datasets, online accuracy usually suffers from the changes in physiological conditions and electrode displacement. The user ability in generating consistent electromyographic (EMG) patterns can be enhanced via proper user training strategies in order to improve online performance. This study proposes a clustering-feedback strategy that provides real-time feedback to users by means of a visualized online EMG signal input as well as the centroids of the training samples, whose dimensionality is reduced to minimal number by dimension reduction. Clustering feedback provides a criterion that guides users to adjust motion gestures and muscle contraction forces intentionally. The experiment results have demonstrated that hand motion recognition accuracy increases steadily along the progress of the clustering-feedback-based user training, while conventional classifier-feedback methods, i.e., label feedback, hardly achieve any improvement. The result concludes that the use of proper classifier feedback can accelerate the process of user training, and implies prosperous future for the amputees with limited or no experience in pattern-recognition-based prosthetic device manipulation.
Improved semi-supervised online boosting for object tracking

NASA Astrophysics Data System (ADS)

Li, Yicui; Qi, Lin; Tan, Shukun

2016-10-01

The advantage of an online semi-supervised boosting method which takes object tracking problem as a classification problem, is training a binary classifier from labeled and unlabeled examples. Appropriate object features are selected based on real time changes in the object. However, the online semi-supervised boosting method faces one key problem: The traditional self-training using the classification results to update the classifier itself, often leads to drifting or tracking failure, due to the accumulated error during each update of the tracker. To overcome the disadvantages of semi-supervised online boosting based on object tracking methods, the contribution of this paper is an improved online semi-supervised boosting method, in which the learning process is guided by positive (P) and negative (N) constraints, termed P-N constraints, which restrict the labeling of the unlabeled samples. First, we train the classification by an online semi-supervised boosting. Then, this classification is used to process the next frame. Finally, the classification is analyzed by the P-N constraints, which are used to verify if the labels of unlabeled data assigned by the classifier are in line with the assumptions made about positive and negative samples. The proposed algorithm can effectively improve the discriminative ability of the classifier and significantly alleviate the drifting problem in tracking applications. In the experiments, we demonstrate real-time tracking of our tracker on several challenging test sequences where our tracker outperforms other related on-line tracking methods and achieves promising tracking performance.
Towards exaggerated emphysema stereotypes

NASA Astrophysics Data System (ADS)

Chen, C.; Sørensen, L.; Lauze, F.; Igel, C.; Loog, M.; Feragen, A.; de Bruijne, M.; Nielsen, M.

2012-03-01

Classification is widely used in the context of medical image analysis and in order to illustrate the mechanism of a classifier, we introduce the notion of an exaggerated image stereotype based on training data and trained classifier. The stereotype of some image class of interest should emphasize/exaggerate the characteristic patterns in an image class and visualize the information the employed classifier relies on. This is useful for gaining insight into the classification and serves for comparison with the biological models of disease. In this work, we build exaggerated image stereotypes by optimizing an objective function which consists of a discriminative term based on the classification accuracy, and a generative term based on the class distributions. A gradient descent method based on iterated conditional modes (ICM) is employed for optimization. We use this idea with Fisher's linear discriminant rule and assume a multivariate normal distribution for samples within a class. The proposed framework is applied to computed tomography (CT) images of lung tissue with emphysema. The synthesized stereotypes illustrate the exaggerated patterns of lung tissue with emphysema, which is underpinned by three different quantitative evaluation methods.
Fast clustering algorithm for large ECG data sets based on CS theory in combination with PCA and K-NN methods.

PubMed

Balouchestani, Mohammadreza; Krishnan, Sridhar

2014-01-01

Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
Computer-aided Prognosis of Neuroblastoma on Whole-slide Images: Classification of Stromal Development

PubMed Central

Sertel, O.; Kong, J.; Shimada, H.; Catalyurek, U.V.; Saltz, J.H.; Gurcan, M.N.

2009-01-01

We are developing a computer-aided prognosis system for neuroblastoma (NB), a cancer of the nervous system and one of the most malignant tumors affecting children. Histopathological examination is an important stage for further treatment planning in routine clinical diagnosis of NB. According to the International Neuroblastoma Pathology Classification (the Shimada system), NB patients are classified into favorable and unfavorable histology based on the tissue morphology. In this study, we propose an image analysis system that operates on digitized H&E stained whole-slide NB tissue samples and classifies each slide as either stroma-rich or stroma-poor based on the degree of Schwannian stromal development. Our statistical framework performs the classification based on texture features extracted using co-occurrence statistics and local binary patterns. Due to the high resolution of digitized whole-slide images, we propose a multi-resolution approach that mimics the evaluation of a pathologist such that the image analysis starts from the lowest resolution and switches to higher resolutions when necessary. We employ an offine feature selection step, which determines the most discriminative features at each resolution level during the training step. A modified k-nearest neighbor classifier is used to determine the confidence level of the classification to make the decision at a particular resolution level. The proposed approach was independently tested on 43 whole-slide samples and provided an overall classification accuracy of 88.4%. PMID:20161324
Residential scene classification for gridded population sampling in developing countries using deep convolutional neural networks on satellite imagery.

PubMed

Chew, Robert F; Amer, Safaa; Jones, Kasey; Unangst, Jennifer; Cajka, James; Allpress, Justine; Bruhn, Mark

2018-05-09

Conducting surveys in low- and middle-income countries is often challenging because many areas lack a complete sampling frame, have outdated census information, or have limited data available for designing and selecting a representative sample. Geosampling is a probability-based, gridded population sampling method that addresses some of these issues by using geographic information system (GIS) tools to create logistically manageable area units for sampling. GIS grid cells are overlaid to partition a country's existing administrative boundaries into area units that vary in size from 50 m × 50 m to 150 m × 150 m. To avoid sending interviewers to unoccupied areas, researchers manually classify grid cells as "residential" or "nonresidential" through visual inspection of aerial images. "Nonresidential" units are then excluded from sampling and data collection. This process of manually classifying sampling units has drawbacks since it is labor intensive, prone to human error, and creates the need for simplifying assumptions during calculation of design-based sampling weights. In this paper, we discuss the development of a deep learning classification model to predict whether aerial images are residential or nonresidential, thus reducing manual labor and eliminating the need for simplifying assumptions. On our test sets, the model performs comparable to a human-level baseline in both Nigeria (94.5% accuracy) and Guatemala (96.4% accuracy), and outperforms baseline machine learning models trained on crowdsourced or remote-sensed geospatial features. Additionally, our findings suggest that this approach can work well in new areas with relatively modest amounts of training data. Gridded population sampling methods like geosampling are becoming increasingly popular in countries with outdated or inaccurate census data because of their timeliness, flexibility, and cost. Using deep learning models directly on satellite images, we provide a novel method for sample frame construction that identifies residential gridded aerial units. In cases where manual classification of satellite images is used to (1) correct for errors in gridded population data sets or (2) classify grids where population estimates are unavailable, this methodology can help reduce annotation burden with comparable quality to human analysts.
Low Prevalence of Substandard and Falsified Antimalarial and Antibiotic Medicines in Public and Faith-Based Health Facilities of Southern Malawi

PubMed Central

Khuluza, Felix; Kigera, Stephen; Heide, Lutz

2017-01-01

Substandard and falsified antimalarial and antibiotic medicines represent a serious problem for public health, especially in low- and middle-income countries. However, information on the prevalence of poor-quality medicines is limited. In the present study, samples of six antimalarial and six antibiotic medicines were collected from 31 health facilities and drug outlets in southern Malawi. Random sampling was used in the selection of health facilities. For sample collection, an overt approach was used in licensed facilities, and a mystery shopper approach in nonlicensed outlets. One hundred and fifty-five samples were analyzed by visual and physical examination and by rapid prescreening tests, that is, disintegration testing and thin-layer chromatography using the GPHF-Minilab. Fifty-six of the samples were analyzed according to pharmacopeial monographs in a World Health Organization-prequalified quality control laboratory. Seven out-of-specification medicines were identified. One sample was classified as falsified, lacking the declared active ingredients, and containing other active ingredients instead. Three samples were classified as substandard with extreme deviations from the pharmacopeial standards, and three further samples as substandard with nonextreme deviations. Of the substandard medicines, three failed in dissolution testing, two in the assay for the content of the active pharmaceutical ingredient, and one failed in both dissolution testing and assay. Six of the seven out-of-specification medicines were from private facilities. Only one out-of-specification medicine was found within the samples from public and faith-based health facilities. Although the observed presence of substandard and falsified medicines in Malawi requires action, their low prevalence in public and faith-based health facilities is encouraging. PMID:28219993
Classification of Salmonella serotypes with hyperspectral microscope imagery

USDA-ARS?s Scientific Manuscript database

Previous research has demonstrated an optical method with acousto-optic tunable filter (AOTF) based hyperspectral microscope imaging (HMI) had potential for classifying gram-negative from gram-positive foodborne pathogenic bacteria rapidly and nondestructively with a minimum sample preparation. In t...
Classification of 'Chemlali' accessions according to the geographical area using chemometric methods of phenolic profiles analysed by HPLC-ESI-TOF-MS.

PubMed

Taamalli, Amani; Arráez Román, David; Zarrouk, Mokhtar; Segura-Carretero, Antonio; Fernández-Gutiérrez, Alberto

2012-05-01

The present work describes a classification method of Tunisian 'Chemlali' olive oils based on their phenolic composition and geographical area. For this purpose, the data obtained by HPLC-ESI-TOF-MS from 13 samples of extra virgin olive oils, obtained from different production area throughout the country, were used for this study focusing in 23 phenolics compounds detected. The quantitative results showed a significant variability among the analysed oil samples. Factor analysis method using principal component was applied to the data in order to reduce the number of factors which explain the variability of the selected compounds. The data matrix constructed was subjected to a canonical discriminant analysis (CDA) in order to classify the oil samples. These results showed that 100% of cross-validated original group cases were correctly classified, which proves the usefulness of the selected variables. Copyright © 2011 Elsevier Ltd. All rights reserved.
Confidence Preserving Machine for Facial Action Unit Detection

PubMed Central

Zeng, Jiabei; Chu, Wen-Sheng; De la Torre, Fernando; Cohn, Jeffrey F.; Xiong, Zhang

2016-01-01

Facial action unit (AU) detection from video has been a long-standing problem in automated facial expression analysis. While progress has been made, accurate detection of facial AUs remains challenging due to ubiquitous sources of errors, such as inter-personal variability, pose, and low-intensity AUs. In this paper, we refer to samples causing such errors as hard samples, and the remaining as easy samples. To address learning with the hard samples, we propose the Confidence Preserving Machine (CPM), a novel two-stage learning framework that combines multiple classifiers following an “easy-to-hard” strategy. During the training stage, CPM learns two confident classifiers. Each classifier focuses on separating easy samples of one class from all else, and thus preserves confidence on predicting each class. During the testing stage, the confident classifiers provide “virtual labels” for easy test samples. Given the virtual labels, we propose a quasi-semi-supervised (QSS) learning strategy to learn a person-specific (PS) classifier. The QSS strategy employs a spatio-temporal smoothness that encourages similar predictions for samples within a spatio-temporal neighborhood. In addition, to further improve detection performance, we introduce two CPM extensions: iCPM that iteratively augments training samples to train the confident classifiers, and kCPM that kernelizes the original CPM model to promote nonlinearity. Experiments on four spontaneous datasets GFT [15], BP4D [56], DISFA [42], and RU-FACS [3] illustrate the benefits of the proposed CPM models over baseline methods and state-of-the-art semisupervised learning and transfer learning methods. PMID:27479964
Recent developments in detection and enumeration of waterborne bacteria: a retrospective minireview.

PubMed

Deshmukh, Rehan A; Joshi, Kopal; Bhand, Sunil; Roy, Utpal

2016-12-01

Waterborne diseases have emerged as global health problems and their rapid and sensitive detection in environmental water samples is of great importance. Bacterial identification and enumeration in water samples is significant as it helps to maintain safe drinking water for public consumption. Culture-based methods are laborious, time-consuming, and yield false-positive results, whereas viable but nonculturable (VBNCs) microorganisms cannot be recovered. Hence, numerous methods have been developed for rapid detection and quantification of waterborne pathogenic bacteria in water. These rapid methods can be classified into nucleic acid-based, immunology-based, and biosensor-based detection methods. This review summarizes the principle and current state of rapid methods for the monitoring and detection of waterborne bacterial pathogens. Rapid methods outlined are polymerase chain reaction (PCR), digital droplet PCR, real-time PCR, multiplex PCR, DNA microarray, Next-generation sequencing (pyrosequencing, Illumina technology and genomics), and fluorescence in situ hybridization that are categorized as nucleic acid-based methods. Enzyme-linked immunosorbent assay (ELISA) and immunofluorescence are classified into immunology-based methods. Optical, electrochemical, and mass-based biosensors are grouped into biosensor-based methods. Overall, these methods are sensitive, specific, time-effective, and important in prevention and diagnosis of waterborne bacterial diseases. © 2016 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
Super resolution reconstruction of infrared images based on classified dictionary learning

NASA Astrophysics Data System (ADS)

Liu, Fei; Han, Pingli; Wang, Yi; Li, Xuan; Bai, Lu; Shao, Xiaopeng

2018-05-01

Infrared images always suffer from low-resolution problems resulting from limitations of imaging devices. An economical approach to combat this problem involves reconstructing high-resolution images by reasonable methods without updating devices. Inspired by compressed sensing theory, this study presents and demonstrates a Classified Dictionary Learning method to reconstruct high-resolution infrared images. It classifies features of the samples into several reasonable clusters and trained a dictionary pair for each cluster. The optimal pair of dictionaries is chosen for each image reconstruction and therefore, more satisfactory results is achieved without the increase in computational complexity and time cost. Experiments and results demonstrated that it is a viable method for infrared images reconstruction since it improves image resolution and recovers detailed information of targets.

Detecting and classifying method based on similarity matching of Android malware behavior with profile.

PubMed

Jang, Jae-Wook; Yun, Jaesung; Mohaisen, Aziz; Woo, Jiyoung; Kim, Huy Kang

2016-01-01

Mass-market mobile security threats have increased recently due to the growth of mobile technologies and the popularity of mobile devices. Accordingly, techniques have been introduced for identifying, classifying, and defending against mobile threats utilizing static, dynamic, on-device, and off-device techniques. Static techniques are easy to evade, while dynamic techniques are expensive. On-device techniques are evasion, while off-device techniques need being always online. To address some of those shortcomings, we introduce Andro-profiler, a hybrid behavior based analysis and classification system for mobile malware. Andro-profiler main goals are efficiency, scalability, and accuracy. For that, Andro-profiler classifies malware by exploiting the behavior profiling extracted from the integrated system logs including system calls. Andro-profiler executes a malicious application on an emulator in order to generate the integrated system logs, and creates human-readable behavior profiles by analyzing the integrated system logs. By comparing the behavior profile of malicious application with representative behavior profile for each malware family using a weighted similarity matching technique, Andro-profiler detects and classifies it into malware families. The experiment results demonstrate that Andro-profiler is scalable, performs well in detecting and classifying malware with accuracy greater than 98 %, outperforms the existing state-of-the-art work, and is capable of identifying 0-day mobile malware samples.
Quantum Support Vector Machine for Big Data Classification

NASA Astrophysics Data System (ADS)

Rebentrost, Patrick; Mohseni, Masoud; Lloyd, Seth

2014-09-01

Supervised machine learning is the classification of new data based on already classified training examples. In this work, we show that the support vector machine, an optimized binary classifier, can be implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number of training examples. In cases where classical sampling algorithms require polynomial time, an exponential speedup is obtained. At the core of this quantum big data algorithm is a nonsparse matrix exponentiation technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix.
Moving beyond the van Krevelen Diagram: A New Stoichiometric Approach for Compound Classification in Organisms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rivas-Ubach, Albert; Liu, Yina; Bianchi, Thomas S.

van Krevelen diagrams (O:C vs H:C ratios of elemental formulas) have been widely used in studies to obtain an estimation of the main compound categories present in environmental samples. However, the limits defining a specific compound category based solely on O:C and H:C ratios of elemental formulas have never been accurately listed or proposed to classify metabolites in biological samples. Furthermore, while O:C vs. H:C ratios of elemental formulas can provide an overview of the compound categories, such classification is inefficient because of the large overlap among different compound categories along both axes. We propose a more accurate compound classificationmore » for biological samples analyzed by high-resolution mass spectrometry-based on an assessment of the C:H:O:N:P stoichiometric ratios of over 130,000 elemental formulas of compounds classified in 6 main categories: lipids, peptides, amino-sugars, carbohydrates, nucleotides and phytochemical compounds (oxy-aromatic compounds). Our multidimensional stoichiometric compound classification (MSCC) constraints showed a highly accurate categorization of elemental formulas to the main compound categories in biological samples with over 98% of accuracy representing a substantial improvement over any classification based on the classic van Krevelen diagram. This method represents a significant step forward in environmental research, especially ecological stoichiometry and eco-metabolomics studies, by providing a novel and robust tool to further our understanding the ecosystem structure and function through the chemical characterization of different biological samples.« less
Japanese Wolves are Genetically Divided into Two Groups Based on an 8-Nucleotide Insertion/Deletion within the mtDNA Control Region.

PubMed

Ishiguro, Naotaka; Inoshima, Yasuo; Yanai, Tokuma; Sasaki, Motoki; Matsui, Akira; Kikuchi, Hiroki; Maruyama, Masashi; Hongo, Hitomi; Vostretsov, Yuri E; Gasilin, Viatcheslav; Kosintsev, Pavel A; Quanjia, Chen; Chunxue, Wang

2016-02-01

The mitochondrial DNA (mtDNA) control region (198- to 598-bp) of four ancient Canis specimens (two Canis mandibles, a cranium, and a first phalanx) was examined, and each specimen was genetically identified as Japanese wolf. Two unique nucleotide substitutions, the 78-C insertion and the 482-G deletion, both of which are specific for Japanese wolf, were observed in each sample. Based on the mtDNA sequences analyzed, these four specimens and 10 additional Japanese wolf samples could be classified into two groups- Group A (10 samples) and Group B (4 samples)-which contain or lack an 8-bp insertion/deletion (indel), respectively. Interestingly, three dogs (Akita-b, Kishu 25, and S-husky 102) that each contained Japanese wolf-specific features were also classified into Group A or B based on the 8-bp indel. To determine the origin or ancestor of the Japanese wolf, mtDNA control regions of ancient continental Canis specimens were examined; 84 specimens were from Russia, and 29 were from China. However, none of these 113 specimens contained Japanese wolf-specific sequences. Moreover, none of 426 Japanese modern hunting dogs examined contained these Japanese wolf-specific mtDNA sequences. The mtDNA control region sequences of Groups A and B appeared to be unique to grey wolf and dog populations.
Assessing clinical significance of treatment outcomes using the DASS-21.

PubMed

Ronk, Fiona R; Korman, James R; Hooke, Geoffrey R; Page, Andrew C

2013-12-01

Standard clinical significance classifications are based on movement between the "dysfunctional" and "functional" distributions; however, this dichotomy ignores heterogeneity within the "dysfunctional" population. Based on the methodology described by Tingey, Lambert, Burlingame, and Hansen (1996), the present study sought to present a 3-distribution clinical significance model for the 21-item version of the Depression Anxiety Stress Scales (DASS-21; P. F. Lovibond & Lovibond, 1995) using data from a normative sample (n = 2,914), an outpatient sample (n = 1,000), and an inpatient sample (n = 3,964). DASS-21 scores were collected at pre- and post-treatment for both clinical samples, and patients were classified into 1 of 5 categories based on whether they had made a reliable change and whether they had moved into a different functional range. Evidence supported the validity of the 3-distribution model for the DASS-21, since inpatients who were classified as making a clinically significant change showed lower symptom severity, higher perceived quality of life, and higher clinician-rated functioning than those who did not make a clinically significant change. Importantly, results suggest that the new category of recovering is an intermediate point between recovered and making no clinically significant change. Inpatients and outpatients have different treatment goals and therefore use of the concept of clinical significance needs to acknowledge differences in what constitutes a meaningful change. (c) 2013 APA, all rights reserved.
The Discovery of Novel Biomarkers Improves Breast Cancer Intrinsic Subtype Prediction and Reconciles the Labels in the METABRIC Data Set

PubMed Central

Milioli, Heloisa Helena; Vimieiro, Renato; Riveros, Carlos; Tishchenko, Inna; Berretta, Regina; Moscato, Pablo

2015-01-01

Background The prediction of breast cancer intrinsic subtypes has been introduced as a valuable strategy to determine patient diagnosis and prognosis, and therapy response. The PAM50 method, based on the expression levels of 50 genes, uses a single sample predictor model to assign subtype labels to samples. Intrinsic errors reported within this assay demonstrate the challenge of identifying and understanding the breast cancer groups. In this study, we aim to: a) identify novel biomarkers for subtype individuation by exploring the competence of a newly proposed method named CM1 score, and b) apply an ensemble learning, as opposed to the use of a single classifier, for sample subtype assignment. The overarching objective is to improve class prediction. Methods and Findings The microarray transcriptome data sets used in this study are: the METABRIC breast cancer data recorded for over 2000 patients, and the public integrated source from ROCK database with 1570 samples. We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype. We further assessed the ability of 42 selected probes on assigning correct subtype labels using 24 different classifiers from the Weka software suite. For comparison, the same method was applied on the list of 50 genes from the PAM50 method. Conclusions The CM1 score portrayed 30 novel biomarkers for predicting breast cancer subtypes, with the confirmation of the role of 12 well-established genes. Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels. The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets. Remarkably, the paradoxical attribution of the original labels reinforces the limitations of employing a single sample classifiers to predict breast cancer intrinsic subtypes. PMID:26132585
RBoost: Label Noise-Robust Boosting Algorithm Based on a Nonconvex Loss Function and the Numerically Stable Base Learners.

PubMed

Miao, Qiguang; Cao, Ying; Xia, Ge; Gong, Maoguo; Liu, Jiachen; Song, Jianfeng

2016-11-01

AdaBoost has attracted much attention in the machine learning community because of its excellent performance in combining weak classifiers into strong classifiers. However, AdaBoost tends to overfit to the noisy data in many applications. Accordingly, improving the antinoise ability of AdaBoost plays an important role in many applications. The sensitiveness to the noisy data of AdaBoost stems from the exponential loss function, which puts unrestricted penalties to the misclassified samples with very large margins. In this paper, we propose two boosting algorithms, referred to as RBoost1 and RBoost2, which are more robust to the noisy data compared with AdaBoost. RBoost1 and RBoost2 optimize a nonconvex loss function of the classification margin. Because the penalties to the misclassified samples are restricted to an amount less than one, RBoost1 and RBoost2 do not overfocus on the samples that are always misclassified by the previous base learners. Besides the loss function, at each boosting iteration, RBoost1 and RBoost2 use numerically stable ways to compute the base learners. These two improvements contribute to the robustness of the proposed algorithms to the noisy training and testing samples. Experimental results on the synthetic Gaussian data set, the UCI data sets, and a real malware behavior data set illustrate that the proposed RBoost1 and RBoost2 algorithms perform better when the training data sets contain noisy data.
A clustering algorithm for sample data based on environmental pollution characteristics

NASA Astrophysics Data System (ADS)

Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun

2015-04-01

Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
Sample classification for improved performance of PLS models applied to the quality control of deep-frying oils of different botanic origins analyzed using ATR-FTIR spectroscopy.

PubMed

Kuligowski, Julia; Carrión, David; Quintás, Guillermo; Garrigues, Salvador; de la Guardia, Miguel

2011-01-01

The selection of an appropriate calibration set is a critical step in multivariate method development. In this work, the effect of using different calibration sets, based on a previous classification of unknown samples, on the partial least squares (PLS) regression model performance has been discussed. As an example, attenuated total reflection (ATR) mid-infrared spectra of deep-fried vegetable oil samples from three botanical origins (olive, sunflower, and corn oil), with increasing polymerized triacylglyceride (PTG) content induced by a deep-frying process were employed. The use of a one-class-classifier partial least squares-discriminant analysis (PLS-DA) and a rooted binary directed acyclic graph tree provided accurate oil classification. Oil samples fried without foodstuff could be classified correctly, independent of their PTG content. However, class separation of oil samples fried with foodstuff, was less evident. The combined use of double-cross model validation with permutation testing was used to validate the obtained PLS-DA classification models, confirming the results. To discuss the usefulness of the selection of an appropriate PLS calibration set, the PTG content was determined by calculating a PLS model based on the previously selected classes. In comparison to a PLS model calculated using a pooled calibration set containing samples from all classes, the root mean square error of prediction could be improved significantly using PLS models based on the selected calibration sets using PLS-DA, ranging between 1.06 and 2.91% (w/w).
Neurons from the adult human dentate nucleus: neural networks in the neuron classification.

PubMed

Grbatinić, Ivan; Marić, Dušica L; Milošević, Nebojša T

2015-04-07

Topological (central vs. border neuron type) and morphological classification of adult human dentate nucleus neurons according to their quantified histomorphological properties using neural networks on real and virtual neuron samples. In the real sample 53.1% and 14.1% of central and border neurons, respectively, are classified correctly with total of 32.8% of misclassified neurons. The most important result present 62.2% of misclassified neurons in border neurons group which is even greater than number of correctly classified neurons (37.8%) in that group, showing obvious failure of network to classify neurons correctly based on computational parameters used in our study. On the virtual sample 97.3% of misclassified neurons in border neurons group which is much greater than number of correctly classified neurons (2.7%) in that group, again confirms obvious failure of network to classify neurons correctly. Statistical analysis shows that there is no statistically significant difference in between central and border neurons for each measured parameter (p>0.05). Total of 96.74% neurons are morphologically classified correctly by neural networks and each one belongs to one of the four histomorphological types: (a) neurons with small soma and short dendrites, (b) neurons with small soma and long dendrites, (c) neuron with large soma and short dendrites, (d) neurons with large soma and long dendrites. Statistical analysis supports these results (p<0.05). Human dentate nucleus neurons can be classified in four neuron types according to their quantitative histomorphological properties. These neuron types consist of two neuron sets, small and large ones with respect to their perykarions with subtypes differing in dendrite length i.e. neurons with short vs. long dendrites. Besides confirmation of neuron classification on small and large ones, already shown in literature, we found two new subtypes i.e. neurons with small soma and long dendrites and with large soma and short dendrites. These neurons are most probably equally distributed throughout the dentate nucleus as no significant difference in their topological distribution is observed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Beyond the swab: ecosystem sampling to understand the persistence of an amphibian pathogen.

PubMed

Mosher, Brittany A; Huyvaert, Kathryn P; Bailey, Larissa L

2018-06-02

Understanding the ecosystem-level persistence of pathogens is essential for predicting and measuring host-pathogen dynamics. However, this process is often masked, in part due to a reliance on host-based pathogen detection methods. The amphibian pathogens Batrachochytrium dendrobatidis (Bd) and B. salamandrivorans (Bsal) are pathogens of global conservation concern. Despite having free-living life stages, little is known about the distribution and persistence of these pathogens outside of their amphibian hosts. We combine historic amphibian monitoring data with contemporary host- and environment-based pathogen detection data to obtain estimates of Bd occurrence independent of amphibian host distributions. We also evaluate differences in filter- and swab-based detection probability and assess inferential differences arising from using different decision criteria used to classify samples as positive or negative. Water filtration-based detection probabilities were lower than those from swabs but were > 10%, and swab-based detection probabilities varied seasonally, declining in the early fall. The decision criterion used to classify samples as positive or negative was important; using a more liberal criterion yielded higher estimates of Bd occurrence than when a conservative criterion was used. Different covariates were important when using the liberal or conservative criterion in modeling Bd detection. We found evidence of long-term Bd persistence for several years after an amphibian host species of conservation concern, the boreal toad (Anaxyrus boreas boreas), was last detected. Our work provides evidence of long-term Bd persistence in the ecosystem, and underscores the importance of environmental samples for understanding and mitigating disease-related threats to amphibian biodiversity.
Discovery and validation of gene classifiers for endocrine-disrupting chemicals in zebrafish (danio rerio)

PubMed Central

2012-01-01

Background Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of biomedical sciences. Many such classifiers discovered thus far lack vigorous statistical and experimental validations. A combination of genetic algorithm/support vector machines and genetic algorithm/K nearest neighbors was used in this study to search for classifiers of endocrine-disrupting chemicals (EDCs) in zebrafish. Searches were conducted on both tissue-specific and tissue-combined datasets, either across the entire transcriptome or within individual transcription factor (TF) networks previously linked to EDC effects. Candidate classifiers were evaluated by gene set enrichment analysis (GSEA) on both the original training data and a dedicated validation dataset. Results Multi-tissue dataset yielded no classifiers. Among the 19 chemical-tissue conditions evaluated, the transcriptome-wide searches yielded classifiers for six of them, each having approximately 20 to 30 gene features unique to a condition. Searches within individual TF networks produced classifiers for 15 chemical-tissue conditions, each containing 100 or fewer top-ranked gene features pooled from those of multiple TF networks and also unique to each condition. For the training dataset, 10 out of 11 classifiers successfully identified the gene expression profiles (GEPs) of their targeted chemical-tissue conditions by GSEA. For the validation dataset, classifiers for prochloraz-ovary and flutamide-ovary also correctly identified the GEPs of corresponding conditions while no classifier could predict the GEP from prochloraz-brain. Conclusions The discrepancies in the performance of these classifiers were attributed in part to varying data complexity among the conditions, as measured to some degree by Fisher’s discriminant ratio statistic. This variation in data complexity could likely be compensated by adjusting sample size for individual chemical-tissue conditions, thus suggesting a need for a preliminary survey of transcriptomic responses before launching a full scale classifier discovery effort. Classifier discovery based on individual TF networks could yield more mechanistically-oriented biomarkers. GSEA proved to be a flexible and effective tool for application of gene classifiers but a similar and more refined algorithm, connectivity mapping, should also be explored. The distribution characteristics of classifiers across tissues, chemicals, and TF networks suggested a differential biological impact among the EDCs on zebrafish transcriptome involving some basic cellular functions. PMID:22849515
Target recognition of ladar range images using even-order Zernike moments.

PubMed

Liu, Zheng-Jun; Li, Qi; Xia, Zhi-Wei; Wang, Qi

2012-11-01

Ladar range images have attracted considerable attention in automatic target recognition fields. In this paper, Zernike moments (ZMs) are applied to classify the target of the range image from an arbitrary azimuth angle. However, ZMs suffer from high computational costs. To improve the performance of target recognition based on small samples, even-order ZMs with serial-parallel backpropagation neural networks (BPNNs) are applied to recognize the target of the range image. It is found that the rotation invariance and classified performance of the even-order ZMs are both better than for odd-order moments and for moments compressed by principal component analysis. The experimental results demonstrate that combining the even-order ZMs with serial-parallel BPNNs can significantly improve the recognition rate for small samples.
Breast tissue classification in digital tomosynthesis images based on global gradient minimization and texture features

NASA Astrophysics Data System (ADS)

Qin, Xulei; Lu, Guolan; Sechopoulos, Ioannis; Fei, Baowei

2014-03-01

Digital breast tomosynthesis (DBT) is a pseudo-three-dimensional x-ray imaging modality proposed to decrease the effect of tissue superposition present in mammography, potentially resulting in an increase in clinical performance for the detection and diagnosis of breast cancer. Tissue classification in DBT images can be useful in risk assessment, computer-aided detection and radiation dosimetry, among other aspects. However, classifying breast tissue in DBT is a challenging problem because DBT images include complicated structures, image noise, and out-of-plane artifacts due to limited angular tomographic sampling. In this project, we propose an automatic method to classify fatty and glandular tissue in DBT images. First, the DBT images are pre-processed to enhance the tissue structures and to decrease image noise and artifacts. Second, a global smooth filter based on L0 gradient minimization is applied to eliminate detailed structures and enhance large-scale ones. Third, the similar structure regions are extracted and labeled by fuzzy C-means (FCM) classification. At the same time, the texture features are also calculated. Finally, each region is classified into different tissue types based on both intensity and texture features. The proposed method is validated using five patient DBT images using manual segmentation as the gold standard. The Dice scores and the confusion matrix are utilized to evaluate the classified results. The evaluation results demonstrated the feasibility of the proposed method for classifying breast glandular and fat tissue on DBT images.
Noninvasive Dissection of Mouse Sleep Using a Piezoelectric Motion Sensor

PubMed Central

Yaghouby, Farid; Donohue, Kevin D.; O’Hara, Bruce F.; Sunderam, Sridhar

2015-01-01

Background Changes in autonomic control cause regular breathing during NREM sleep to fluctuate during REM. Piezoelectric cage-floor sensors have been used to successfully discriminate sleep and wake states in mice based on signal features related to respiration and other movements. This study presents a classifier for noninvasively classifying REM and NREM using a piezoelectric sensor. New Method Vigilance state was scored manually in 4-second epochs for 24-hour EEG/EMG recordings in twenty mice. An unsupervised classifier clustered piezoelectric signal features quantifying movement and respiration into three states: one active; and two inactive with regular and irregular breathing respectively. These states were hypothesized to correspond to Wake, NREM, and REM respectively. States predicted by the classifier were compared against manual EEG/EMG scores to test this hypothesis. Results Using only piezoelectric signal features, an unsupervised classifier distinguished Wake with high (89% sensitivity, 96% specificity) and REM with moderate (73% sensitivity, 75% specificity) accuracy, but NREM with poor sensitivity (51%) and high specificity (96%). The classifier sometimes confused light NREM sleep—characterized by irregular breathing and moderate delta EEG power—with REM. A supervised classifier improved sensitivities to 90, 81, and 67% and all specificities to over 90% for Wake, NREM, and REM respectively. Comparison with Existing Methods Unlike most actigraphic techniques, which only differentiate sleep from wake, the proposed piezoelectric method further dissects sleep based on breathing regularity into states strongly correlated with REM and NREM. Conclusions This approach could facilitate large-sample screening for genes influencing different sleep traits, besides drug studies or other manipulations. PMID:26582569
EurEAs_Gplex--A new SNaPshot assay for continental population discrimination and gender identification.

PubMed

Daca-Roszak, P; Pfeifer, A; Żebracka-Gala, J; Jarząb, B; Witt, M; Ziętkiewicz, E

2016-01-01

Assays that allow analysis of the biogeographic origin of biological samples in a standard forensic laboratory have to target a small number of highly differentiating markers. Such markers should be easy to multiplex and the assay must perform well in the degraded and scarce biological material. SNPs localized in the genome regions, which in the past were subjected to differential selective pressure in various populations, are the most widely used markers in the studies of biogeographic affiliation. SNPs reflecting biogeographic differences not related to any phenotypic traits are not sufficiently explored. The goal of our study was to identify a small set of SNPs not related to any known pigmentation/phenotype-specific genes, which would allow efficient discrimination between populations of Europe and East Asia. The selection of SNPs was based on the comparative analysis of representative European and Chinese/Japanese samples (B-lymphocyte cell lines), genotyped using the Infinium HumanOmniExpressExome microarray (Illumina). The classifier, consisting of 24 unlinked SNPs (24-SNP classifier), was selected. The performance of a 14-SNP subset of this classifier (14-SNP subclassifier) was tested using genotype data from several populations. The 14-SNP subclassifier differentiated East Asians, Europeans and Africans with ∼100% accuracy; Palestinians, representative of the Middle East, clustered with Europeans, while Amerindians and Pakistani were placed between East Asian and European populations. Based on these results, we have developed a SNaPshot assay (EurEAs_Gplex) for genotyping SNPs from the 14-SNP subclassifier, combined with an additional marker for gender identification. Forensic utility of the EurEAs_Gplex was verified using degraded and low quantity DNA samples. The performance of the EurEAs_Gplex was satisfactory when using degraded DNA; tests using low quantity DNA samples revealed a previously not described source of genotyping errors, potentially important for any SNaPshot-based assays. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Efficient method of image edge detection based on FSVM

NASA Astrophysics Data System (ADS)

Cai, Aiping; Xiong, Xiaomei

2013-07-01

For efficient object cover edge detection in digital images, this paper studied traditional methods and algorithm based on SVM. It analyzed Canny edge detection algorithm existed some pseudo-edge and poor anti-noise capability. In order to provide a reliable edge extraction method, propose a new detection algorithm based on FSVM. Which contains several steps: first, trains classify sample and gives the different membership function to different samples. Then, a new training sample is formed by increase the punishment some wrong sub-sample, and use the new FSVM classification model for train and test them. Finally the edges are extracted of the object image by using the model. Experimental result shows that good edge detection image will be obtained and adding noise experiments results show that this method has good anti-noise.
A Hybrid Sensing Approach for Pure and Adulterated Honey Classification

PubMed Central

Subari, Norazian; Saleh, Junita Mohamad; Shakaff, Ali Yeon Md; Zakaria, Ammar

2012-01-01

This paper presents a comparison between data from single modality and fusion methods to classify Tualang honey as pure or adulterated using Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) statistical classification approaches. Ten different brands of certified pure Tualang honey were obtained throughout peninsular Malaysia and Sumatera, Indonesia. Various concentrations of two types of sugar solution (beet and cane sugar) were used in this investigation to create honey samples of 20%, 40%, 60% and 80% adulteration concentrations. Honey data extracted from an electronic nose (e-nose) and Fourier Transform Infrared Spectroscopy (FTIR) were gathered, analyzed and compared based on fusion methods. Visual observation of classification plots revealed that the PCA approach able to distinct pure and adulterated honey samples better than the LDA technique. Overall, the validated classification results based on FTIR data (88.0%) gave higher classification accuracy than e-nose data (76.5%) using the LDA technique. Honey classification based on normalized low-level and intermediate-level FTIR and e-nose fusion data scored classification accuracies of 92.2% and 88.7%, respectively using the Stepwise LDA method. The results suggested that pure and adulterated honey samples were better classified using FTIR and e-nose fusion data than single modality data. PMID:23202033
Method of Menu Selection by Gaze Movement Using AC EOG Signals

NASA Astrophysics Data System (ADS)

Kanoh, Shin'ichiro; Futami, Ryoko; Yoshinobu, Tatsuo; Hoshimiya, Nozomu

A method to detect the direction and the distance of voluntary eye gaze movement from EOG (electrooculogram) signals was proposed and tested. In this method, AC-amplified vertical and horizontal transient EOG signals were classified into 8-class directions and 2-class distances of voluntary eye gaze movements. A horizontal and a vertical EOGs during eye gaze movement at each sampling time were treated as a two-dimensional vector, and the center of gravity of the sample vectors whose norms were more than 80% of the maximum norm was used as a feature vector to be classified. By the classification using the k-nearest neighbor algorithm, it was shown that the averaged correct detection rates on each subject were 98.9%, 98.7%, 94.4%, respectively. This method can avoid strict EOG-based eye tracking which requires DC amplification of very small signal. It would be useful to develop robust human interfacing systems based on menu selection for severely paralyzed patients.
Pollen analysis of natural honeys from the central region of Shanxi, North China.

PubMed

Song, Xiao-Yan; Yao, Yi-Feng; Yang, Wu-De

2012-01-01

Based on qualitative and quantitative melissopalynological analyses, 19 Chinese honeys were classified by botanical origin to determine their floral sources. The honey samples were collected during 2010-2011 from the central region of Shanxi Province, North China. A diverse spectrum of 61 pollen types from 37 families was identified. Fourteen samples were classified as unifloral, whereas the remaining samples were multifloral. Bee-favoured families (occurring in more than 50% of the samples) included Caprifoliaceae (found in 10 samples), Laminaceae (10), Brassicaceae (12), Rosaceae (12), Moraceae (13), Rhamnaceae (15), Asteraceae (17), and Fabaceae (19). In the unifloral honeys, the predominant pollen types were Ziziphus jujuba (in 5 samples), Robinia pseudoacacia (3), Vitex negundo var. heterophylla (2), Sophora japonica (1), Ailanthus altissima (1), Asteraceae type (1), and Fabaceae type (1). The absolute pollen count (i.e., the number of pollen grains per 10 g honey sample) suggested that 13 samples belonged to Group I (<20,000 pollen grains), 4 to Group II (20,000-100,000), and 2 to Group III (100,000-500,000). The dominance of unifloral honeys without toxic pollen grains and the low value of the HDE/P ratio (i.e., honey dew elements/pollen grains from nectariferous plants) indicated that the honey samples are of good quality and suitable for human consumption.

Information theoretic partitioning and confidence based weight assignment for multi-classifier decision level fusion in hyperspectral target recognition applications

NASA Astrophysics Data System (ADS)

Prasad, S.; Bruce, L. M.

2007-04-01

There is a growing interest in using multiple sources for automatic target recognition (ATR) applications. One approach is to take multiple, independent observations of a phenomenon and perform a feature level or a decision level fusion for ATR. This paper proposes a method to utilize these types of multi-source fusion techniques to exploit hyperspectral data when only a small number of training pixels are available. Conventional hyperspectral image based ATR techniques project the high dimensional reflectance signature onto a lower dimensional subspace using techniques such as Principal Components Analysis (PCA), Fisher's linear discriminant analysis (LDA), subspace LDA and stepwise LDA. While some of these techniques attempt to solve the curse of dimensionality, or small sample size problem, these are not necessarily optimal projections. In this paper, we present a divide and conquer approach to address the small sample size problem. The hyperspectral space is partitioned into contiguous subspaces such that the discriminative information within each subspace is maximized, and the statistical dependence between subspaces is minimized. We then treat each subspace as a separate source in a multi-source multi-classifier setup and test various decision fusion schemes to determine their efficacy. Unlike previous approaches which use correlation between variables for band grouping, we study the efficacy of higher order statistical information (using average mutual information) for a bottom up band grouping. We also propose a confidence measure based decision fusion technique, where the weights associated with various classifiers are based on their confidence in recognizing the training data. To this end, training accuracies of all classifiers are used for weight assignment in the fusion process of test pixels. The proposed methods are tested using hyperspectral data with known ground truth, such that the efficacy can be quantitatively measured in terms of target recognition accuracies.
VizieR Online Data Catalog: Fermi/non-Fermi blazars jet power and accretion (Chen+, 2015)

NASA Astrophysics Data System (ADS)

Chen, Y. Y.; Zhang, X.; Zhang, H. J.; Yu, X. L.

2017-11-01

We selected the sample using radio catalogues to get the widest possible sample of blazars based on their radio properties. We split them into Fermi-detected sources and non-Fermi detections. Massaro et al. (2009, J/A+A/495/691) created the "Multifrequency Catalogue of Blazars" (Roma-BZCAT), which classifies blazars into three main groups based on their spectral properties. In total, we have a sample containing 177 clean Fermi blazars (96 Fermi FSRQs and 81 Fermi BL Lacs) and 133 non-Fermi blazars (105 non-Fermi FSRQs and 28 non-Fermi BL Lacs). (2 data files).
A systematic comparison of different object-based classification techniques using high spatial resolution imagery in agricultural environments

NASA Astrophysics Data System (ADS)

Li, Manchun; Ma, Lei; Blaschke, Thomas; Cheng, Liang; Tiede, Dirk

2016-07-01

Geographic Object-Based Image Analysis (GEOBIA) is becoming more prevalent in remote sensing classification, especially for high-resolution imagery. Many supervised classification approaches are applied to objects rather than pixels, and several studies have been conducted to evaluate the performance of such supervised classification techniques in GEOBIA. However, these studies did not systematically investigate all relevant factors affecting the classification (segmentation scale, training set size, feature selection and mixed objects). In this study, statistical methods and visual inspection were used to compare these factors systematically in two agricultural case studies in China. The results indicate that Random Forest (RF) and Support Vector Machines (SVM) are highly suitable for GEOBIA classifications in agricultural areas and confirm the expected general tendency, namely that the overall accuracies decline with increasing segmentation scale. All other investigated methods except for RF and SVM are more prone to obtain a lower accuracy due to the broken objects at fine scales. In contrast to some previous studies, the RF classifiers yielded the best results and the k-nearest neighbor classifier were the worst results, in most cases. Likewise, the RF and Decision Tree classifiers are the most robust with or without feature selection. The results of training sample analyses indicated that the RF and adaboost. M1 possess a superior generalization capability, except when dealing with small training sample sizes. Furthermore, the classification accuracies were directly related to the homogeneity/heterogeneity of the segmented objects for all classifiers. Finally, it was suggested that RF should be considered in most cases for agricultural mapping.
The edge-preservation multi-classifier relearning framework for the classification of high-resolution remotely sensed imagery

NASA Astrophysics Data System (ADS)

Han, Xiaopeng; Huang, Xin; Li, Jiayi; Li, Yansheng; Yang, Michael Ying; Gong, Jianya

2018-04-01

In recent years, the availability of high-resolution imagery has enabled more detailed observation of the Earth. However, it is imperative to simultaneously achieve accurate interpretation and preserve the spatial details for the classification of such high-resolution data. To this aim, we propose the edge-preservation multi-classifier relearning framework (EMRF). This multi-classifier framework is made up of support vector machine (SVM), random forest (RF), and sparse multinomial logistic regression via variable splitting and augmented Lagrangian (LORSAL) classifiers, considering their complementary characteristics. To better characterize complex scenes of remote sensing images, relearning based on landscape metrics is proposed, which iteratively quantizes both the landscape composition and spatial configuration by the use of the initial classification results. In addition, a novel tri-training strategy is proposed to solve the over-smoothing effect of relearning by means of automatic selection of training samples with low classification certainties, which always distribute in or near the edge areas. Finally, EMRF flexibly combines the strengths of relearning and tri-training via the classification certainties calculated by the probabilistic output of the respective classifiers. It should be noted that, in order to achieve an unbiased evaluation, we assessed the classification accuracy of the proposed framework using both edge and non-edge test samples. The experimental results obtained with four multispectral high-resolution images confirm the efficacy of the proposed framework, in terms of both edge and non-edge accuracy.
Issues and considerations in the use of serologic biomarkers for classifying vaccination history in household surveys.

PubMed

MacNeil, Adam; Lee, Chung-Won; Dietz, Vance

2014-09-03

Accurate estimates of vaccination coverage are crucial for assessing routine immunization program performance. Community based household surveys are frequently used to assess coverage within a country. In household surveys to assess routine immunization coverage, a child's vaccination history is classified on the basis of observation of the immunization card, parental recall of receipt of vaccination, or both; each of these methods has been shown to commonly be inaccurate. The use of serologic data as a biomarker of vaccination history is a potential additional approach to improve accuracy in classifying vaccination history. However, potential challenges, including the accuracy of serologic methods in classifying vaccination history, varying vaccine types and dosing schedules, and logistical and financial implications must be considered. We provide historic and scientific context for the potential use of serologic data to assess vaccination history and discuss in detail key areas of importance for consideration in the context of using serologic data for classifying vaccination history in household surveys. Further studies are needed to directly evaluate the performance of serologic data compared with use of immunization cards or parental recall for classification of vaccination history in household surveys, as well assess the impact of age at the time of sample collection on serologic titers, the predictive value of serology to identify a fully vaccinated child for multi-dose vaccines, and the cost impact and logistical issues on outcomes associated with different types of biological samples for serologic testing. Published by Elsevier Ltd.
Researches of fruit quality prediction model based on near infrared spectrum

NASA Astrophysics Data System (ADS)

Shen, Yulin; Li, Lian

2018-04-01

With the improvement in standards for food quality and safety, people pay more attention to the internal quality of fruits, therefore the measurement of fruit internal quality is increasingly imperative. In general, nondestructive soluble solid content (SSC) and total acid content (TAC) analysis of fruits is vital and effective for quality measurement in global fresh produce markets, so in this paper, we aim at establishing a novel fruit internal quality prediction model based on SSC and TAC for Near Infrared Spectrum. Firstly, the model of fruit quality prediction based on PCA + BP neural network, PCA + GRNN network, PCA + BP adaboost strong classifier, PCA + ELM and PCA + LS_SVM classifier are designed and implemented respectively; then, in the NSCT domain, the median filter and the SavitzkyGolay filter are used to preprocess the spectral signal, Kennard-Stone algorithm is used to automatically select the training samples and test samples; thirdly, we achieve the optimal models by comparing 15 kinds of prediction model based on the theory of multi-classifier competition mechanism, specifically, the non-parametric estimation is introduced to measure the effectiveness of proposed model, the reliability and variance of nonparametric estimation evaluation of each prediction model to evaluate the prediction result, while the estimated value and confidence interval regard as a reference, the experimental results demonstrate that this model can better achieve the optimal evaluation of the internal quality of fruit; finally, we employ cat swarm optimization to optimize two optimal models above obtained from nonparametric estimation, empirical testing indicates that the proposed method can provide more accurate and effective results than other forecasting methods.
A Classification Method for Seed Viability Assessment with Infrared Thermography.

PubMed

Men, Sen; Yan, Lei; Liu, Jiaxin; Qian, Hua; Luo, Qinjuan

2017-04-12

This paper presents a viability assessment method for Pisum sativum L. seeds based on the infrared thermography technique. In this work, different artificial treatments were conducted to prepare seeds samples with different viability. Thermal images and visible images were recorded every five minutes during the standard five day germination test. After the test, the root length of each sample was measured, which can be used as the viability index of that seed. Each individual seed area in the visible images was segmented with an edge detection method, and the average temperature of the corresponding area in the infrared images was calculated as the representative temperature for this seed at that time. The temperature curve of each seed during germination was plotted. Thirteen characteristic parameters extracted from the temperature curve were analyzed to show the difference of the temperature fluctuations between the seeds samples with different viability. With above parameters, support vector machine (SVM) was used to classify the seed samples into three categories: viable, aged and dead according to the root length, the classification accuracy rate was 95%. On this basis, with the temperature data of only the first three hours during the germination, another SVM model was proposed to classify the seed samples, and the accuracy rate was about 91.67%. From these experimental results, it can be seen that infrared thermography can be applied for the prediction of seed viability, based on the SVM algorithm.
Comparison of Different Features and Classifiers for Driver Fatigue Detection Based on a Single EEG Channel

PubMed Central

2017-01-01

Driver fatigue has become an important factor to traffic accidents worldwide, and effective detection of driver fatigue has major significance for public health. The purpose method employs entropy measures for feature extraction from a single electroencephalogram (EEG) channel. Four types of entropies measures, sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE), and spectral entropy (PE), were deployed for the analysis of original EEG signal and compared by ten state-of-the-art classifiers. Results indicate that optimal performance of single channel is achieved using a combination of channel CP4, feature FE, and classifier Random Forest (RF). The highest accuracy can be up to 96.6%, which has been able to meet the needs of real applications. The best combination of channel + features + classifier is subject-specific. In this work, the accuracy of FE as the feature is far greater than the Acc of other features. The accuracy using classifier RF is the best, while that of classifier SVM with linear kernel is the worst. The impact of channel selection on the Acc is larger. The performance of various channels is very different. PMID:28255330
Comparison of Different Features and Classifiers for Driver Fatigue Detection Based on a Single EEG Channel.

PubMed

Hu, Jianfeng

2017-01-01

Driver fatigue has become an important factor to traffic accidents worldwide, and effective detection of driver fatigue has major significance for public health. The purpose method employs entropy measures for feature extraction from a single electroencephalogram (EEG) channel. Four types of entropies measures, sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE), and spectral entropy (PE), were deployed for the analysis of original EEG signal and compared by ten state-of-the-art classifiers. Results indicate that optimal performance of single channel is achieved using a combination of channel CP4, feature FE, and classifier Random Forest (RF). The highest accuracy can be up to 96.6%, which has been able to meet the needs of real applications. The best combination of channel + features + classifier is subject-specific. In this work, the accuracy of FE as the feature is far greater than the Acc of other features. The accuracy using classifier RF is the best, while that of classifier SVM with linear kernel is the worst. The impact of channel selection on the Acc is larger. The performance of various channels is very different.
Multimodal manifold-regularized transfer learning for MCI conversion prediction.

PubMed

Cheng, Bo; Liu, Mingxia; Suk, Heung-Il; Shen, Dinggang; Zhang, Daoqiang

2015-12-01

As the early stage of Alzheimer's disease (AD), mild cognitive impairment (MCI) has high chance to convert to AD. Effective prediction of such conversion from MCI to AD is of great importance for early diagnosis of AD and also for evaluating AD risk pre-symptomatically. Unlike most previous methods that used only the samples from a target domain to train a classifier, in this paper, we propose a novel multimodal manifold-regularized transfer learning (M2TL) method that jointly utilizes samples from another domain (e.g., AD vs. normal controls (NC)) as well as unlabeled samples to boost the performance of the MCI conversion prediction. Specifically, the proposed M2TL method includes two key components. The first one is a kernel-based maximum mean discrepancy criterion, which helps eliminate the potential negative effect induced by the distributional difference between the auxiliary domain (i.e., AD and NC) and the target domain (i.e., MCI converters (MCI-C) and MCI non-converters (MCI-NC)). The second one is a semi-supervised multimodal manifold-regularized least squares classification method, where the target-domain samples, the auxiliary-domain samples, and the unlabeled samples can be jointly used for training our classifier. Furthermore, with the integration of a group sparsity constraint into our objective function, the proposed M2TL has a capability of selecting the informative samples to build a robust classifier. Experimental results on the Alzheimer's Disease Neuroimaging Initiative (ADNI) database validate the effectiveness of the proposed method by significantly improving the classification accuracy of 80.1 % for MCI conversion prediction, and also outperforming the state-of-the-art methods.
Identification of Anisomerous Motor Imagery EEG Signals Based on Complex Algorithms

PubMed Central

Zhang, Zhiwen; Duan, Feng; Zhou, Xin; Meng, Zixuan

2017-01-01

Motor imagery (MI) electroencephalograph (EEG) signals are widely applied in brain-computer interface (BCI). However, classified MI states are limited, and their classification accuracy rates are low because of the characteristics of nonlinearity and nonstationarity. This study proposes a novel MI pattern recognition system that is based on complex algorithms for classifying MI EEG signals. In electrooculogram (EOG) artifact preprocessing, band-pass filtering is performed to obtain the frequency band of MI-related signals, and then, canonical correlation analysis (CCA) combined with wavelet threshold denoising (WTD) is used for EOG artifact preprocessing. We propose a regularized common spatial pattern (R-CSP) algorithm for EEG feature extraction by incorporating the principle of generic learning. A new classifier combining the K-nearest neighbor (KNN) and support vector machine (SVM) approaches is used to classify four anisomerous states, namely, imaginary movements with the left hand, right foot, and right shoulder and the resting state. The highest classification accuracy rate is 92.5%, and the average classification accuracy rate is 87%. The proposed complex algorithm identification method can significantly improve the identification rate of the minority samples and the overall classification performance. PMID:28874909
Classifying MCI Subtypes in Community-Dwelling Elderly Using Cross-Sectional and Longitudinal MRI-Based Biomarkers

PubMed Central

Guan, Hao; Liu, Tao; Jiang, Jiyang; Tao, Dacheng; Zhang, Jicong; Niu, Haijun; Zhu, Wanlin; Wang, Yilong; Cheng, Jian; Kochan, Nicole A.; Brodaty, Henry; Sachdev, Perminder; Wen, Wei

2017-01-01

Amnestic MCI (aMCI) and non-amnestic MCI (naMCI) are considered to differ in etiology and outcome. Accurately classifying MCI into meaningful subtypes would enable early intervention with targeted treatment. In this study, we employed structural magnetic resonance imaging (MRI) for MCI subtype classification. This was carried out in a sample of 184 community-dwelling individuals (aged 73–85 years). Cortical surface based measurements were computed from longitudinal and cross-sectional scans. By introducing a feature selection algorithm, we identified a set of discriminative features, and further investigated the temporal patterns of these features. A voting classifier was trained and evaluated via 10 iterations of cross-validation. The best classification accuracies achieved were: 77% (naMCI vs. aMCI), 81% (aMCI vs. cognitively normal (CN)) and 70% (naMCI vs. CN). The best results for differentiating aMCI from naMCI were achieved with baseline features. Hippocampus, amygdala and frontal pole were found to be most discriminative for classifying MCI subtypes. Additionally, we observed the dynamics of classification of several MRI biomarkers. Learning the dynamics of atrophy may aid in the development of better biomarkers, as it may track the progression of cognitive impairment. PMID:29085292
Use of Unlabeled Samples for Mitigating the Hughes Phenomenon

NASA Technical Reports Server (NTRS)

Landgrebe, David A.; Shahshahani, Behzad M.

1993-01-01

The use of unlabeled samples in improving the performance of classifiers is studied. When the number of training samples is fixed and small, additional feature measurements may reduce the performance of a statistical classifier. It is shown that by using unlabeled samples, estimates of the parameters can be improved and therefore this phenomenon may be mitigated. Various methods for using unlabeled samples are reviewed and experimental results are provided.
Anomaly and signature filtering improve classifier performance for detection of suspicious access to EHRs.

PubMed

Kim, Jihoon; Grillo, Janice M; Boxwala, Aziz A; Jiang, Xiaoqian; Mandelbaum, Rose B; Patel, Bhakti A; Mikels, Debra; Vinterbo, Staal A; Ohno-Machado, Lucila

2011-01-01

Our objective is to facilitate semi-automated detection of suspicious access to EHRs. Previously we have shown that a machine learning method can play a role in identifying potentially inappropriate access to EHRs. However, the problem of sampling informative instances to build a classifier still remained. We developed an integrated filtering method leveraging both anomaly detection based on symbolic clustering and signature detection, a rule-based technique. We applied the integrated filtering to 25.5 million access records in an intervention arm, and compared this with 8.6 million access records in a control arm where no filtering was applied. On the training set with cross-validation, the AUC was 0.960 in the control arm and 0.998 in the intervention arm. The difference in false negative rates on the independent test set was significant, P=1.6×10(-6). Our study suggests that utilization of integrated filtering strategies to facilitate the construction of classifiers can be helpful.
Anomaly and Signature Filtering Improve Classifier Performance For Detection Of Suspicious Access To EHRs

PubMed Central

Kim, Jihoon; Grillo, Janice M; Boxwala, Aziz A; Jiang, Xiaoqian; Mandelbaum, Rose B; Patel, Bhakti A; Mikels, Debra; Vinterbo, Staal A; Ohno-Machado, Lucila

2011-01-01

Our objective is to facilitate semi-automated detection of suspicious access to EHRs. Previously we have shown that a machine learning method can play a role in identifying potentially inappropriate access to EHRs. However, the problem of sampling informative instances to build a classifier still remained. We developed an integrated filtering method leveraging both anomaly detection based on symbolic clustering and signature detection, a rule-based technique. We applied the integrated filtering to 25.5 million access records in an intervention arm, and compared this with 8.6 million access records in a control arm where no filtering was applied. On the training set with cross-validation, the AUC was 0.960 in the control arm and 0.998 in the intervention arm. The difference in false negative rates on the independent test set was significant, P=1.6×10−6. Our study suggests that utilization of integrated filtering strategies to facilitate the construction of classifiers can be helpful. PMID:22195129
Naive scoring of human sleep based on a hidden Markov model of the electroencephalogram.

PubMed

Yaghouby, Farid; Modur, Pradeep; Sunderam, Sridhar

2014-01-01

Clinical sleep scoring involves tedious visual review of overnight polysomnograms by a human expert. Many attempts have been made to automate the process by training computer algorithms such as support vector machines and hidden Markov models (HMMs) to replicate human scoring. Such supervised classifiers are typically trained on scored data and then validated on scored out-of-sample data. Here we describe a methodology based on HMMs for scoring an overnight sleep recording without the benefit of a trained initial model. The number of states in the data is not known a priori and is optimized using a Bayes information criterion. When tested on a 22-subject database, this unsupervised classifier agreed well with human scores (mean of Cohen's kappa > 0.7). The HMM also outperformed other unsupervised classifiers (Gaussian mixture models, k-means, and linkage trees), that are capable of naive classification but do not model dynamics, by a significant margin (p < 0.05).
Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences.

PubMed

ElGokhy, Sherin M; ElHefnawi, Mahmoud; Shoukry, Amin

2014-05-06

MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons. The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f-measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index.The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred.The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs. The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.
Movement activity based classification of animal behaviour with an application to data from cheetah (Acinonyx jubatus).

PubMed

Grünewälder, Steffen; Broekhuis, Femke; Macdonald, David Whyte; Wilson, Alan Martin; McNutt, John Weldon; Shawe-Taylor, John; Hailes, Stephen

2012-01-01

We propose a new method, based on machine learning techniques, for the analysis of a combination of continuous data from dataloggers and a sampling of contemporaneous behaviour observations. This data combination provides an opportunity for biologists to study behaviour at a previously unknown level of detail and accuracy; however, continuously recorded data are of little use unless the resulting large volumes of raw data can be reliably translated into actual behaviour. We address this problem by applying a Support Vector Machine and a Hidden-Markov Model that allows us to classify an animal's behaviour using a small set of field observations to calibrate continuously recorded activity data. Such classified data can be applied quantitatively to the behaviour of animals over extended periods and at times during which observation is difficult or impossible. We demonstrate the usefulness of the method by applying it to data from six cheetah (Acinonyx jubatus) in the Okavango Delta, Botswana. Cumulative activity data scores were recorded every five minutes by accelerometers embedded in GPS radio-collars for around one year on average. Direct behaviour sampling of each of the six cheetah were collected in the field for comparatively short periods. Using this approach we are able to classify each five minute activity score into a set of three key behaviour (feeding, mobile and stationary), creating a continuous behavioural sequence for the entire period for which the collars were deployed. Evaluation of our classifier with cross-validation shows the accuracy to be 83%-94%, but that the accuracy for individual classes is reduced with decreasing sample size of direct observations. We demonstrate how these processed data can be used to study behaviour identifying seasonal and gender differences in daily activity and feeding times. Results given here are unlike any that could be obtained using traditional approaches in both accuracy and detail.
Movement Activity Based Classification of Animal Behaviour with an Application to Data from Cheetah (Acinonyx jubatus)

PubMed Central

Grünewälder, Steffen; Broekhuis, Femke; Macdonald, David Whyte; Wilson, Alan Martin; McNutt, John Weldon; Shawe-Taylor, John; Hailes, Stephen

2012-01-01

We propose a new method, based on machine learning techniques, for the analysis of a combination of continuous data from dataloggers and a sampling of contemporaneous behaviour observations. This data combination provides an opportunity for biologists to study behaviour at a previously unknown level of detail and accuracy; however, continuously recorded data are of little use unless the resulting large volumes of raw data can be reliably translated into actual behaviour. We address this problem by applying a Support Vector Machine and a Hidden-Markov Model that allows us to classify an animal's behaviour using a small set of field observations to calibrate continuously recorded activity data. Such classified data can be applied quantitatively to the behaviour of animals over extended periods and at times during which observation is difficult or impossible. We demonstrate the usefulness of the method by applying it to data from six cheetah (Acinonyx jubatus) in the Okavango Delta, Botswana. Cumulative activity data scores were recorded every five minutes by accelerometers embedded in GPS radio-collars for around one year on average. Direct behaviour sampling of each of the six cheetah were collected in the field for comparatively short periods. Using this approach we are able to classify each five minute activity score into a set of three key behaviour (feeding, mobile and stationary), creating a continuous behavioural sequence for the entire period for which the collars were deployed. Evaluation of our classifier with cross-validation shows the accuracy to be , but that the accuracy for individual classes is reduced with decreasing sample size of direct observations. We demonstrate how these processed data can be used to study behaviour identifying seasonal and gender differences in daily activity and feeding times. Results given here are unlike any that could be obtained using traditional approaches in both accuracy and detail. PMID:23185301
A Litmus Test for Performance Assessment.

ERIC Educational Resources Information Center

Finson, Kevin D.; Beaver, John B.

1992-01-01

Presents 10 guidelines for developing performance-based assessment items. Presents a sample activity developed from the guidelines. The activity tests students ability to observe, classify, and infer, using red and blue litmus paper, a pH-range finder, vinegar, ammonia, an unknown solution, distilled water, and paper towels. (PR)

Genetic variation and virulence of Autographa californica multiple nucleopolyhedrovirus and Trichoplusia ni single nucleopolyhedrovirus isolates

USDA-ARS?s Scientific Manuscript database

To determine the genetic diversity within the baculovirus species Autographa calfornica multiple nucleopolyhedrovirus (AcMNPV; Baculoviridae: Alphabaculovirus), a PCR-based method was used to identify and classify baculoviruses found in virus samples from the lepidopteran host species A. californi...
Surveillance of Endoscopes: Comparison of Different Sampling Techniques.

PubMed

Cattoir, Lien; Vanzieleghem, Thomas; Florin, Lisa; Helleputte, Tania; De Vos, Martine; Verhasselt, Bruno; Boelens, Jerina; Leroux-Roels, Isabel

2017-09-01

OBJECTIVE To compare different techniques of endoscope sampling to assess residual bacterial contamination. DESIGN Diagnostic study. SETTING The endoscopy unit of an 1,100-bed university hospital performing ~13,000 endoscopic procedures annually. METHODS In total, 4 sampling techniques, combining flushing fluid with or without a commercial endoscope brush, were compared in an endoscope model. Based on these results, sterile physiological saline flushing with or without PULL THRU brush was selected for evaluation on 40 flexible endoscopes by adenosine triphosphate (ATP) measurement and bacterial culture. Acceptance criteria from the French National guideline (<25 colony-forming units [CFU] per endoscope and absence of indicator microorganisms) were used as part of the evaluation. RESULTS On biofilm-coated PTFE tubes, physiological saline in combination with a PULL THRU brush generated higher mean ATP values (2,579 relative light units [RLU]) compared with saline alone (1,436 RLU; P=.047). In the endoscope samples, culture yield using saline plus the PULL THRU (mean, 43 CFU; range, 1-400 CFU) was significantly higher than that of saline alone (mean, 17 CFU; range, 0-500 CFU; P<.001). In samples obtained using the saline+PULL THRU brush method, ATP values of samples classified as unacceptable were significantly higher than those of samples classified as acceptable (P=.001). CONCLUSION Physiological saline flushing combined with PULL THRU brush to sample endoscopes generated higher ATP values and increased the yield of microbial surveillance culture. Consequently, the acceptance rate of endoscopes based on a defined CFU limit was significantly lower when the saline+PULL THRU method was used instead of saline alone. Infect Control Hosp Epidemiol 2017;38:1062-1069.
Leaching characteristics of fly ash from thermal power plants of Soma and Tuncbilek, Turkey.

PubMed

Baba, Alper; Kaya, Abidin

2004-02-01

Use of lignite in power generation has led to increasing environmental problems associated not only with gaseous emissions but also with the disposal of ash residues. In particular, use of low quality coal with high ash content results in huge quantities of fly ash to be disposed of. The main problem related to fly ash disposal is the heavy metal content of the residue. In this regard, experimental results of numerous studies indicate that toxic trace metals may leach when fly ash contacts water. In this study, fly ash samples obtained from thermal power plants, namely Soma and Tunçbilek, located at the west part of Turkey, were subjected to toxicity tests such as European Committee for standardization (CEN) and toxicity characteristic leaching (TCLP) procedures of the U.S. Environmental Protection Agency (U.S. EPA). The geochemical composition of the tested ash samples from the power plant show variations depending on the coal burned in the plants. Furthermore, the CEN and TCLP extraction results showed variations such that the ash samples were classified as 'toxic waste' based on TCLP result whereas they were classified as 'non-toxic' wastes based on CEN results, indicating test results are pH dependent.
Classification of yeast cells from image features to evaluate pathogen conditions

NASA Astrophysics Data System (ADS)

van der Putten, Peter; Bertens, Laura; Liu, Jinshuo; Hagen, Ferry; Boekhout, Teun; Verbeek, Fons J.

2007-01-01

Morphometrics from images, image analysis, may reveal differences between classes of objects present in the images. We have performed an image-features-based classification for the pathogenic yeast Cryptococcus neoformans. Building and analyzing image collections from the yeast under different environmental or genetic conditions may help to diagnose a new "unseen" situation. Diagnosis here means that retrieval of the relevant information from the image collection is at hand each time a new "sample" is presented. The basidiomycetous yeast Cryptococcus neoformans can cause infections such as meningitis or pneumonia. The presence of an extra-cellular capsule is known to be related to virulence. This paper reports on the approach towards developing classifiers for detecting potentially more or less virulent cells in a sample, i.e. an image, by using a range of features derived from the shape or density distribution. The classifier can henceforth be used for automating screening and annotating existing image collections. In addition we will present our methods for creating samples, collecting images, image preprocessing, identifying "yeast cells" and creating feature extraction from the images. We compare various expertise based and fully automated methods of feature selection and benchmark a range of classification algorithms and illustrate successful application to this particular domain.
Altered enzyme-linked immunosorbent assay immunoglobulin M (IgM)/IgG optical density ratios can correctly classify all primary or secondary dengue virus infections 1 day after the onset of symptoms, when all of the viruses can be isolated.

PubMed

Falconar, Andrew K I; de Plata, Elsa; Romero-Vivas, Claudia M E

2006-09-01

We compared dengue virus (DV) isolation rates and tested whether acute primary (P) and acute/probable acute secondary (S/PS) DV infections could be correctly classified serologically when the patients' first serum (S1) samples were obtained 1 to 3 days after the onset of symptoms (AOS). DV envelope/membrane protein-specific immunoglobulin M (IgM) capture and IgG capture enzyme-linked immunosorbent assay (ELISA) titrations (1/log(10) 1.7 to 1 log(10) 6.6 dilutions) were performed on 100 paired S1 and S2 samples from suspected DV infections. The serologically confirmed S/PS infections were divided into six subgroups based on their different IgM and IgG responses. Because of their much greater dynamic ranges, IgG/IgM ELISA titer ratios were more accurate and reliable than IgM/IgG optical density (OD) ratios recorded at a single cutoff dilution for discriminating between P and S/PS infections. However, 62% of these patients' S1 samples were DV IgM and IgG titer negative (or=2.60 and <2.60) discriminatory IgM/IgG OD (DOD) ratios on these S1 samples than those published previously to correctly classify the highest percentage of these P and S/PS infections. The DV isolation rate was highest (12/12; 100%) using IgG and IgM titer-negative S1 samples collected 1 day AOS, when 100% of them were correctly classified as P or S/PS infections using these higher DOD ratios.
A consensus prognostic gene expression classifier for ER positive breast cancer

PubMed Central

Teschendorff, Andrew E; Naderi, Ali; Barbosa-Morais, Nuno L; Pinder, Sarah E; Ellis, Ian O; Aparicio, Sam; Brenton, James D; Caldas, Carlos

2006-01-01

Background A consensus prognostic gene expression classifier is still elusive in heterogeneous diseases such as breast cancer. Results Here we perform a combined analysis of three major breast cancer microarray data sets to hone in on a universally valid prognostic molecular classifier in estrogen receptor (ER) positive tumors. Using a recently developed robust measure of prognostic separation, we further validate the prognostic classifier in three external independent cohorts, confirming the validity of our molecular classifier in a total of 877 ER positive samples. Furthermore, we find that molecular classifiers may not outperform classical prognostic indices but that they can be used in hybrid molecular-pathological classification schemes to improve prognostic separation. Conclusion The prognostic molecular classifier presented here is the first to be valid in over 877 ER positive breast cancer samples and across three different microarray platforms. Larger multi-institutional studies will be needed to fully determine the added prognostic value of molecular classifiers when combined with standard prognostic factors. PMID:17076897
A DNA-based pattern classifier with in vitro learning and associative recall for genomic characterization and biosensing without explicit sequence knowledge.

PubMed

Lee, Ju Seok; Chen, Junghuei; Deaton, Russell; Kim, Jin-Woo

2014-01-01

Genetic material extracted from in situ microbial communities has high promise as an indicator of biological system status. However, the challenge is to access genomic information from all organisms at the population or community scale to monitor the biosystem's state. Hence, there is a need for a better diagnostic tool that provides a holistic view of a biosystem's genomic status. Here, we introduce an in vitro methodology for genomic pattern classification of biological samples that taps large amounts of genetic information from all genes present and uses that information to detect changes in genomic patterns and classify them. We developed a biosensing protocol, termed Biological Memory, that has in vitro computational capabilities to "learn" and "store" genomic sequence information directly from genomic samples without knowledge of their explicit sequences, and that discovers differences in vitro between previously unknown inputs and learned memory molecules. The Memory protocol was designed and optimized based upon (1) common in vitro recombinant DNA operations using 20-base random probes, including polymerization, nuclease digestion, and magnetic bead separation, to capture a snapshot of the genomic state of a biological sample as a DNA memory and (2) the thermal stability of DNA duplexes between new input and the memory to detect similarities and differences. For efficient read out, a microarray was used as an output method. When the microarray-based Memory protocol was implemented to test its capability and sensitivity using genomic DNA from two model bacterial strains, i.e., Escherichia coli K12 and Bacillus subtilis, results indicate that the Memory protocol can "learn" input DNA, "recall" similar DNA, differentiate between dissimilar DNA, and detect relatively small concentration differences in samples. This study demonstrated not only the in vitro information processing capabilities of DNA, but also its promise as a genomic pattern classifier that could access information from all organisms in a biological system without explicit genomic information. The Memory protocol has high potential for many applications, including in situ biomonitoring of ecosystems, screening for diseases, biosensing of pathological features in water and food supplies, and non-biological information processing of memory devices, among many.
Solution-based circuits enable rapid and multiplexed pathogen detection.

PubMed

Lam, Brian; Das, Jagotamoy; Holmes, Richard D; Live, Ludovic; Sage, Andrew; Sargent, Edward H; Kelley, Shana O

2013-01-01

Electronic readout of markers of disease provides compelling simplicity, sensitivity and specificity in the detection of small panels of biomarkers in clinical samples; however, the most important emerging tests for disease, such as infectious disease speciation and antibiotic-resistance profiling, will need to interrogate samples for many dozens of biomarkers. Electronic readout of large panels of markers has been hampered by the difficulty of addressing large arrays of electrode-based sensors on inexpensive platforms. Here we report a new concept--solution-based circuits formed on chip--that makes highly multiplexed electrochemical sensing feasible on passive chips. The solution-based circuits switch the information-carrying signal readout channels and eliminate all measurable crosstalk from adjacent, biomolecule-specific microsensors. We build chips that feature this advance and prove that they analyse unpurified samples successfully, and accurately classify pathogens at clinically relevant concentrations. We also show that signature molecules can be accurately read 2 minutes after sample introduction.
A tree-like Bayesian structure learning algorithm for small-sample datasets from complex biological model systems.

PubMed

Yin, Weiwei; Garimalla, Swetha; Moreno, Alberto; Galinski, Mary R; Styczynski, Mark P

2015-08-28

There are increasing efforts to bring high-throughput systems biology techniques to bear on complex animal model systems, often with a goal of learning about underlying regulatory network structures (e.g., gene regulatory networks). However, complex animal model systems typically have significant limitations on cohort sizes, number of samples, and the ability to perform follow-up and validation experiments. These constraints are particularly problematic for many current network learning approaches, which require large numbers of samples and may predict many more regulatory relationships than actually exist. Here, we test the idea that by leveraging the accuracy and efficiency of classifiers, we can construct high-quality networks that capture important interactions between variables in datasets with few samples. We start from a previously-developed tree-like Bayesian classifier and generalize its network learning approach to allow for arbitrary depth and complexity of tree-like networks. Using four diverse sample networks, we demonstrate that this approach performs consistently better at low sample sizes than the Sparse Candidate Algorithm, a representative approach for comparison because it is known to generate Bayesian networks with high positive predictive value. We develop and demonstrate a resampling-based approach to enable the identification of a viable root for the learned tree-like network, important for cases where the root of a network is not known a priori. We also develop and demonstrate an integrated resampling-based approach to the reduction of variable space for the learning of the network. Finally, we demonstrate the utility of this approach via the analysis of a transcriptional dataset of a malaria challenge in a non-human primate model system, Macaca mulatta, suggesting the potential to capture indicators of the earliest stages of cellular differentiation during leukopoiesis. We demonstrate that by starting from effective and efficient approaches for creating classifiers, we can identify interesting tree-like network structures with significant ability to capture the relationships in the training data. This approach represents a promising strategy for inferring networks with high positive predictive value under the constraint of small numbers of samples, meeting a need that will only continue to grow as more high-throughput studies are applied to complex model systems.
Using Copula Distributions to Support More Accurate Imaging-Based Diagnostic Classifiers for Neuropsychiatric Disorders

PubMed Central

Bansal, Ravi; Hao, Xuejun; Liu, Jun; Peterson, Bradley S.

2014-01-01

Many investigators have tried to apply machine learning techniques to magnetic resonance images (MRIs) of the brain in order to diagnose neuropsychiatric disorders. Usually the number of brain imaging measures (such as measures of cortical thickness and measures of local surface morphology) derived from the MRIs (i.e., their dimensionality) has been large (e.g. >10) relative to the number of participants who provide the MRI data (<100). Sparse data in a high dimensional space increases the variability of the classification rules that machine learning algorithms generate, thereby limiting the validity, reproducibility, and generalizability of those classifiers. The accuracy and stability of the classifiers can improve significantly if the multivariate distributions of the imaging measures can be estimated accurately. To accurately estimate the multivariate distributions using sparse data, we propose to estimate first the univariate distributions of imaging data and then combine them using a Copula to generate more accurate estimates of their multivariate distributions. We then sample the estimated Copula distributions to generate dense sets of imaging measures and use those measures to train classifiers. We hypothesize that the dense sets of brain imaging measures will generate classifiers that are stable to variations in brain imaging measures, thereby improving the reproducibility, validity, and generalizability of diagnostic classification algorithms in imaging datasets from clinical populations. In our experiments, we used both computer-generated and real-world brain imaging datasets to assess the accuracy of multivariate Copula distributions in estimating the corresponding multivariate distributions of real-world imaging data. Our experiments showed that diagnostic classifiers generated using imaging measures sampled from the Copula were significantly more accurate and more reproducible than were the classifiers generated using either the real-world imaging measures or their multivariate Gaussian distributions. Thus, our findings demonstrate that estimated multivariate Copula distributions can generate dense sets of brain imaging measures that can in turn be used to train classifiers, and those classifiers are significantly more accurate and more reproducible than are those generated using real-world imaging measures alone. PMID:25093634
Handling Imbalanced Data Sets in Multistage Classification

NASA Astrophysics Data System (ADS)

López, M.

Multistage classification is a logical approach, based on a divide-and-conquer solution, for dealing with problems with a high number of classes. The classification problem is divided into several sequential steps, each one associated to a single classifier that works with subgroups of the original classes. In each level, the current set of classes is split into smaller subgroups of classes until they (the subgroups) are composed of only one class. The resulting chain of classifiers can be represented as a tree, which (1) simplifies the classification process by using fewer categories in each classifier and (2) makes it possible to combine several algorithms or use different attributes in each stage. Most of the classification algorithms can be biased in the sense of selecting the most populated class in overlapping areas of the input space. This can degrade a multistage classifier performance if the training set sample frequencies do not reflect the real prevalence in the population. Several techniques such as applying prior probabilities, assigning weights to the classes, or replicating instances have been developed to overcome this handicap. Most of them are designed for two-class (accept-reject) problems. In this article, we evaluate several of these techniques as applied to multistage classification and analyze how they can be useful for astronomy. We compare the results obtained by classifying a data set based on Hipparcos with and without these methods.
Chemical classification of iron meteorites. XI. Multi-element studies of 38 new irons and the high abundance of ungrouped irons from Antarctica

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wasson, J.T.; Ouyang, Xinwei; Wang, Jianmin

1989-03-01

The authors report concentrations of 14 elements in the metal of 38 iron meteorites and a pallasite. The meteorites are classified based on these data and on structural observations. Three samples are paired with previously classified irons; thus, these additional 35 irons raise the number of well-classified, independent iron meteorites to 598. One Yamato iron contains 342 mg/g Ni, the second highest Ni content in an IAB iron after Oktibbeha County. Two small irons from Western Australia appear to be metal nodules from mesosiderites. Several of the new irons are from Antarctica. Of 24 independent irons from Antarctica, 8 aremore » ungrouped. The fraction, 0.333, is much higher than the fraction 0.161 among all 598 classified irons. Statistical tests show that it is highly improbably ({approximately}2.9% probability) that the Antarctic population is a random sample of the larger population. The difference is probably related to the fact that the median mass of Antarctic irons is about two orders of magnitude smaller than that of non-Antarctic irons. It is doubtful that the difference results from fragmentation patterns yielding different size distributions favoring smaller masses among ungrouped irons. More likely is the possibility that smaller meteoroids tend to sample a larger number of asteroidal source regions, perhaps because small meteoroids tend to have higher ejection velocities or because small meteoroids have random-walked a greater increment of orbital semimajor axis away from that of the parent body.« less
Assessment of groundwater quality: a fusion of geochemical and geophysical information via Bayesian neural networks.

PubMed

Maiti, Saumen; Erram, V C; Gupta, Gautam; Tiwari, Ram Krishna; Kulkarni, U D; Sangpal, R R

2013-04-01

Deplorable quality of groundwater arising from saltwater intrusion, natural leaching and anthropogenic activities is one of the major concerns for the society. Assessment of groundwater quality is, therefore, a primary objective of scientific research. Here, we propose an artificial neural network-based method set in a Bayesian neural network (BNN) framework and employ it to assess groundwater quality. The approach is based on analyzing 36 water samples and inverting up to 85 Schlumberger vertical electrical sounding data. We constructed a priori model by suitably parameterizing geochemical and geophysical data collected from the western part of India. The posterior model (post-inversion) was estimated using the BNN learning procedure and global hybrid Monte Carlo/Markov Chain Monte Carlo optimization scheme. By suitable parameterization of geochemical and geophysical parameters, we simulated 1,500 training samples, out of which 50 % samples were used for training and remaining 50 % were used for validation and testing. We show that the trained model is able to classify validation and test samples with 85 % and 80 % accuracy respectively. Based on cross-correlation analysis and Gibb's diagram of geochemical attributes, the groundwater qualities of the study area were classified into following three categories: "Very good", "Good", and "Unsuitable". The BNN model-based results suggest that groundwater quality falls mostly in the range of "Good" to "Very good" except for some places near the Arabian Sea. The new modeling results powered by uncertainty and statistical analyses would provide useful constrain, which could be utilized in monitoring and assessment of the groundwater quality.
Ensemble Sparse Classification of Alzheimer’s Disease

PubMed Central

Liu, Manhua; Zhang, Daoqiang; Shen, Dinggang

2012-01-01

The high-dimensional pattern classification methods, e.g., support vector machines (SVM), have been widely investigated for analysis of structural and functional brain images (such as magnetic resonance imaging (MRI)) to assist the diagnosis of Alzheimer’s disease (AD) including its prodromal stage, i.e., mild cognitive impairment (MCI). Most existing classification methods extract features from neuroimaging data and then construct a single classifier to perform classification. However, due to noise and small sample size of neuroimaging data, it is challenging to train only a global classifier that can be robust enough to achieve good classification performance. In this paper, instead of building a single global classifier, we propose a local patch-based subspace ensemble method which builds multiple individual classifiers based on different subsets of local patches and then combines them for more accurate and robust classification. Specifically, to capture the local spatial consistency, each brain image is partitioned into a number of local patches and a subset of patches is randomly selected from the patch pool to build a weak classifier. Here, the sparse representation-based classification (SRC) method, which has shown effective for classification of image data (e.g., face), is used to construct each weak classifier. Then, multiple weak classifiers are combined to make the final decision. We evaluate our method on 652 subjects (including 198 AD patients, 225 MCI and 229 normal controls) from Alzheimer’s Disease Neuroimaging Initiative (ADNI) database using MR images. The experimental results show that our method achieves an accuracy of 90.8% and an area under the ROC curve (AUC) of 94.86% for AD classification and an accuracy of 87.85% and an AUC of 92.90% for MCI classification, respectively, demonstrating a very promising performance of our method compared with the state-of-the-art methods for AD/MCI classification using MR images. PMID:22270352
A Filter Feature Selection Method Based on MFA Score and Redundancy Excluding and It's Application to Tumor Gene Expression Data Analysis.

PubMed

Li, Jiangeng; Su, Lei; Pang, Zenan

2015-12-01

Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.
A novel application of deep learning for single-lead ECG classification.

PubMed

Mathews, Sherin M; Kambhamettu, Chandra; Barner, Kenneth E

2018-06-04

Detecting and classifying cardiac arrhythmias is critical to the diagnosis of patients with cardiac abnormalities. In this paper, a novel approach based on deep learning methodology is proposed for the classification of single-lead electrocardiogram (ECG) signals. We demonstrate the application of the Restricted Boltzmann Machine (RBM) and deep belief networks (DBN) for ECG classification following detection of ventricular and supraventricular heartbeats using single-lead ECG. The effectiveness of this proposed algorithm is illustrated using real ECG signals from the widely-used MIT-BIH database. Simulation results demonstrate that with a suitable choice of parameters, RBM and DBN can achieve high average recognition accuracies of ventricular ectopic beats (93.63%) and of supraventricular ectopic beats (95.57%) at a low sampling rate of 114 Hz. Experimental results indicate that classifiers built into this deep learning-based framework achieved state-of-the art performance models at lower sampling rates and simple features when compared to traditional methods. Further, employing features extracted at a sampling rate of 114 Hz when combined with deep learning provided enough discriminatory power for the classification task. This performance is comparable to that of traditional methods and uses a much lower sampling rate and simpler features. Thus, our proposed deep neural network algorithm demonstrates that deep learning-based methods offer accurate ECG classification and could potentially be extended to other physiological signal classifications, such as those in arterial blood pressure (ABP), nerve conduction (EMG), and heart rate variability (HRV) studies. Copyright © 2018. Published by Elsevier Ltd.
Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification.

PubMed

Zhang, Xiang; Guan, Naiyang; Jia, Zhilong; Qiu, Xiaogang; Luo, Zhigang

2015-01-01

Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF) to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR) to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods.
Differentiation and classification of bacteria using vancomycin functionalized silver nanorods array based surface-enhanced raman spectroscopy an chemometric analysis

USDA-ARS?s Scientific Manuscript database

The intrinsic surface-enhanced Raman scattering (SERS) was used for differentiating and classifying bacterial species with chemometric data analysis. Such differentiation has often been conducted with an insufficient sample population and strong interference from the food matrices. To address these ...
Classifying Work-Related and Personal Problems of Troubled Employees.

ERIC Educational Resources Information Center

Gomez-Mejia, Luis R.; Balkin, David B.

1980-01-01

Summarizes the results of research conducted on the nature of work-related and personal problems afflicting employee assistance program users. Based on a sample of 14,000 cases, the project sought to identify problems that seem to cluster together and the demographic profile of employees experiencing the cluster. (Author/MLF)
Genetic variation and virulence of nucleopolyhedroviruses isolated worldwide from the heliothine pests Helicoverpa armigera, Helicoverpa zea, and Heliothis virescens

USDA-ARS?s Scientific Manuscript database

A PCR-based method was used to classify 90 samples of nucleopolyhedrovirus (NPV; Baculoviridae: Alphabaculovirus) obtained worldwide from larvae of Heliothis virescens, Helicoverpa zea, and Helicoverpa armigera. Partial nucleotide sequencing and phylogenetic analysis of three highly conserved genes...

Scientific Investigations of Elementary School Children

ERIC Educational Resources Information Center

Valanides, Nicos; Papageorgiou, Maria; Angeli, Charoula

2014-01-01

The study provides evidence concerning elementary school children's ability to conduct a scientific investigation. Two hundred and fifty sixth-grade students and 248 fourth-grade students were administered a test, and based on their performance, they were classified into high-ability and low-ability students. The sample of this study was…
Blood-Based Gene Expression Signatures of Infants and Toddlers with Autism

ERIC Educational Resources Information Center

Glatt, Stephen J.; Tsuang, Ming T.; Winn, Mary; Chandler, Sharon D.; Collins, Melanie; Lopez, Linda; Weinfeld, Melanie; Carter, Cindy; Schork, Nicholas; Pierce, Karen; Courchesne, Eric

2012-01-01

Objective: Autism spectrum disorders (ASDs) are highly heritable neurodevelopmental disorders that onset clinically during the first years of life. ASD risk biomarkers expressed early in life could significantly impact diagnosis and treatment, but no transcriptome-wide biomarker classifiers derived from fresh blood samples from children with…
Colorectal Cancer and Colitis Diagnosis Using Fourier Transform Infrared Spectroscopy and an Improved K-Nearest-Neighbour Classifier.

PubMed

Li, Qingbo; Hao, Can; Kang, Xue; Zhang, Jialin; Sun, Xuejun; Wang, Wenbo; Zeng, Haishan

2017-11-27

Combining Fourier transform infrared spectroscopy (FTIR) with endoscopy, it is expected that noninvasive, rapid detection of colorectal cancer can be performed in vivo in the future. In this study, Fourier transform infrared spectra were collected from 88 endoscopic biopsy colorectal tissue samples (41 colitis and 47 cancers). A new method, viz., entropy weight local-hyperplane k-nearest-neighbor (EWHK), which is an improved version of K-local hyperplane distance nearest-neighbor (HKNN), is proposed for tissue classification. In order to avoid limiting high dimensions and small values of the nearest neighbor, the new EWHK method calculates feature weights based on information entropy. The average results of the random classification showed that the EWHK classifier for differentiating cancer from colitis samples produced a sensitivity of 81.38% and a specificity of 92.69%.
Consensus-Based Attributes for Identifying Patients With Spasmodic Dysphonia and Other Voice Disorders.

PubMed

Ludlow, Christy L; Domangue, Rickie; Sharma, Dinesh; Jinnah, H A; Perlmutter, Joel S; Berke, Gerald; Sapienza, Christine; Smith, Marshall E; Blumin, Joel H; Kalata, Carrie E; Blindauer, Karen; Johns, Michael; Hapner, Edie; Harmon, Archie; Paniello, Randal; Adler, Charles H; Crujido, Lisa; Lott, David G; Bansberg, Stephen F; Barone, Nicholas; Drulia, Teresa; Stebbins, Glenn

2018-06-21

A roadblock for research on adductor spasmodic dysphonia (ADSD), abductor SD (ABSD), voice tremor (VT), and muscular tension dysphonia (MTD) is the lack of criteria for selecting patients with these disorders. To determine the agreement among experts not using standard guidelines to classify patients with ABSD, ADSD, VT, and MTD, and develop expert consensus attributes for classifying patients for research. From 2011 to 2016, a multicenter observational study examined agreement among blinded experts when classifying patients with ADSD, ABSD, VT or MTD (first study). Subsequently, a 4-stage Delphi method study used reiterative stages of review by an expert panel and 46 community experts to develop consensus on attributes to be used for classifying patients with the 4 disorders (second study). The study used a convenience sample of 178 patients clinically diagnosed with ADSD, ABSD, VT MTD, vocal fold paresis/paralysis, psychogenic voice disorders, or hypophonia secondary to Parkinson disease. Participants were aged 18 years or older, without laryngeal structural disease or surgery for ADSD and underwent speech and nasolaryngoscopy video recordings following a standard protocol. Speech and nasolaryngoscopy video recordings following a standard protocol. Specialists at 4 sites classified 178 patients into 11 categories. Four international experts independently classified 75 patients using the same categories without guidelines after viewing speech and nasolaryngoscopy video recordings. Each member from the 4 sites also classified 50 patients from other sites after viewing video clips of voice/laryngeal tasks. Interrater κ less than 0.40 indicated poor classification agreement among rater pairs and across recruiting sites. Consequently, a Delphi panel of 13 experts identified and ranked speech and laryngeal movement attributes for classifying ADSD, ABSD, VT, and MTD, which were reviewed by 46 community specialists. Based on the median attribute rankings, a final attribute list was created for each disorder. When classifying patients without guidelines, raters differed in their classification distributions (likelihood ratio, χ2 = 107.66), had poor interrater agreement, and poor agreement with site categories. For 11 categories, the highest agreement was 34%, with no κ values greater than 0.26. In external rater pairs, the highest κ was 0.23 and the highest agreement was 38.5%. Using 6 categories, the highest percent agreement was 73.3% and the highest κ was 0.40. The Delphi method yielded 18 attributes for classifying disorders from speech and nasolaryngoscopic examinations. Specialists without guidelines had poor agreement when classifying patients for research, leading to a Delphi-based development of the Spasmodic Dysphonia Attributes Inventory for classifying patients with ADSD, ABSD, VT, and MTD for research.
Online breakage detection of multitooth tools using classifier ensembles for imbalanced data

NASA Astrophysics Data System (ADS)

Bustillo, Andrés; Rodríguez, Juan J.

2014-12-01

Cutting tool breakage detection is an important task, due to its economic impact on mass production lines in the automobile industry. This task presents a central limitation: real data-sets are extremely imbalanced because breakage occurs in very few cases compared with normal operation of the cutting process. In this paper, we present an analysis of different data-mining techniques applied to the detection of insert breakage in multitooth tools. The analysis applies only one experimental variable: the electrical power consumption of the tool drive. This restriction profiles real industrial conditions more accurately than other physical variables, such as acoustic or vibration signals, which are not so easily measured. Many efforts have been made to design a method that is able to identify breakages with a high degree of reliability within a short period of time. The solution is based on classifier ensembles for imbalanced data-sets. Classifier ensembles are combinations of classifiers, which in many situations are more accurate than individual classifiers. Six different base classifiers are tested: Decision Trees, Rules, Naïve Bayes, Nearest Neighbour, Multilayer Perceptrons and Logistic Regression. Three different balancing strategies are tested with each of the classifier ensembles and compared to their performance with the original data-set: Synthetic Minority Over-Sampling Technique (SMOTE), undersampling and a combination of SMOTE and undersampling. To identify the most suitable data-mining solution, Receiver Operating Characteristics (ROC) graph and Recall-precision graph are generated and discussed. The performance of logistic regression ensembles on the balanced data-set using the combination of SMOTE and undersampling turned out to be the most suitable technique. Finally a comparison using industrial performance measures is presented, which concludes that this technique is also more suited to this industrial problem than the other techniques presented in the bibliography.
Development of a method for the determination of Fusarium fungi on corn using mid-infrared spectroscopy with attenuated total reflection and chemometrics.

PubMed

Kos, Gregor; Lohninger, Hans; Krska, Rudolf

2003-03-01

A novel method, which enables the determination of fungal infection with Fusarium graminearum on corn within minutes, is presented. The ground sample was sieved and the particle size fraction between >250 and 100 microm was used for mid-infrared/attenuated total reflection (ATR) measurements. The sample was pressed onto the ATR crystal, and reproducible pressure was applied. After the spectra were recorded, they were subjected to principle component analysis (PCA) and classified using cluster analysis. Observed changes in the spectra reflected changes in protein, carbohydrate, and lipid contents. Ergosterol (for the total fungal biomass) and the toxin deoxynivalenol (DON; a secondary metabolite) of Fusarium fungi served as reference parameters, because of their relevance for the examination of corn based food and feed. The repeatability was highly improved by sieving prior to recording the spectra, resulting in a better clustering in PCA score/score plots. The developed method enabled the separation of samples with a toxin content of as low as 310 microg/kg from noncontaminated (blank) samples. Investigated concentration ranges were 880-3600 microg/kg for ergosterol and 310-2596 microg/kg for DON. The percentage of correctly classified samples was up to 100% for individual samples compared with a number of blank samples.
Evaluation of the maximum-likelihood adaptive neural system (MLANS) applications to noncooperative IFF

NASA Astrophysics Data System (ADS)

Chernick, Julian A.; Perlovsky, Leonid I.; Tye, David M.

1994-06-01

This paper describes applications of maximum likelihood adaptive neural system (MLANS) to the characterization of clutter in IR images and to the identification of targets. The characterization of image clutter is needed to improve target detection and to enhance the ability to compare performance of different algorithms using diverse imagery data. Enhanced unambiguous IFF is important for fratricide reduction while automatic cueing and targeting is becoming an ever increasing part of operations. We utilized MLANS which is a parametric neural network that combines optimal statistical techniques with a model-based approach. This paper shows that MLANS outperforms classical classifiers, the quadratic classifier and the nearest neighbor classifier, because on the one hand it is not limited to the usual Gaussian distribution assumption and can adapt in real time to the image clutter distribution; on the other hand MLANS learns from fewer samples and is more robust than the nearest neighbor classifiers. Future research will address uncooperative IFF using fused IR and MMW data.
Fluorescence intensity positivity classification of Hep-2 cells images using fuzzy logic

NASA Astrophysics Data System (ADS)

Sazali, Dayang Farzana Abang; Janier, Josefina Barnachea; May, Zazilah Bt.

2014-10-01

Indirect Immunofluorescence (IIF) is a good standard used for antinuclear autoantibody (ANA) test using Hep-2 cells to determine specific diseases. Different classifier algorithm methods have been proposed in previous works however, there still no valid set as a standard to classify the fluorescence intensity. This paper presents the use of fuzzy logic to classify the fluorescence intensity and to determine the positivity of the Hep-2 cell serum samples. The fuzzy algorithm involves the image pre-processing by filtering the noises and smoothen the image, converting the red, green and blue (RGB) color space of images to luminosity layer, chromaticity layer "a" and "b" (LAB) color space where the mean value of the lightness and chromaticity layer "a" was extracted and classified by using fuzzy logic algorithm based on the standard score ranges of antinuclear autoantibody (ANA) fluorescence intensity. Using 100 data sets of positive and intermediate fluorescence intensity for testing the performance measurements, the fuzzy logic obtained an accuracy of intermediate and positive class as 85% and 87% respectively.
Development and Validation of a Risk Score for Age-Related Macular Degeneration: The STARS Questionnaire.

PubMed

Delcourt, Cécile; Souied, Eric; Sanchez, Alice; Bandello, Francesco

2017-12-01

To develop and validate a risk score for AMD based on a simple self-administered questionnaire. Risk factors having shown the most consistent associations with AMD were included in the STARS (Simplified Théa AMD Risk-Assessment Scale) questionnaire. Two studies were conducted, one in Italy (127 participating ophthalmologists) and one in France (80 participating ophthalmologists). During 1 week, participating ophthalmologists invited all their patients aged 55 years or older to fill in the STARS questionnaire. Based on fundus examination, early AMD was defined by the presence of soft drusen and/or pigmentary abnormalities and late AMD by the presence of geographic atrophy and/or neovascular AMD. The Italian and French samples consisted of 12,639 and 6897 patients, respectively. All 13 risk factors included in the STARS questionnaire showed significant associations with AMD in the Italian sample. The area under the receiving operating characteristic curve for the STARS risk score, derived from the multivariate logistic regression in the Italian sample, was 0.78 in the Italian sample and 0.72 in the French sample. In both samples, less than 10% of patients without AMD were classified at high risk, and less than 13% of late AMD cases were classified as low risk, with a more intermediate situation in early AMD cases. STARS is a new, simple self-assessed questionnaire showing good discrimination of risk for AMD in two large European samples. It might be used by ophthalmologists in routine clinical practice or as a self-assessment for risk of AMD in the general population.
Ensemble stump classifiers and gene expression signatures in lung cancer.

PubMed

Frey, Lewis; Edgerton, Mary; Fisher, Douglas; Levy, Shawn

2007-01-01

Microarray data sets for cancer tumor tissue generally have very few samples, each sample having thousands of probes (i.e., continuous variables). The sparsity of samples makes it difficult for machine learning techniques to discover probes relevant to the classification of tumor tissue. By combining data from different platforms (i.e., data sources), data sparsity is reduced, but this typically requires normalizing data from the different platforms, which can be non-trivial. This paper proposes a variant on the idea of ensemble learners to circumvent the need for normalization. To facilitate comprehension we build ensembles of very simple classifiers known as decision stumps--decision trees of one test each. The Ensemble Stump Classifier (ESC) identifies an mRNA signature having three probes and high accuracy for distinguishing between adenocarcinoma and squamous cell carcinoma of the lung across four data sets. In terms of accuracy, ESC outperforms a decision tree classifier on all four data sets, outperforms ensemble decision trees on three data sets, and simple stump classifiers on two data sets.
Pollen Analysis of Natural Honeys from the Central Region of Shanxi, North China

PubMed Central

Song, Xiao-Yan; Yao, Yi-Feng; Yang, Wu-De

2012-01-01

Based on qualitative and quantitative melissopalynological analyses, 19 Chinese honeys were classified by botanical origin to determine their floral sources. The honey samples were collected during 2010–2011 from the central region of Shanxi Province, North China. A diverse spectrum of 61 pollen types from 37 families was identified. Fourteen samples were classified as unifloral, whereas the remaining samples were multifloral. Bee-favoured families (occurring in more than 50% of the samples) included Caprifoliaceae (found in 10 samples), Laminaceae (10), Brassicaceae (12), Rosaceae (12), Moraceae (13), Rhamnaceae (15), Asteraceae (17), and Fabaceae (19). In the unifloral honeys, the predominant pollen types were Ziziphus jujuba (in 5 samples), Robinia pseudoacacia (3), Vitex negundo var. heterophylla (2), Sophora japonica (1), Ailanthus altissima (1), Asteraceae type (1), and Fabaceae type (1). The absolute pollen count (i.e., the number of pollen grains per 10 g honey sample) suggested that 13 samples belonged to Group I (<20,000 pollen grains), 4 to Group II (20,000–100,000), and 2 to Group III (100,000–500,000). The dominance of unifloral honeys without toxic pollen grains and the low value of the HDE/P ratio (i.e., honey dew elements/pollen grains from nectariferous plants) indicated that the honey samples are of good quality and suitable for human consumption. PMID:23185358
Co-Labeling for Multi-View Weakly Labeled Learning.

PubMed

Xu, Xinxing; Li, Wen; Xu, Dong; Tsang, Ivor W

2016-06-01

It is often expensive and time consuming to collect labeled training samples in many real-world applications. To reduce human effort on annotating training samples, many machine learning techniques (e.g., semi-supervised learning (SSL), multi-instance learning (MIL), etc.) have been studied to exploit weakly labeled training samples. Meanwhile, when the training data is represented with multiple types of features, many multi-view learning methods have shown that classifiers trained on different views can help each other to better utilize the unlabeled training samples for the SSL task. In this paper, we study a new learning problem called multi-view weakly labeled learning, in which we aim to develop a unified approach to learn robust classifiers by effectively utilizing different types of weakly labeled multi-view data from a broad range of tasks including SSL, MIL and relative outlier detection (ROD). We propose an effective approach called co-labeling to solve the multi-view weakly labeled learning problem. Specifically, we model the learning problem on each view as a weakly labeled learning problem, which aims to learn an optimal classifier from a set of pseudo-label vectors generated by using the classifiers trained from other views. Unlike traditional co-training approaches using a single pseudo-label vector for training each classifier, our co-labeling approach explores different strategies to utilize the predictions from different views, biases and iterations for generating the pseudo-label vectors, making our approach more robust for real-world applications. Moreover, to further improve the weakly labeled learning on each view, we also exploit the inherent group structure in the pseudo-label vectors generated from different strategies, which leads to a new multi-layer multiple kernel learning problem. Promising results for text-based image retrieval on the NUS-WIDE dataset as well as news classification and text categorization on several real-world multi-view datasets clearly demonstrate that our proposed co-labeling approach achieves state-of-the-art performance for various multi-view weakly labeled learning problems including multi-view SSL, multi-view MIL and multi-view ROD.
Optimizing a machine learning based glioma grading system using multi-parametric MRI histogram and texture features

PubMed Central

Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin

2017-01-01

Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization. PMID:28599282
Optimizing a machine learning based glioma grading system using multi-parametric MRI histogram and texture features.

PubMed

Zhang, Xin; Yan, Lin-Feng; Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin

2017-07-18

Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ukwatta, T. N.; Wozniak, P. R.; Gehrels, N.

Studies of high-redshift gamma-ray bursts (GRBs) provide important information about the early Universe such as the rates of stellar collapsars and mergers, the metallicity content, constraints on the re-ionization period, and probes of the Hubble expansion. Rapid selection of high-z candidates from GRB samples reported in real time by dedicated space missions such as Swift is the key to identifying the most distant bursts before the optical afterglow becomes too dim to warrant a good spectrum. Here, we introduce ‘machine-z’, a redshift prediction algorithm and a ‘high-z’ classifier for Swift GRBs based on machine learning. Our method relies exclusively onmore » canonical data commonly available within the first few hours after the GRB trigger. Using a sample of 284 bursts with measured redshifts, we trained a randomized ensemble of decision trees (random forest) to perform both regression and classification. Cross-validated performance studies show that the correlation coefficient between machine-z predictions and the true redshift is nearly 0.6. At the same time, our high-z classifier can achieve 80 per cent recall of true high-redshift bursts, while incurring a false positive rate of 20 per cent. With 40 per cent false positive rate the classifier can achieve ~100 per cent recall. As a result, the most reliable selection of high-redshift GRBs is obtained by combining predictions from both the high-z classifier and the machine-z regressor.« less
Machine-z: Rapid Machine-Learned Redshift Indicator for Swift Gamma-Ray Bursts

NASA Technical Reports Server (NTRS)

Ukwatta, T. N.; Wozniak, P. R.; Gehrels, N.

2016-01-01

Studies of high-redshift gamma-ray bursts (GRBs) provide important information about the early Universe such as the rates of stellar collapsars and mergers, the metallicity content, constraints on the re-ionization period, and probes of the Hubble expansion. Rapid selection of high-z candidates from GRB samples reported in real time by dedicated space missions such as Swift is the key to identifying the most distant bursts before the optical afterglow becomes too dim to warrant a good spectrum. Here, we introduce 'machine-z', a redshift prediction algorithm and a 'high-z' classifier for Swift GRBs based on machine learning. Our method relies exclusively on canonical data commonly available within the first few hours after the GRB trigger. Using a sample of 284 bursts with measured redshifts, we trained a randomized ensemble of decision trees (random forest) to perform both regression and classification. Cross-validated performance studies show that the correlation coefficient between machine-z predictions and the true redshift is nearly 0.6. At the same time, our high-z classifier can achieve 80 per cent recall of true high-redshift bursts, while incurring a false positive rate of 20 per cent. With 40 per cent false positive rate the classifier can achieve approximately 100 per cent recall. The most reliable selection of high-redshift GRBs is obtained by combining predictions from both the high-z classifier and the machine-z regressor.
Investigation of Metabolomic Blood Biomarkers for Detection of Adenocarcinoma Lung Cancer

PubMed Central

Fahrmann, Johannes F.; Kim, Kyoungmi; DeFelice, Brian C.; Taylor, Sandra L.; Gandara, David R.; Yoneda, Ken Y.; Cooke, David T.; Fiehn, Oliver; Kelly, Karen; Miyamoto, Suzanne

2015-01-01

Background Untargeted metabolomics was utilized in case control studies of adenocarcinoma (ADC) lung cancer in order to develop and test metabolite classifiers in serum and plasma as potential biomarkers for diagnosing lung cancer. Methods Serum and plasma were collected and used in two independent case-control studies (ADC1 and ADC2). Controls were frequency matched for gender, age and smoking history. There were 52 ADC cases and 31 controls in ADC1 and 43 ADC cases and 43 controls in ADC2. Metabolomics was conducted using gas chromatography time-of-flight mass spectrometry. Differential analysis was performed on ADC1 and the top candidates (FDR < 0.05) for serum and plasma used to develop individual and multiplex-classifiers that were then tested on an independent set of serum and plasma samples (ADC2). Results Aspartate provided the best accuracy (81.4%) for an individual metabolite classifier in serum whereas pyrophosphate had the best accuracy (77.9%) in plasma when independently tested. Multiplex classifiers of either 2 or 4 serum metabolites had an accuracy of 72.7% when independently tested. For plasma, a multi-metabolite classifier consisting of 8 metabolites gave an accuracy of 77.3% when independently tested. Comparison of overall diagnostic performance between the two blood matrices yielded similar performances. However, serum is most ideal given higher sensitivity for low abundant metabolites. Conclusion This study shows the potential of metabolite-based diagnostic tests for detection of lung adenocarcinoma. Further validation in a larger pool of samples is warranted. Impact These biomarkers could improve early detection and diagnosis of lung cancer. PMID:26282632
Training set optimization and classifier performance in a top-down diabetic retinopathy screening system

NASA Astrophysics Data System (ADS)

Wigdahl, J.; Agurto, C.; Murray, V.; Barriga, S.; Soliz, P.

2013-03-01

Diabetic retinopathy (DR) affects more than 4.4 million Americans age 40 and over. Automatic screening for DR has shown to be an efficient and cost-effective way to lower the burden on the healthcare system, by triaging diabetic patients and ensuring timely care for those presenting with DR. Several supervised algorithms have been developed to detect pathologies related to DR, but little work has been done in determining the size of the training set that optimizes an algorithm's performance. In this paper we analyze the effect of the training sample size on the performance of a top-down DR screening algorithm for different types of statistical classifiers. Results are based on partial least squares (PLS), support vector machines (SVM), k-nearest neighbor (kNN), and Naïve Bayes classifiers. Our dataset consisted of digital retinal images collected from a total of 745 cases (595 controls, 150 with DR). We varied the number of normal controls in the training set, while keeping the number of DR samples constant, and repeated the procedure 10 times using randomized training sets to avoid bias. Results show increasing performance in terms of area under the ROC curve (AUC) when the number of DR subjects in the training set increased, with similar trends for each of the classifiers. Of these, PLS and k-NN had the highest average AUC. Lower standard deviation and a flattening of the AUC curve gives evidence that there is a limit to the learning ability of the classifiers and an optimal number of cases to train on.
Real-data comparison of data mining methods in prediction of diabetes in iran.

PubMed

Tapak, Lily; Mahjub, Hossein; Hamidi, Omid; Poorolajal, Jalal

2013-09-01

Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.
A ROC-based feature selection method for computer-aided detection and diagnosis

NASA Astrophysics Data System (ADS)

Wang, Songyuan; Zhang, Guopeng; Liao, Qimei; Zhang, Junying; Jiao, Chun; Lu, Hongbing

2014-03-01

Image-based computer-aided detection and diagnosis (CAD) has been a very active research topic aiming to assist physicians to detect lesions and distinguish them from benign to malignant. However, the datasets fed into a classifier usually suffer from small number of samples, as well as significantly less samples available in one class (have a disease) than the other, resulting in the classifier's suboptimal performance. How to identifying the most characterizing features of the observed data for lesion detection is critical to improve the sensitivity and minimize false positives of a CAD system. In this study, we propose a novel feature selection method mR-FAST that combines the minimal-redundancymaximal relevance (mRMR) framework with a selection metric FAST (feature assessment by sliding thresholds) based on the area under a ROC curve (AUC) generated on optimal simple linear discriminants. With three feature datasets extracted from CAD systems for colon polyps and bladder cancer, we show that the space of candidate features selected by mR-FAST is more characterizing for lesion detection with higher AUC, enabling to find a compact subset of superior features at low cost.

Competitive intelligence information management and innovation in small technology-based companies

NASA Astrophysics Data System (ADS)

Tanev, Stoyan

2007-05-01

In this article we examine how (i) company type and (ii) the competitive intelligence information used by small technology-based companies affect their innovation performance. The focus is on the specific information types used and not on the information sources. Information topics are classified in four groups - customers (10), company (9), competitor (11) and industry (12). The sample consists of 45 small new technology-based companies, specialized suppliers, and service companies from a variety of sectors - software, photonics, telecommunications, biomedical engineering and biotech, traditional manufacturing etc. The results suggest that the total number of intelligence information topics companies use to make decisions about innovation is not associated with the number of their new products, processes, services and patents. Therefore the companies in our sample do not seem to have the resources, processes or value systems required to use different competitive intelligence information when making decisions on innovation or may rely more on their own internal logic than on external information. Companies are classified using a Pavitt-like taxonomy. Service companies are considered as a separate company type. This allows for explicitly studying both, the innovative role of new services in product driven companies, and the role of new product development in service companies.
Spatial-temporal discriminant analysis for ERP-based brain-computer interface.

PubMed

Zhang, Yu; Zhou, Guoxu; Zhao, Qibin; Jin, Jing; Wang, Xingyu; Cichocki, Andrzej

2013-03-01

Linear discriminant analysis (LDA) has been widely adopted to classify event-related potential (ERP) in brain-computer interface (BCI). Good classification performance of the ERP-based BCI usually requires sufficient data recordings for effective training of the LDA classifier, and hence a long system calibration time which however may depress the system practicability and cause the users resistance to the BCI system. In this study, we introduce a spatial-temporal discriminant analysis (STDA) to ERP classification. As a multiway extension of the LDA, the STDA method tries to maximize the discriminant information between target and nontarget classes through finding two projection matrices from spatial and temporal dimensions collaboratively, which reduces effectively the feature dimensionality in the discriminant analysis, and hence decreases significantly the number of required training samples. The proposed STDA method was validated with dataset II of the BCI Competition III and dataset recorded from our own experiments, and compared to the state-of-the-art algorithms for ERP classification. Online experiments were additionally implemented for the validation. The superior classification performance in using few training samples shows that the STDA is effective to reduce the system calibration time and improve the classification accuracy, thereby enhancing the practicability of ERP-based BCI.
A Host-Based RT-PCR Gene Expression Signature to Identify Acute Respiratory Viral Infection

PubMed Central

Zaas, Aimee K.; Burke, Thomas; Chen, Minhua; McClain, Micah; Nicholson, Bradly; Veldman, Timothy; Tsalik, Ephraim L.; Fowler, Vance; Rivers, Emanuel P.; Otero, Ronny; Kingsmore, Stephen F.; Voora, Deepak; Lucas, Joseph; Hero, Alfred O.; Carin, Lawrence; Woods, Christopher W.; Ginsburg, Geoffrey S.

2014-01-01

Improved ways to diagnose acute respiratory viral infections could decrease inappropriate antibacterial use and serve as a vital triage mechanism in the event of a potential viral pandemic. Measurement of the host response to infection is an alternative to pathogen-based diagnostic testing and may improve diagnostic accuracy. We have developed a host-based assay with a reverse transcription polymerase chain reaction (RT-PCR) TaqMan low-density array (TLDA) platform for classifying respiratory viral infection. We developed the assay using two cohorts experimentally infected with influenza A H3N2/Wisconsin or influenza A H1N1/Brisbane, and validated the assay in a sample of adults presenting to the emergency department with fever (n = 102) and in healthy volunteers (n = 41). Peripheral blood RNA samples were obtained from individuals who underwent experimental viral challenge or who presented to the emergency department and had microbiologically proven viral respiratory infection or systemic bacterial infection. The selected gene set on the RT-PCR TLDA assay classified participants with experimentally induced influenza H3N2 and H1N1 infection with 100 and 87% accuracy, respectively. We validated this host gene expression signature in a cohort of 102 individuals arriving at the emergency department. The sensitivity of the RT-PCR test was 89% [95% confidence interval (CI), 72 to 98%], and the specificity was 94% (95% CI, 86 to 99%). These results show that RT-PCR–based detection of a host gene expression signature can classify individuals with respiratory viral infection and sets the stage for prospective evaluation of this diagnostic approach in a clinical setting. PMID:24048524
Simulation techniques for estimating error in the classification of normal patterns

NASA Technical Reports Server (NTRS)

Whitsitt, S. J.; Landgrebe, D. A.

1974-01-01

Methods of efficiently generating and classifying samples with specified multivariate normal distributions were discussed. Conservative confidence tables for sample sizes are given for selective sampling. Simulation results are compared with classified training data. Techniques for comparing error and separability measure for two normal patterns are investigated and used to display the relationship between the error and the Chernoff bound.
Applying active learning to supervised word sense disambiguation in MEDLINE.

PubMed

Chen, Yukun; Cao, Hongxin; Mei, Qiaozhu; Zheng, Kai; Xu, Hua

2013-01-01

This study was to assess whether active learning strategies can be integrated with supervised word sense disambiguation (WSD) methods, thus reducing the number of annotated samples, while keeping or improving the quality of disambiguation models. We developed support vector machine (SVM) classifiers to disambiguate 197 ambiguous terms and abbreviations in the MSH WSD collection. Three different uncertainty sampling-based active learning algorithms were implemented with the SVM classifiers and were compared with a passive learner (PL) based on random sampling. For each ambiguous term and each learning algorithm, a learning curve that plots the accuracy computed from the test set as a function of the number of annotated samples used in the model was generated. The area under the learning curve (ALC) was used as the primary metric for evaluation. Our experiments demonstrated that active learners (ALs) significantly outperformed the PL, showing better performance for 177 out of 197 (89.8%) WSD tasks. Further analysis showed that to achieve an average accuracy of 90%, the PL needed 38 annotated samples, while the ALs needed only 24, a 37% reduction in annotation effort. Moreover, we analyzed cases where active learning algorithms did not achieve superior performance and identified three causes: (1) poor models in the early learning stage; (2) easy WSD cases; and (3) difficult WSD cases, which provide useful insight for future improvements. This study demonstrated that integrating active learning strategies with supervised WSD methods could effectively reduce annotation cost and improve the disambiguation models.
Applying active learning to supervised word sense disambiguation in MEDLINE

PubMed Central

Chen, Yukun; Cao, Hongxin; Mei, Qiaozhu; Zheng, Kai; Xu, Hua

2013-01-01

Objectives This study was to assess whether active learning strategies can be integrated with supervised word sense disambiguation (WSD) methods, thus reducing the number of annotated samples, while keeping or improving the quality of disambiguation models. Methods We developed support vector machine (SVM) classifiers to disambiguate 197 ambiguous terms and abbreviations in the MSH WSD collection. Three different uncertainty sampling-based active learning algorithms were implemented with the SVM classifiers and were compared with a passive learner (PL) based on random sampling. For each ambiguous term and each learning algorithm, a learning curve that plots the accuracy computed from the test set as a function of the number of annotated samples used in the model was generated. The area under the learning curve (ALC) was used as the primary metric for evaluation. Results Our experiments demonstrated that active learners (ALs) significantly outperformed the PL, showing better performance for 177 out of 197 (89.8%) WSD tasks. Further analysis showed that to achieve an average accuracy of 90%, the PL needed 38 annotated samples, while the ALs needed only 24, a 37% reduction in annotation effort. Moreover, we analyzed cases where active learning algorithms did not achieve superior performance and identified three causes: (1) poor models in the early learning stage; (2) easy WSD cases; and (3) difficult WSD cases, which provide useful insight for future improvements. Conclusions This study demonstrated that integrating active learning strategies with supervised WSD methods could effectively reduce annotation cost and improve the disambiguation models. PMID:23364851
Predictive models of alcohol use based on attitudes and individual values.

PubMed

García del Castillo Rodríguez, José A; López-Sánchez, Carmen; Quiles Soler, M Carmen; García del Castillo-López, Alvaro; Gázquez Pertusa, Mónica; Marzo Campos, Juan Carlos; Inglés, Candido J

2013-01-01

Two predictive models are developed in this article: the first is designed to predict people's attitudes to alcoholic drinks, while the second sets out to predict the use of alcohol in relation to selected individual values. University students (N = 1,500) were recruited through stratified sampling based on sex and academic discipline. The questionnaire used obtained information on participants' alcohol use, attitudes and personal values. The results show that the attitudes model correctly classifies 76.3% of cases. Likewise, the model for level of alcohol use correctly classifies 82% of cases. According to our results, we can conclude that there are a series of individual values that influence drinking and attitudes to alcohol use, which therefore provides us with a potentially powerful instrument for developing preventive intervention programs.
Integrated pillar scatterers for speeding up classification of cell holograms.

PubMed

Lugnan, Alessio; Dambre, Joni; Bienstman, Peter

2017-11-27

The computational power required to classify cell holograms is a major limit to the throughput of label-free cell sorting based on digital holographic microscopy. In this work, a simple integrated photonic stage comprising a collection of silica pillar scatterers is proposed as an effective nonlinear mixing interface between the light scattered by a cell and an image sensor. The light processing provided by the photonic stage allows for the use of a simple linear classifier implemented in the electric domain and applied on a limited number of pixels. A proof-of-concept of the presented machine learning technique, which is based on the extreme learning machine (ELM) paradigm, is provided by the classification results on samples generated by 2D FDTD simulations of cells in a microfluidic channel.
Relative effectiveness of kinetic analysis vs single point readings for classifying environmental samples based on community-level physiological profiles (CLPP)

NASA Technical Reports Server (NTRS)

Garland, J. L.; Mills, A. L.; Young, J. S.

2001-01-01

The relative effectiveness of average-well-color-development-normalized single-point absorbance readings (AWCD) vs the kinetic parameters mu(m), lambda, A, and integral (AREA) of the modified Gompertz equation fit to the color development curve resulting from reduction of a redox sensitive dye from microbial respiration of 95 separate sole carbon sources in microplate wells was compared for a dilution series of rhizosphere samples from hydroponically grown wheat and potato ranging in inoculum densities of 1 x 10(4)-4 x 10(6) cells ml-1. Patterns generated with each parameter were analyzed using principal component analysis (PCA) and discriminant function analysis (DFA) to test relative resolving power. Samples of equivalent cell density (undiluted samples) were correctly classified by rhizosphere type for all parameters based on DFA analysis of the first five PC scores. Analysis of undiluted and 1:4 diluted samples resulted in misclassification of at least two of the wheat samples for all parameters except the AWCD normalized (0.50 abs. units) data, and analysis of undiluted, 1:4, and 1:16 diluted samples resulted in misclassification for all parameter types. Ordination of samples along the first principal component (PC) was correlated to inoculum density in analyses performed on all of the kinetic parameters, but no such influence was seen for AWCD-derived results. The carbon sources responsible for classification differed among the variable types with the exception of AREA and A, which were strongly correlated. These results indicate that the use of kinetic parameters for pattern analysis in CLPP may provide some additional information, but only if the influence of inoculum density is carefully considered. c2001 Elsevier Science Ltd. All rights reserved.
Classification of breast cancer cytological specimen using convolutional neural network

NASA Astrophysics Data System (ADS)

Żejmo, Michał; Kowal, Marek; Korbicz, Józef; Monczak, Roman

2017-01-01

The paper presents a deep learning approach for automatic classification of breast tumors based on fine needle cytology. The main aim of the system is to distinguish benign from malignant cases based on microscopic images. Experiment was carried out on cytological samples derived from 50 patients (25 benign cases + 25 malignant cases) diagnosed in Regional Hospital in Zielona Góra. To classify microscopic images, we used convolutional neural networks (CNN) of two types: GoogLeNet and AlexNet. Due to the very large size of images of cytological specimen (on average 200000 × 100000 pixels), they were divided into smaller patches of size 256 × 256 pixels. Breast cancer classification usually is based on morphometric features of nuclei. Therefore, training and validation patches were selected using Support Vector Machine (SVM) so that suitable amount of cell material was depicted. Neural classifiers were tuned using GPU accelerated implementation of gradient descent algorithm. Training error was defined as a cross-entropy classification loss. Classification accuracy was defined as the percentage ratio of successfully classified validation patches to the total number of validation patches. The best accuracy rate of 83% was obtained by GoogLeNet model. We observed that more misclassified patches belong to malignant cases.
Novel layered clustering-based approach for generating ensemble of classifiers.

PubMed

Rahman, Ashfaqur; Verma, Brijesh

2011-05-01

This paper introduces a novel concept for creating an ensemble of classifiers. The concept is based on generating an ensemble of classifiers through clustering of data at multiple layers. The ensemble classifier model generates a set of alternative clustering of a dataset at different layers by randomly initializing the clustering parameters and trains a set of base classifiers on the patterns at different clusters in different layers. A test pattern is classified by first finding the appropriate cluster at each layer and then using the corresponding base classifier. The decisions obtained at different layers are fused into a final verdict using majority voting. As the base classifiers are trained on overlapping patterns at different layers, the proposed approach achieves diversity among the individual classifiers. Identification of difficult-to-classify patterns through clustering as well as achievement of diversity through layering leads to better classification results as evidenced from the experimental results.
A novel modular ANN architecture for efficient monitoring of gases/odours in real-time

NASA Astrophysics Data System (ADS)

Mishra, A.; Rajput, N. S.

2018-04-01

Data pre-processing is tremendously used for enhanced classification of gases. However, it suppresses the concentration variances of different gas samples. A classical solution of using single artificial neural network (ANN) architecture is also inefficient and renders degraded quantification. In this paper, a novel modular ANN design has been proposed to provide an efficient and scalable solution in real–time. Here, two separate ANN blocks viz. classifier block and quantifier block have been used to provide efficient and scalable gas monitoring in real—time. The classifier ANN consists of two stages. In the first stage, the Net 1-NDSRT has been trained to transform raw sensor responses into corresponding virtual multi-sensor responses using normalized difference sensor response transformation (NDSRT). These responses have been fed to the second stage (i.e., Net 2-classifier ). The Net 2-classifier has been trained to classify various gas samples to their respective class. Further, the quantifier block has parallel ANN modules, multiplexed to quantify each gas. Therefore, the classifier ANN decides class and quantifier ANN decides the exact quantity of the gas/odor present in the respective sample of that class.
Automated Detection of Driver Fatigue Based on AdaBoost Classifier with EEG Signals.

PubMed

Hu, Jianfeng

2017-01-01

Purpose: Driving fatigue has become one of the important causes of road accidents, there are many researches to analyze driver fatigue. EEG is becoming increasingly useful in the measuring fatigue state. Manual interpretation of EEG signals is impossible, so an effective method for automatic detection of EEG signals is crucial needed. Method: In order to evaluate the complex, unstable, and non-linear characteristics of EEG signals, four feature sets were computed from EEG signals, in which fuzzy entropy (FE), sample entropy (SE), approximate Entropy (AE), spectral entropy (PE), and combined entropies (FE + SE + AE + PE) were included. All these feature sets were used as the input vectors of AdaBoost classifier, a boosting method which is fast and highly accurate. To assess our method, several experiments including parameter setting and classifier comparison were conducted on 28 subjects. For comparison, Decision Trees (DT), Support Vector Machine (SVM) and Naive Bayes (NB) classifiers are used. Results: The proposed method (combination of FE and AdaBoost) yields superior performance than other schemes. Using FE feature extractor, AdaBoost achieves improved area (AUC) under the receiver operating curve of 0.994, error rate (ERR) of 0.024, Precision of 0.969, Recall of 0.984, F1 score of 0.976, and Matthews correlation coefficient (MCC) of 0.952, compared to SVM (ERR at 0.035, Precision of 0.957, Recall of 0.974, F1 score of 0.966, and MCC of 0.930 with AUC of 0.990), DT (ERR at 0.142, Precision of 0.857, Recall of 0.859, F1 score of 0.966, and MCC of 0.716 with AUC of 0.916) and NB (ERR at 0.405, Precision of 0.646, Recall of 0.434, F1 score of 0.519, and MCC of 0.203 with AUC of 0.606). It shows that the FE feature set and combined feature set outperform other feature sets. AdaBoost seems to have better robustness against changes of ratio of test samples for all samples and number of subjects, which might therefore aid in the real-time detection of driver fatigue through the classification of EEG signals. Conclusion: By using combination of FE features and AdaBoost classifier to detect EEG-based driver fatigue, this paper ensured confidence in exploring the inherent physiological mechanisms and wearable application.
Automated Detection of Driver Fatigue Based on AdaBoost Classifier with EEG Signals

PubMed Central

Hu, Jianfeng

2017-01-01

Purpose: Driving fatigue has become one of the important causes of road accidents, there are many researches to analyze driver fatigue. EEG is becoming increasingly useful in the measuring fatigue state. Manual interpretation of EEG signals is impossible, so an effective method for automatic detection of EEG signals is crucial needed. Method: In order to evaluate the complex, unstable, and non-linear characteristics of EEG signals, four feature sets were computed from EEG signals, in which fuzzy entropy (FE), sample entropy (SE), approximate Entropy (AE), spectral entropy (PE), and combined entropies (FE + SE + AE + PE) were included. All these feature sets were used as the input vectors of AdaBoost classifier, a boosting method which is fast and highly accurate. To assess our method, several experiments including parameter setting and classifier comparison were conducted on 28 subjects. For comparison, Decision Trees (DT), Support Vector Machine (SVM) and Naive Bayes (NB) classifiers are used. Results: The proposed method (combination of FE and AdaBoost) yields superior performance than other schemes. Using FE feature extractor, AdaBoost achieves improved area (AUC) under the receiver operating curve of 0.994, error rate (ERR) of 0.024, Precision of 0.969, Recall of 0.984, F1 score of 0.976, and Matthews correlation coefficient (MCC) of 0.952, compared to SVM (ERR at 0.035, Precision of 0.957, Recall of 0.974, F1 score of 0.966, and MCC of 0.930 with AUC of 0.990), DT (ERR at 0.142, Precision of 0.857, Recall of 0.859, F1 score of 0.966, and MCC of 0.716 with AUC of 0.916) and NB (ERR at 0.405, Precision of 0.646, Recall of 0.434, F1 score of 0.519, and MCC of 0.203 with AUC of 0.606). It shows that the FE feature set and combined feature set outperform other feature sets. AdaBoost seems to have better robustness against changes of ratio of test samples for all samples and number of subjects, which might therefore aid in the real-time detection of driver fatigue through the classification of EEG signals. Conclusion: By using combination of FE features and AdaBoost classifier to detect EEG-based driver fatigue, this paper ensured confidence in exploring the inherent physiological mechanisms and wearable application. PMID:28824409
Identification and validation of biomarkers of IgV(H) mutation status in chronic lymphocytic leukemia using microfluidics quantitative real-time polymerase chain reaction technology.

PubMed

Abruzzo, Lynne V; Barron, Lynn L; Anderson, Keith; Newman, Rachel J; Wierda, William G; O'brien, Susan; Ferrajoli, Alessandra; Luthra, Madan; Talwalkar, Sameer; Luthra, Rajyalakshmi; Jones, Dan; Keating, Michael J; Coombes, Kevin R

2007-09-01

To develop a model incorporating relevant prognostic biomarkers for untreated chronic lymphocytic leukemia patients, we re-analyzed the raw data from four published gene expression profiling studies. We selected 88 candidate biomarkers linked to immunoglobulin heavy-chain variable region gene (IgV(H)) mutation status and produced a reliable and reproducible microfluidics quantitative real-time polymerase chain reaction array. We applied this array to a training set of 29 purified samples from previously untreated patients. In an unsupervised analysis, the samples clustered into two groups. Using a cutoff point of 2% homology to the germline IgV(H) sequence, one group contained all 14 IgV(H)-unmutated samples; the other contained all 15 mutated samples. We confirmed the differential expression of 37 of the candidate biomarkers using two-sample t-tests. Next, we constructed 16 different models to predict IgV(H) mutation status and evaluated their performance on an independent test set of 20 new samples. Nine models correctly classified 11 of 11 IgV(H)-mutated cases and eight of nine IgV(H)-unmutated cases, with some models using three to seven genes. Thus, we can classify cases with 95% accuracy based on the expression of as few as three genes.
Is Science Me? Exploring Middle School Students' STE-M Career Aspirations

ERIC Educational Resources Information Center

Aschbacher, Pamela R.; Ing, Marsha; Tsai, Sherry M.

2014-01-01

This study explores middle school students' aspirations in science, technology, engineering, and medical (STE-M) careers by analyzing survey data during their eighth and ninth grade years from an ethnically and economically diverse sample of Southern California urban and suburban public school students (n = 493). Students were classified based on…
PERSONAL AND CIRCUMSTANTIAL FACTORS INFLUENCING THE ACT OF DISCOVERY.

ERIC Educational Resources Information Center

OSTRANDER, EDWARD R.

HOW STUDENTS SAY THEY LEARN WAS INVESTIGATED. INTERVIEWS WITH A RANDOM SAMPLE OF 74 WOMEN STUDENTS POSED QUESTIONS ABOUT THE NATURE, FREQUENCY, PATTERNS, AND CIRCUMSTANCES UNDER WHICH ACTS OF DISCOVERY TAKE PLACE IN THE ACADEMIC SETTING. STUDENTS WERE ASSIGNED DISCOVERY RATINGS BASED ON READINGS OF TYPESCRIPTS. EACH STUDENT WAS CLASSIFIED AND…
Psychopathic Traits of Dutch Adolescents in Residential Care: Identifying Subgroups

ERIC Educational Resources Information Center

Nijhof, Karin S.; Vermulst, Ad; Scholte, Ron H. J.; van Dam, Coleta; Veerman, Jan Willem; Engels, Rutger C. M. E.

2011-01-01

The present study examined whether a sample of 214 (52.8% male, M age = 15.76, SD = 1.29) institutionalized adolescents could be classified into subgroups based on psychopathic traits. Confirmatory Factor Analyses revealed a relationship between the subscales of the Youth Psychopathic traits Inventory (YPI) and the three latent constructs of the…
Classification, genetic variation, and biological activity of nucleopolyhedrovirus samples from larvae of the heliothine pests heliothis virescens, helicoverpa zea, and helicoverpa armigera

USDA-ARS?s Scientific Manuscript database

A PCR-based method was used to classify 109 isolates of nucleopolyhedrovirus (NPV; Baculoviridae: Alphabaculovirus) collected worldwide from larvae of Heliothis virescens, Helicoverpa zea, and Helicoverpa armigera. Partial nucleotide sequencing and phylogenetic analysis of three highly conserved ge...
Identification of beta-lactam antibiotics in tissue samples containing unknown microbial inhibitors.

PubMed

Moats, W A; Romanowski, R D; Medina, M B

1998-01-01

Antibiotic residues in animal tissues can be detected by various screening tests based on microbial inhibition. In the 7-plate assay used by the U.S. Department of Agriculture's Food Safety and Inspection Service (FSIS), penicillinase is incorporated into all but one plate to distinguish beta-lactam antibiotics from other types. However, beta-lactams such as cloxacillin and the cephalosporins are resistant to degradation by penicillinase. They may not be identified as beta-lactams by this procedure, and thus, they may be identified as unidentified microbial inhibitors (UMIs). However, these penicillinase-resistant compounds can be degraded by other beta-lactamases. The present study describes an improved screening protocol to identify beta-lactam antibiotics classified as UMIs. A multiresidue liquid chromatographic procedure based on a method for determining beta-lactams in milk was also used to identify and quantitate residues. The 2 methods were tested with 24 tissue FSIS samples classified as containing UMIs. Of these, 3 contained penicillin G, including one at a violative level, and 5 contained a metabolite of ceftiofur. The others were negative for beta-lactam antibiotics.

Classification and identification of Rhodobryum roseum Limpr. and its adulterants based on fourier-transform infrared spectroscopy (FTIR) and chemometrics.

PubMed

Cao, Zhen; Wang, Zhenjie; Shang, Zhonglin; Zhao, Jiancheng

2017-01-01

Fourier-transform infrared spectroscopy (FTIR) with the attenuated total reflectance technique was used to identify Rhodobryum roseum from its four adulterants. The FTIR spectra of six samples in the range from 4000 cm-1 to 600 cm-1 were obtained. The second-derivative transformation test was used to identify the small and nearby absorption peaks. A cluster analysis was performed to classify the spectra in a dendrogram based on the spectral similarity. Principal component analysis (PCA) was used to classify the species of six moss samples. A cluster analysis with PCA was used to identify different genera. However, some species of the same genus exhibited highly similar chemical components and FTIR spectra. Fourier self-deconvolution and discrete wavelet transform (DWT) were used to enhance the differences among the species with similar chemical components and FTIR spectra. Three scales were selected as the feature-extracting space in the DWT domain. The results show that FTIR spectroscopy with chemometrics is suitable for identifying Rhodobryum roseum and its adulterants.
Variations in the buccal-lingual alveolar bone thickness of impacted mandibular third molar: our classification and treatment perspectives

PubMed Central

Ge, Jing; Zheng, Jia-Wei; Yang, Chi; Qian, Wen-Tao

2016-01-01

Selecting either buccal or lingual approach for the mandibular third molar surgical extraction has been an intense debate for years. The aim of this observational retrospective study was to classify the molar based on the proximity to the external cortical bone, and analyze the position of inferior alveolar canal (IAC) of each type. Cone-beam CT (CBCT) data of 110 deeply impacted mandibular third molars from 91 consecutive patients were analyzed. A new classification based on the mean deduction value (MD) of buccal-lingual alveolar bone thickness was proposed: MD≥1 mm was classified as buccal position, 1 mm>MD>−1 mm was classified as central position, MD≤−1 mm was classified as lingual position. The study samples were distributed as: buccal position (1.8%) in 2 subjects, central position (10.9%) in 12 and lingual position (87.3%) in 96. Ninety-six molars (87.3%) contacted the IAC. The buccal and inferior IAC course were the most common types in impacted third molar, especially in lingually positioned ones. Our study suggested that amongst deeply impacted mandibular third molars, lingual position occupies the largest proportion, followed by the central, and then the buccal type. PMID:26759181
Wearable Sensor Data Classification for Human Activity Recognition Based on an Iterative Learning Framework.

PubMed

Davila, Juan Carlos; Cretu, Ana-Maria; Zaremba, Marek

2017-06-07

The design of multiple human activity recognition applications in areas such as healthcare, sports and safety relies on wearable sensor technologies. However, when making decisions based on the data acquired by such sensors in practical situations, several factors related to sensor data alignment, data losses, and noise, among other experimental constraints, deteriorate data quality and model accuracy. To tackle these issues, this paper presents a data-driven iterative learning framework to classify human locomotion activities such as walk, stand, lie, and sit, extracted from the Opportunity dataset. Data acquired by twelve 3-axial acceleration sensors and seven inertial measurement units are initially de-noised using a two-stage consecutive filtering approach combining a band-pass Finite Impulse Response (FIR) and a wavelet filter. A series of statistical parameters are extracted from the kinematical features, including the principal components and singular value decomposition of roll, pitch, yaw and the norm of the axial components. The novel interactive learning procedure is then applied in order to minimize the number of samples required to classify human locomotion activities. Only those samples that are most distant from the centroids of data clusters, according to a measure presented in the paper, are selected as candidates for the training dataset. The newly built dataset is then used to train an SVM multi-class classifier. The latter will produce the lowest prediction error. The proposed learning framework ensures a high level of robustness to variations in the quality of input data, while only using a much lower number of training samples and therefore a much shorter training time, which is an important consideration given the large size of the dataset.
Microarray-based gene expression profiling in patients with cryopyrin-associated periodic syndromes defines a disease-related signature and IL-1-responsive transcripts.

PubMed

Balow, James E; Ryan, John G; Chae, Jae Jin; Booty, Matthew G; Bulua, Ariel; Stone, Deborah; Sun, Hong-Wei; Greene, James; Barham, Beverly; Goldbach-Mansky, Raphaela; Kastner, Daniel L; Aksentijevich, Ivona

2013-06-01

To analyse gene expression patterns and to define a specific gene expression signature in patients with the severe end of the spectrum of cryopyrin-associated periodic syndromes (CAPS). The molecular consequences of interleukin 1 inhibition were examined by comparing gene expression patterns in 16 CAPS patients before and after treatment with anakinra. We collected peripheral blood mononuclear cells from 22 CAPS patients with active disease and from 14 healthy children. Transcripts that passed stringent filtering criteria (p values≤false discovery rate 1%) were considered as differentially expressed genes (DEG). A set of DEG was validated by quantitative reverse transcription PCR and functional studies with primary cells from CAPS patients and healthy controls. We used 17 CAPS and 66 non-CAPS patient samples to create a set of gene expression models that differentiates CAPS patients from controls and from patients with other autoinflammatory conditions. Many DEG include transcripts related to the regulation of innate and adaptive immune responses, oxidative stress, cell death, cell adhesion and motility. A set of gene expression-based models comprising the CAPS-specific gene expression signature correctly classified all 17 samples from an independent dataset. This classifier also correctly identified 15 of 16 post-anakinra CAPS samples despite the fact that these CAPS patients were in clinical remission. We identified a gene expression signature that clearly distinguished CAPS patients from controls. A number of DEG were in common with other systemic inflammatory diseases such as systemic onset juvenile idiopathic arthritis. The CAPS-specific gene expression classifiers also suggest incomplete suppression of inflammation at low doses of anakinra.
Microarray-based gene expression profiling in patients with cryopyrin-associated periodic syndromes defines a disease-related signature and IL-1-responsive transcripts

PubMed Central

Balow, James E; Ryan, John G; Chae, Jae Jin; Booty, Matthew G; Bulua, Ariel; Stone, Deborah; Sun, Hong-Wei; Greene, James; Barham, Beverly; Goldbach-Mansky, Raphaela; Kastner, Daniel L; Aksentijevich, Ivona

2014-01-01

Objective To analyse gene expression patterns and to define a specific gene expression signature in patients with the severe end of the spectrum of cryopyrin-associated periodic syndromes (CAPS). The molecular consequences of interleukin 1 inhibition were examined by comparing gene expression patterns in 16 CAPS patients before and after treatment with anakinra. Methods We collected peripheral blood mononuclear cells from 22 CAPS patients with active disease and from 14 healthy children. Transcripts that passed stringent filtering criteria (p values ≤ false discovery rate 1%) were considered as differentially expressed genes (DEG). A set of DEG was validated by quantitative reverse transcription PCR and functional studies with primary cells from CAPS patients and healthy controls. We used 17 CAPS and 66 non-CAPS patient samples to create a set of gene expression models that differentiates CAPS patients from controls and from patients with other autoinflammatory conditions. Results Many DEG include transcripts related to the regulation of innate and adaptive immune responses, oxidative stress, cell death, cell adhesion and motility. A set of gene expression-based models comprising the CAPS-specific gene expression signature correctly classified all 17 samples from an independent dataset. This classifier also correctly identified 15 of 16 postanakinra CAPS samples despite the fact that these CAPS patients were in clinical remission. Conclusions We identified a gene expression signature that clearly distinguished CAPS patients from controls. A number of DEG were in common with other systemic inflammatory diseases such as systemic onset juvenile idiopathic arthritis. The CAPS-specific gene expression classifiers also suggest incomplete suppression of inflammation at low doses of anakinra. PMID:23223423
Color distributions in E-S0 galaxies. I. Frequency and importance of dust patterns for various brands of E classified galaxies

NASA Astrophysics Data System (ADS)

Michard, R.

1998-06-01

From the consideration of a sample of color distributions in 67 E classified objects of the Local Supercluster, it is found that local dust features are much more frequent and important in disky E's than boxy E's. The subclass of undeterminate objects, those which cannot be assigned to the diE or boE groups, is intermediate. Subsets of objects of common properties are considered from the point of view of local dust features occurrence: giant boxy E's; minor boxy E's with rotational support; compact dwarfs; SB0-like E's. It is noted that the detection of dust features is more than twice less frequent in Virgo cluster ellipticals than in the full sample, but the significance of this result is not clear. Based on observations collected at the Canada-France-Hawaii Telescope and at the Observatoire du Pic du Midi
Diagnosis of oral lichen planus from analysis of saliva samples using terahertz time-domain spectroscopy and chemometrics

NASA Astrophysics Data System (ADS)

Kistenev, Yury V.; Borisov, Alexey V.; Titarenko, Maria A.; Baydik, Olga D.; Shapovalov, Alexander V.

2018-04-01

The ability to diagnose oral lichen planus (OLP) based on saliva analysis using THz time-domain spectroscopy and chemometrics is discussed. The study involved 30 patients (2 male and 28 female) with OLP. This group consisted of two subgroups with the erosive form of OLP (n = 15) and with the reticular and papular forms of OLP (n = 15). The control group consisted of six healthy volunteers (one male and five females) without inflammation in the mucous membrane in the oral cavity and without periodontitis. Principal component analysis was used to reveal informative features in the experimental data. The one-versus-one multiclass classifier using support vector machine binary classifiers was used. The two-stage classification approach using several absorption spectra scans for an individual saliva sample provided 100% accuracy of differential classification between OLP subgroups and control group.
[Nursing diagnosis "impaired walking" in elderly patients: integrative literature review].

PubMed

Marques-Vieira, Cristina Maria Alves; de Sousa, Luís Manuel Mota; de Matos Machado Carias, João Filipe; Caldeira, Sílvia Maria Alves

2015-03-01

The impaired walking nursing diagnosis has been included in NANDA International classification taxonomy in 1998, and this review aims to identify the defining characteristics and related factors in elderly patients in recent literature. Integrative literature review based on the following guiding question: Are there more defining characteristics and factors related to the nursing diagnosis impaired walking than those included in NANDA International classification taxonomy in elderly patients? Search conducted in 2007-2013 on international and Portuguese databases. Sample composed of 15 papers. Among the 6 defining characteristics classified at NANDA International, 3 were identified in the search results, but 13 were not included in the classification. Regarding the 14 related factors that are classified, 9 were identified in the sample and 12 were not included in the NANDA International taxonomy. This review allowed the identification of new elements not included in NANDA International Taxonomy and may contribute to the development of taxonomy and nursing knowledge.
Source identification of western Oregon Douglas-fir wood cores using mass spectrometry and random forest classification.

PubMed

Finch, Kristen; Espinoza, Edgard; Jones, F Andrew; Cronn, Richard

2017-05-01

We investigated whether wood metabolite profiles from direct analysis in real time (time-of-flight) mass spectrometry (DART-TOFMS) could be used to determine the geographic origin of Douglas-fir wood cores originating from two regions in western Oregon, USA. Three annual ring mass spectra were obtained from 188 adult Douglas-fir trees, and these were analyzed using random forest models to determine whether samples could be classified to geographic origin, growth year, or growth year and geographic origin. Specific wood molecules that contributed to geographic discrimination were identified. Douglas-fir mass spectra could be differentiated into two geographic classes with an accuracy between 70% and 76%. Classification models could not accurately classify sample mass spectra based on growth year. Thirty-two molecules were identified as key for classifying western Oregon Douglas-fir wood cores to geographic origin. DART-TOFMS is capable of detecting minute but regionally informative differences in wood molecules over a small geographic scale, and these differences made it possible to predict the geographic origin of Douglas-fir wood with moderate accuracy. Studies involving DART-TOFMS, alone and in combination with other technologies, will be relevant for identifying the geographic origin of illegally harvested wood.
Joint Sparse Recovery With Semisupervised MUSIC

NASA Astrophysics Data System (ADS)

Wen, Zaidao; Hou, Biao; Jiao, Licheng

2017-05-01

Discrete multiple signal classification (MUSIC) with its low computational cost and mild condition requirement becomes a significant noniterative algorithm for joint sparse recovery (JSR). However, it fails in rank defective problem caused by coherent or limited amount of multiple measurement vectors (MMVs). In this letter, we provide a novel sight to address this problem by interpreting JSR as a binary classification problem with respect to atoms. Meanwhile, MUSIC essentially constructs a supervised classifier based on the labeled MMVs so that its performance will heavily depend on the quality and quantity of these training samples. From this viewpoint, we develop a semisupervised MUSIC (SS-MUSIC) in the spirit of machine learning, which declares that the insufficient supervised information in the training samples can be compensated from those unlabeled atoms. Instead of constructing a classifier in a fully supervised manner, we iteratively refine a semisupervised classifier by exploiting the labeled MMVs and some reliable unlabeled atoms simultaneously. Through this way, the required conditions and iterations can be greatly relaxed and reduced. Numerical experimental results demonstrate that SS-MUSIC can achieve much better recovery performances than other MUSIC extended algorithms as well as some typical greedy algorithms for JSR in terms of iterations and recovery probability.
EEG classification for motor imagery and resting state in BCI applications using multi-class Adaboost extreme learning machine

NASA Astrophysics Data System (ADS)

Gao, Lin; Cheng, Wei; Zhang, Jinhua; Wang, Jue

2016-08-01

Brain-computer interface (BCI) systems provide an alternative communication and control approach for people with limited motor function. Therefore, the feature extraction and classification approach should differentiate the relative unusual state of motion intention from a common resting state. In this paper, we sought a novel approach for multi-class classification in BCI applications. We collected electroencephalographic (EEG) signals registered by electrodes placed over the scalp during left hand motor imagery, right hand motor imagery, and resting state for ten healthy human subjects. We proposed using the Kolmogorov complexity (Kc) for feature extraction and a multi-class Adaboost classifier with extreme learning machine as base classifier for classification, in order to classify the three-class EEG samples. An average classification accuracy of 79.5% was obtained for ten subjects, which greatly outperformed commonly used approaches. Thus, it is concluded that the proposed method could improve the performance for classification of motor imagery tasks for multi-class samples. It could be applied in further studies to generate the control commands to initiate the movement of a robotic exoskeleton or orthosis, which finally facilitates the rehabilitation of disabled people.
Texture analysis of pulmonary parenchyma in normal and emphysematous lung

NASA Astrophysics Data System (ADS)

Uppaluri, Renuka; Mitsa, Theophano; Hoffman, Eric A.; McLennan, Geoffrey; Sonka, Milan

1996-04-01

Tissue characterization using texture analysis is gaining increasing importance in medical imaging. We present a completely automated method for discriminating between normal and emphysematous regions from CT images. This method involves extracting seventeen features which are based on statistical, hybrid and fractal texture models. The best subset of features is derived from the training set using the divergence technique. A minimum distance classifier is used to classify the samples into one of the two classes--normal and emphysema. Sensitivity and specificity and accuracy values achieved were 80% or greater in most cases proving that texture analysis holds great promise in identifying emphysema.
Breast cancer risk assessment and diagnosis model using fuzzy support vector machine based expert system

NASA Astrophysics Data System (ADS)

Dheeba, J.; Jaya, T.; Singh, N. Albert

2017-09-01

Classification of cancerous masses is a challenging task in many computerised detection systems. Cancerous masses are difficult to detect because these masses are obscured and subtle in mammograms. This paper investigates an intelligent classifier - fuzzy support vector machine (FSVM) applied to classify the tissues containing masses on mammograms for breast cancer diagnosis. The algorithm utilises texture features extracted using Laws texture energy measures and a FSVM to classify the suspicious masses. The new FSVM treats every feature as both normal and abnormal samples, but with different membership. By this way, the new FSVM have more generalisation ability to classify the masses in mammograms. The classifier analysed 219 clinical mammograms collected from breast cancer screening laboratory. The tests made on the real clinical mammograms shows that the proposed detection system has better discriminating power than the conventional support vector machine. With the best combination of FSVM and Laws texture features, the area under the Receiver operating characteristic curve reached .95, which corresponds to a sensitivity of 93.27% with a specificity of 87.17%. The results suggest that detecting masses using FSVM contribute to computer-aided detection of breast cancer and as a decision support system for radiologists.
PROTAX-Sound: A probabilistic framework for automated animal sound identification

PubMed Central

Somervuo, Panu; Ovaskainen, Otso

2017-01-01

Autonomous audio recording is stimulating new field in bioacoustics, with a great promise for conducting cost-effective species surveys. One major current challenge is the lack of reliable classifiers capable of multi-species identification. We present PROTAX-Sound, a statistical framework to perform probabilistic classification of animal sounds. PROTAX-Sound is based on a multinomial regression model, and it can utilize as predictors any kind of sound features or classifications produced by other existing algorithms. PROTAX-Sound combines audio and image processing techniques to scan environmental audio files. It identifies regions of interest (a segment of the audio file that contains a vocalization to be classified), extracts acoustic features from them and compares with samples in a reference database. The output of PROTAX-Sound is the probabilistic classification of each vocalization, including the possibility that it represents species not present in the reference database. We demonstrate the performance of PROTAX-Sound by classifying audio from a species-rich case study of tropical birds. The best performing classifier achieved 68% classification accuracy for 200 bird species. PROTAX-Sound improves the classification power of current techniques by combining information from multiple classifiers in a manner that yields calibrated classification probabilities. PMID:28863178
PROTAX-Sound: A probabilistic framework for automated animal sound identification.

PubMed

de Camargo, Ulisses Moliterno; Somervuo, Panu; Ovaskainen, Otso

2017-01-01

Autonomous audio recording is stimulating new field in bioacoustics, with a great promise for conducting cost-effective species surveys. One major current challenge is the lack of reliable classifiers capable of multi-species identification. We present PROTAX-Sound, a statistical framework to perform probabilistic classification of animal sounds. PROTAX-Sound is based on a multinomial regression model, and it can utilize as predictors any kind of sound features or classifications produced by other existing algorithms. PROTAX-Sound combines audio and image processing techniques to scan environmental audio files. It identifies regions of interest (a segment of the audio file that contains a vocalization to be classified), extracts acoustic features from them and compares with samples in a reference database. The output of PROTAX-Sound is the probabilistic classification of each vocalization, including the possibility that it represents species not present in the reference database. We demonstrate the performance of PROTAX-Sound by classifying audio from a species-rich case study of tropical birds. The best performing classifier achieved 68% classification accuracy for 200 bird species. PROTAX-Sound improves the classification power of current techniques by combining information from multiple classifiers in a manner that yields calibrated classification probabilities.
Improving compound-protein interaction prediction by building up highly credible negative samples.

PubMed

Liu, Hui; Sun, Jianjiang; Guan, Jihong; Zheng, Jie; Zhou, Shuigeng

2015-06-15

Computational prediction of compound-protein interactions (CPIs) is of great importance for drug design and development, as genome-scale experimental validation of CPIs is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative CPI samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods. This article aims at building up a set of highly credible negative samples of CPIs via an in silico screening method. As most existing computational models assume that similar compounds are likely to interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not much likely to be targeted by the compound and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein-protein interaction network and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly generated negative samples for both human and Caenorhabditis elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile and Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an support vector machine classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound-protein databases. Supplementary files are available at: http://admis.fudan.edu.cn/negative-cpi/. © The Author 2015. Published by Oxford University Press.
Evaluation of the initial thematic output from a continuous change-detection algorithm for use in automated operational land-change mapping by the U.S. Geological Survey

USGS Publications Warehouse

Pengra, Bruce; Gallant, Alisa L.; Zhu, Zhe; Dahal, Devendra

2016-01-01

The U.S. Geological Survey (USGS) has begun the development of operational, 30-m resolution annual thematic land cover data to meet the needs of a variety of land cover data users. The Continuous Change Detection and Classification (CCDC) algorithm is being evaluated as the likely methodology following early trials. Data for training and testing of CCDC thematic maps have been provided by the USGS Land Cover Trends (LC Trends) project, which offers sample-based, manually classified thematic land cover data at 2755 probabilistically located sample blocks across the conterminous United States. These samples represent a high quality, well distributed source of data to train the Random Forest classifier invoked by CCDC. We evaluated the suitability of LC Trends data to train the classifier by assessing the agreement of annual land cover maps output from CCDC with output from the LC Trends project within 14 Landsat path/row locations across the conterminous United States. We used a small subset of circa 2000 data from the LC Trends project to train the classifier, reserving the remaining Trends data from 2000, and incorporating LC Trends data from 1992, to evaluate measures of agreement across time, space, and thematic classes, and to characterize disagreement. Overall agreement ranged from 75% to 98% across the path/rows, and results were largely consistent across time. Land cover types that were well represented in the training data tended to have higher rates of agreement between LC Trends and CCDC outputs. Characteristics of disagreement are being used to improve the use of LC Trends data as a continued source of training information for operational production of annual land cover maps.
Photometric classification of type Ia supernovae in the SuperNova Legacy Survey with supervised learning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Möller, A.; Ruhlmann-Kleider, V.; Leloup, C.

In the era of large astronomical surveys, photometric classification of supernovae (SNe) has become an important research field due to limited spectroscopic resources for candidate follow-up and classification. In this work, we present a method to photometrically classify type Ia supernovae based on machine learning with redshifts that are derived from the SN light-curves. This method is implemented on real data from the SNLS deferred pipeline, a purely photometric pipeline that identifies SNe Ia at high-redshifts (0.2 < z < 1.1). Our method consists of two stages: feature extraction (obtaining the SN redshift from photometry and estimating light-curve shape parameters)more » and machine learning classification. We study the performance of different algorithms such as Random Forest and Boosted Decision Trees. We evaluate the performance using SN simulations and real data from the first 3 years of the Supernova Legacy Survey (SNLS), which contains large spectroscopically and photometrically classified type Ia samples. Using the Area Under the Curve (AUC) metric, where perfect classification is given by 1, we find that our best-performing classifier (Extreme Gradient Boosting Decision Tree) has an AUC of 0.98.We show that it is possible to obtain a large photometrically selected type Ia SN sample with an estimated contamination of less than 5%. When applied to data from the first three years of SNLS, we obtain 529 events. We investigate the differences between classifying simulated SNe, and real SN survey data. In particular, we find that applying a thorough set of selection cuts to the SN sample is essential for good classification. This work demonstrates for the first time the feasibility of machine learning classification in a high- z SN survey with application to real SN data.« less
Exploring geo-tagged photos for land cover validation with deep learning

NASA Astrophysics Data System (ADS)

Xing, Hanfa; Meng, Yuan; Wang, Zixuan; Fan, Kaixuan; Hou, Dongyang

2018-07-01

Land cover validation plays an important role in the process of generating and distributing land cover thematic maps, which is usually implemented by high cost of sample interpretation with remotely sensed images or field survey. With an increasing availability of geo-tagged landscape photos, the automatic photo recognition methodologies, e.g., deep learning, can be effectively utilised for land cover applications. However, they have hardly been utilised in validation processes, as challenges remain in sample selection and classification for highly heterogeneous photos. This study proposed an approach to employ geo-tagged photos for land cover validation by using the deep learning technology. The approach first identified photos automatically based on the VGG-16 network. Then, samples for validation were selected and further classified by considering photos distribution and classification probabilities. The implementations were conducted for the validation of the GlobeLand30 land cover product in a heterogeneous area, western California. Experimental results represented promises in land cover validation, given that GlobeLand30 showed an overall accuracy of 83.80% with classified samples, which was close to the validation result of 80.45% based on visual interpretation. Additionally, the performances of deep learning based on ResNet-50 and AlexNet were also quantified, revealing no substantial differences in final validation results. The proposed approach ensures geo-tagged photo quality, and supports the sample classification strategy by considering photo distribution, with accuracy improvement from 72.07% to 79.33% compared with solely considering the single nearest photo. Consequently, the presented approach proves the feasibility of deep learning technology on land cover information identification of geo-tagged photos, and has a great potential to support and improve the efficiency of land cover validation.
Biological classification with RNA-Seq data: Can alternatively spliced transcript expression enhance machine learning classifier?

PubMed

Johnson, Nathan T; Dhroso, Andi; Hughes, Katelyn J; Korkin, Dmitry

2018-06-25

The extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data. In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis. Published by Cold Spring Harbor Laboratory Press for the RNA Society.

An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data.

PubMed

Wang, Kung-Jeng; Makond, Bunjira; Wang, Kung-Min

2013-11-09

Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-survival patients) whereas the standard classifiers are not applicable for the imbalanced data sets. The methods to improve survivability prognosis of breast cancer need for study. Two well-known five-year prognosis models/classifiers [i.e., logistic regression (LR) and decision tree (DT)] are constructed by combining synthetic minority over-sampling technique (SMOTE), cost-sensitive classifier technique (CSC), under-sampling, bagging, and boosting. The feature selection method is used to select relevant variables, while the pruning technique is applied to obtain low information-burden models. These methods are applied on data obtained from the Surveillance, Epidemiology, and End Results database. The improvements of survivability prognosis of breast cancer are investigated based on the experimental results. Experimental results confirm that the DT and LR models combined with SMOTE, CSC, and under-sampling generate higher predictive performance consecutively than the original ones. Most of the time, DT and LR models combined with SMOTE and CSC use less informative burden/features when a feature selection method and a pruning technique are applied. LR is found to have better statistical power than DT in predicting five-year survivability. CSC is superior to SMOTE, under-sampling, bagging, and boosting to improve the prognostic performance of DT and LR.
An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data

PubMed Central

2013-01-01

Background Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-survival patients) whereas the standard classifiers are not applicable for the imbalanced data sets. The methods to improve survivability prognosis of breast cancer need for study. Methods Two well-known five-year prognosis models/classifiers [i.e., logistic regression (LR) and decision tree (DT)] are constructed by combining synthetic minority over-sampling technique (SMOTE) ,cost-sensitive classifier technique (CSC), under-sampling, bagging, and boosting. The feature selection method is used to select relevant variables, while the pruning technique is applied to obtain low information-burden models. These methods are applied on data obtained from the Surveillance, Epidemiology, and End Results database. The improvements of survivability prognosis of breast cancer are investigated based on the experimental results. Results Experimental results confirm that the DT and LR models combined with SMOTE, CSC, and under-sampling generate higher predictive performance consecutively than the original ones. Most of the time, DT and LR models combined with SMOTE and CSC use less informative burden/features when a feature selection method and a pruning technique are applied. Conclusions LR is found to have better statistical power than DT in predicting five-year survivability. CSC is superior to SMOTE, under-sampling, bagging, and boosting to improve the prognostic performance of DT and LR. PMID:24207108
Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types

PubMed Central

Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.

2014-01-01

Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592
Determination of HIV Status in African Adults With Discordant HIV Rapid Tests.

PubMed

Fogel, Jessica M; Piwowar-Manning, Estelle; Donohue, Kelsey; Cummings, Vanessa; Marzinke, Mark A; Clarke, William; Breaud, Autumn; Fiamma, Agnès; Donnell, Deborah; Kulich, Michal; Mbwambo, Jessie K K; Richter, Linda; Gray, Glenda; Sweat, Michael; Coates, Thomas J; Eshleman, Susan H

2015-08-01

In resource-limited settings, HIV infection is often diagnosed using 2 rapid tests. If the results are discordant, a third tie-breaker test is often used to determine HIV status. This study characterized samples with discordant rapid tests and compared different testing strategies for determining HIV status in these cases. Samples were previously collected from 173 African adults in a population-based survey who had discordant rapid test results. Samples were classified as HIV positive or HIV negative using a rigorous testing algorithm that included two fourth-generation tests, a discriminatory test, and 2 HIV RNA tests. Tie-breaker tests were evaluated, including rapid tests (1 performed in-country), a third-generation enzyme immunoassay, and two fourth-generation tests. Selected samples were further characterized using additional assays. Twenty-nine samples (16.8%) were classified as HIV positive and 24 of those samples (82.8%) had undetectable HIV RNA. Antiretroviral drugs were detected in 1 sample. Sensitivity was 8.3%-43% for the rapid tests; 24.1% for the third-generation enzyme immunoassay; 95.8% and 96.6% for the fourth-generation tests. Specificity was lower for the fourth-generation tests than the other tests. Accuracy ranged from 79.5% to 91.3%. In this population-based survey, most HIV-infected adults with discordant rapid tests were virally suppressed without antiretroviral drugs. Use of individual assays as tie-breaker tests was not a reliable method for determining HIV status in these individuals. More extensive testing algorithms that use a fourth-generation screening test with a discriminatory test and HIV RNA test are preferable for determining HIV status in these cases.
Optimizing of MALDI-ToF-based low-molecular-weight serum proteome pattern analysis in detection of breast cancer patients; the effect of albumin removal on classification performance.

PubMed

Pietrowska, M; Marczak, L; Polanska, J; Nowicka, E; Behrent, K; Tarnawski, R; Stobiecki, M; Polanski, A; Widlak, P

2010-01-01

Mass spectrometry-based analysis of the serum proteome allows identifying multi-peptide patterns/signatures specific for blood of cancer patients, thus having high potential value for cancer diagnostics. However, because of problems with optimization and standardization of experimental and computational design, none of identified proteome patterns/signatures was approved for diagnostics in clinical practice as yet. Here we compared two methods of serum sample preparation for mass spectrometry-based proteome pattern analysis aimed to identify biomarkers that could be used in early detection of breast cancer patients. Blood samples were collected in a group of 92 patients diagnosed at early (I and II) stages of the disease before the start of therapy, and in a group of age-matched healthy controls (104 women). Serum specimens were purified and analyzed using MALDI-ToF spectrometry, either directly or after membrane filtration (50 kDa cut-off) to remove albumin and other large serum proteins. Mass spectra of the low-molecular-weight fraction (2-10 kDa) of the serum proteome were resolved using the Gaussian mixture decomposition, and identified spectral components were used to build classifiers that differentiated samples from breast cancer patients and healthy persons. Mass spectra of complete serum and membrane-filtered albumin-depleted samples have apparently different structure and peaks specific for both types of samples could be identified. The optimal classifier built for the complete serum specimens consisted of 8 spectral components, and had 81% specificity and 72% sensitivity, while that built for the membrane-filtered samples consisted of 4 components, and had 80% specificity and 81% sensitivity. We concluded that pre-processing of samples to remove albumin might be recommended before MALDI-ToF mass spectrometric analysis of the low-molecular-weight components of human serum Keywords: albumin removal; breast cancer; clinical proteomics; mass spectrometry; pattern analysis; serum proteome.
Identification of Trypanosoma cruzi Discrete Typing Units (DTUs) in Latin-American migrants in Barcelona (Spain).

PubMed

Abras, Alba; Gállego, Montserrat; Muñoz, Carmen; Juiz, Natalia A; Ramírez, Juan Carlos; Cura, Carolina I; Tebar, Silvia; Fernández-Arévalo, Anna; Pinazo, María-Jesús; de la Torre, Leonardo; Posada, Elizabeth; Navarro, Ferran; Espinal, Paula; Ballart, Cristina; Portús, Montserrat; Gascón, Joaquim; Schijman, Alejandro G

2017-04-01

Trypanosoma cruzi, the causative agent of Chagas disease, is divided into six Discrete Typing Units (DTUs): TcI-TcVI. We aimed to identify T. cruzi DTUs in Latin-American migrants in the Barcelona area (Spain) and to assess different molecular typing approaches for the characterization of T. cruzi genotypes. Seventy-five peripheral blood samples were analyzed by two real-time PCR methods (qPCR) based on satellite DNA (SatDNA) and kinetoplastid DNA (kDNA). The 20 samples testing positive in both methods, all belonging to Bolivian individuals, were submitted to DTU characterization using two PCR-based flowcharts: multiplex qPCR using TaqMan probes (MTq-PCR), and conventional PCR. These samples were also studied by sequencing the SatDNA and classified as type I (TcI/III), type II (TcII/IV) and type I/II hybrid (TcV/VI). Ten out of the 20 samples gave positive results in the flowcharts: TcV (5 samples), TcII/V/VI (3) and mixed infections by TcV plus TcII (1) and TcV plus TcII/VI (1). By SatDNA sequencing, we classified the 20 samples, 19 as type I/II and one as type I. The most frequent DTU identified by both flowcharts, and suggested by SatDNA sequencing in the remaining samples with low parasitic loads, TcV, is common in Bolivia and predominant in peripheral blood. The mixed infection by TcV-TcII was detected for the first time simultaneously in Bolivian migrants. PCR-based flowcharts are very useful to characterize DTUs during acute infection. SatDNA sequence analysis cannot discriminate T. cruzi populations at the level of a single DTU but it enabled us to increase the number of characterized cases in chronically infected patients. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Application of a neural network for reflectance spectrum classification

NASA Astrophysics Data System (ADS)

Yang, Gefei; Gartley, Michael

2017-05-01

Traditional reflectance spectrum classification algorithms are based on comparing spectrum across the electromagnetic spectrum anywhere from the ultra-violet to the thermal infrared regions. These methods analyze reflectance on a pixel by pixel basis. Inspired by high performance that Convolution Neural Networks (CNN) have demonstrated in image classification, we applied a neural network to analyze directional reflectance pattern images. By using the bidirectional reflectance distribution function (BRDF) data, we can reformulate the 4-dimensional into 2 dimensions, namely incident direction × reflected direction × channels. Meanwhile, RIT's micro-DIRSIG model is utilized to simulate additional training samples for improving the robustness of the neural networks training. Unlike traditional classification by using hand-designed feature extraction with a trainable classifier, neural networks create several layers to learn a feature hierarchy from pixels to classifier and all layers are trained jointly. Hence, the our approach of utilizing the angular features are different to traditional methods utilizing spatial features. Although training processing typically has a large computational cost, simple classifiers work well when subsequently using neural network generated features. Currently, most popular neural networks such as VGG, GoogLeNet and AlexNet are trained based on RGB spatial image data. Our approach aims to build a directional reflectance spectrum based neural network to help us to understand from another perspective. At the end of this paper, we compare the difference among several classifiers and analyze the trade-off among neural networks parameters.
Neural network classification of sweet potato embryos

NASA Astrophysics Data System (ADS)

Molto, Enrique; Harrell, Roy C.

1993-05-01

Somatic embryogenesis is a process that allows for the in vitro propagation of thousands of plants in sub-liter size vessels and has been successfully applied to many significant species. The heterogeneity of maturity and quality of embryos produced with this technique requires sorting to obtain a uniform product. An automated harvester is being developed at the University of Florida to sort embryos in vitro at different stages of maturation in a suspension culture. The system utilizes machine vision to characterize embryo morphology and a fluidic based separation device to isolate embryos associated with a pre-defined, targeted morphology. Two different backpropagation neural networks (BNN) were used to classify embryos based on information extracted from the vision system. One network utilized geometric features such as embryo area, length, and symmetry as inputs. The alternative network utilized polar coordinates of an embryo's perimeter with respect to its centroid as inputs. The performances of both techniques were compared with each other and with an embryo classification method based on linear discriminant analysis (LDA). Similar results were obtained with all three techniques. Classification efficiency was improved by reducing the dimension of the feature vector trough a forward stepwise analysis by LDA. In order to enhance the purity of the sample selected as harvestable, a reject to classify option was introduced in the model and analyzed. The best classifier performances (76% overall correct classifications, 75% harvestable objects properly classified, homogeneity improvement ratio 1.5) were obtained using 8 features in a BNN.
Mapping Land Cover Types in Amazon Basin Using 1km JERS-1 Mosaic

NASA Technical Reports Server (NTRS)

Saatchi, Sassan S.; Nelson, Bruce; Podest, Erika; Holt, John

2000-01-01

In this paper, the 100 meter JERS-1 Amazon mosaic image was used in a new classifier to generate a I km resolution land cover map. The inputs to the classifier were 1 km resolution mean backscatter and seven first order texture measures derived from the 100 m data by using a 10 x 10 independent sampling window. The classification approach included two interdependent stages: 1) a supervised maximum a posteriori Bayesian approach to classify the mean backscatter image into 5 general land cover categories of forest, savannah, inundated, white sand, and anthropogenic vegetation classes, and 2) a texture measure decision rule approach to further discriminate subcategory classes based on taxonomic information and biomass levels. Fourteen classes were successfully separated at 1 km scale. The results were verified by examining the accuracy of the approach by comparison with the IBGE and the AVHRR 1 km resolution land cover maps.
Optimizing area under the ROC curve using semi-supervised learning

PubMed Central

Wang, Shijun; Li, Diana; Petrick, Nicholas; Sahiner, Berkman; Linguraru, Marius George; Summers, Ronald M.

2014-01-01

Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multidimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.1 PMID:25395692
Optimizing area under the ROC curve using semi-supervised learning.

PubMed

Wang, Shijun; Li, Diana; Petrick, Nicholas; Sahiner, Berkman; Linguraru, Marius George; Summers, Ronald M

2015-01-01

Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multidimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.
Bayes Error Rate Estimation Using Classifier Ensembles

NASA Technical Reports Server (NTRS)

Tumer, Kagan; Ghosh, Joydeep

2003-01-01

The Bayes error rate gives a statistical lower bound on the error achievable for a given classification problem and the associated choice of features. By reliably estimating th is rate, one can assess the usefulness of the feature set that is being used for classification. Moreover, by comparing the accuracy achieved by a given classifier with the Bayes rate, one can quantify how effective that classifier is. Classical approaches for estimating or finding bounds for the Bayes error, in general, yield rather weak results for small sample sizes; unless the problem has some simple characteristics, such as Gaussian class-conditional likelihoods. This article shows how the outputs of a classifier ensemble can be used to provide reliable and easily obtainable estimates of the Bayes error with negligible extra computation. Three methods of varying sophistication are described. First, we present a framework that estimates the Bayes error when multiple classifiers, each providing an estimate of the a posteriori class probabilities, a recombined through averaging. Second, we bolster this approach by adding an information theoretic measure of output correlation to the estimate. Finally, we discuss a more general method that just looks at the class labels indicated by ensem ble members and provides error estimates based on the disagreements among classifiers. The methods are illustrated for artificial data, a difficult four-class problem involving underwater acoustic data, and two problems from the Problem benchmarks. For data sets with known Bayes error, the combiner-based methods introduced in this article outperform existing methods. The estimates obtained by the proposed methods also seem quite reliable for the real-life data sets for which the true Bayes rates are unknown.
Classification of THz pulse signals using two-dimensional cross-correlation feature extraction and non-linear classifiers.

PubMed

Siuly; Yin, Xiaoxia; Hadjiloucas, Sillas; Zhang, Yanchun

2016-04-01

This work provides a performance comparison of four different machine learning classifiers: multinomial logistic regression with ridge estimators (MLR) classifier, k-nearest neighbours (KNN), support vector machine (SVM) and naïve Bayes (NB) as applied to terahertz (THz) transient time domain sequences associated with pixelated images of different powder samples. The six substances considered, although have similar optical properties, their complex insertion loss at the THz part of the spectrum is significantly different because of differences in both their frequency dependent THz extinction coefficient as well as differences in their refractive index and scattering properties. As scattering can be unquantifiable in many spectroscopic experiments, classification solely on differences in complex insertion loss can be inconclusive. The problem is addressed using two-dimensional (2-D) cross-correlations between background and sample interferograms, these ensure good noise suppression of the datasets and provide a range of statistical features that are subsequently used as inputs to the above classifiers. A cross-validation procedure is adopted to assess the performance of the classifiers. Firstly the measurements related to samples that had thicknesses of 2mm were classified, then samples at thicknesses of 4mm, and after that 3mm were classified and the success rate and consistency of each classifier was recorded. In addition, mixtures having thicknesses of 2 and 4mm as well as mixtures of 2, 3 and 4mm were presented simultaneously to all classifiers. This approach provided further cross-validation of the classification consistency of each algorithm. The results confirm the superiority in classification accuracy and robustness of the MLR (least accuracy 88.24%) and KNN (least accuracy 90.19%) algorithms which consistently outperformed the SVM (least accuracy 74.51%) and NB (least accuracy 56.86%) classifiers for the same number of feature vectors across all studies. The work establishes a general methodology for assessing the performance of other hyperspectral dataset classifiers on the basis of 2-D cross-correlations in far-infrared spectroscopy or other parts of the electromagnetic spectrum. It also advances the wider proliferation of automated THz imaging systems across new application areas e.g., biomedical imaging, industrial processing and quality control where interpretation of hyperspectral images is still under development. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Diagnosing intramammary infections: evaluation of definitions based on a single milk sample.

PubMed

Dohoo, I R; Smith, J; Andersen, S; Kelton, D F; Godden, S

2011-01-01

Criteria for diagnosing intramammary infections (IMI) have been debated for many years. Factors that may be considered in making a diagnosis include the organism of interest being found on culture, the number of colonies isolated, whether or not the organism was recovered in pure or mixed culture, and whether or not concurrent evidence of inflammation existed (often measured by somatic cell count). However, research using these criteria has been hampered by the lack of a "gold standard" test (i.e., a perfect test against which the criteria can be evaluated) and the need for very large data sets of culture results to have sufficient numbers of quarters with infections with a variety of organisms. This manuscript used 2 large data sets of culture results to evaluate several definitions (sets of criteria) for classifying a quarter as having, or not having an IMI by comparing the results from a single culture to a gold standard diagnosis based on a set of 3 milk samples. The first consisted of 38,376 milk samples from which 25,886 triplicate sets of milk samples taken 1 wk apart were extracted. The second consisted of 784 quarters that were classified as infected or not based on a set of 3 milk samples collected at 2-d intervals. From these quarters, a total of 3,136 additional samples were evaluated. A total of 12 definitions (named A to L) based on combinations of the number of colonies isolated, whether or not the organism was recovered in pure or mixed culture, and the somatic cell count were evaluated for each organism (or group of organisms) with sufficient data. The sensitivity (ability of a definition to detect IMI) and the specificity (Sp; ability of a definition to correctly classify noninfected quarters) were both computed. For all species, except Staphylococcus aureus, the sensitivity of all definitions was <90% (and in many cases<50%). Consequently, if identifying as many existing infections as possible is important, then the criteria for considering a quarter positive should be a single colony (from a 0.01-mL milk sample) isolated (definition A). With the exception of "any organism" and coagulase-negative staphylococci, all Sp estimates were over 94% in the daily data and over 97% in the weekly data, suggesting that for most species, definition A may be acceptable. For coagulase-negative staphylococci, definitions B (2 colonies from a 0.01-mL milk sample) raised the Sp to 92 and 95% in the daily and weekly data, respectively. For "any organism," using definition B raised the Sp to 88 and 93% in the 2 data sets, respectively. The final choice of definition will depend on the objectives of study or control program for which the sample was collected. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Classifying Academically At-Risk First Graders into Engagement Types: Association with Long-Term Achievement Trajectories

ERIC Educational Resources Information Center

Luo, Wen; Hughes, Jan N.; Liew, Jeffrey; Kwok, Oiman

2009-01-01

Based on a sample of 480 academically at-risk first graders, we used a cluster analysis involving multimethod assessment (i.e., teacher-report, peer-evaluation, and self-report) of behavioral and psychological engagement to identify subtypes of academic engagement. Four theoretically and practically meaningful clusters were identified and labeled…
Classifying Autism Spectrum Disorders by ADI-R: Subtypes or Severity Gradient?

ERIC Educational Resources Information Center

Cholemkery, Hannah; Medda, Juliane; Lempp, Thomas; Freitag, Christine M.

2016-01-01

To reduce phenotypic heterogeneity of Autism spectrum disorders (ASD) and add to the current diagnostic discussion this study aimed at identifying clinically meaningful ASD subgroups. Cluster analyses were used to describe empirically derived groups based on the Autism Diagnostic Interview-revised (ADI-R) in a large sample of n = 463 individuals…
An Electronic Nose Based on Coated Piezoelectric Quartz Crystals to Certify Ewes’ Cheese and to Discriminate between Cheese Varieties

PubMed Central

Pais, Vânia F.; Oliveira, João A. B. P.; Gomes, Maria Teresa S. R.

2012-01-01

An electronic nose based on coated piezoelectric quartz crystals was used to distinguish cheese made from ewes’ milk, and to distinguish cheese varieties. Two sensors coated with Nafion and Carbowax could certify half the ewes’ cheese samples, exclude 32 cheeses made from cow’s milk and to classify half of the ewes’ cheese samples as possibly authentic. Two other sensors, coated with polyvinylpyrrolidone and triethanolamine clearly distinguished between Flamengo, Brie, Gruyère and Mozzarella cheeses. Brie cheeses were further separated according to their origin, and Mozzarella grated cheese also appeared clearly separated from non-grated Mozzarella. PMID:22438717
Ultrasonic Sensor Signals and Optimum Path Forest Classifier for the Microstructural Characterization of Thermally-Aged Inconel 625 Alloy

PubMed Central

de Albuquerque, Victor Hugo C.; Barbosa, Cleisson V.; Silva, Cleiton C.; Moura, Elineudo P.; Rebouças Filho, Pedro P.; Papa, João P.; Tavares, João Manuel R. S.

2015-01-01

Secondary phases, such as laves and carbides, are formed during the final solidification stages of nickel-based superalloy coatings deposited during the gas tungsten arc welding cold wire process. However, when aged at high temperatures, other phases can precipitate in the microstructure, like the γ” and δ phases. This work presents an evaluation of the powerful optimum path forest (OPF) classifier configured with six distance functions to classify background echo and backscattered ultrasonic signals from samples of the inconel 625 superalloy thermally aged at 650 and 950 °C for 10, 100 and 200 h. The background echo and backscattered ultrasonic signals were acquired using transducers with frequencies of 4 and 5 MHz. The potentiality of ultrasonic sensor signals combined with the OPF to characterize the microstructures of an inconel 625 thermally aged and in the as-welded condition were confirmed by the results. The experimental results revealed that the OPF classifier is sufficiently fast (classification total time of 0.316 ms) and accurate (accuracy of 88.75% and harmonic mean of 89.52) for the application proposed. PMID:26024416
Ultrasonic sensor signals and optimum path forest classifier for the microstructural characterization of thermally-aged inconel 625 alloy.

PubMed

de Albuquerque, Victor Hugo C; Barbosa, Cleisson V; Silva, Cleiton C; Moura, Elineudo P; Filho, Pedro P Rebouças; Papa, João P; Tavares, João Manuel R S

2015-05-27

Secondary phases, such as laves and carbides, are formed during the final solidification stages of nickel-based superalloy coatings deposited during the gas tungsten arc welding cold wire process. However, when aged at high temperatures, other phases can precipitate in the microstructure, like the γ'' and δ phases. This work presents an evaluation of the powerful optimum path forest (OPF) classifier configured with six distance functions to classify background echo and backscattered ultrasonic signals from samples of the inconel 625 superalloy thermally aged at 650 and 950 °C for 10, 100 and 200 h. The background echo and backscattered ultrasonic signals were acquired using transducers with frequencies of 4 and 5 MHz. The potentiality of ultrasonic sensor signals combined with the OPF to characterize the microstructures of an inconel 625 thermally aged and in the as-welded condition were confirmed by the results. The experimental results revealed that the OPF classifier is sufficiently fast (classification total time of 0.316 ms) and accurate (accuracy of 88.75%" and harmonic mean of 89.52) for the application proposed.
Machine- z: Rapid machine-learned redshift indicator for Swift gamma-ray bursts

DOE PAGES

Ukwatta, T. N.; Wozniak, P. R.; Gehrels, N.

2016-03-08

Studies of high-redshift gamma-ray bursts (GRBs) provide important information about the early Universe such as the rates of stellar collapsars and mergers, the metallicity content, constraints on the re-ionization period, and probes of the Hubble expansion. Rapid selection of high-z candidates from GRB samples reported in real time by dedicated space missions such as Swift is the key to identifying the most distant bursts before the optical afterglow becomes too dim to warrant a good spectrum. Here, we introduce ‘machine-z’, a redshift prediction algorithm and a ‘high-z’ classifier for Swift GRBs based on machine learning. Our method relies exclusively onmore » canonical data commonly available within the first few hours after the GRB trigger. Using a sample of 284 bursts with measured redshifts, we trained a randomized ensemble of decision trees (random forest) to perform both regression and classification. Cross-validated performance studies show that the correlation coefficient between machine-z predictions and the true redshift is nearly 0.6. At the same time, our high-z classifier can achieve 80 per cent recall of true high-redshift bursts, while incurring a false positive rate of 20 per cent. With 40 per cent false positive rate the classifier can achieve ~100 per cent recall. As a result, the most reliable selection of high-redshift GRBs is obtained by combining predictions from both the high-z classifier and the machine-z regressor.« less

Monthly Fluctuations of Insomnia Symptoms in a Population-Based Sample

PubMed Central

Morin, Charles M.; LeBlanc, M.; Ivers, H.; Bélanger, L.; Mérette, Chantal; Savard, Josée; Jarrin, Denise C.

2014-01-01

Study Objectives: To document the monthly changes in sleep/insomnia status over a 12-month period; to determine the optimal time intervals to reliably capture new incident cases and recurrent episodes of insomnia and the likelihood of its persistence over time. Design: Participants were 100 adults (mean age = 49.9 years; 66% women) randomly selected from a larger population-based sample enrolled in a longitudinal study of the natural history of insomnia. They completed 12 monthly telephone interviews assessing insomnia, use of sleep aids, stressful life events, and physical and mental health problems in the previous month. A total of 1,125 interviews of a potential 1,200 were completed. Based on data collected at each assessment, participants were classified into one of three subgroups: good sleepers, insomnia symptoms, and insomnia syndrome. Results: At baseline, 42 participants were classified as good sleepers, 34 met criteria for insomnia symptoms, and 24 for an insomnia syndrome. There were significant fluctuations of insomnia over time, with 66% of the participants changing sleep status at least once over the 12 monthly assessments (51.5% for good sleepers, 59.5% for insomnia syndrome, and 93.4% for insomnia symptoms). Changes of status were more frequent among individuals with insomnia symptoms at baseline (mean = 3.46, SD = 2.36) than among those initially classified as good sleepers (mean = 2.12, SD = 2.70). Among the subgroup with insomnia symptoms at baseline, 88.3% reported improved sleep (i.e., became good sleepers) at least once over the 12 monthly assessments compared to 27.7% whose sleep worsened (i.e., met criteria for an insomnia syndrome) during the same period. Among individuals classified as good sleepers at baseline, risks of developing insomnia symptoms and syndrome over the subsequent months were, respectively, 48.6% and 14.5%. Monthly assessment over an interval of 6 months was found most reliable to estimate incidence rates, while an interval of 3 months proved the most reliable for defining chronic insomnia. Conclusions: Monthly assessment of insomnia and sleep patterns revealed significant variability over the course of a 12-month period. These findings highlight the importance for future epidemiological studies of conducting repeated assessment at shorter than the typical yearly interval in order to reliably capture the natural course of insomnia over time. Citation: Morin CM; LeBlanc M; Ivers H; Bélanger L; Mérette C; Savard J; Jarrin DC. Monthly fluctuations of insomnia symptoms in a population-based sample. SLEEP 2014;37(2):319-326. PMID:24497660
Planning schistosomiasis control: investigation of alternative sampling strategies for Schistosoma mansoni to target mass drug administration of praziquantel in East Africa.

PubMed

Sturrock, Hugh J W; Gething, Pete W; Ashton, Ruth A; Kolaczinski, Jan H; Kabatereine, Narcis B; Brooker, Simon

2011-09-01

In schistosomiasis control, there is a need to geographically target treatment to populations at high risk of morbidity. This paper evaluates alternative sampling strategies for surveys of Schistosoma mansoni to target mass drug administration in Kenya and Ethiopia. Two main designs are considered: lot quality assurance sampling (LQAS) of children from all schools; and a geostatistical design that samples a subset of schools and uses semi-variogram analysis and spatial interpolation to predict prevalence in the remaining unsurveyed schools. Computerized simulations are used to investigate the performance of sampling strategies in correctly classifying schools according to treatment needs and their cost-effectiveness in identifying high prevalence schools. LQAS performs better than geostatistical sampling in correctly classifying schools, but at a cost with a higher cost per high prevalence school correctly classified. It is suggested that the optimal surveying strategy for S. mansoni needs to take into account the goals of the control programme and the financial and drug resources available.
Mining big data sets of plankton images: a zero-shot learning approach to retrieve labels without training data

NASA Astrophysics Data System (ADS)

Orenstein, E. C.; Morgado, P. M.; Peacock, E.; Sosik, H. M.; Jaffe, J. S.

2016-02-01

Technological advances in instrumentation and computing have allowed oceanographers to develop imaging systems capable of collecting extremely large data sets. With the advent of in situ plankton imaging systems, scientists must now commonly deal with "big data" sets containing tens of millions of samples spanning hundreds of classes, making manual classification untenable. Automated annotation methods are now considered to be the bottleneck between collection and interpretation. Typically, such classifiers learn to approximate a function that predicts a predefined set of classes for which a considerable amount of labeled training data is available. The requirement that the training data span all the classes of concern is problematic for plankton imaging systems since they sample such diverse, rapidly changing populations. These data sets may contain relatively rare, sparsely distributed, taxa that will not have associated training data; a classifier trained on a limited set of classes will miss these samples. The computer vision community, leveraging advances in Convolutional Neural Networks (CNNs), has recently attempted to tackle such problems using "zero-shot" object categorization methods. Under a zero-shot framework, a classifier is trained to map samples onto a set of attributes rather than a class label. These attributes can include visual and non-visual information such as what an organism is made out of, where it is distributed globally, or how it reproduces. A second stage classifier is then used to extrapolate a class. In this work, we demonstrate a zero-shot classifier, implemented with a CNN, to retrieve out-of-training-set labels from images. This method is applied to data from two continuously imaging, moored instruments: the Scripps Plankton Camera System (SPCS) and the Imaging FlowCytobot (IFCB). Results from simulated deployment scenarios indicate zero-shot classifiers could be successful at recovering samples of rare taxa in image sets. This capability will allow ecologists to identify trends in the distribution of difficult to sample organisms in their data.
Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods

PubMed Central

2013-01-01

Background Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers. Results In the first stage of this research, five feature selection methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each feature selection methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, artificial neural network, support vector machine and logistic regression. A k-fold cross-validation is implemented on all types of classifiers due to the small sample size. The hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 achieved the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis. Conclusions The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies. PMID:23725313
Fusion and Gaussian mixture based classifiers for SONAR data

NASA Astrophysics Data System (ADS)

Kotari, Vikas; Chang, KC

2011-06-01

Underwater mines are inexpensive and highly effective weapons. They are difficult to detect and classify. Hence detection and classification of underwater mines is essential for the safety of naval vessels. This necessitates a formulation of highly efficient classifiers and detection techniques. Current techniques primarily focus on signals from one source. Data fusion is known to increase the accuracy of detection and classification. In this paper, we formulated a fusion-based classifier and a Gaussian mixture model (GMM) based classifier for classification of underwater mines. The emphasis has been on sound navigation and ranging (SONAR) signals due to their extensive use in current naval operations. The classifiers have been tested on real SONAR data obtained from University of California Irvine (UCI) repository. The performance of both GMM based classifier and fusion based classifier clearly demonstrate their superior classification accuracy over conventional single source cases and validate our approach.
Discovering Fine-grained Sentiment in Suicide Notes

PubMed Central

Wang, Wenbo; Chen, Lu; Tan, Ming; Wang, Shaojun; Sheth, Amit P.

2012-01-01

This paper presents our solution for the i2b2 sentiment classification challenge. Our hybrid system consists of machine learning and rule-based classifiers. For the machine learning classifier, we investigate a variety of lexical, syntactic and knowledge-based features, and show how much these features contribute to the performance of the classifier through experiments. For the rule-based classifier, we propose an algorithm to automatically extract effective syntactic and lexical patterns from training examples. The experimental results show that the rule-based classifier outperforms the baseline machine learning classifier using unigram features. By combining the machine learning classifier and the rule-based classifier, the hybrid system gains a better trade-off between precision and recall, and yields the highest micro-averaged F-measure (0.5038), which is better than the mean (0.4875) and median (0.5027) micro-average F-measures among all participating teams. PMID:22879770
A paper-based cantilever array sensor: Monitoring volatile organic compounds with naked eye.

PubMed

Fraiwan, Arwa; Lee, Hankeun; Choi, Seokheun

2016-09-01

Volatile organic compound (VOC) detection is critical for controlling industrial and commercial emissions, environmental monitoring, and public health. Simple, portable, rapid and low-cost VOC sensing platforms offer the benefits of on-site and real-time monitoring anytime and anywhere. The best and most practically useful approaches to monitoring would include equipment-free and power-free detection by the naked eye. In this work, we created a novel, paper-based cantilever sensor array that allows simple and rapid naked-eye VOC detection without the need for power, electronics or readout interface/equipment. This simple VOC detection method was achieved using (i) low-cost paper materials as a substrate and (ii) swellable thin polymers adhered to the paper. Upon exposure to VOCs, the polymer swelling adhered to the paper-based cantilever, inducing mechanical deflection that generated a distinctive composite pattern of the deflection angles for a specific VOC. The angle is directly measured by the naked eye on a 3-D protractor printed on a paper facing the cantilevers. The generated angle patterns are subjected to statistical algorithms (linear discriminant analysis (LDA)) to classify each VOC sample and selectively detect a VOC. We classified four VOC samples with 100% accuracy using LDA. Copyright © 2016 Elsevier B.V. All rights reserved.
Learning in data-limited multimodal scenarios: Scandent decision forests and tree-based features.

PubMed

Hor, Soheil; Moradi, Mehdi

2016-12-01

Incomplete and inconsistent datasets often pose difficulties in multimodal studies. We introduce the concept of scandent decision trees to tackle these difficulties. Scandent trees are decision trees that optimally mimic the partitioning of the data determined by another decision tree, and crucially, use only a subset of the feature set. We show how scandent trees can be used to enhance the performance of decision forests trained on a small number of multimodal samples when we have access to larger datasets with vastly incomplete feature sets. Additionally, we introduce the concept of tree-based feature transforms in the decision forest paradigm. When combined with scandent trees, the tree-based feature transforms enable us to train a classifier on a rich multimodal dataset, and use it to classify samples with only a subset of features of the training data. Using this methodology, we build a model trained on MRI and PET images of the ADNI dataset, and then test it on cases with only MRI data. We show that this is significantly more effective in staging of cognitive impairments compared to a similar decision forest model trained and tested on MRI only, or one that uses other kinds of feature transform applied to the MRI data. Copyright © 2016. Published by Elsevier B.V.
The decision tree classifier - Design and potential. [for Landsat-1 data

NASA Technical Reports Server (NTRS)

Hauska, H.; Swain, P. H.

1975-01-01

A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.
A mathematical theory of shape and neuro-fuzzy methodology-based diagnostic analysis: a comparative study on early detection and treatment planning of brain cancer.

PubMed

Kar, Subrata; Majumder, D Dutta

2017-08-01

Investigation of brain cancer can detect the abnormal growth of tissue in the brain using computed tomography (CT) scans and magnetic resonance (MR) images of patients. The proposed method classifies brain cancer on shape-based feature extraction as either benign or malignant. The authors used input variables such as shape distance (SD) and shape similarity measure (SSM) in fuzzy tools, and used fuzzy rules to evaluate the risk status as an output variable. We presented a classifier neural network system (NNS), namely Levenberg-Marquardt (LM), which is a feed-forward back-propagation learning algorithm used to train the NN for the status of brain cancer, if any, and which achieved satisfactory performance with 100% accuracy. The proposed methodology is divided into three phases. First, we find the region of interest (ROI) in the brain to detect the tumors using CT and MR images. Second, we extract the shape-based features, like SD and SSM, and grade the brain tumors as benign or malignant with the concept of SD function and SSM as shape-based parameters. Third, we classify the brain cancers using neuro-fuzzy tools. In this experiment, we used a 16-sample database with SSM (μ) values and classified the benignancy or malignancy of the brain tumor lesions using the neuro-fuzzy system (NFS). We have developed a fuzzy expert system (FES) and NFS for early detection of brain cancer from CT and MR images. In this experiment, shape-based features, such as SD and SSM, were extracted from the ROI of brain tumor lesions. These shape-based features were considered as input variables and, using fuzzy rules, we were able to evaluate brain cancer risk values for each case. We used an NNS with LM, a feed-forward back-propagation learning algorithm, as a classifier for the diagnosis of brain cancer and achieved satisfactory performance with 100% accuracy. The proposed network was trained with MR image datasets of 16 cases. The 16 cases were fed to the ANN with 2 input neurons, one hidden layer of 10 neurons and 2 output neurons. Of the 16-sample database, 10 datasets for training, 3 datasets for validation, and 3 datasets for testing were used in the ANN classification system. From the SSM (µ) confusion matrix, the number of output datasets of true positive, false positive, true negative and false negative was 6, 0, 10, and 0, respectively. The sensitivity, specificity and accuracy were each equal to 100%. The method of diagnosing brain cancer presented in this study is a successful model to assist doctors in the screening and treatment of brain cancer patients. The presented FES successfully identified the presence of brain cancer in CT and MR images using the extracted shape-based features and the use of NFS for the identification of brain cancer in the early stages. From the analysis and diagnosis of the disease, the doctors can decide the stage of cancer and take the necessary steps for more accurate treatment. Here, we have presented an investigation and comparison study of the shape-based feature extraction method with the use of NFS for classifying brain tumors as showing normal or abnormal patterns. The results have proved that the shape-based features with the use of NFS can achieve a satisfactory performance with 100% accuracy. We intend to extend this methodology for the early detection of cancer in other regions such as the prostate region and human cervix.
Classifier transfer with data selection strategies for online support vector machine classification with class imbalance

NASA Astrophysics Data System (ADS)

Krell, Mario Michael; Wilshusen, Nils; Seeland, Anett; Kim, Su Kyoung

2017-04-01

Objective. Classifier transfers usually come with dataset shifts. To overcome dataset shifts in practical applications, we consider the limitations in computational resources in this paper for the adaptation of batch learning algorithms, like the support vector machine (SVM). Approach. We focus on data selection strategies which limit the size of the stored training data by different inclusion, exclusion, and further dataset manipulation criteria like handling class imbalance with two new approaches. We provide a comparison of the strategies with linear SVMs on several synthetic datasets with different data shifts as well as on different transfer settings with electroencephalographic (EEG) data. Main results. For the synthetic data, adding only misclassified samples performed astoundingly well. Here, balancing criteria were very important when the other criteria were not well chosen. For the transfer setups, the results show that the best strategy depends on the intensity of the drift during the transfer. Adding all and removing the oldest samples results in the best performance, whereas for smaller drifts, it can be sufficient to only add samples near the decision boundary of the SVM which reduces processing resources. Significance. For brain-computer interfaces based on EEG data, models trained on data from a calibration session, a previous recording session, or even from a recording session with another subject are used. We show, that by using the right combination of data selection criteria, it is possible to adapt the SVM classifier to overcome the performance drop from the transfer.
Combination of mass spectrometry-based targeted lipidomics and supervised machine learning algorithms in detecting adulterated admixtures of white rice.

PubMed

Lim, Dong Kyu; Long, Nguyen Phuoc; Mo, Changyeun; Dong, Ziyuan; Cui, Lingmei; Kim, Giyoung; Kwon, Sung Won

2017-10-01

The mixing of extraneous ingredients with original products is a common adulteration practice in food and herbal medicines. In particular, authenticity of white rice and its corresponding blended products has become a key issue in food industry. Accordingly, our current study aimed to develop and evaluate a novel discrimination method by combining targeted lipidomics with powerful supervised learning methods, and eventually introduce a platform to verify the authenticity of white rice. A total of 30 cultivars were collected, and 330 representative samples of white rice from Korea and China as well as seven mixing ratios were examined. Random forests (RF), support vector machines (SVM) with a radial basis function kernel, C5.0, model averaged neural network, and k-nearest neighbor classifiers were used for the classification. We achieved desired results, and the classifiers effectively differentiated white rice from Korea to blended samples with high prediction accuracy for the contamination ratio as low as five percent. In addition, RF and SVM classifiers were generally superior to and more robust than the other techniques. Our approach demonstrated that the relative differences in lysoGPLs can be successfully utilized to detect the adulterated mixing of white rice originating from different countries. In conclusion, the present study introduces a novel and high-throughput platform that can be applied to authenticate adulterated admixtures from original white rice samples. Copyright © 2017 Elsevier Ltd. All rights reserved.
Construction of a multiple myeloma diagnostic model by magnetic bead-based MALDI-TOF mass spectrometry of serum and pattern recognition software.

PubMed

Wang, Qing-Tao; Li, Yong-Zhe; Liang, Yu-Fang; Hu, Chao-Jun; Zhai, Yu-Hua; Zhao, Guan-Fei; Zhang, Jian; Li, Ning; Ni, An-Ping; Chen, Wen-Ming; Xu, Yang

2009-04-01

A diagnosis of multiple myeloma (MM) is difficult to make on the basis of any single laboratory test result. Accurate diagnosis of MM generally results from a number of costly and invasive laboratory tests and medical procedures. The aim of this work is to find a new, highly specific and sensitive method for MM diagnosis. Serum samples were tested in groups representing MM (n = 54) and non-MM (n = 108). These included a subgroup of 17 plasma cell dyscrasias, a subgroup of 17 reactive plasmacytosis, 5 B cell lymphomas, and 7 other tumors with osseus metastasis, as well as 62 healthy donors as controls. Bioinformatic calculations associated with MM were performed. The decision algorithm, with a panel of three biomarkers, correctly identified 24 of 24 (100%) MM samples and 46 of 49 (93.88%) non-MM samples in the training set. During the masked test for the discriminatory model, 26 of 30 MM patients (sensitivity, 86.67%) were precisely recognized, and all 34 normal donors were successfully classified; patients with reactive plasmacytosis were also correctly classified into the non-MM group, and 11 of the other patients were incorrectly classified as MM. The results suggested that proteomic fingerprint technology combining magnetic beads with MALDI-TOF-MS has the potential for identifying individuals with MM. The biomarker classification model was suitable for preliminary assessment of MM and could potentially serve as a useful tool for MM diagnosis and differentiation diagnosis.
The Performance of Short-Term Heart Rate Variability in the Detection of Congestive Heart Failure

PubMed Central

Barros, Allan Kardec; Ohnishi, Noboru

2016-01-01

Congestive heart failure (CHF) is a cardiac disease associated with the decreasing capacity of the cardiac output. It has been shown that the CHF is the main cause of the cardiac death around the world. Some works proposed to discriminate CHF subjects from healthy subjects using either electrocardiogram (ECG) or heart rate variability (HRV) from long-term recordings. In this work, we propose an alternative framework to discriminate CHF from healthy subjects by using HRV short-term intervals based on 256 RR continuous samples. Our framework uses a matching pursuit algorithm based on Gabor functions. From the selected Gabor functions, we derived a set of features that are inputted into a hybrid framework which uses a genetic algorithm and k-nearest neighbour classifier to select a subset of features that has the best classification performance. The performance of the framework is analyzed using both Fantasia and CHF database from Physionet archives which are, respectively, composed of 40 healthy volunteers and 29 subjects. From a set of nonstandard 16 features, the proposed framework reaches an overall accuracy of 100% with five features. Our results suggest that the application of hybrid frameworks whose classifier algorithms are based on genetic algorithms has outperformed well-known classifier methods. PMID:27891509
Validation of a new classifier for the automated analysis of the human epidermal growth factor receptor 2 (HER2) gene amplification in breast cancer specimens

PubMed Central

2013-01-01

Amplification of the human epidermal growth factor receptor 2 (HER2) is a prognostic marker for poor clinical outcome and a predictive marker for therapeutic response to targeted therapies in breast cancer patients. With the introduction of anti-HER2 therapies, accurate assessment of HER2 status has become essential. Fluorescence in situ hybridization (FISH) is a widely used technique for the determination of HER2 status in breast cancer. However, the manual signal enumeration is time-consuming. Therefore, several companies like MetaSystem have developed automated image analysis software. Some of these signal enumeration software employ the so called “tile-sampling classifier”, a programming algorithm through which the software quantifies fluorescent signals in images on the basis of square tiles of fixed dimensions. Considering that the size of tile does not always correspond to the size of a single tumor cell nucleus, some users argue that this analysis method might not completely reflect the biology of cells. For that reason, MetaSystems has developed a new classifier which is able to recognize nuclei within tissue sections in order to determine the HER2 amplification status on nuclei basis. We call this new programming algorithm “nuclei-sampling classifier”. In this study, we evaluated the accuracy of the “nuclei-sampling classifier” in determining HER2 gene amplification by FISH in nuclei of breast cancer cells. To this aim, we randomly selected from our cohort 64 breast cancer specimens (32 nonamplified and 32 amplified) and we compared results obtained through manual scoring and through this new classifier. The new classifier automatically recognized individual nuclei. The automated analysis was followed by an optional human correction, during which the user interacted with the software in order to improve the selection of cell nuclei automatically selected. Overall concordance between manual scoring and automated nuclei-sampling analysis was 98.4% (100% for nonamplified cases and 96.9% for amplified cases). However, after human correction, concordance between the two methods was 100%. We conclude that the nuclei-based classifier is a new available tool for automated quantitative HER2 FISH signals analysis in nuclei in breast cancer specimen and it can be used for clinical purposes. PMID:23379971
Assessing the similarity of surface linguistic features related to epilepsy across pediatric hospitals.

PubMed

Connolly, Brian; Matykiewicz, Pawel; Bretonnel Cohen, K; Standridge, Shannon M; Glauser, Tracy A; Dlugos, Dennis J; Koh, Susan; Tham, Eric; Pestian, John

2014-01-01

The constant progress in computational linguistic methods provides amazing opportunities for discovering information in clinical text and enables the clinical scientist to explore novel approaches to care. However, these new approaches need evaluation. We describe an automated system to compare descriptions of epilepsy patients at three different organizations: Cincinnati Children's Hospital, the Children's Hospital Colorado, and the Children's Hospital of Philadelphia. To our knowledge, there have been no similar previous studies. In this work, a support vector machine (SVM)-based natural language processing (NLP) algorithm is trained to classify epilepsy progress notes as belonging to a patient with a specific type of epilepsy from a particular hospital. The same SVM is then used to classify notes from another hospital. Our null hypothesis is that an NLP algorithm cannot be trained using epilepsy-specific notes from one hospital and subsequently used to classify notes from another hospital better than a random baseline classifier. The hypothesis is tested using epilepsy progress notes from the three hospitals. We are able to reject the null hypothesis at the 95% level. It is also found that classification was improved by including notes from a second hospital in the SVM training sample. With a reasonably uniform epilepsy vocabulary and an NLP-based algorithm able to use this uniformity to classify epilepsy progress notes across different hospitals, we can pursue automated comparisons of patient conditions, treatments, and diagnoses across different healthcare settings. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Characterization and classification of seven citrus herbs by liquid chromatography-quadrupole time-of-flight mass spectrometry and genetic algorithm optimized support vector machines.

PubMed

Duan, Li; Guo, Long; Liu, Ke; Liu, E-Hu; Li, Ping

2014-04-25

Citrus herbs have been widely used in traditional medicine and cuisine in China and other countries since the ancient time. However, the authentication and quality control of Citrus herbs has always been a challenging task due to their similar morphological characteristics and the diversity of the multi-components existed in the complicated matrix. In the present investigation, we developed a novel strategy to characterize and classify seven Citrus herbs based on chromatographic analysis and chemometric methods. Firstly, the chemical constituents in seven Citrus herbs were globally characterized by liquid chromatography combined with quadrupole time-of-flight mass spectrometry (LC-QTOF-MS). Based on their retention time, UV spectra and MS fragmentation behavior, a total of 75 compounds were identified or tentatively characterized in these herbal medicines. Secondly, a segmental monitoring method based on LC-variable wavelength detection was developed for simultaneous quantification of ten marker compounds in these Citrus herbs. Thirdly, based on the contents of the ten analytes, genetic algorithm optimized support vector machines (GA-SVM) was employed to differentiate and classify the 64 samples covering these seven herbs. The obtained classifier showed good prediction performance and the overall prediction accuracy reached 96.88%. The proposed strategy is expected to provide new insight for authentication and quality control of traditional herbs. Copyright © 2014 Elsevier B.V. All rights reserved.
Lamb wave based damage detection using Matching Pursuit and Support Vector Machine classifier

NASA Astrophysics Data System (ADS)

Agarwal, Sushant; Mitra, Mira

2014-03-01

In this paper, the suitability of using Matching Pursuit (MP) and Support Vector Machine (SVM) for damage detection using Lamb wave response of thin aluminium plate is explored. Lamb wave response of thin aluminium plate with or without damage is simulated using finite element. Simulations are carried out at different frequencies for various kinds of damage. The procedure is divided into two parts - signal processing and machine learning. Firstly, MP is used for denoising and to maintain the sparsity of the dataset. In this study, MP is extended by using a combination of time-frequency functions as the dictionary and is deployed in two stages. Selection of a particular type of atoms lead to extraction of important features while maintaining the sparsity of the waveform. The resultant waveform is then passed as input data for SVM classifier. SVM is used to detect the location of the potential damage from the reduced data. The study demonstrates that SVM is a robust classifier in presence of noise and more efficient as compared to Artificial Neural Network (ANN). Out-of-sample data is used for the validation of the trained and tested classifier. Trained classifiers are found successful in detection of the damage with more than 95% detection rate.
Use of Lot Quality Assurance Sampling to Ascertain Levels of Drug Resistant Tuberculosis in Western Kenya

PubMed Central

Cohen, Ted; Zignol, Matteo; Nyakan, Edwin; Hedt-Gauthier, Bethany L.; Gardner, Adrian; Kamle, Lydia; Injera, Wilfred; Carter, E. Jane

2016-01-01

Objective To classify the prevalence of multi-drug resistant tuberculosis (MDR-TB) in two different geographic settings in western Kenya using the Lot Quality Assurance Sampling (LQAS) methodology. Design The prevalence of drug resistance was classified among treatment-naïve smear positive TB patients in two settings, one rural and one urban. These regions were classified as having high or low prevalence of MDR-TB according to a static, two-way LQAS sampling plan selected to classify high resistance regions at greater than 5% resistance and low resistance regions at less than 1% resistance. Results This study classified both the urban and rural settings as having low levels of TB drug resistance. Out of the 105 patients screened in each setting, two patients were diagnosed with MDR-TB in the urban setting and one patient was diagnosed with MDR-TB in the rural setting. An additional 27 patients were diagnosed with a variety of mono- and poly- resistant strains. Conclusion Further drug resistance surveillance using LQAS may help identify the levels and geographical distribution of drug resistance in Kenya and may have applications in other countries in the African Region facing similar resource constraints. PMID:27167381
Use of Lot Quality Assurance Sampling to Ascertain Levels of Drug Resistant Tuberculosis in Western Kenya.

PubMed

Jezmir, Julia; Cohen, Ted; Zignol, Matteo; Nyakan, Edwin; Hedt-Gauthier, Bethany L; Gardner, Adrian; Kamle, Lydia; Injera, Wilfred; Carter, E Jane

2016-01-01

To classify the prevalence of multi-drug resistant tuberculosis (MDR-TB) in two different geographic settings in western Kenya using the Lot Quality Assurance Sampling (LQAS) methodology. The prevalence of drug resistance was classified among treatment-naïve smear positive TB patients in two settings, one rural and one urban. These regions were classified as having high or low prevalence of MDR-TB according to a static, two-way LQAS sampling plan selected to classify high resistance regions at greater than 5% resistance and low resistance regions at less than 1% resistance. This study classified both the urban and rural settings as having low levels of TB drug resistance. Out of the 105 patients screened in each setting, two patients were diagnosed with MDR-TB in the urban setting and one patient was diagnosed with MDR-TB in the rural setting. An additional 27 patients were diagnosed with a variety of mono- and poly- resistant strains. Further drug resistance surveillance using LQAS may help identify the levels and geographical distribution of drug resistance in Kenya and may have applications in other countries in the African Region facing similar resource constraints.

AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Farrell, Sean A.; Murphy, Tara; Lo, Kitty K., E-mail: s.farrell@physics.usyd.edu.au

In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of amore » random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.« less
A Biomimetic Sensor for the Classification of Honeys of Different Floral Origin and the Detection of Adulteration

PubMed Central

Zakaria, Ammar; Shakaff, Ali Yeon Md; Masnan, Maz Jamilah; Ahmad, Mohd Noor; Adom, Abdul Hamid; Jaafar, Mahmad Nor; Ghani, Supri A.; Abdullah, Abu Hassan; Aziz, Abdul Hallis Abdul; Kamarudin, Latifah Munirah; Subari, Norazian; Fikri, Nazifah Ahmad

2011-01-01

The major compounds in honey are carbohydrates such as monosaccharides and disaccharides. The same compounds are found in cane-sugar concentrates. Unfortunately when sugar concentrate is added to honey, laboratory assessments are found to be ineffective in detecting this adulteration. Unlike tracing heavy metals in honey, sugar adulterated honey is much trickier and harder to detect, and traditionally it has been very challenging to come up with a suitable method to prove the presence of adulterants in honey products. This paper proposes a combination of array sensing and multi-modality sensor fusion that can effectively discriminate the samples not only based on the compounds present in the sample but also mimic the way humans perceive flavours and aromas. Conversely, analytical instruments are based on chemical separations which may alter the properties of the volatiles or flavours of a particular honey. The present work is focused on classifying 18 samples of different honeys, sugar syrups and adulterated samples using data fusion of electronic nose (e-nose) and electronic tongue (e-tongue) measurements. Each group of samples was evaluated separately by the e-nose and e-tongue. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) were able to separately discriminate monofloral honey from sugar syrup, and polyfloral honey from sugar and adulterated samples using the e-nose and e-tongue. The e-nose was observed to give better separation compared to e-tongue assessment, particularly when LDA was applied. However, when all samples were combined in one classification analysis, neither PCA nor LDA were able to discriminate between honeys of different floral origins, sugar syrup and adulterated samples. By applying a sensor fusion technique, the classification for the 18 different samples was improved. Significant improvement was observed using PCA, while LDA not only improved the discrimination but also gave better classification. An improvement in performance was also observed using a Probabilistic Neural Network classifier when the e-nose and e-tongue data were fused. PMID:22164046
A biomimetic sensor for the classification of honeys of different floral origin and the detection of adulteration.

PubMed

Zakaria, Ammar; Shakaff, Ali Yeon Md; Masnan, Maz Jamilah; Ahmad, Mohd Noor; Adom, Abdul Hamid; Jaafar, Mahmad Nor; Ghani, Supri A; Abdullah, Abu Hassan; Aziz, Abdul Hallis Abdul; Kamarudin, Latifah Munirah; Subari, Norazian; Fikri, Nazifah Ahmad

2011-01-01

The major compounds in honey are carbohydrates such as monosaccharides and disaccharides. The same compounds are found in cane-sugar concentrates. Unfortunately when sugar concentrate is added to honey, laboratory assessments are found to be ineffective in detecting this adulteration. Unlike tracing heavy metals in honey, sugar adulterated honey is much trickier and harder to detect, and traditionally it has been very challenging to come up with a suitable method to prove the presence of adulterants in honey products. This paper proposes a combination of array sensing and multi-modality sensor fusion that can effectively discriminate the samples not only based on the compounds present in the sample but also mimic the way humans perceive flavours and aromas. Conversely, analytical instruments are based on chemical separations which may alter the properties of the volatiles or flavours of a particular honey. The present work is focused on classifying 18 samples of different honeys, sugar syrups and adulterated samples using data fusion of electronic nose (e-nose) and electronic tongue (e-tongue) measurements. Each group of samples was evaluated separately by the e-nose and e-tongue. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) were able to separately discriminate monofloral honey from sugar syrup, and polyfloral honey from sugar and adulterated samples using the e-nose and e-tongue. The e-nose was observed to give better separation compared to e-tongue assessment, particularly when LDA was applied. However, when all samples were combined in one classification analysis, neither PCA nor LDA were able to discriminate between honeys of different floral origins, sugar syrup and adulterated samples. By applying a sensor fusion technique, the classification for the 18 different samples was improved. Significant improvement was observed using PCA, while LDA not only improved the discrimination but also gave better classification. An improvement in performance was also observed using a Probabilistic Neural Network classifier when the e-nose and e-tongue data were fused.
[Establishment and Management of Multicentral Collection Bio-sample Banks of Malignant Tumors from Digestive System].

PubMed

Shen, Si; Shen, Junwei; Zhu, Liang; Wu, Chaoqun; Li, Dongliang; Yu, Hongyu; Qiu, Yuanyuan; Zhou, Yi

2015-11-01

To establish and manage of multicentral collection bio-sample banks of malignant tumors from digestive system, the paper designed a multicentral management system, established the standard operation procedures (SOPs) and leaded ten hospitals nationwide to collect tumor samples. The biobank has been established for half a year, and has collected 695 samples from patients with digestive system malignant tumor. The clinical data is full and complete, labeled in a unified way and classified to be managed. The clinical and molecular biology researches were based on the biobank, and obtained achievements. The biobank provides a research platform for malignant tumor of digestive system from different regions and of different types.
Height in healthy children in low- and middle-income countries: an assessment.

PubMed

Karra, Mahesh; Subramanian, S V; Fink, Günther

2017-01-01

Despite rapid economic development and reductions in child mortality worldwide, continued high rates of early childhood stunting have put the global applicability of international child-height standards into question. We used population-based survey data to identify children growing up in healthy environments in low- and middle-income countries and compared the height distribution of these children to the height distribution of the reference sample established by the WHO. Height data were extracted from 169 Demographic and Health Surveys (DHSs) that were collected across 63 countries between 1990 and 2014. Children were classified as having grown up in ideal environments if they 1) had access to safe water and sanitation; 2) lived in households with finished floors, a television, and a car; 3) were born to highly educated mothers; 4) were single births; and 5) were delivered in hospitals. We compared the heights of children in ideal environments with those in the WHO reference sample. A total of 878,249 height records were extracted, and 1006 children (0.1%) were classified as having been raised in an ideal home environment. The mean height-for-age z score (HAZ) in this sample was not statistically different from zero (95% CI: -0.039, 0.125). The HAZ SD for the sample was estimated to be 1.3, and 5.3% of children in the sample were classified as being stunted (HAZ <-2). Similar means, SDs, and stunting rates were found when less restrictive definitions of ideal environments were used. The large current gaps in children's heights relative to those of the reference sample likely are not due to innate or genetic differences between children but, rather, reflect children's continued exposure to poverty, a lack of maternal education, and a lack of access to safe water and sanitation across populations. © 2017 American Society for Nutrition.
Guided filter and convolutional network based tracking for infrared dim moving target

NASA Astrophysics Data System (ADS)

Qian, Kun; Zhou, Huixin; Qin, Hanlin; Rong, Shenghui; Zhao, Dong; Du, Juan

2017-09-01

The dim moving target usually submerges in strong noise, and its motion observability is debased by numerous false alarms for low signal-to-noise ratio. A tracking algorithm that integrates the Guided Image Filter (GIF) and the Convolutional neural network (CNN) into the particle filter framework is presented to cope with the uncertainty of dim targets. First, the initial target template is treated as a guidance to filter incoming templates depending on similarities between the guidance and candidate templates. The GIF algorithm utilizes the structure in the guidance and performs as an edge-preserving smoothing operator. Therefore, the guidance helps to preserve the detail of valuable templates and makes inaccurate ones blurry, alleviating the tracking deviation effectively. Besides, the two-layer CNN method is adopted to obtain a powerful appearance representation. Subsequently, a Bayesian classifier is trained with these discriminative yet strong features. Moreover, an adaptive learning factor is introduced to prevent the update of classifier's parameters when a target undergoes sever background. At last, classifier responses of particles are utilized to generate particle importance weights and a re-sample procedure preserves samples according to the weight. In the predication stage, a 2-order transition model considers the target velocity to estimate current position. Experimental results demonstrate that the presented algorithm outperforms several relative algorithms in the accuracy.
Bayes classification of interferometric TOPSAR data

NASA Technical Reports Server (NTRS)

Michel, T. R.; Rodriguez, E.; Houshmand, B.; Carande, R.

1995-01-01

We report the Bayes classification of terrain types at different sites using airborne interferometric synthetic aperture radar (INSAR) data. A Gaussian maximum likelihood classifier was applied on multidimensional observations derived from the SAR intensity, the terrain elevation model, and the magnitude of the interferometric correlation. Training sets for forested, urban, agricultural, or bare areas were obtained either by selecting samples with known ground truth, or by k-means clustering of random sets of samples uniformly distributed across all sites, and subsequent assignments of these clusters using ground truth. The accuracy of the classifier was used to optimize the discriminating efficiency of the set of features that was chosen. The most important features include the SAR intensity, a canopy penetration depth model, and the terrain slope. We demonstrate the classifier's performance across sites using a unique set of training classes for the four main terrain categories. The scenes examined include San Francisco (CA) (predominantly urban and water), Mount Adams (WA) (forested with clear cuts), Pasadena (CA) (urban with mountains), and Antioch Hills (CA) (water, swamps, fields). Issues related to the effects of image calibration and the robustness of the classification to calibration errors are explored. The relative performance of single polarization Interferometric data classification is contrasted against classification schemes based on polarimetric SAR data.
Source identification of western Oregon Douglas-fir wood cores using mass spectrometry and random forest classification1

PubMed Central

Finch, Kristen; Espinoza, Edgard; Jones, F. Andrew; Cronn, Richard

2017-01-01

Premise of the study: We investigated whether wood metabolite profiles from direct analysis in real time (time-of-flight) mass spectrometry (DART-TOFMS) could be used to determine the geographic origin of Douglas-fir wood cores originating from two regions in western Oregon, USA. Methods: Three annual ring mass spectra were obtained from 188 adult Douglas-fir trees, and these were analyzed using random forest models to determine whether samples could be classified to geographic origin, growth year, or growth year and geographic origin. Specific wood molecules that contributed to geographic discrimination were identified. Results: Douglas-fir mass spectra could be differentiated into two geographic classes with an accuracy between 70% and 76%. Classification models could not accurately classify sample mass spectra based on growth year. Thirty-two molecules were identified as key for classifying western Oregon Douglas-fir wood cores to geographic origin. Discussion: DART-TOFMS is capable of detecting minute but regionally informative differences in wood molecules over a small geographic scale, and these differences made it possible to predict the geographic origin of Douglas-fir wood with moderate accuracy. Studies involving DART-TOFMS, alone and in combination with other technologies, will be relevant for identifying the geographic origin of illegally harvested wood. PMID:28529831
Spectral imaging perspective on cytomics.

PubMed

Levenson, Richard M

2006-07-01

Cytomics involves the analysis of cellular morphology and molecular phenotypes, with reference to tissue architecture and to additional metadata. To this end, a variety of imaging and nonimaging technologies need to be integrated. Spectral imaging is proposed as a tool that can simplify and enrich the extraction of morphological and molecular information. Simple-to-use instrumentation is available that mounts on standard microscopes and can generate spectral image datasets with excellent spatial and spectral resolution; these can be exploited by sophisticated analysis tools. This report focuses on brightfield microscopy-based approaches. Cytological and histological samples were stained using nonspecific standard stains (Giemsa; hematoxylin and eosin (H&E)) or immunohistochemical (IHC) techniques employing three chromogens plus a hematoxylin counterstain. The samples were imaged using the Nuance system, a commercially available, liquid-crystal tunable-filter-based multispectral imaging platform. The resulting data sets were analyzed using spectral unmixing algorithms and/or learn-by-example classification tools. Spectral unmixing of Giemsa-stained guinea-pig blood films readily classified the major blood elements. Machine-learning classifiers were also successful at the same task, as well in distinguishing normal from malignant regions in a colon-cancer example, and in delineating regions of inflammation in an H&E-stained kidney sample. In an example of a multiplexed ICH sample, brown, red, and blue chromogens were isolated into separate images without crosstalk or interference from the (also blue) hematoxylin counterstain. Cytomics requires both accurate architectural segmentation as well as multiplexed molecular imaging to associate molecular phenotypes with relevant cellular and tissue compartments. Multispectral imaging can assist in both these tasks, and conveys new utility to brightfield-based microscopy approaches. Copyright 2006 International Society for Analytical Cytology.
Comparison of Three Different Hepatitis C Virus Genotyping Methods: 5'NCR PCR-RFLP, Core Type-Specific PCR, and NS5b Sequencing in a Tertiary Care Hospital in South India.

PubMed

Daniel, Hubert D-J; David, Joel; Raghuraman, Sukanya; Gnanamony, Manu; Chandy, George M; Sridharan, Gopalan; Abraham, Priya

2017-05-01

Based on genetic heterogeneity, hepatitis C virus (HCV) is classified into seven major genotypes and 64 subtypes. In spite of the sequence heterogeneity, all genotypes share an identical complement of colinear genes within the large open reading frame. The genetic interrelationships between these genes are consistent among genotypes. Due to this property, complete sequencing of the HCV genome is not required. HCV genotypes along with subtypes are critical for planning antiviral therapy. Certain genotypes are also associated with higher progression to liver cirrhosis. In this study, 100 blood samples were collected from individuals who came for routine HCV genotype identification. These samples were used for the comparison of two different genotyping methods (5'NCR PCR-RFLP and HCV core type-specific PCR) with NS5b sequencing. Of the 100 samples genotyped using 5'NCR PCR-RFLP and HCV core type-specific PCR, 90% (κ = 0.913, P < 0.00) and 96% (κ = 0.794, P < 0.00) correlated with NS5b sequencing, respectively. Sixty percent and 75% of discordant samples by 5'NCR PCR-RFLP and HCV core type-specific PCR, respectively, belonged to genotype 6. All the HCV genotype 1 subtypes were classified accurately by both the methods. This study shows that the 5'NCR-based PCR-RFLP and the HCV core type-specific PCR-based assays correctly identified HCV genotypes except genotype 6 from this region. Direct sequencing of the HCV core region was able to identify all the genotype 6 from this region and serves as an alternative to NS5b sequencing. © 2016 Wiley Periodicals, Inc.
Urban Image Classification: Per-Pixel Classifiers, Sub-Pixel Analysis, Object-Based Image Analysis, and Geospatial Methods. 10; Chapter

NASA Technical Reports Server (NTRS)

Myint, Soe W.; Mesev, Victor; Quattrochi, Dale; Wentz, Elizabeth A.

2013-01-01

Remote sensing methods used to generate base maps to analyze the urban environment rely predominantly on digital sensor data from space-borne platforms. This is due in part from new sources of high spatial resolution data covering the globe, a variety of multispectral and multitemporal sources, sophisticated statistical and geospatial methods, and compatibility with GIS data sources and methods. The goal of this chapter is to review the four groups of classification methods for digital sensor data from space-borne platforms; per-pixel, sub-pixel, object-based (spatial-based), and geospatial methods. Per-pixel methods are widely used methods that classify pixels into distinct categories based solely on the spectral and ancillary information within that pixel. They are used for simple calculations of environmental indices (e.g., NDVI) to sophisticated expert systems to assign urban land covers. Researchers recognize however, that even with the smallest pixel size the spectral information within a pixel is really a combination of multiple urban surfaces. Sub-pixel classification methods therefore aim to statistically quantify the mixture of surfaces to improve overall classification accuracy. While within pixel variations exist, there is also significant evidence that groups of nearby pixels have similar spectral information and therefore belong to the same classification category. Object-oriented methods have emerged that group pixels prior to classification based on spectral similarity and spatial proximity. Classification accuracy using object-based methods show significant success and promise for numerous urban 3 applications. Like the object-oriented methods that recognize the importance of spatial proximity, geospatial methods for urban mapping also utilize neighboring pixels in the classification process. The primary difference though is that geostatistical methods (e.g., spatial autocorrelation methods) are utilized during both the pre- and post-classification steps. Within this chapter, each of the four approaches is described in terms of scale and accuracy classifying urban land use and urban land cover; and for its range of urban applications. We demonstrate the overview of four main classification groups in Figure 1 while Table 1 details the approaches with respect to classification requirements and procedures (e.g., reflectance conversion, steps before training sample selection, training samples, spatial approaches commonly used, classifiers, primary inputs for classification, output structures, number of output layers, and accuracy assessment). The chapter concludes with a brief summary of the methods reviewed and the challenges that remain in developing new classification methods for improving the efficiency and accuracy of mapping urban areas.
New Tree-Classification System Used by the Southern Forest Inventory and Analysis Unit

Treesearch

Dennis M. May; John S. Vissage; D. Vince Few

1990-01-01

Trees at USDA Forest Service, Southern Forest Inventory and Analysis, sample locations are classified as growing stock or cull based on their ability to produce sawlogs. The old and new classification systems are compared, and the impacts of the new system on the reporting of tree volumes are illustrated with inventory data from north Alabama.
Molecular typing of uropathogenic E. coli strains by the ERIC-PCR method.

PubMed

Ardakani, Maryam Afkhami; Ranjbar, Reza

2016-04-01

Escherichia coli (E. coli) is the most common cause of urinary infections in hospitals. The aim of this study was to evaluate the ERIC-PCR method for molecular typing of uropathogenic E. coli strains isolated from hospitalized patients. In a cross sectional study, 98 E. coli samples were collected from urine samples taken from patients admitted to Baqiyatallah Hospital from June 2014 to January 2015. The disk agar diffusion method was used to determine antibiotic sensitivity. DNA proliferation based on repetitive intergenic consensus was used to classify the E. coli strains. The products of proliferation were electrophoresed on 1.5% agarose gel, and their dendrograms were drawn. The data were analyzed by online Insillico software. The method used in this research proliferated numerous bands (4-17 bands), ranging from 100 to 3000 base pairs. The detected strains were classified into six clusters (E1-E6) with 70% similarity between them. In this study, uropathogenic E. coli strains belonged to different genotypic clusters. It was found that ERIC-PCR had good differentiation power for molecular typing of uropathogenic E. coli strains isolated from the patients in the study.
Discrimination of chicken seasonings and beef seasonings using electronic nose and sensory evaluation.

PubMed

Tian, Huaixiang; Li, Fenghua; Qin, Lan; Yu, Haiyan; Ma, Xia

2014-11-01

This study examines the feasibility of electronic nose as a method to discriminate chicken and beef seasonings and to predict sensory attributes. Sensory evaluation showed that 8 chicken seasonings and 4 beef seasonings could be well discriminated and classified based on 8 sensory attributes. The sensory attributes including chicken/beef, gamey, garlic, spicy, onion, soy sauce, retention, and overall aroma intensity were generated by a trained evaluation panel. Principal component analysis (PCA), discriminant factor analysis (DFA), and cluster analysis (CA) combined with electronic nose were used to discriminate seasoning samples based on the difference of the sensor response signals of chicken and beef seasonings. The correlation between sensory attributes and electronic nose sensors signal was established using partial least squares regression (PLSR) method. The results showed that the seasoning samples were all correctly classified by the electronic nose combined with PCA, DFA, and CA. The electronic nose gave good prediction results for all the sensory attributes with correlation coefficient (r) higher than 0.8. The work indicated that electronic nose is an effective method for discriminating different seasonings and predicting sensory attributes. © 2014 Institute of Food Technologists®
A tool for developing an automatic insect identification system based on wing outlines

PubMed Central

Yang, He-Ping; Ma, Chun-Sen; Wen, Hui; Zhan, Qing-Bin; Wang, Xin-Li

2015-01-01

For some insect groups, wing outline is an important character for species identification. We have constructed a program as the integral part of an automated system to identify insects based on wing outlines (DAIIS). This program includes two main functions: (1) outline digitization and Elliptic Fourier transformation and (2) classifier model training by pattern recognition of support vector machines and model validation. To demonstrate the utility of this program, a sample of 120 owlflies (Neuroptera: Ascalaphidae) was split into training and validation sets. After training, the sample was sorted into seven species using this tool. In five repeated experiments, the mean accuracy for identification of each species ranged from 90% to 98%. The accuracy increased to 99% when the samples were first divided into two groups based on features of their compound eyes. DAIIS can therefore be a useful tool for developing a system of automated insect identification. PMID:26251292
An improved SRC method based on virtual samples for face recognition

NASA Astrophysics Data System (ADS)

Fu, Lijun; Chen, Deyun; Lin, Kezheng; Li, Ao

2018-07-01

The sparse representation classifier (SRC) performs classification by evaluating which class leads to the minimum representation error. However, in real world, the number of available training samples is limited due to noise interference, training samples cannot accurately represent the test sample linearly. Therefore, in this paper, we first produce virtual samples by exploiting original training samples at the aim of increasing the number of training samples. Then, we take the intra-class difference as data representation of partial noise, and utilize the intra-class differences and training samples simultaneously to represent the test sample in a linear way according to the theory of SRC algorithm. Using weighted score level fusion, the respective representation scores of the virtual samples and the original training samples are fused together to obtain the final classification results. The experimental results on multiple face databases show that our proposed method has a very satisfactory classification performance.
Identification of usual interstitial pneumonia pattern using RNA-Seq and machine learning: challenges and solutions.

PubMed

Choi, Yoonha; Liu, Tiffany Ting; Pankratz, Daniel G; Colby, Thomas V; Barth, Neil M; Lynch, David A; Walsh, P Sean; Raghu, Ganesh; Kennedy, Giulia C; Huang, Jing

2018-05-09

We developed a classifier using RNA sequencing data that identifies the usual interstitial pneumonia (UIP) pattern for the diagnosis of idiopathic pulmonary fibrosis. We addressed significant challenges, including limited sample size, biological and technical sample heterogeneity, and reagent and assay batch effects. We identified inter- and intra-patient heterogeneity, particularly within the non-UIP group. The models classified UIP on transbronchial biopsy samples with a receiver-operating characteristic area under the curve of ~ 0.9 in cross-validation. Using in silico mixed samples in training, we prospectively defined a decision boundary to optimize specificity at ≥85%. The penalized logistic regression model showed greater reproducibility across technical replicates and was chosen as the final model. The final model showed sensitivity of 70% and specificity of 88% in the test set. We demonstrated that the suggested methodologies appropriately addressed challenges of the sample size, disease heterogeneity and technical batch effects and developed a highly accurate and robust classifier leveraging RNA sequencing for the classification of UIP.
Molecular identification and first report of mitochondrial COI gene haplotypes in the hawksbill turtle Eretmochelys imbricata (Testudines: Cheloniidae) in the Colombian Caribbean nesting colonies.

PubMed

Daza-Criado, L; Hernández-Fernández, J

2014-02-21

Hawksbill sea turtles Eretmochelys imbricata are found extensively around the world, including the Atlantic, Pacific, and Indian Oceans; the Persian Gulf, and the Red and Mediterranean Seas. Populations of this species are affected by international trafficking of their shields, meat, and eggs, making it a critically endangered animal. We determined the haplotypes of 17 hawksbill foraging turtles of Islas del Rosario (Bolivar) and of the nesting beach Don Diego (Magdalena) in the Colombian Caribbean based on amplification and sequencing of the mitochondrial gene cytochrome oxidase c subunit I (COI). We identified 5 haplotypes, including EI-A1 previously reported in Puerto Rico, which was similar to 10 of the study samples. To our knowledge, the remaining 4 haplotypes have not been described. Samples EICOI11 and EICOI3 showed 0.2% divergence from EI-A1, by a single nucleotide change, and were classified as the EI-A2 haplotype. EICOI6, EICOI14, and EICOI12 samples showed 0.2% divergence from EI-A1 and 0.3% divergence from EI-A2 and were classified as EI-A3 haplotype. Samples EICOI16 and EICOI15 presented 5 nucleotide changes each and were classified as 2 different haplotypes, EI-A4 and EI-A5, respectively. The last 2 haplotypes had higher nucleotide diversity (K2P=1.7%) than that by the first 3 haplotypes. EI-A1 and EI-A2 occurred in nesting individuals, and EI-A2, EI-A3, EI-A4, and EI-A5 occurred in foraging individuals. The description of the haplotypes may be associated with reproductive migrations or foraging and could support the hypothesis of natal homing. Furthermore, they can be used in phylogeographic studies.
Evaluation of artificial neural network algorithms for predicting METs and activity type from accelerometer data: validation on an independent sample.

PubMed

Freedson, Patty S; Lyden, Kate; Kozey-Keadle, Sarah; Staudenmayer, John

2011-12-01

Previous work from our laboratory provided a "proof of concept" for use of artificial neural networks (nnets) to estimate metabolic equivalents (METs) and identify activity type from accelerometer data (Staudenmayer J, Pober D, Crouter S, Bassett D, Freedson P, J Appl Physiol 107: 1330-1307, 2009). The purpose of this study was to develop new nnets based on a larger, more diverse, training data set and apply these nnet prediction models to an independent sample to evaluate the robustness and flexibility of this machine-learning modeling technique. The nnet training data set (University of Massachusetts) included 277 participants who each completed 11 activities. The independent validation sample (n = 65) (University of Tennessee) completed one of three activity routines. Criterion measures were 1) measured METs assessed using open-circuit indirect calorimetry; and 2) observed activity to identify activity type. The nnet input variables included five accelerometer count distribution features and the lag-1 autocorrelation. The bias and root mean square errors for the nnet MET trained on University of Massachusetts and applied to University of Tennessee were +0.32 and 1.90 METs, respectively. Seventy-seven percent of the activities were correctly classified as sedentary/light, moderate, or vigorous intensity. For activity type, household and locomotion activities were correctly classified by the nnet activity type 98.1 and 89.5% of the time, respectively, and sport was correctly classified 23.7% of the time. Use of this machine-learning technique operates reasonably well when applied to an independent sample. We propose the creation of an open-access activity dictionary, including accelerometer data from a broad array of activities, leading to further improvements in prediction accuracy for METs, activity intensity, and activity type.
Jaccard distance based weighted sparse representation for coarse-to-fine plant species recognition.

PubMed

Zhang, Shanwen; Wu, Xiaowei; You, Zhuhong

2017-01-01

Leaf based plant species recognition plays an important role in ecological protection, however its application to large and modern leaf databases has been a long-standing obstacle due to the computational cost and feasibility. Recognizing such limitations, we propose a Jaccard distance based sparse representation (JDSR) method which adopts a two-stage, coarse to fine strategy for plant species recognition. In the first stage, we use the Jaccard distance between the test sample and each training sample to coarsely determine the candidate classes of the test sample. The second stage includes a Jaccard distance based weighted sparse representation based classification(WSRC), which aims to approximately represent the test sample in the training space, and classify it by the approximation residuals. Since the training model of our JDSR method involves much fewer but more informative representatives, this method is expected to overcome the limitation of high computational and memory costs in traditional sparse representation based classification. Comparative experimental results on a public leaf image database demonstrate that the proposed method outperforms other existing feature extraction and SRC based plant recognition methods in terms of both accuracy and computational speed.

Predicting survival in critical patients by use of body temperature regularity measurement based on approximate entropy.

PubMed

Cuesta, D; Varela, M; Miró, P; Galdós, P; Abásolo, D; Hornero, R; Aboy, M

2007-07-01

Body temperature is a classical diagnostic tool for a number of diseases. However, it is usually employed as a plain binary classification function (febrile or not febrile), and therefore its diagnostic power has not been fully developed. In this paper, we describe how body temperature regularity can be used for diagnosis. Our proposed methodology is based on obtaining accurate long-term temperature recordings at high sampling frequencies and analyzing the temperature signal using a regularity metric (approximate entropy). In this study, we assessed our methodology using temperature registers acquired from patients with multiple organ failure admitted to an intensive care unit. Our results indicate there is a correlation between the patient's condition and the regularity of the body temperature. This finding enabled us to design a classifier for two outcomes (survival or death) and test it on a dataset including 36 subjects. The classifier achieved an accuracy of 72%.
Multiple-instance ensemble learning for hyperspectral images

NASA Astrophysics Data System (ADS)

Ergul, Ugur; Bilgin, Gokhan

2017-10-01

An ensemble framework for multiple-instance (MI) learning (MIL) is introduced for use in hyperspectral images (HSIs) by inspiring the bagging (bootstrap aggregation) method in ensemble learning. Ensemble-based bagging is performed by a small percentage of training samples, and MI bags are formed by a local windowing process with variable window sizes on selected instances. In addition to bootstrap aggregation, random subspace is another method used to diversify base classifiers. The proposed method is implemented using four MIL classification algorithms. The classifier model learning phase is carried out with MI bags, and the estimation phase is performed over single-test instances. In the experimental part of the study, two different HSIs that have ground-truth information are used, and comparative results are demonstrated with state-of-the-art classification methods. In general, the MI ensemble approach produces more compact results in terms of both diversity and error compared to equipollent non-MIL algorithms.
New low-resolution spectrometer spectra for IRAS sources

NASA Astrophysics Data System (ADS)

Volk, Kevin; Kwok, Sun; Stencel, R. E.; Brugel, E.

1991-12-01

Low-resolution spectra of 486 IRAS point sources with Fnu(12 microns) in the range 20-40 Jy are presented. This is part of an effort to extract and classify spectra that were not included in the Atlas of Low-Resolution Spectra and represents an extension of the earlier work by Volk and Cohen which covers sources with Fnu(12 microns) greater than 40 Jy. The spectra have been examined by eye and classified into nine groups based on the spectral morphology. This new classification scheme is compared with the mechanical classification of the Atlas, and the differences are noted. Oxygen-rich stars of the asymptotic giant branch make up 33 percent of the sample. Solid state features dominate the spectra of most sources. It is found that the nature of the sources as implied by the present spectral classification is consistent with the classifications based on broad-band colors of the sources.
Dementia diagnoses from clinical and neuropsychological data compared: the Cache County study.

PubMed

Tschanz, J T; Welsh-Bohmer, K A; Skoog, I; West, N; Norton, M C; Wyse, B W; Nickles, R; Breitner, J C

2000-03-28

To validate a neuropsychological algorithm for dementia diagnosis. We developed a neuropsychological algorithm in a sample of 1,023 elderly residents of Cache County, UT. We compared algorithmic and clinical dementia diagnoses both based on DSM-III-R criteria. The algorithm diagnosed dementia when there was impairment in memory and at least one other cognitive domain. We also tested a variant of the algorithm that incorporated functional measures that were based on structured informant reports. Of 1,023 participants, 87% could be classified by the basic algorithm, 94% when functional measures were considered. There was good concordance between basic psychometric and clinical diagnoses (79% agreement, kappa = 0.57). This improved after incorporating functional measures (90% agreement, kappa = 0.76). Neuropsychological algorithms may reasonably classify individuals on dementia status across a range of severity levels and ages and may provide a useful adjunct to clinical diagnoses in population studies.
Metrics and textural features of MRI diffusion to improve classification of pediatric posterior fossa tumors.

PubMed

Rodriguez Gutierrez, D; Awwad, A; Meijer, L; Manita, M; Jaspan, T; Dineen, R A; Grundy, R G; Auer, D P

2014-05-01

Qualitative radiologic MR imaging review affords limited differentiation among types of pediatric posterior fossa brain tumors and cannot detect histologic or molecular subtypes, which could help to stratify treatment. This study aimed to improve current posterior fossa discrimination of histologic tumor type by using support vector machine classifiers on quantitative MR imaging features. This retrospective study included preoperative MRI in 40 children with posterior fossa tumors (17 medulloblastomas, 16 pilocytic astrocytomas, and 7 ependymomas). Shape, histogram, and textural features were computed from contrast-enhanced T2WI and T1WI and diffusivity (ADC) maps. Combinations of features were used to train tumor-type-specific classifiers for medulloblastoma, pilocytic astrocytoma, and ependymoma types in separation and as a joint posterior fossa classifier. A tumor-subtype classifier was also produced for classic medulloblastoma. The performance of different classifiers was assessed and compared by using randomly selected subsets of training and test data. ADC histogram features (25th and 75th percentiles and skewness) yielded the best classification of tumor type (on average >95.8% of medulloblastomas, >96.9% of pilocytic astrocytomas, and >94.3% of ependymomas by using 8 training samples). The resulting joint posterior fossa classifier correctly assigned >91.4% of the posterior fossa tumors. For subtype classification, 89.4% of classic medulloblastomas were correctly classified on the basis of ADC texture features extracted from the Gray-Level Co-Occurence Matrix. Support vector machine-based classifiers using ADC histogram features yielded very good discrimination among pediatric posterior fossa tumor types, and ADC textural features show promise for further subtype discrimination. These findings suggest an added diagnostic value of quantitative feature analysis of diffusion MR imaging in pediatric neuro-oncology. © 2014 by American Journal of Neuroradiology.
A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification.

PubMed

Kang, Qi; Chen, XiaoShuang; Li, SiSi; Zhou, MengChu

2017-12-01

Under-sampling is a popular data preprocessing method in dealing with class imbalance problems, with the purposes of balancing datasets to achieve a high classification rate and avoiding the bias toward majority class examples. It always uses full minority data in a training dataset. However, some noisy minority examples may reduce the performance of classifiers. In this paper, a new under-sampling scheme is proposed by incorporating a noise filter before executing resampling. In order to verify the efficiency, this scheme is implemented based on four popular under-sampling methods, i.e., Undersampling + Adaboost, RUSBoost, UnderBagging, and EasyEnsemble through benchmarks and significance analysis. Furthermore, this paper also summarizes the relationship between algorithm performance and imbalanced ratio. Experimental results indicate that the proposed scheme can improve the original undersampling-based methods with significance in terms of three popular metrics for imbalanced classification, i.e., the area under the curve, -measure, and -mean.
Study on a pattern classification method of soil quality based on simplified learning sample dataset

USGS Publications Warehouse

Zhang, Jiahua; Liu, S.; Hu, Y.; Tian, Y.

2011-01-01

Based on the massive soil information in current soil quality grade evaluation, this paper constructed an intelligent classification approach of soil quality grade depending on classical sampling techniques and disordered multiclassification Logistic regression model. As a case study to determine the learning sample capacity under certain confidence level and estimation accuracy, and use c-means algorithm to automatically extract the simplified learning sample dataset from the cultivated soil quality grade evaluation database for the study area, Long chuan county in Guangdong province, a disordered Logistic classifier model was then built and the calculation analysis steps of soil quality grade intelligent classification were given. The result indicated that the soil quality grade can be effectively learned and predicted by the extracted simplified dataset through this method, which changed the traditional method for soil quality grade evaluation. ?? 2011 IEEE.
Hierarchy-associated semantic-rule inference framework for classifying indoor scenes

NASA Astrophysics Data System (ADS)

Yu, Dan; Liu, Peng; Ye, Zhipeng; Tang, Xianglong; Zhao, Wei

2016-03-01

Typically, the initial task of classifying indoor scenes is challenging, because the spatial layout and decoration of a scene can vary considerably. Recent efforts at classifying object relationships commonly depend on the results of scene annotation and predefined rules, making classification inflexible. Furthermore, annotation results are easily affected by external factors. Inspired by human cognition, a scene-classification framework was proposed using the empirically based annotation (EBA) and a match-over rule-based (MRB) inference system. The semantic hierarchy of images is exploited by EBA to construct rules empirically for MRB classification. The problem of scene classification is divided into low-level annotation and high-level inference from a macro perspective. Low-level annotation involves detecting the semantic hierarchy and annotating the scene with a deformable-parts model and a bag-of-visual-words model. In high-level inference, hierarchical rules are extracted to train the decision tree for classification. The categories of testing samples are generated from the parts to the whole. Compared with traditional classification strategies, the proposed semantic hierarchy and corresponding rules reduce the effect of a variable background and improve the classification performance. The proposed framework was evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.
Identifying Suicide Ideation and Suicidal Attempts in a Psychiatric Clinical Research Database using Natural Language Processing.

PubMed

Fernandes, Andrea C; Dutta, Rina; Velupillai, Sumithra; Sanyal, Jyoti; Stewart, Robert; Chandran, David

2018-05-09

Research into suicide prevention has been hampered by methodological limitations such as low sample size and recall bias. Recently, Natural Language Processing (NLP) strategies have been used with Electronic Health Records to increase information extraction from free text notes as well as structured fields concerning suicidality and this allows access to much larger cohorts than previously possible. This paper presents two novel NLP approaches - a rule-based approach to classify the presence of suicide ideation and a hybrid machine learning and rule-based approach to identify suicide attempts in a psychiatric clinical database. Good performance of the two classifiers in the evaluation study suggest they can be used to accurately detect mentions of suicide ideation and attempt within free-text documents in this psychiatric database. The novelty of the two approaches lies in the malleability of each classifier if a need to refine performance, or meet alternate classification requirements arises. The algorithms can also be adapted to fit infrastructures of other clinical datasets given sufficient clinical recording practice knowledge, without dependency on medical codes or additional data extraction of known risk factors to predict suicidal behaviour.
Conflicts in wound classification of neonatal operations.

PubMed

Vu, Lan T; Nobuhara, Kerilyn K; Lee, Hanmin; Farmer, Diana L

2009-06-01

This study sought to determine the reliability of wound classification guidelines when applied to neonatal operations. This study is a cross-sectional web-based survey of pediatric surgeons. From a random sample of 22 neonatal operations, participants classified each operation as "clean," "clean-contaminated," "contaminated," or "dirty or infected," and specified duration of perioperative antibiotics as "none," "single preoperative," "24 hours," or ">24 hours." Unweighted kappa score was calculated to estimate interrater reliability. Overall interrater reliability for wound classification was poor (kappa = 0.30). The following operations were classified as clean: pyloromyotomy, resection of sequestration, resection of sacrococcygeal teratoma, oophorectomy, and immediate repair of omphalocele; as clean-contaminated: Ladd procedure, bowel resection for midgut volvulus and meconium peritonitis, fistula ligation of tracheoesophageal fistula, primary esophageal anastomosis of esophageal atresia, thoracic lobectomy, staged closure of gastroschisis, delayed repair and primary closure of omphalocele, perineal anoplasty and diverting colostomy for imperforate anus, anal pull-through for Hirschsprung disease, and colostomy closure; and as dirty: perforated necrotizing enterocolitis. There is poor consensus on how neonatal operations are classified based on contamination. An improved classification system will provide more accurate risk assessment for development of surgical site infections and identify neonates who would benefit from antibiotic prophylaxis.
Sequential sampling: a novel method in farm animal welfare assessment.

PubMed

Heath, C A E; Main, D C J; Mullan, S; Haskell, M J; Browne, W J

2016-02-01

Lameness in dairy cows is an important welfare issue. As part of a welfare assessment, herd level lameness prevalence can be estimated from scoring a sample of animals, where higher levels of accuracy are associated with larger sample sizes. As the financial cost is related to the number of cows sampled, smaller samples are preferred. Sequential sampling schemes have been used for informing decision making in clinical trials. Sequential sampling involves taking samples in stages, where sampling can stop early depending on the estimated lameness prevalence. When welfare assessment is used for a pass/fail decision, a similar approach could be applied to reduce the overall sample size. The sampling schemes proposed here apply the principles of sequential sampling within a diagnostic testing framework. This study develops three sequential sampling schemes of increasing complexity to classify 80 fully assessed UK dairy farms, each with known lameness prevalence. Using the Welfare Quality herd-size-based sampling scheme, the first 'basic' scheme involves two sampling events. At the first sampling event half the Welfare Quality sample size is drawn, and then depending on the outcome, sampling either stops or is continued and the same number of animals is sampled again. In the second 'cautious' scheme, an adaptation is made to ensure that correctly classifying a farm as 'bad' is done with greater certainty. The third scheme is the only scheme to go beyond lameness as a binary measure and investigates the potential for increasing accuracy by incorporating the number of severely lame cows into the decision. The three schemes are evaluated with respect to accuracy and average sample size by running 100 000 simulations for each scheme, and a comparison is made with the fixed size Welfare Quality herd-size-based sampling scheme. All three schemes performed almost as well as the fixed size scheme but with much smaller average sample sizes. For the third scheme, an overall association between lameness prevalence and the proportion of lame cows that were severely lame on a farm was found. However, as this association was found to not be consistent across all farms, the sampling scheme did not prove to be as useful as expected. The preferred scheme was therefore the 'cautious' scheme for which a sampling protocol has also been developed.
Domain-Adapted Convolutional Networks for Satellite Image Classification: A Large-Scale Interactive Learning Workflow

DOE PAGES

Lunga, Dalton D.; Yang, Hsiuhan Lexie; Reith, Andrew E.; ...

2018-02-06

Satellite imagery often exhibits large spatial extent areas that encompass object classes with considerable variability. This often limits large-scale model generalization with machine learning algorithms. Notably, acquisition conditions, including dates, sensor position, lighting condition, and sensor types, often translate into class distribution shifts introducing complex nonlinear factors and hamper the potential impact of machine learning classifiers. Here, this article investigates the challenge of exploiting satellite images using convolutional neural networks (CNN) for settlement classification where the class distribution shifts are significant. We present a large-scale human settlement mapping workflow based-off multiple modules to adapt a pretrained CNN to address themore » negative impact of distribution shift on classification performance. To extend a locally trained classifier onto large spatial extents areas we introduce several submodules: First, a human-in-the-loop element for relabeling of misclassified target domain samples to generate representative examples for model adaptation; second, an efficient hashing module to minimize redundancy and noisy samples from the mass-selected examples; and third, a novel relevance ranking module to minimize the dominance of source example on the target domain. The workflow presents a novel and practical approach to achieve large-scale domain adaptation with binary classifiers that are based-off CNN features. Experimental evaluations are conducted on areas of interest that encompass various image characteristics, including multisensors, multitemporal, and multiangular conditions. Domain adaptation is assessed on source–target pairs through the transfer loss and transfer ratio metrics to illustrate the utility of the workflow.« less
Domain-Adapted Convolutional Networks for Satellite Image Classification: A Large-Scale Interactive Learning Workflow

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lunga, Dalton D.; Yang, Hsiuhan Lexie; Reith, Andrew E.

Satellite imagery often exhibits large spatial extent areas that encompass object classes with considerable variability. This often limits large-scale model generalization with machine learning algorithms. Notably, acquisition conditions, including dates, sensor position, lighting condition, and sensor types, often translate into class distribution shifts introducing complex nonlinear factors and hamper the potential impact of machine learning classifiers. Here, this article investigates the challenge of exploiting satellite images using convolutional neural networks (CNN) for settlement classification where the class distribution shifts are significant. We present a large-scale human settlement mapping workflow based-off multiple modules to adapt a pretrained CNN to address themore » negative impact of distribution shift on classification performance. To extend a locally trained classifier onto large spatial extents areas we introduce several submodules: First, a human-in-the-loop element for relabeling of misclassified target domain samples to generate representative examples for model adaptation; second, an efficient hashing module to minimize redundancy and noisy samples from the mass-selected examples; and third, a novel relevance ranking module to minimize the dominance of source example on the target domain. The workflow presents a novel and practical approach to achieve large-scale domain adaptation with binary classifiers that are based-off CNN features. Experimental evaluations are conducted on areas of interest that encompass various image characteristics, including multisensors, multitemporal, and multiangular conditions. Domain adaptation is assessed on source–target pairs through the transfer loss and transfer ratio metrics to illustrate the utility of the workflow.« less
Quantitative assessment of specific defects in roasted ground coffee via infrared-photoacoustic spectroscopy.

PubMed

Dias, Rafael Carlos Eloy; Valderrama, Patrícia; Março, Paulo Henrique; Dos Santos Scholz, Maria Brigida; Edelmann, Michael; Yeretzian, Chahan

2018-07-30

Chemical analyses and sensory evaluation are the most applied methods for quality control of roasted and ground coffee (RG). However, faster alternatives would be highly valuable. Here, we applied infrared-photoacoustic spectroscopy (FTIR-PAS) on RG powder. Mixtures of specific defective beans were blended with healthy (defect-free) Coffea arabica and Coffea canephora bases in specific ratios, forming different classes of blends. Principal Component Analysis allowed predicting the amount/fraction and nature of the defects in blends while partial Least Squares Discriminant Analysis revealed similarities between blends (=samples). A successful predictive model was obtained using six classes of blends. The model could classify 100% of the samples into four classes. The specificities were higher than 0.9. Application of FTIR-PAS on RG coffee to characterize and classify blends has shown to be an accurate, easy, quick and "green" alternative to current methods. Copyright © 2018 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Effect of finite sample size on feature selection and classification: a simulation study.

PubMed

Way, Ted W; Sahiner, Berkman; Hadjiiski, Lubomir M; Chan, Heang-Ping

2010-02-01

The small number of samples available for training and testing is often the limiting factor in finding the most effective features and designing an optimal computer-aided diagnosis (CAD) system. Training on a limited set of samples introduces bias and variance in the performance of a CAD system relative to that trained with an infinite sample size. In this work, the authors conducted a simulation study to evaluate the performances of various combinations of classifiers and feature selection techniques and their dependence on the class distribution, dimensionality, and the training sample size. The understanding of these relationships will facilitate development of effective CAD systems under the constraint of limited available samples. Three feature selection techniques, the stepwise feature selection (SFS), sequential floating forward search (SFFS), and principal component analysis (PCA), and two commonly used classifiers, Fisher's linear discriminant analysis (LDA) and support vector machine (SVM), were investigated. Samples were drawn from multidimensional feature spaces of multivariate Gaussian distributions with equal or unequal covariance matrices and unequal means, and with equal covariance matrices and unequal means estimated from a clinical data set. Classifier performance was quantified by the area under the receiver operating characteristic curve Az. The mean Az values obtained by resubstitution and hold-out methods were evaluated for training sample sizes ranging from 15 to 100 per class. The number of simulated features available for selection was chosen to be 50, 100, and 200. It was found that the relative performance of the different combinations of classifier and feature selection method depends on the feature space distributions, the dimensionality, and the available training sample sizes. The LDA and SVM with radial kernel performed similarly for most of the conditions evaluated in this study, although the SVM classifier showed a slightly higher hold-out performance than LDA for some conditions and vice versa for other conditions. PCA was comparable to or better than SFS and SFFS for LDA at small samples sizes, but inferior for SVM with polynomial kernel. For the class distributions simulated from clinical data, PCA did not show advantages over the other two feature selection methods. Under this condition, the SVM with radial kernel performed better than the LDA when few training samples were available, while LDA performed better when a large number of training samples were available. None of the investigated feature selection-classifier combinations provided consistently superior performance under the studied conditions for different sample sizes and feature space distributions. In general, the SFFS method was comparable to the SFS method while PCA may have an advantage for Gaussian feature spaces with unequal covariance matrices. The performance of the SVM with radial kernel was better than, or comparable to, that of the SVM with polynomial kernel under most conditions studied.
Novel particle tracking algorithm based on the Random Sample Consensus Model for the Active Target Time Projection Chamber (AT-TPC)

NASA Astrophysics Data System (ADS)

Ayyad, Yassid; Mittig, Wolfgang; Bazin, Daniel; Beceiro-Novo, Saul; Cortesi, Marco

2018-02-01

The three-dimensional reconstruction of particle tracks in a time projection chamber is a challenging task that requires advanced classification and fitting algorithms. In this work, we have developed and implemented a novel algorithm based on the Random Sample Consensus Model (RANSAC). The RANSAC is used to classify tracks including pile-up, to remove uncorrelated noise hits, as well as to reconstruct the vertex of the reaction. The algorithm, developed within the Active Target Time Projection Chamber (AT-TPC) framework, was tested and validated by analyzing the 4He+4He reaction. Results, performance and quality of the proposed algorithm are presented and discussed in detail.
Case base classification on digital mammograms: improving the performance of case base classifier

NASA Astrophysics Data System (ADS)

Raman, Valliappan; Then, H. H.; Sumari, Putra; Venkatesa Mohan, N.

2011-10-01

Breast cancer continues to be a significant public health problem in the world. Early detection is the key for improving breast cancer prognosis. The aim of the research presented here is in twofold. First stage of research involves machine learning techniques, which segments and extracts features from the mass of digital mammograms. Second level is on problem solving approach which includes classification of mass by performance based case base classifier. In this paper we build a case-based Classifier in order to diagnose mammographic images. We explain different methods and behaviors that have been added to the classifier to improve the performance of the classifier. Currently the initial Performance base Classifier with Bagging is proposed in the paper and it's been implemented and it shows an improvement in specificity and sensitivity.
Minimum distance classification in remote sensing

NASA Technical Reports Server (NTRS)

Wacker, A. G.; Landgrebe, D. A.

1972-01-01

The utilization of minimum distance classification methods in remote sensing problems, such as crop species identification, is considered. Literature concerning both minimum distance classification problems and distance measures is reviewed. Experimental results are presented for several examples. The objective of these examples is to: (a) compare the sample classification accuracy of a minimum distance classifier, with the vector classification accuracy of a maximum likelihood classifier, and (b) compare the accuracy of a parametric minimum distance classifier with that of a nonparametric one. Results show the minimum distance classifier performance is 5% to 10% better than that of the maximum likelihood classifier. The nonparametric classifier is only slightly better than the parametric version.
Domain Regeneration for Cross-Database Micro-Expression Recognition

NASA Astrophysics Data System (ADS)

Zong, Yuan; Zheng, Wenming; Huang, Xiaohua; Shi, Jingang; Cui, Zhen; Zhao, Guoying

2018-05-01

In this paper, we investigate the cross-database micro-expression recognition problem, where the training and testing samples are from two different micro-expression databases. Under this setting, the training and testing samples would have different feature distributions and hence the performance of most existing micro-expression recognition methods may decrease greatly. To solve this problem, we propose a simple yet effective method called Target Sample Re-Generator (TSRG) in this paper. By using TSRG, we are able to re-generate the samples from target micro-expression database and the re-generated target samples would share same or similar feature distributions with the original source samples. For this reason, we can then use the classifier learned based on the labeled source samples to accurately predict the micro-expression categories of the unlabeled target samples. To evaluate the performance of the proposed TSRG method, extensive cross-database micro-expression recognition experiments designed based on SMIC and CASME II databases are conducted. Compared with recent state-of-the-art cross-database emotion recognition methods, the proposed TSRG achieves more promising results.
HIV-1 genetic diversity and primary drug resistance mutations before large-scale access to antiretroviral therapy, Republic of Congo.

PubMed

Niama, Fabien Roch; Vidal, Nicole; Diop-Ndiaye, Halimatou; Nguimbi, Etienne; Ahombo, Gabriel; Diakabana, Philippe; Bayonne Kombo, Édith Sophie; Mayengue, Pembe Issamou; Kobawila, Simon-Charles; Parra, Henri Joseph; Toure-Kane, Coumba

2017-07-05

In this work, we investigated the genetic diversity of HIV-1 and the presence of mutations conferring antiretroviral drug resistance in 50 drug-naïve infected persons in the Republic of Congo (RoC). Samples were obtained before large-scale access to HAART in 2002 and 2004. To assess the HIV-1 genetic recombination, the sequencing of the pol gene encoding a protease and partial reverse transcriptase was performed and analyzed with updated references, including newly characterized CRFs. The assessment of drug resistance was conducted according to the WHO protocol. Among the 50 samples analyzed for the pol gene, 50% were classified as intersubtype recombinants, charring complex structures inside the pol fragment. Five samples could not be classified (noted U). The most prevalent subtypes were G with 10 isolates and D with 11 isolates. One isolate of A, J, H, CRF05, CRF18 and CRF37 were also found. Two samples (4%) harboring the mutations M230L and Y181C associated with the TAMs M41L and T215Y, respectively, were found. This first study in the RoC, based on WHO classification, shows that the threshold of transmitted drug resistance before large-scale access to antiretroviral therapy is 4%.

HPLC fingerprint analysis combined with chemometrics for pattern recognition of ginger.

PubMed

Feng, Xu; Kong, Weijun; Wei, Jianhe; Ou-Yang, Zhen; Yang, Meihua

2014-03-01

Ginger, the fresh rhizome of Zingiber officinale Rosc. (Zingiberaceae), has been used worldwide; however, for a long time, there has been no standard approbated internationally for its quality control. To establish an efficacious and combinational method and pattern recognition technique for quality control of ginger. A simple, accurate and reliable method based on high-performance liquid chromatography with photodiode array (HPLC-PDA) detection was developed for establishing the chemical fingerprints of 10 batches of ginger from different markets in China. The method was validated in terms of precision, reproducibility and stability; and the relative standard deviations were all less than 1.57%. On the basis of this method, the fingerprints of 10 batches of ginger samples were obtained, which showed 16 common peaks. Coupled with similarity evaluation software, the similarities between each fingerprint of the sample and the simulative mean chromatogram were in the range of 0.998-1.000. Then, the chemometric techniques, including similarity analysis, hierarchical clustering analysis and principal component analysis were applied to classify the ginger samples. Consistent results were obtained to show that ginger samples could be successfully classified into two groups. This study revealed that HPLC-PDA method was simple, sensitive and reliable for fingerprint analysis, and moreover, for pattern recognition and quality control of ginger.
Analysis of biofluids in aqueous environment based on mid-infrared spectroscopy.

PubMed

Fabian, Heinz; Lasch, Peter; Naumann, Dieter

2005-01-01

In this study we describe a semiautomatic Fourier transform infrared spectroscopic methodology for the analysis of liquid serum samples, which combines simple sample introduction with high sample throughput. The applicability of this new infrared technology to the analysis of liquid serum samples from a cohort of cattle naturally infected with bovine spongiform encephalopathy and from controls was explored in comparison to the conventional approach based on transmission infrared spectroscopy of dried serum films. Artifical neural network analysis of the infrared data was performed to differentiate between bovine spongiform encephalopathy-negative controls and animals in the late stage of the disease. After training of artifical neural network classifiers, infrared spectra of sera from an independent external validation data set were analyzed. In this way, sensitivities between 90 and 96% and specificities between 84 and 92% were achieved, respectively, depending upon the strategy of data collection and data analysis. Based on these results, the advantages and limitations of the liquid sample technique and the dried film approach for routine analysis of biofluids are discussed. 2005 Society of Photo-Optical Instrumentation Engineers.
Application of machine learning on brain cancer multiclass classification

NASA Astrophysics Data System (ADS)

Panca, V.; Rustam, Z.

2017-07-01

Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.
Active machine learning for rapid landslide inventory mapping with VHR satellite images (Invited)

NASA Astrophysics Data System (ADS)

Stumpf, A.; Lachiche, N.; Malet, J.; Kerle, N.; Puissant, A.

2013-12-01

VHR satellite images have become a primary source for landslide inventory mapping after major triggering events such as earthquakes and heavy rainfalls. Visual image interpretation is still the prevailing standard method for operational purposes but is time-consuming and not well suited to fully exploit the increasingly better supply of remote sensing data. Recent studies have addressed the development of more automated image analysis workflows for landslide inventory mapping. In particular object-oriented approaches that account for spatial and textural image information have been demonstrated to be more adequate than pixel-based classification but manually elaborated rule-based classifiers are difficult to adapt under changing scene characteristics. Machine learning algorithm allow learning classification rules for complex image patterns from labelled examples and can be adapted straightforwardly with available training data. In order to reduce the amount of costly training data active learning (AL) has evolved as a key concept to guide the sampling for many applications. The underlying idea of AL is to initialize a machine learning model with a small training set, and to subsequently exploit the model state and data structure to iteratively select the most valuable samples that should be labelled by the user. With relatively few queries and labelled samples, an AL strategy yields higher accuracies than an equivalent classifier trained with many randomly selected samples. This study addressed the development of an AL method for landslide mapping from VHR remote sensing images with special consideration of the spatial distribution of the samples. Our approach [1] is based on the Random Forest algorithm and considers the classifier uncertainty as well as the variance of potential sampling regions to guide the user towards the most valuable sampling areas. The algorithm explicitly searches for compact regions and thereby avoids a spatially disperse sampling pattern inherent to most other AL methods. The accuracy, the sampling time and the computational runtime of the algorithm were evaluated on multiple satellite images capturing recent large scale landslide events. Sampling between 1-4% of the study areas the accuracies between 74% and 80% were achieved, whereas standard sampling schemes yielded only accuracies between 28% and 50% with equal sampling costs. Compared to commonly used point-wise AL algorithm the proposed approach significantly reduces the number of iterations and hence the computational runtime. Since the user can focus on relatively few compact areas (rather than on hundreds of distributed points) the overall labeling time is reduced by more than 50% compared to point-wise queries. An experimental evaluation of multiple expert mappings demonstrated strong relationships between the uncertainties of the experts and the machine learning model. It revealed that the achieved accuracies are within the range of the inter-expert disagreement and that it will be indispensable to consider ground truth uncertainties to truly achieve further enhancements in the future. The proposed method is generally applicable to a wide range of optical satellite images and landslide types. [1] A. Stumpf, N. Lachiche, J.-P. Malet, N. Kerle, and A. Puissant, Active learning in the spatial domain for remote sensing image classification, IEEE Transactions on Geosciece and Remote Sensing. 2013, DOI 10.1109/TGRS.2013.2262052.
Maternal sensitivity and infant attachment security in Korea: cross-cultural validation of the Strange Situation.

PubMed

Jin, Mi Kyoung; Jacobvitz, Deborah; Hazen, Nancy; Jung, Sung Hoon

2012-01-01

The present study sought to analyze infant and maternal behavior both during the Strange Situation Procedure (SSP) and a free play session in a Korean sample (N = 87) to help understand whether mother-infant attachment relationships are universal or culture-specific. Distributions of attachment classifications in the Korean sample were compared with a cross-national sample. Behavior of mothers and infants following the two separation episodes in the SSP, including mothers' proximity to their infants and infants' approach to the caregiver, was also observed, as was the association between maternal sensitivity observed during free play session and infant security. The percentage of Korean infants classified as secure versus insecure mirrored the global distribution, however, only one Korean baby was classified as avoidant. Following the separation episodes in the Strange Situation, Korean mothers were more likely than mothers in Ainsworth's Baltimore sample to approach their babies immediately and sit beside them throughout the reunion episodes, even when their babies were no longer distressed. Also, Korean babies less often approached their mothers during reunions than did infants in the Baltimore sample. Finally, the link between maternal sensitivity and infant security was significant. The findings support the idea that the basic secure base function of attachment is universal and the SSP is a valid measure of secure attachment, but cultural differences in caregiving may result in variations in how this function is manifested.
Intelligent query by humming system based on score level fusion of multiple classifiers

NASA Astrophysics Data System (ADS)

Pyo Nam, Gi; Thu Trang Luong, Thi; Ha Nam, Hyun; Ryoung Park, Kang; Park, Sung-Joo

2011-12-01

Recently, the necessity for content-based music retrieval that can return results even if a user does not know information such as the title or singer has increased. Query-by-humming (QBH) systems have been introduced to address this need, as they allow the user to simply hum snatches of the tune to find the right song. Even though there have been many studies on QBH, few have combined multiple classifiers based on various fusion methods. Here we propose a new QBH system based on the score level fusion of multiple classifiers. This research is novel in the following three respects: three local classifiers [quantized binary (QB) code-based linear scaling (LS), pitch-based dynamic time warping (DTW), and LS] are employed; local maximum and minimum point-based LS and pitch distribution feature-based LS are used as global classifiers; and the combination of local and global classifiers based on the score level fusion by the PRODUCT rule is used to achieve enhanced matching accuracy. Experimental results with the 2006 MIREX QBSH and 2009 MIR-QBSH corpus databases show that the performance of the proposed method is better than that of single classifier and other fusion methods.
Study design in high-dimensional classification analysis.

PubMed

Sánchez, Brisa N; Wu, Meihua; Song, Peter X K; Wang, Wen

2016-10-01

Advances in high throughput technology have accelerated the use of hundreds to millions of biomarkers to construct classifiers that partition patients into different clinical conditions. Prior to classifier development in actual studies, a critical need is to determine the sample size required to reach a specified classification precision. We develop a systematic approach for sample size determination in high-dimensional (large [Formula: see text] small [Formula: see text]) classification analysis. Our method utilizes the probability of correct classification (PCC) as the optimization objective function and incorporates the higher criticism thresholding procedure for classifier development. Further, we derive the theoretical bound of maximal PCC gain from feature augmentation (e.g. when molecular and clinical predictors are combined in classifier development). Our methods are motivated and illustrated by a study using proteomics markers to classify post-kidney transplantation patients into stable and rejecting classes. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Cascaded discrimination of normal, abnormal, and confounder classes in histopathology: Gleason grading of prostate cancer

PubMed Central

2012-01-01

Background Automated classification of histopathology involves identification of multiple classes, including benign, cancerous, and confounder categories. The confounder tissue classes can often mimic and share attributes with both the diseased and normal tissue classes, and can be particularly difficult to identify, both manually and by automated classifiers. In the case of prostate cancer, they may be several confounding tissue types present in a biopsy sample, posing as major sources of diagnostic error for pathologists. Two common multi-class approaches are one-shot classification (OSC), where all classes are identified simultaneously, and one-versus-all (OVA), where a “target” class is distinguished from all “non-target” classes. OSC is typically unable to handle discrimination of classes of varying similarity (e.g. with images of prostate atrophy and high grade cancer), while OVA forces several heterogeneous classes into a single “non-target” class. In this work, we present a cascaded (CAS) approach to classifying prostate biopsy tissue samples, where images from different classes are grouped to maximize intra-group homogeneity while maximizing inter-group heterogeneity. Results We apply the CAS approach to categorize 2000 tissue samples taken from 214 patient studies into seven classes: epithelium, stroma, atrophy, prostatic intraepithelial neoplasia (PIN), and prostate cancer Gleason grades 3, 4, and 5. A series of increasingly granular binary classifiers are used to split the different tissue classes until the images have been categorized into a single unique class. Our automatically-extracted image feature set includes architectural features based on location of the nuclei within the tissue sample as well as texture features extracted on a per-pixel level. The CAS strategy yields a positive predictive value (PPV) of 0.86 in classifying the 2000 tissue images into one of 7 classes, compared with the OVA (0.77 PPV) and OSC approaches (0.76 PPV). Conclusions Use of the CAS strategy increases the PPV for a multi-category classification system over two common alternative strategies. In classification problems such as histopathology, where multiple class groups exist with varying degrees of heterogeneity, the CAS system can intelligently assign class labels to objects by performing multiple binary classifications according to domain knowledge. PMID:23110677
SomInaClust: detection of cancer genes based on somatic mutation patterns of inactivation and clustering.

PubMed

Van den Eynden, Jimmy; Fierro, Ana Carolina; Verbeke, Lieven P C; Marchal, Kathleen

2015-04-23

With the advances in high throughput technologies, increasing amounts of cancer somatic mutation data are being generated and made available. Only a small number of (driver) mutations occur in driver genes and are responsible for carcinogenesis, while the majority of (passenger) mutations do not influence tumour biology. In this study, SomInaClust is introduced, a method that accurately identifies driver genes based on their mutation pattern across tumour samples and then classifies them into oncogenes or tumour suppressor genes respectively. SomInaClust starts from the observation that oncogenes mainly contain mutations that, due to positive selection, cluster at similar positions in a gene across patient samples, whereas tumour suppressor genes contain a high number of protein-truncating mutations throughout the entire gene length. The method was shown to prioritize driver genes in 9 different solid cancers. Furthermore it was found to be complementary to existing similar-purpose methods with the additional advantages that it has a higher sensitivity, also for rare mutations (occurring in less than 1% of all samples), and it accurately classifies candidate driver genes in putative oncogenes and tumour suppressor genes. Pathway enrichment analysis showed that the identified genes belong to known cancer signalling pathways, and that the distinction between oncogenes and tumour suppressor genes is biologically relevant. SomInaClust was shown to detect candidate driver genes based on somatic mutation patterns of inactivation and clustering and to distinguish oncogenes from tumour suppressor genes. The method could be used for the identification of new cancer genes or to filter mutation data for further data-integration purposes.
A Smartphone Indoor Localization Algorithm Based on WLAN Location Fingerprinting with Feature Extraction and Clustering.

PubMed

Luo, Junhai; Fu, Liang

2017-06-09

With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS), which is collected from Access Points (APs). The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA) is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC) algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML) estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.
Driver fatigue detection through multiple entropy fusion analysis in an EEG-based system.

PubMed

Min, Jianliang; Wang, Ping; Hu, Jianfeng

2017-01-01

Driver fatigue is an important contributor to road accidents, and fatigue detection has major implications for transportation safety. The aim of this research is to analyze the multiple entropy fusion method and evaluate several channel regions to effectively detect a driver's fatigue state based on electroencephalogram (EEG) records. First, we fused multiple entropies, i.e., spectral entropy, approximate entropy, sample entropy and fuzzy entropy, as features compared with autoregressive (AR) modeling by four classifiers. Second, we captured four significant channel regions according to weight-based electrodes via a simplified channel selection method. Finally, the evaluation model for detecting driver fatigue was established with four classifiers based on the EEG data from four channel regions. Twelve healthy subjects performed continuous simulated driving for 1-2 hours with EEG monitoring on a static simulator. The leave-one-out cross-validation approach obtained an accuracy of 98.3%, a sensitivity of 98.3% and a specificity of 98.2%. The experimental results verified the effectiveness of the proposed method, indicating that the multiple entropy fusion features are significant factors for inferring the fatigue state of a driver.
Meat mixture detection in Iberian pork sausages.

PubMed

Ortiz-Somovilla, V; España-España, F; De Pedro-Sanz, E J; Gaitán-Jurado, A J

2005-11-01

Five homogenized meat mixture treatments of Iberian (I) and/or Standard (S) pork were set up. Each treatment was analyzed by NIRS as a fresh product (N=75) and as dry-cured sausage (N=75). Spectra acquisition was carried out using DA 7000 equipment (Perten Instruments), obtaining a total of 750 spectra. Several absorption peaks and bands were selected as the most representative for homogenized dry-cured and fresh sausages. Discriminant analysis and mixture prediction equations were carried out based on the spectral data gathered. The best results using discriminant models were for fresh products, with 98.3% (calibration) and 60% (validation) correct classification. For dry-cured sausages 91.7% (calibration) and 80% (validation) of the samples were correctly classified. Models developed using mixture prediction equations showed SECV=4.7, r(2)=0.98 (calibration) and 73.3% of validation set were correctly classified for the fresh product. These values for dry-cured sausages were SECV=5.9, r(2)=0.99 (calibration) and 93.3% correctly classified for validation.
Classification of volcanic ash particles using a convolutional neural network and probability.

PubMed

Shoji, Daigo; Noguchi, Rina; Otsuki, Shizuka; Hino, Hideitsu

2018-05-25

Analyses of volcanic ash are typically performed either by qualitatively classifying ash particles by eye or by quantitatively parameterizing its shape and texture. While complex shapes can be classified through qualitative analyses, the results are subjective due to the difficulty of categorizing complex shapes into a single class. Although quantitative analyses are objective, selection of shape parameters is required. Here, we applied a convolutional neural network (CNN) for the classification of volcanic ash. First, we defined four basal particle shapes (blocky, vesicular, elongated, rounded) generated by different eruption mechanisms (e.g., brittle fragmentation), and then trained the CNN using particles composed of only one basal shape. The CNN could recognize the basal shapes with over 90% accuracy. Using the trained network, we classified ash particles composed of multiple basal shapes based on the output of the network, which can be interpreted as a mixing ratio of the four basal shapes. Clustering of samples by the averaged probabilities and the intensity is consistent with the eruption type. The mixing ratio output by the CNN can be used to quantitatively classify complex shapes in nature without categorizing forcibly and without the need for shape parameters, which may lead to a new taxonomy.
Classification and Verification of Handwritten Signatures with Time Causal Information Theory Quantifiers.

PubMed

Rosso, Osvaldo A; Ospina, Raydonal; Frery, Alejandro C

2016-01-01

We present a new approach for handwritten signature classification and verification based on descriptors stemming from time causal information theory. The proposal uses the Shannon entropy, the statistical complexity, and the Fisher information evaluated over the Bandt and Pompe symbolization of the horizontal and vertical coordinates of signatures. These six features are easy and fast to compute, and they are the input to an One-Class Support Vector Machine classifier. The results are better than state-of-the-art online techniques that employ higher-dimensional feature spaces which often require specialized software and hardware. We assess the consistency of our proposal with respect to the size of the training sample, and we also use it to classify the signatures into meaningful groups.
Generalization Analysis of Fredholm Kernel Regularized Classifiers.

PubMed

Gong, Tieliang; Xu, Zongben; Chen, Hong

2017-07-01

Recently, a new framework, Fredholm learning, was proposed for semisupervised learning problems based on solving a regularized Fredholm integral equation. It allows a natural way to incorporate unlabeled data into learning algorithms to improve their prediction performance. Despite rapid progress on implementable algorithms with theoretical guarantees, the generalization ability of Fredholm kernel learning has not been studied. In this letter, we focus on investigating the generalization performance of a family of classification algorithms, referred to as Fredholm kernel regularized classifiers. We prove that the corresponding learning rate can achieve [Formula: see text] ([Formula: see text] is the number of labeled samples) in a limiting case. In addition, a representer theorem is provided for the proposed regularized scheme, which underlies its applications.
A Modified Hopfield Neural Network Algorithm (MHNNA) Using ALOS Image for Water Quality Mapping

PubMed Central

Kzar, Ahmed Asal; Mat Jafri, Mohd Zubir; Mutter, Kussay N.; Syahreza, Saumi

2015-01-01

Decreasing water pollution is a big problem in coastal waters. Coastal health of ecosystems can be affected by high concentrations of suspended sediment. In this work, a Modified Hopfield Neural Network Algorithm (MHNNA) was used with remote sensing imagery to classify the total suspended solids (TSS) concentrations in the waters of coastal Langkawi Island, Malaysia. The adopted remote sensing image is the Advanced Land Observation Satellite (ALOS) image acquired on 18 January 2010. Our modification allows the Hopfield neural network to convert and classify color satellite images. The samples were collected from the study area simultaneously with the acquiring of satellite imagery. The sample locations were determined using a handheld global positioning system (GPS). The TSS concentration measurements were conducted in a lab and used for validation (real data), classification, and accuracy assessments. Mapping was achieved by using the MHNNA to classify the concentrations according to their reflectance values in band 1, band 2, and band 3. The TSS map was color-coded for visual interpretation. The efficiency of the proposed algorithm was investigated by dividing the validation data into two groups. The first group was used as source samples for supervisor classification via the MHNNA. The second group was used to test the MHNNA efficiency. After mapping, the locations of the second group in the produced classes were detected. Next, the correlation coefficient (R) and root mean square error (RMSE) were calculated between the two groups, according to their corresponding locations in the classes. The MHNNA exhibited a higher R (0.977) and lower RMSE (2.887). In addition, we test the MHNNA with noise, where it proves its accuracy with noisy images over a range of noise levels. All results have been compared with a minimum distance classifier (Min-Dis). Therefore, TSS mapping of polluted water in the coastal Langkawi Island, Malaysia can be performed using the adopted MHNNA with remote sensing techniques (as based on ALOS images). PMID:26729148
Text mining and natural language processing approaches for automatic categorization of lay requests to web-based expert forums.

PubMed

Himmel, Wolfgang; Reincke, Ulrich; Michelmann, Hans Wilhelm

2009-07-22

Both healthy and sick people increasingly use electronic media to obtain medical information and advice. For example, Internet users may send requests to Web-based expert forums, or so-called "ask the doctor" services. To automatically classify lay requests to an Internet medical expert forum using a combination of different text-mining strategies. We first manually classified a sample of 988 requests directed to a involuntary childlessness forum on the German website "Rund ums Baby" ("Everything about Babies") into one or more of 38 categories belonging to two dimensions ("subject matter" and "expectations"). After creating start and synonym lists, we calculated the average Cramer's V statistic for the association of each word with each category. We also used principle component analysis and singular value decomposition as further text-mining strategies. With these measures we trained regression models and determined, on the basis of best regression models, for any request the probability of belonging to each of the 38 different categories, with a cutoff of 50%. Recall and precision of a test sample were calculated as a measure of quality for the automatic classification. According to the manual classification of 988 documents, 102 (10%) documents fell into the category "in vitro fertilization (IVF)," 81 (8%) into the category "ovulation," 79 (8%) into "cycle," and 57 (6%) into "semen analysis." These were the four most frequent categories in the subject matter dimension (consisting of 32 categories). The expectation dimension comprised six categories; we classified 533 documents (54%) as "general information" and 351 (36%) as a wish for "treatment recommendations." The generation of indicator variables based on the chi-square analysis and Cramer's V proved to be the best approach for automatic classification in about half of the categories. In combination with the two other approaches, 100% precision and 100% recall were realized in 18 (47%) out of the 38 categories in the test sample. For 35 (92%) categories, precision and recall were better than 80%. For some categories, the input variables (ie, "words") also included variables from other categories, most often with a negative sign. For example, absence of words predictive for "menstruation" was a strong indicator for the category "pregnancy test." Our approach suggests a way of automatically classifying and analyzing unstructured information in Internet expert forums. The technique can perform a preliminary categorization of new requests and help Internet medical experts to better handle the mass of information and to give professional feedback.
Genetic Variability among Lucerne Cultivars Based on Biochemical (SDS-PAGE) and Morphological Markers

NASA Astrophysics Data System (ADS)

Farshadfar, M.; Farshadfar, E.

The present research was conducted to determine the genetic variability of 18 Lucerne cultivars, based on morphological and biochemical markers. The traits studied were plant height, tiller number, biomass, dry yield, dry yield/biomass, dry leaf/dry yield, macro and micro elements, crude protein, dry matter, crude fiber and ash percentage and SDS- PAGE in seed and leaf samples. Field experiments included 18 plots of two meter rows. Data based on morphological, chemical and SDS-PAGE markers were analyzed using SPSSWIN soft ware and the multivariate statistical procedures: cluster analysis (UPGMA), principal component. Analysis of analysis of variance and mean comparison for morphological traits reflected significant differences among genotypes. Genotype 13 and 15 had the greatest values for most traits. The Genotypic Coefficient of Variation (GCV), Phenotypic Coefficient of Variation (PCV) and Heritability (Hb) parameters for different characters raged from 12.49 to 26.58% for PCV, hence the GCV ranged from 6.84 to 18.84%. The greatest value of Hb was 0.94 for stem number. Lucerne genotypes could be classified, based on morphological traits, into four clusters and 94% of the variance among the genotypes was explained by two PCAs: Based on chemical traits they were classified into five groups and 73.492% of variance was explained by four principal components: Dry matter, protein, fiber, P, K, Na, Mg and Zn had higher variance. Genotypes based on the SDS-PAGE patterns all genotypes were classified into three clusters. The greatest genetic distance was between cultivar 10 and others, therefore they would be suitable parent in a breeding program.
A web-based system for neural network based classification in temporomandibular joint osteoarthritis.

PubMed

de Dumast, Priscille; Mirabel, Clément; Cevidanes, Lucia; Ruellas, Antonio; Yatabe, Marilia; Ioshida, Marcos; Ribera, Nina Tubau; Michoud, Loic; Gomes, Liliane; Huang, Chao; Zhu, Hongtu; Muniz, Luciana; Shoukri, Brandon; Paniagua, Beatriz; Styner, Martin; Pieper, Steve; Budin, Francois; Vimort, Jean-Baptiste; Pascal, Laura; Prieto, Juan Carlos

2018-07-01

The purpose of this study is to describe the methodological innovations of a web-based system for storage, integration and computation of biomedical data, using a training imaging dataset to remotely compute a deep neural network classifier of temporomandibular joint osteoarthritis (TMJOA). This study imaging dataset consisted of three-dimensional (3D) surface meshes of mandibular condyles constructed from cone beam computed tomography (CBCT) scans. The training dataset consisted of 259 condyles, 105 from control subjects and 154 from patients with diagnosis of TMJ OA. For the image analysis classification, 34 right and left condyles from 17 patients (39.9 ± 11.7 years), who experienced signs and symptoms of the disease for less than 5 years, were included as the testing dataset. For the integrative statistical model of clinical, biological and imaging markers, the sample consisted of the same 17 test OA subjects and 17 age and sex matched control subjects (39.4 ± 15.4 years), who did not show any sign or symptom of OA. For these 34 subjects, a standardized clinical questionnaire, blood and saliva samples were also collected. The technological methodologies in this study include a deep neural network classifier of 3D condylar morphology (ShapeVariationAnalyzer, SVA), and a flexible web-based system for data storage, computation and integration (DSCI) of high dimensional imaging, clinical, and biological data. The DSCI system trained and tested the neural network, indicating 5 stages of structural degenerative changes in condylar morphology in the TMJ with 91% close agreement between the clinician consensus and the SVA classifier. The DSCI remotely ran with a novel application of a statistical analysis, the Multivariate Functional Shape Data Analysis, that computed high dimensional correlations between shape 3D coordinates, clinical pain levels and levels of biological markers, and then graphically displayed the computation results. The findings of this study demonstrate a comprehensive phenotypic characterization of TMJ health and disease at clinical, imaging and biological levels, using novel flexible and versatile open-source tools for a web-based system that provides advanced shape statistical analysis and a neural network based classification of temporomandibular joint osteoarthritis. Published by Elsevier Ltd.
Extensive sampling of basidiomycete genomes demonstrates inadequacy of the white-rot/brown-rot paradigm for wood decay fungi

Treesearch

Robert Riley; Asaf A. Salamov; Daren W. Brown; Laszlo G. Nagy; Dimitrios Floudas; Benjamin W. Held; Anthony Levasseur; Vincent Lombard; Emmanuelle Morin; Robert Otillar; Erika A. Lindquist; Hui Sun; Kurt M. LaButti; Jeremy Schmutz; Dina Jabbour; Hong Luo; Scott E. Baker; Antonio G. Pisabarro; Jonathan D. Walton; Robert A. Blanchette; Bernard Henrissat; Francis Martin; Daniel Cullen; David S. Hibbett; Igor V. Grigoriev

2014-01-01

Basidiomycota (basidiomycetes) make up 32% of the described fungi and include most wood-decaying species, as well as pathogens and mutualistic symbionts. Wood-decaying basidiomycetes have typically been classified as either white rot or brown rot, based on the ability (in white rot only) to degrade lignin along with cellulose and hemicellulose. Prior genomic...

A Novel Method to Detect Early Colorectal Cancer Based on Chromosome Copy Number Variation in Plasma.

PubMed

Xu, Jun-Feng; Kang, Qian; Ma, Xing-Yong; Pan, Yuan-Ming; Yang, Lang; Jin, Peng; Wang, Xin; Li, Chen-Guang; Chen, Xiao-Chen; Wu, Chao; Jiao, Shao-Zhuo; Sheng, Jian-Qiu

2018-01-01

Colonoscopy screening has been accepted broadly to evaluate the risk and incidence of colorectal cancer (CRC) during health examination in outpatients. However, the intrusiveness, complexity and discomfort of colonoscopy may limit its application and the compliance of patients. Thus, more reliable and convenient diagnostic methods are necessary for CRC screening. Genome instability, especially copy-number variation (CNV), is a hallmark of cancer and has been proved to have potential in clinical application. We determined the diagnostic potential of chromosomal CNV at the arm level by whole-genome sequencing of CRC plasma samples (n = 32) and healthy controls (n = 38). Arm level CNV was determined and the consistence of arm-level CNV between plasma and tissue was further analyzed. Two methods including regular z score and trained Support Vector Machine (SVM) classifier were applied for detection of colorectal cancer. In plasma samples of CRC patients, the most frequent deletions were detected on chromosomes 6, 8p, 14q and 1p, and the most frequent amplifications occurred on chromosome 19, 5, 2, 9p and 20p. These arm-level alterations detected in plasma were also observed in tumor tissues. We showed that the specificity of regular z score analysis for the detection of colorectal cancer was 86.8% (33/38), whereas its sensitivity was only 56.3% (18/32). Applying a trained SVM classifier (n = 40 in trained group) as the standard to detect colorectal cancer relevance ratio in the test samples (n = 30), a sensitivity of 91.7% (11/12) and a specificity 88.9% (16/18) were finally reached. Furthermore, all five early CRC patients in stages I and II were successfully detected. Trained SVM classifier based on arm-level CNVs can be used as a promising method to screen early-stage CRC. © 2018 The Author(s). Published by S. Karger AG, Basel.
Development of municipal solid waste classification in Korea based on fossil carbon fraction.

PubMed

Lee, Jeongwoo; Kang, Seongmin; Kim, Seungjin; Kim, Ki-Hyun; Jeon, Eui-Chan

2015-10-01

Environmental problems and climate change arising from waste incineration are taken quite seriously in the world. In Korea, the waste disposal methods are largely classified into landfill, incineration, recycling, etc. and the amount of incinerated waste has risen by 24.5% from 2002. In the analysis of CO₂emissions estimations of waste incinerators fossil carbon content are main factor by the IPCC. FCF differs depending on the characteristics of waste in each country, and a wide range of default values are proposed by the IPCC. This study conducted research on the existing classifications of the IPCC and Korean waste classification systems based on FCF for accurate greenhouse gas emissions estimation of waste incineration. The characteristics possible for sorting were classified according to FCF and form. The characteristics sorted according to fossil carbon fraction were paper, textiles, rubber, and leather. Paper was classified into pure paper and processed paper; textiles were classified into cotton and synthetic fibers; and rubber and leather were classified into artificial and natural. The analysis of FCF was implemented by collecting representative samples from each classification group, by applying the 14C method, and using AMS equipment. And the analysis values were compared with the default values proposed by the IPCC. In this study of garden and park waste and plastics, the differences were within the range of the IPCC default values or the differences were negligible. However, coated paper, synthetic textiles, natural rubber, synthetic rubber, artificial leather, and other wastes showed differences of over 10% in FCF content. IPCC is comprised of largely 9 types of qualitative classifications, in emissions estimation a great difference can occur from the combined characteristics according with the existing IPCC classification system by using the minutely classified waste characteristics as in this study. Fossil carbon fraction (FCF) differs depending on the characteristics of waste in each country; and a wide range of default values are proposed by the IPCC. This study conducted research on the existing classifications of the IPCC and Korean waste classification systems based on FCF for accurate greenhouse gas emissions estimation of waste incineration.
Analysis of longitudinal diffusion-weighted images in healthy and pathological aging: An ADNI study.

PubMed

Kruggel, Frithjof; Masaki, Fumitaro; Solodkin, Ana

2017-02-15

The widely used framework of voxel-based morphometry for analyzing neuroimages is extended here to model longitudinal imaging data by exchanging the linear model with a linear mixed-effects model. The new approach is employed for analyzing a large longitudinal sample of 756 diffusion-weighted images acquired in 177 subjects of the Alzheimer's Disease Neuroimaging initiative (ADNI). While sample- and group-level results from both approaches are equivalent, the mixed-effect model yields information at the single subject level. Interestingly, the neurobiological relevance of the relevant parameter at the individual level describes specific differences associated with aging. In addition, our approach highlights white matter areas that reliably discriminate between patients with Alzheimer's disease and healthy controls with a predictive power of 0.99 and include the hippocampal alveus, the para-hippocampal white matter, the white matter of the posterior cingulate, and optic tracts. In this context, notably the classifier includes a sub-population of patients with minimal cognitive impairment into the pathological domain. Our classifier offers promising features for an accessible biomarker that predicts the risk of conversion to Alzheimer's disease. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how to apply/ADNI Acknowledgement List.pdf. Significance statement This study assesses neuro-degenerative processes in the brain's white matter as revealed by diffusion-weighted imaging, in order to discriminate healthy from pathological aging in a large sample of elderly subjects. The analysis of time-series examinations in a linear mixed effects model allowed the discrimination of population-based aging processes from individual determinants. We demonstrate that a simple classifier based on white matter imaging data is able to predict the conversion to Alzheimer's disease with a high predictive power. Copyright © 2017 Elsevier B.V. All rights reserved.
Evaluation of an Algorithm to Predict Menstrual-Cycle Phase at the Time of Injury.

PubMed

Tourville, Timothy W; Shultz, Sandra J; Vacek, Pamela M; Knudsen, Emily J; Bernstein, Ira M; Tourville, Kelly J; Hardy, Daniel M; Johnson, Robert J; Slauterbeck, James R; Beynnon, Bruce D

2016-01-01

Women are 2 to 8 times more likely to sustain an anterior cruciate ligament (ACL) injury than men, and previous studies indicated an increased risk for injury during the preovulatory phase of the menstrual cycle (MC). However, investigations of risk rely on retrospective classification of MC phase, and no tools for this have been validated. To evaluate the accuracy of an algorithm for retrospectively classifying MC phase at the time of a mock injury based on MC history and salivary progesterone (P4) concentration. Descriptive laboratory study. Research laboratory. Thirty-one healthy female collegiate athletes (age range, 18-24 years) provided serum or saliva (or both) samples at 8 visits over 1 complete MC. Self-reported MC information was obtained on a randomized date (1-45 days) after mock injury, which is the typical timeframe in which researchers have access to ACL-injured study participants. The MC phase was classified using the algorithm as applied in a stand-alone computational fashion and also by 4 clinical experts using the algorithm and additional subjective hormonal history information to help inform their decision. To assess algorithm accuracy, phase classifications were compared with the actual MC phase at the time of mock injury (ascertained using urinary luteinizing hormone tests and serial serum P4 samples). Clinical expert and computed classifications were compared using κ statistics. Fourteen participants (45%) experienced anovulatory cycles. The algorithm correctly classified MC phase for 23 participants (74%): 22 (76%) of 29 who were preovulatory/anovulatory and 1 (50%) of 2 who were postovulatory. Agreement between expert and algorithm classifications ranged from 80.6% (κ = 0.50) to 93% (κ = 0.83). Classifications based on same-day saliva sample and optimal P4 threshold were the same as those based on MC history alone (87.1% correct). Algorithm accuracy varied during the MC but at no time were both sensitivity and specificity levels acceptable. These findings raise concerns about the accuracy of previous retrospective MC-phase classification systems, particularly in a population with a high occurrence of anovulatory cycles.
A novel, efficient method for estimating the prevalence of acute malnutrition in resource-constrained and crisis-affected settings: A simulation study.

PubMed

Frison, Severine; Kerac, Marko; Checchi, Francesco; Nicholas, Jennifer

2017-01-01

The assessment of the prevalence of acute malnutrition in children under five is widely used for the detection of emergencies, planning interventions, advocacy, and monitoring and evaluation. This study examined PROBIT Methods which convert parameters (mean and standard deviation (SD)) of a normally distributed variable to a cumulative probability below any cut-off to estimate acute malnutrition in children under five using Middle-Upper Arm Circumference (MUAC). We assessed the performance of: PROBIT Method I, with mean MUAC from the survey sample and MUAC SD from a database of previous surveys; and PROBIT Method II, with mean and SD of MUAC observed in the survey sample. Specifically, we generated sub-samples from 852 survey datasets, simulating 100 surveys for eight sample sizes. Overall the methods were tested on 681 600 simulated surveys. PROBIT methods relying on sample sizes as small as 50 had better performance than the classic method for estimating and classifying the prevalence of acute malnutrition. They had better precision in the estimation of acute malnutrition for all sample sizes and better coverage for smaller sample sizes, while having relatively little bias. They classified situations accurately for a threshold of 5% acute malnutrition. Both PROBIT methods had similar outcomes. PROBIT Methods have a clear advantage in the assessment of acute malnutrition prevalence based on MUAC, compared to the classic method. Their use would require much lower sample sizes, thus enable great time and resource savings and permit timely and/or locally relevant prevalence estimates of acute malnutrition for a swift and well-targeted response.
Raman spectroscopy for highly accurate estimation of the age of refrigerated porcine muscle

NASA Astrophysics Data System (ADS)

Timinis, Constantinos; Pitris, Costas

2016-03-01

The high water content of meat, combined with all the nutrients it contains, make it vulnerable to spoilage at all stages of production and storage even when refrigerated at 5 °C. A non-destructive and in situ tool for meat sample testing, which could provide an accurate indication of the storage time of meat, would be very useful for the control of meat quality as well as for consumer safety. The proposed solution is based on Raman spectroscopy which is non-invasive and can be applied in situ. For the purposes of this project, 42 meat samples from 14 animals were obtained and three Raman spectra per sample were collected every two days for two weeks. The spectra were subsequently processed and the sample age was calculated using a set of linear differential equations. In addition, the samples were classified in categories corresponding to the age in 2-day steps (i.e., 0, 2, 4, 6, 8, 10, 12 or 14 days old), using linear discriminant analysis and cross-validation. Contrary to other studies, where the samples were simply grouped into two categories (higher or lower quality, suitable or unsuitable for human consumption, etc.), in this study, the age was predicted with a mean error of ~ 1 day (20%) or classified, in 2-day steps, with 100% accuracy. Although Raman spectroscopy has been used in the past for the analysis of meat samples, the proposed methodology has resulted in a prediction of the sample age far more accurately than any report in the literature.
[HPLC-fingerprint-based quality evaluation on a Tibetan medicine Phyllanthus emblica and its tannin parts].

PubMed

Sun, Xue-Fei; Zhang, Hong-Yan; Xia, Qing; Zhao, Hai-Juan; Wu, Ling-Fang; Zhang, Lan-Zhen; Shi, Ren-Bing

2014-04-01

This study is to establish the fingerprint for Phyllanthus emblica and their tannin parts from different habitats by HPLC for its quality control. The determination was carried out on a Diamonsil C18 (4.6 mm x 250 mm, 5 microm) column, with methanol-0.2% glacial acetic acid as mobile phase with gradient elution at a flow rate of 1 mL x min(-1). The temperature was maintained at 30 degrees C and the detected wavelength is 260 nm, Thirteen chromatographic peaks were extracted as the common peaks of the fingerprint of P. emblica, and eleven as the common peaks of P. emblica tannin parts, and five peaks were identified by comparing with referent samples. The fingerprints of 8 samples were compared and classified by similarity evaluation, cluster analysis and principal component analysis (PCA). The similarity degrees of eight P. emblica were between 0.763 and 0.993, while tannin parts were between 0.903 and 0.991. All the samples of P. emblica and their tannin parts were classified into 3 categories. The method was so highly reproducible, simple and reliable that it could provide basis for quality control and evaluation of P. emblica from different habitats.
S-CNN: Subcategory-aware convolutional networks for object detection.

PubMed

Chen, Tao; Lu, Shijian; Fan, Jiayuan

2017-09-26

The marriage between the deep convolutional neural network (CNN) and region proposals has made breakthroughs for object detection in recent years. While the discriminative object features are learned via a deep CNN for classification, the large intra-class variation and deformation still limit the performance of the CNN based object detection. We propose a subcategory-aware CNN (S-CNN) to solve the object intra-class variation problem. In the proposed technique, the training samples are first grouped into multiple subcategories automatically through a novel instance sharing maximum margin clustering process. A multi-component Aggregated Channel Feature (ACF) detector is then trained to produce more latent training samples, where each ACF component corresponds to one clustered subcategory. The produced latent samples together with their subcategory labels are further fed into a CNN classifier to filter out false proposals for object detection. An iterative learning algorithm is designed for the joint optimization of image subcategorization, multi-component ACF detector, and subcategory-aware CNN classifier. Experiments on INRIA Person dataset, Pascal VOC 2007 dataset and MS COCO dataset show that the proposed technique clearly outperforms the state-of-the-art methods for generic object detection.
Using digital photogrammetry to conduct an anthropometric analysis of wheelchair users.

PubMed

Barros, Helda Oliveira; Soares, Marcelomárcio

2012-01-01

This study deals with using digital photogrammetry to make an anthropometric analysis of wheelchair users. To analyse the data, Digita software was used, which was made available by means of the agreement of the Design Department of the Federal University of Pernambuco--Brazil--with the Department of Ergonomics of the Technical University of Lisbon--Portugal. Data collection involved a random sample of 18 subjects and occurred in the Biomechanics Laboratory of the Maurice of Nassau Faculty, located in Recife, Pernambuco. The methodology applied comprises the steps of Ergonomic Assessment, Configuration of the Data Base, Taking Digital Photographs, Digitalising the Coordinates and Presentation of Results. 15 structural variables related to static anthropometry were analysed, and 4 functional range variables relating to dynamic anthropometry. The results were presented by analysing personal data, classified by gender, ethnicity and age; by functional analysis of the sample, classified by clinical diagnosis, results of assessing the joints, results of the evaluation through motion and postural evaluation; and of the analysis of the anthropometric sample, which indicated for each variable the number of people, the mean, the standard deviation, and the minimum, median and maximum values.
The effect of heavy metal contamination on the bacterial community structure at Jiaozhou Bay, China.

PubMed

Yao, Xie-Feng; Zhang, Jiu-Ming; Tian, Li; Guo, Jian-Hua

In this study, determination of heavy metal parameters and microbiological characterization of marine sediments obtained from two heavily polluted sites and one low-grade contaminated reference station at Jiaozhou Bay in China were carried out. The microbial communities found in the sampled marine sediments were studied using PCR-DGGE (denaturing gradient gel electrophoresis) fingerprinting profiles in combination with multivariate analysis. Clustering analysis of DGGE and matrix of heavy metals displayed similar occurrence patterns. On this basis, 17 samples were classified into two clusters depending on the presence or absence of the high level contamination. Moreover, the cluster of highly contaminated samples was further classified into two sub-groups based on the stations of their origin. These results showed that the composition of the bacterial community is strongly influenced by heavy metal variables present in the sediments found in the Jiaozhou Bay. This study also suggested that metagenomic techniques such as PCR-DGGE fingerprinting in combination with multivariate analysis is an efficient method to examine the effect of metal contamination on the bacterial community structure. Copyright © 2016 Sociedade Brasileira de Microbiologia. Published by Elsevier Editora Ltda. All rights reserved.
High-depth, high-accuracy microsatellite genotyping enables precision lung cancer risk classification

PubMed Central

Velmurugan, K R; Varghese, R T; Fonville, N C; Garner, H R

2017-01-01

There remains a large discrepancy between the known genetic contributions to cancer and that which can be explained by genomic variants, both inherited and somatic. Recently, understudied repetitive DNA regions called microsatellites have been identified as genetic risk markers for a number of diseases including various cancers (breast, ovarian and brain). In this study, we demonstrate an integrated process for identifying and further evaluating microsatellite-based risk markers for lung cancer using data from the cancer genome atlas and the 1000 genomes project. Comparing whole-exome germline sequencing data from 488 TCGA lung cancer samples to germline exome data from 390 control samples from the 1000 genomes project, we identified 119 potentially informative microsatellite loci. These loci were found to be able to distinguish between cancer and control samples with sensitivity and specificity ratios over 0.8. Then these loci, supplemented with additional loci from other cancers and controls, were evaluated using a target enrichment kit and sample-multiplexed nextgen sequencing. Thirteen of the 119 risk markers were found to be informative in a well powered study (>0.99 for a 0.95 confidence interval) using high-depth (579x±315) nextgen sequencing of 30 lung cancer and 89 control samples, resulting in sensitivity and specificity ratios of 0.90 and 0.94, respectively. When 8 loci harvested from the bioinformatic analysis of other cancers are added to the classifier, then the sensitivity and specificity rise to 0.93 and 0.97, respectively. Analysis of the genes harboring these loci revealed two genes (ARID1B and REL) and two significantly enriched pathways (chromatin organization and cellular stress response) suggesting that the process of lung carcinogenesis is linked to chromatin remodeling, inflammation, and tumor microenvironment restructuring. We illustrate that high-depth sequencing enables a high-precision microsatellite-based risk classifier analysis approach. This microsatellite-based platform confirms the potential to create clinically actionable diagnostics for lung cancer. PMID:28759038
High-depth, high-accuracy microsatellite genotyping enables precision lung cancer risk classification.

PubMed

Velmurugan, K R; Varghese, R T; Fonville, N C; Garner, H R

2017-11-16

There remains a large discrepancy between the known genetic contributions to cancer and that which can be explained by genomic variants, both inherited and somatic. Recently, understudied repetitive DNA regions called microsatellites have been identified as genetic risk markers for a number of diseases including various cancers (breast, ovarian and brain). In this study, we demonstrate an integrated process for identifying and further evaluating microsatellite-based risk markers for lung cancer using data from the cancer genome atlas and the 1000 genomes project. Comparing whole-exome germline sequencing data from 488 TCGA lung cancer samples to germline exome data from 390 control samples from the 1000 genomes project, we identified 119 potentially informative microsatellite loci. These loci were found to be able to distinguish between cancer and control samples with sensitivity and specificity ratios over 0.8. Then these loci, supplemented with additional loci from other cancers and controls, were evaluated using a target enrichment kit and sample-multiplexed nextgen sequencing. Thirteen of the 119 risk markers were found to be informative in a well powered study (>0.99 for a 0.95 confidence interval) using high-depth (579x±315) nextgen sequencing of 30 lung cancer and 89 control samples, resulting in sensitivity and specificity ratios of 0.90 and 0.94, respectively. When 8 loci harvested from the bioinformatic analysis of other cancers are added to the classifier, then the sensitivity and specificity rise to 0.93 and 0.97, respectively. Analysis of the genes harboring these loci revealed two genes (ARID1B and REL) and two significantly enriched pathways (chromatin organization and cellular stress response) suggesting that the process of lung carcinogenesis is linked to chromatin remodeling, inflammation, and tumor microenvironment restructuring. We illustrate that high-depth sequencing enables a high-precision microsatellite-based risk classifier analysis approach. This microsatellite-based platform confirms the potential to create clinically actionable diagnostics for lung cancer.
An ensemble of SVM classifiers based on gene pairs.

PubMed

Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin

2013-07-01

In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.
Simultaneous fecal microbial and metabolite profiling enables accurate classification of pediatric irritable bowel syndrome.

PubMed

Shankar, Vijay; Reo, Nicholas V; Paliy, Oleg

2015-12-09

We previously showed that stool samples of pre-adolescent and adolescent US children diagnosed with diarrhea-predominant IBS (IBS-D) had different compositions of microbiota and metabolites compared to healthy age-matched controls. Here we explored whether observed fecal microbiota and metabolite differences between these two adolescent populations can be used to discriminate between IBS and health. We constructed individual microbiota- and metabolite-based sample classification models based on the partial least squares multivariate analysis and then applied a Bayesian approach to integrate individual models into a single classifier. The resulting combined classification achieved 84 % accuracy of correct sample group assignment and 86 % prediction for IBS-D in cross-validation tests. The performance of the cumulative classification model was further validated by the de novo analysis of stool samples from a small independent IBS-D cohort. High-throughput microbial and metabolite profiling of subject stool samples can be used to facilitate IBS diagnosis.
A timer inventory based upon manual and automated analysis of ERTS-1 and supporting aircraft data using multistage probability sampling. [Plumas National Forest, California

NASA Technical Reports Server (NTRS)

Nichols, J. D.; Gialdini, M.; Jaakkola, S.

1974-01-01

A quasi-operational study demonstrating that a timber inventory based on manual and automated analysis of ERTS-1, supporting aircraft data and ground data was made using multistage sampling techniques. The inventory proved to be a timely, cost effective alternative to conventional timber inventory techniques. The timber volume on the Quincy Ranger District of the Plumas National Forest was estimated to be 2.44 billion board feet with a sampling error of 8.2 percent. Costs per acre for the inventory procedure at 1.1 cent/acre compared favorably with the costs of a conventional inventory at 25 cents/acre. A point-by-point comparison of CALSCAN-classified ERTS data with human-interpreted low altitude photo plots indicated no significant differences in the overall classification accuracies.
Differentiation of black writing ink on paper using luminescence lifetime by time-resolved luminescence spectroscopy.

PubMed

Suzuki, Mototsugu; Akiba, Norimitsu; Kurosawa, Kenji; Akao, Yoshinori; Higashikawa, Yoshiyasu

2017-10-01

The time-resolved luminescence spectra and the lifetimes of eighteen black writing inks were measured to differentiate pen ink on altered documents. The spectra and lifetimes depended on the samples. About half of the samples only exhibited short-lived luminescence components on the nanosecond time scale. On the other hand, the other samples exhibited short- and long-lived components on the microsecond time scale. The samples could be classified into fifteen groups based on the luminescence spectra and dynamics. Therefore, luminescence lifetime can be used for the differentiation of writing inks, and luminescence lifetime imaging can be applied for the examination of altered documents. Copyright © 2017 Elsevier B.V. All rights reserved.
Development of a neural-based forecasting tool to classify recreational water quality using fecal indicator organisms.

PubMed

Motamarri, Srinivas; Boccelli, Dominic L

2012-09-15

Users of recreational waters may be exposed to elevated pathogen levels through various point/non-point sources. Typical daily notifications rely on microbial analysis of indicator organisms (e.g., Escherichia coli) that require 18, or more, hours to provide an adequate response. Modeling approaches, such as multivariate linear regression (MLR) and artificial neural networks (ANN), have been utilized to provide quick predictions of microbial concentrations for classification purposes, but generally suffer from high false negative rates. This study introduces the use of learning vector quantization (LVQ)--a direct classification approach--for comparison with MLR and ANN approaches and integrates input selection for model development with respect to primary and secondary water quality standards within the Charles River Basin (Massachusetts, USA) using meteorologic, hydrologic, and microbial explanatory variables. Integrating input selection into model development showed that discharge variables were the most important explanatory variables while antecedent rainfall and time since previous events were also important. With respect to classification, all three models adequately represented the non-violated samples (>90%). The MLR approach had the highest false negative rates associated with classifying violated samples (41-62% vs 13-43% (ANN) and <16% (LVQ)) when using five or more explanatory variables. The ANN performance was more similar to LVQ when a larger number of explanatory variables were utilized, but the ANN performance degraded toward MLR performance as explanatory variables were removed. Overall, the use of LVQ as a direct classifier provided the best overall classification ability with respect to violated/non-violated samples for both standards. Copyright © 2012 Elsevier Ltd. All rights reserved.
Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer

PubMed Central

Gabere, Musa Nur; Hussein, Mohamed Aly; Aziz, Mohammad Azhar

2016-01-01

Purpose There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. Methods In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). Results The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. Conclusion This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. PMID:27330311
Evaluation of qPCR-Based Assays for Leprosy Diagnosis Directly in Clinical Specimens

PubMed Central

Sarno, Euzenir Nunes; Moraes, Milton Ozório

2011-01-01

The increased reliability and efficiency of the quantitative polymerase chain reaction (qPCR) makes it a promising tool for performing large-scale screening for infectious disease among high-risk individuals. To date, no study has evaluated the specificity and sensitivity of different qPCR assays for leprosy diagnosis using a range of clinical samples that could bias molecular results such as difficult-to-diagnose cases. In this study, qPCR assays amplifying different M. leprae gene targets, sodA, 16S rRNA, RLEP and Ag 85B were compared for leprosy differential diagnosis. qPCR assays were performed on frozen skin biopsy samples from a total of 62 patients: 21 untreated multibacillary (MB), 26 untreated paucibacillary (PB) leprosy patients, as well as 10 patients suffering from other dermatological diseases and 5 healthy donors. To develop standardized protocols and to overcome the bias resulted from using chromosome count cutoffs arbitrarily defined for different assays, decision tree classifiers were used to estimate optimum cutoffs and to evaluate the assays. As a result, we found a decreasing sensitivity for Ag 85B (66.1%), 16S rRNA (62.9%), and sodA (59.7%) optimized assay classifiers, but with similar maximum specificity for leprosy diagnosis. Conversely, the RLEP assay showed to be the most sensitive (87.1%). Moreover, RLEP assay was positive for 3 samples of patients originally not diagnosed as having leprosy, but these patients developed leprosy 5–10 years after the collection of the biopsy. In addition, 4 other samples of patients clinically classified as non-leprosy presented detectable chromosome counts in their samples by the RLEP assay suggesting that those patients either had leprosy that was misdiagnosed or a subclinical state of leprosy. Overall, these results are encouraging and suggest that RLEP assay could be useful as a sensitive diagnostic test to detect M. leprae infection before major clinical manifestations. PMID:22022631
Chemometric brand differentiation of commercial spices using direct analysis in real time mass spectrometry.

PubMed

Pavlovich, Matthew J; Dunn, Emily E; Hall, Adam B

2016-05-15

Commercial spices represent an emerging class of fuels for improvised explosives. Being able to classify such spices not only by type but also by brand would represent an important step in developing methods to analytically investigate these explosive compositions. Therefore, a combined ambient mass spectrometric/chemometric approach was developed to quickly and accurately classify commercial spices by brand. Direct analysis in real time mass spectrometry (DART-MS) was used to generate mass spectra for samples of black pepper, cayenne pepper, and turmeric, along with four different brands of cinnamon, all dissolved in methanol. Unsupervised learning techniques showed that the cinnamon samples clustered according to brand. Then, we used supervised machine learning algorithms to build chemometric models with a known training set and classified the brands of an unknown testing set of cinnamon samples. Ten independent runs of five-fold cross-validation showed that the training set error for the best-performing models (i.e., the linear discriminant and neural network models) was lower than 2%. The false-positive percentages for these models were 3% or lower, and the false-negative percentages were lower than 10%. In particular, the linear discriminant model perfectly classified the testing set with 0% error. Repeated iterations of training and testing gave similar results, demonstrating the reproducibility of these models. Chemometric models were able to classify the DART mass spectra of commercial cinnamon samples according to brand, with high specificity and low classification error. This method could easily be generalized to other classes of spices, and it could be applied to authenticating questioned commercial samples of spices or to examining evidence from improvised explosives. Copyright © 2016 John Wiley & Sons, Ltd.

Soft computing-based terrain visual sensing and data fusion for unmanned ground robotic systems

NASA Astrophysics Data System (ADS)

Shirkhodaie, Amir

2006-05-01

In this paper, we have primarily discussed technical challenges and navigational skill requirements of mobile robots for traversability path planning in natural terrain environments similar to Mars surface terrains. We have described different methods for detection of salient terrain features based on imaging texture analysis techniques. We have also presented three competing techniques for terrain traversability assessment of mobile robots navigating in unstructured natural terrain environments. These three techniques include: a rule-based terrain classifier, a neural network-based terrain classifier, and a fuzzy-logic terrain classifier. Each proposed terrain classifier divides a region of natural terrain into finite sub-terrain regions and classifies terrain condition exclusively within each sub-terrain region based on terrain visual clues. The Kalman Filtering technique is applied for aggregative fusion of sub-terrain assessment results. The last two terrain classifiers are shown to have remarkable capability for terrain traversability assessment of natural terrains. We have conducted a comparative performance evaluation of all three terrain classifiers and presented the results in this paper.
Recognition of pornographic web pages by classifying texts and images.

PubMed

Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

2007-06-01

With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.
A comparative study of surface EMG classification by fuzzy relevance vector machine and fuzzy support vector machine.

PubMed

Xie, Hong-Bo; Huang, Hu; Wu, Jianhua; Liu, Lei

2015-02-01

We present a multiclass fuzzy relevance vector machine (FRVM) learning mechanism and evaluate its performance to classify multiple hand motions using surface electromyographic (sEMG) signals. The relevance vector machine (RVM) is a sparse Bayesian kernel method which avoids some limitations of the support vector machine (SVM). However, RVM still suffers the difficulty of possible unclassifiable regions in multiclass problems. We propose two fuzzy membership function-based FRVM algorithms to solve such problems, based on experiments conducted on seven healthy subjects and two amputees with six hand motions. Two feature sets, namely, AR model coefficients and room mean square value (AR-RMS), and wavelet transform (WT) features, are extracted from the recorded sEMG signals. Fuzzy support vector machine (FSVM) analysis was also conducted for wide comparison in terms of accuracy, sparsity, training and testing time, as well as the effect of training sample sizes. FRVM yielded comparable classification accuracy with dramatically fewer support vectors in comparison with FSVM. Furthermore, the processing delay of FRVM was much less than that of FSVM, whilst training time of FSVM much faster than FRVM. The results indicate that FRVM classifier trained using sufficient samples can achieve comparable generalization capability as FSVM with significant sparsity in multi-channel sEMG classification, which is more suitable for sEMG-based real-time control applications.
[Social self-positioning as indicator of socioeconomic status].

PubMed

Fernández, E; Alonso, R M; Quer, A; Borrell, C; Benach, J; Alonso, J; Gómez, G

2000-01-01

Self-perceived class results from directly questioning subjects about his or her social class. The aim of this investigation was to analyse self-perceived class in relation to other indicator variables of socioeconomic level. Data from the 1994 Catalan Health Interview Survey, a cross-sectional survey of a representative sample of the non-institutionalised population of Catalonia was used. We conducted a discriminant analysis to compute the degree of right classification when different socioeconomic variables potentially related to self-perceived class were considered. All subjects who directly answered the questionnaire were included (N = 12,245). With the aim of obtaining the discriminant functions in a group of subjects and to validate it in another one, the subjects were divided into two random samples, containing approximately 75% and 25% of subjects (analysis sample, n = 9,248; and validation sample, n = 2,997). The final function for men and women included level of education, social class (based in occupation) and equivalent income. This function correctly classified 40.9% of the subjects in the analysis sample and 39.2% in the validation sample. Two other functions were selected for men and women separately. In men, the function included level of education, professional category, and family income (39.2% of classification in analysis sample and 37.2% in validation sample). In women, the function (level of education, working status, and equivalent income) correctly classified 40.3% of women in analysis sample whereas the percentage was 38.9% in validation sample. The percentages of right classification were higher for the highest and lowest classes. These results show the utility of a simple variable to self-position within the social scale. Self-perceived class is related to education, income, and working determinants.
Predicting suicidal ideation in primary care: An approach to identify easily assessable key variables.

PubMed

Jordan, Pascal; Shedden-Mora, Meike C; Löwe, Bernd

To obtain predictors of suicidal ideation, which can also be used for an indirect assessment of suicidal ideation (SI). To create a classifier for SI based on variables of the Patient Health Questionnaire (PHQ) and sociodemographic variables, and to obtain an upper bound on the best possible performance of a predictor based on those variables. From a consecutive sample of 9025 primary care patients, 6805 eligible patients (60% female; mean age = 51.5 years) participated. Advanced methods of machine learning were used to derive the prediction equation. Various classifiers were applied and the area under the curve (AUC) was computed as a performance measure. Classifiers based on methods of machine learning outperformed ordinary regression methods and achieved AUCs around 0.87. The key variables in the prediction equation comprised four items - namely feelings of depression/hopelessness, low self-esteem, worrying, and severe sleep disturbances. The generalized anxiety disorder scale (GAD-7) and the somatic symptom subscale (PHQ-15) did not enhance prediction substantially. In predicting suicidal ideation researchers should refrain from using ordinary regression tools. The relevant information is primarily captured by the depression subscale and should be incorporated in a nonlinear model. For clinical practice, a classification tree using only four items of the whole PHQ may be advocated. Copyright © 2018 Elsevier Inc. All rights reserved.
MEMS-based sensing and algorithm development for fall detection and gait analysis

NASA Astrophysics Data System (ADS)

Gupta, Piyush; Ramirez, Gabriel; Lie, Donald Y. C.; Dallas, Tim; Banister, Ron E.; Dentino, Andrew

2010-02-01

Falls by the elderly are highly detrimental to health, frequently resulting in injury, high medical costs, and even death. Using a MEMS-based sensing system, algorithms are being developed for detecting falls and monitoring the gait of elderly and disabled persons. In this study, wireless sensors utilize Zigbee protocols were incorporated into planar shoe insoles and a waist mounted device. The insole contains four sensors to measure pressure applied by the foot. A MEMS based tri-axial accelerometer is embedded in the insert and a second one is utilized by the waist mounted device. The primary fall detection algorithm is derived from the waist accelerometer. The differential acceleration is calculated from samples received in 1.5s time intervals. This differential acceleration provides the quantification via an energy index. From this index one may ascertain different gait and identify fall events. Once a pre-determined index threshold is exceeded, the algorithm will classify an event as a fall or a stumble. The secondary algorithm is derived from frequency analysis techniques. The analysis consists of wavelet transforms conducted on the waist accelerometer data. The insole pressure data is then used to underline discrepancies in the transforms, providing more accurate data for classifying gait and/or detecting falls. The range of the transform amplitude in the fourth iteration of a Daubechies-6 transform was found sufficient to detect and classify fall events.
Selective visual attention in object detection processes

NASA Astrophysics Data System (ADS)

Paletta, Lucas; Goyal, Anurag; Greindl, Christian

2003-03-01

Object detection is an enabling technology that plays a key role in many application areas, such as content based media retrieval. Attentive cognitive vision systems are here proposed where the focus of attention is directed towards the most relevant target. The most promising information is interpreted in a sequential process that dynamically makes use of knowledge and that enables spatial reasoning on the local object information. The presented work proposes an innovative application of attention mechanisms for object detection which is most general in its understanding of information and action selection. The attentive detection system uses a cascade of increasingly complex classifiers for the stepwise identification of regions of interest (ROIs) and recursively refined object hypotheses. While the most coarse classifiers are used to determine first approximations on a region of interest in the input image, more complex classifiers are used for more refined ROIs to give more confident estimates. Objects are modelled by local appearance based representations and in terms of posterior distributions of the object samples in eigenspace. The discrimination function to discern between objects is modeled by a radial basis functions (RBF) network that has been compared with alternative networks and been proved consistent and superior to other artifical neural networks for appearance based object recognition. The experiments were led for the automatic detection of brand objects in Formula One broadcasts within the European Commission's cognitive vision project DETECT.
Gene expression pattern recognition algorithm inferences to classify samples exposed to chemical agents

NASA Astrophysics Data System (ADS)

Bushel, Pierre R.; Bennett, Lee; Hamadeh, Hisham; Green, James; Ableson, Alan; Misener, Steve; Paules, Richard; Afshari, Cynthia

2002-06-01

We present an analysis of pattern recognition procedures used to predict the classes of samples exposed to pharmacologic agents by comparing gene expression patterns from samples treated with two classes of compounds. Rat liver mRNA samples following exposure for 24 hours with phenobarbital or peroxisome proliferators were analyzed using a 1700 rat cDNA microarray platform. Sets of genes that were consistently differentially expressed in the rat liver samples following treatment were stored in the MicroArray Project System (MAPS) database. MAPS identified 238 genes in common that possessed a low probability (P < 0.01) of being randomly detected as differentially expressed at the 95% confidence level. Hierarchical cluster analysis on the 238 genes clustered specific gene expression profiles that separated samples based on exposure to a particular class of compound.
Integrating in-situ, Landsat, and MODIS data for mapping in Southern African savannas: experiences of LCCS-based land-cover mapping in the Kalahari in Namibia.

PubMed

Hüttich, Christian; Herold, Martin; Strohbach, Ben J; Dech, Stefan

2011-05-01

Integrated ecosystem assessment initiatives are important steps towards a global biodiversity observing system. Reliable earth observation data are key information for tracking biodiversity change on various scales. Regarding the establishment of standardized environmental observation systems, a key question is: What can be observed on each scale and how can land cover information be transferred? In this study, a land cover map from a dry semi-arid savanna ecosystem in Namibia was obtained based on the UN LCCS, in-situ data, and MODIS and Landsat satellite imagery. In situ botanical relevé samples were used as baseline data for the definition of a standardized LCCS legend. A standard LCCS code for savanna vegetation types is introduced. An object-oriented segmentation of Landsat imagery was used as intermediate stage for downscaling in-situ training data on a coarse MODIS resolution. MODIS time series metrics of the growing season 2004/2005 were used to classify Kalahari vegetation types using a tree-based ensemble classifier (Random Forest). The prevailing Kalahari vegetation types based on LCCS was open broadleaved deciduous shrubland with an herbaceous layer which differs from the class assignments of the global and regional land-cover maps. The separability analysis based on Bhattacharya distance measurements applied on two LCCS levels indicated a relationship of spectral mapping dependencies of annual MODIS time series features due to the thematic detail of the classification scheme. The analysis of LCCS classifiers showed an increased significance of life-form composition and soil conditions to the mapping accuracy. An overall accuracy of 92.48% was achieved. Woody plant associations proved to be most stable due to small omission and commission errors. The case study comprised a first suitability assessment of the LCCS classifier approach for a southern African savanna ecosystem.
Determining aspects of ethnicity amongst persons of South Asian origin: the use of a surname-classification programme (Nam Pehchan).

PubMed

Macfarlane, Gary J; Lunt, Mark; Palmer, Benedict; Afzal, Cara; Silman, Alan J; Esmail, Aneez

2007-03-01

Name-based classification systems are potentially useful in identifying study samples based on probable ethnic minority group. The aim of the current study was to assess the validity of the Nam Pehchan name classification programme of religion and language against subject self-report. A population-based cross-sectional survey conducted in areas of the North-West and West Midland regions of England with a relatively high density of South Asian ethnic minority groups. The sampling frame was age-sex registers of selected general practices and subjects were classified according to language and religion using the Nam Pehchan programme. These were compared with responses by subjects on a self-complete postal questionnaire. One thousand nine hundred and forty-nine subjects who participated, classified themselves as South Asian. Sensitivity in identifying religion was high amongst Muslims (92%) and Sikhs (86%), and somewhat lower in Hindus (62%). Specificity exceeded 95% for all ethnic groups. The vast majority of subjects assigned Punjabi or Gujarati as their main South Asian language indicated that they did in fact speak these languages (97% and 94%, respectively). Subjects assigned Urdu or Bengali, however, were less likely to do so (61% and 35%, respectively). The name-based classification system Nam Pehchan has demonstrated high levels of accuracy in some sub-groups of the South Asian population in determining subjects likely language spoken and religion-and is likely to be a useful additional tool when information on ethnicity is not already available.
Online Adaboost-Based Parameterized Methods for Dynamic Distributed Network Intrusion Detection.

PubMed

Hu, Weiming; Gao, Jun; Wang, Yanguo; Wu, Ou; Maybank, Stephen

2014-01-01

Current network intrusion detection systems lack adaptability to the frequently changing network environments. Furthermore, intrusion detection in the new distributed architectures is now a major requirement. In this paper, we propose two online Adaboost-based intrusion detection algorithms. In the first algorithm, a traditional online Adaboost process is used where decision stumps are used as weak classifiers. In the second algorithm, an improved online Adaboost process is proposed, and online Gaussian mixture models (GMMs) are used as weak classifiers. We further propose a distributed intrusion detection framework, in which a local parameterized detection model is constructed in each node using the online Adaboost algorithm. A global detection model is constructed in each node by combining the local parametric models using a small number of samples in the node. This combination is achieved using an algorithm based on particle swarm optimization (PSO) and support vector machines. The global model in each node is used to detect intrusions. Experimental results show that the improved online Adaboost process with GMMs obtains a higher detection rate and a lower false alarm rate than the traditional online Adaboost process that uses decision stumps. Both the algorithms outperform existing intrusion detection algorithms. It is also shown that our PSO, and SVM-based algorithm effectively combines the local detection models into the global model in each node; the global model in a node can handle the intrusion types that are found in other nodes, without sharing the samples of these intrusion types.
Adaptive Water Sampling based on Unsupervised Clustering

NASA Astrophysics Data System (ADS)

Py, F.; Ryan, J.; Rajan, K.; Sherman, A.; Bird, L.; Fox, M.; Long, D.

2007-12-01

Autonomous Underwater Vehicles (AUVs) are widely used for oceanographic surveys, during which data is collected from a number of on-board sensors. Engineers and scientists at MBARI have extended this approach by developing a water sampler specialy for the AUV, which can sample a specific patch of water at a specific time. The sampler, named the Gulper, captures 2 liters of seawater in less than 2 seconds on a 21" MBARI Odyssey AUV. Each sample chamber of the Gulper is filled with seawater through a one-way valve, which protrudes through the fairing of the AUV. This new kind of device raises a new problem: when to trigger the gulper autonomously? For example, scientists interested in studying the mobilization and transport of shelf sediments would like to detect intermediate nepheloïd layers (INLs). To be able to detect this phenomenon we need to extract a model based on AUV sensors that can detect this feature in-situ. The formation of such a model is not obvious as identification of this feature is generally based on data from multiple sensors. We have developed an unsupervised data clustering technique to extract the different features which will then be used for on-board classification and triggering of the Gulper. We use a three phase approach: 1) use data from past missions to learn the different classes of data from sensor inputs. The clustering algorithm will then extract the set of features that can be distinguished within this large data set. 2) Scientists on shore then identify these features and point out which correspond to those of interest (e.g. nepheloïd layer, upwelling material etc) 3) Embed the corresponding classifier into the AUV control system to indicate the most probable feature of the water depending on sensory input. The triggering algorithm looks to this result and triggers the Gulper if the classifier indicates that we are within the feature of interest with a predetermined threshold of confidence. We have deployed this method of online classification and sampling based on AUV depth and HOBI Labs Hydroscat-2 sensor data. Using approximately 20,000 data samples the clustering algorithm generated 14 clusters with one identified as corresponding to a nepheloïd layer. We demonstrate that such a technique can be used to reliably and efficiently sample water based on multiple sources of data in real-time.
Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.

PubMed

Bardsiri, Mahshid Khatibi; Eftekhari, Mahdi

2014-01-01

In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.
Evaluation of the toxicity of sediments from the Anniston PCB Site to the mussel Lampsilis siliquoidea

USGS Publications Warehouse

Schein, Allison; Sinclair, Jesse A.; MacDonald, Donald D.; Ingersoll, Christopher G.; Kemble, Nile E.; Kunz, James L.

2015-01-01

The Anniston Polychlorinated Biphenyl (PCB) Site is located in the vicinity of the municipality of Anniston in Calhoun County, in the north-eastern portion of Alabama. Although there are a variety of land-use activities within the Choccolocco Creek watershed, environmental concerns in the area have focused mainly on releases of PCBs to aquatic and riparian habitats. PCBs were manufactured by Monsanto, Inc. at the Anniston facility from 1935 to 1971. The chemicals of potential concern (COPCs) in sediments at the Anniston PCB Site include: PCBs, mercury, metals, polycyclic aromatic hydrocarbons (PAHs), organochlorine and organophosphorous pesticides, volatile organic compounds (VOCs), semivolatile organic compounds (SVOCs), and polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofurans (PCDDs/PCDFs). The purpose of this study was to evaluate the toxicity of PCB-contaminated sediments to the juvenile fatmucket mussel (Lampsilis siliquoidea) and to characterize relationships between sediment chemistry and the toxicity of sediment samples collected from the Anniston PCB Site using laboratory sediment testing. Samples were collected in August 2010 from OU-4 of the Anniston PCB Site, as well as from selected reference locations. A total of 32 samples were initially collected from six test sites and one reference site within the watershed. A total of 23 of these 32 samples were evaluated in 28-day whole-sediment toxicity tests conducted with juvenile mussels (L. siliquoidea). Physical and chemical characterization of whole sediment included grain size, total organic carbon (TOC), nutrients, PCBs, parent and alkylated PAHs, organochlorine pesticides, PCDD/PCDFs, total metals, simultaneously extracted metals (SEM), and acid volatile sulfide (AVS). Sediment collected from Snow Creek and Choccolocco Creek contained a variety of COPCs. Organic contaminants detected in sediment included PCBs, organochlorine pesticides, PCDDs/PCDFs, and PAHs. In general, the highest concentrations of PCBs were associated with the highest concentrations of PAHs, PCDDs/PCDFs, and organochlorine pesticides. Specifically, sediments 08, 18, and 19 exceeded probable effect concentration quotients (PEC-Qs) of 1.0 for all organic classes of contaminants. These three sediment samples also had high concentrations of mercury and lead, which were the only metals found at elevated concentrations (i.e., above the probable effect concentration [PEC]) in the samples collected. Many sediment samples were highly contaminated with mercury, based on comparisons to samples collected from reference locations. The whole-sediment laboratory toxicity tests conducted with L. siliquoidea met the test acceptability criteria (e.g., control survival was greater than or equal to 80%). Survival of mussels was high in most samples, with 4 of 23 samples (17%) classified as toxic based on the survival endpoint. Biomass and weight were more sensitive endpoints for the L. siliquoidea toxicity tests, with both endpoints classifying 52% of the samples as toxic. Samples 19 and 30 were most toxic to L. siliquoidea, as they were classified as toxic according to all four endpoints (survival, biomass, weight, and length). Mussels were less sensitive in toxicity tests conducted with sediments from the Anniston PCB Site than Hyalella azteca and Chironomus dilutus. Biomass of L. siliquoidea was less sensitive compared to biomass of H. azteca or biomass of larval C. dilutus. Based on the most sensitive endpoint for each species, 52% of the samples were toxic to L. siliquoidea, whereas 67% of sediments were toxic to H. azteca (based on reproduction) and 65% were toxic to C. dilutus (based on adult biomass). The low-risk toxicity threshold (TTLR) was higher for L. siliquoidea biomass (e.g., 20,400 µg/kg dry weight [DW]) compared to that for H. azteca reproduction (e.g., 499 µg/kg DW) or C. dilutus adult biomass (e.g., 1,140 µg/kg DW; MacDonald et al. 2014). While mussels such as L. sili
Rapid assessment of antimicrobial resistance prevalence using a Lot Quality Assurance sampling approach.

PubMed

van Leth, Frank; den Heijer, Casper; Beerepoot, Mariëlle; Stobberingh, Ellen; Geerlings, Suzanne; Schultsz, Constance

2017-04-01

Increasing antimicrobial resistance (AMR) requires rapid surveillance tools, such as Lot Quality Assurance Sampling (LQAS). LQAS classifies AMR as high or low based on set parameters. We compared classifications with the underlying true AMR prevalence using data on 1335 Escherichia coli isolates from surveys of community-acquired urinary tract infection in women, by assessing operating curves, sensitivity and specificity. Sensitivity and specificity of any set of LQAS parameters was above 99% and between 79 and 90%, respectively. Operating curves showed high concordance of the LQAS classification with true AMR prevalence estimates. LQAS-based AMR surveillance is a feasible approach that provides timely and locally relevant estimates, and the necessary information to formulate and evaluate guidelines for empirical treatment.
Fast discrimination of traditional Chinese medicine according to geographical origins with FTIR spectroscopy and advanced pattern recognition techniques

NASA Astrophysics Data System (ADS)

Li, Ning; Wang, Yan; Xu, Kexin

2006-08-01

Combined with Fourier transform infrared (FTIR) spectroscopy and three kinds of pattern recognition techniques, 53 traditional Chinese medicine danshen samples were rapidly discriminated according to geographical origins. The results showed that it was feasible to discriminate using FTIR spectroscopy ascertained by principal component analysis (PCA). An effective model was built by employing the Soft Independent Modeling of Class Analogy (SIMCA) and PCA, and 82% of the samples were discriminated correctly. Through use of the artificial neural network (ANN)-based back propagation (BP) network, the origins of danshen were completely classified.
Qualitative analysis of pure and adulterated canola oil via SIMCA

NASA Astrophysics Data System (ADS)

Basri, Katrul Nadia; Khir, Mohd Fared Abdul; Rani, Rozina Abdul; Sharif, Zaiton; Rusop, M.; Zoolfakar, Ahmad Sabirin

2018-05-01

This paper demonstrates the utilization of near infrared (NIR) spectroscopy to classify pure and adulterated sample of canola oil. Soft Independent Modeling Class Analogies (SIMCA) algorithm was implemented to discriminate the samples to its classes. Spectral data obtained was divided using Kennard Stone algorithm into training and validation dataset by a fixed ratio of 7:3. The model accuracy obtained based on the model built is 0.99 whereas the sensitivity and precision are 0.92 and 1.00. The result showed the classification model is robust to perform qualitative analysis of canola oil for future application.
Citizen science-based water quality monitoring: Constructing a large database to characterize the impacts of combined sewer overflow in New York City.

PubMed

Farnham, David J; Gibson, Rebecca A; Hsueh, Diana Y; McGillis, Wade R; Culligan, Patricia J; Zain, Nina; Buchanan, Rob

2017-02-15

To protect recreational water users from waterborne pathogen exposure, it is crucial that waterways are monitored for the presence of harmful bacteria. In NYC, a citizen science campaign is monitoring waterways impacted by inputs of storm water and untreated sewage during periods of rainfall. However, the spatial and temporal scales over which the monitoring program can sample are constrained by cost and time, thus hindering the construction of databases that benefit both scientists and citizens. In this study, we first illustrate the scientific value of a citizen scientist monitoring campaign by using the data collected through the campaign to characterize the seasonal variability of sampled bacterial concentration as well as its response to antecedent rainfall. Second, we examine the efficacy of the HyServe Compact Dry ETC method, a lower cost and time-efficient alternative to the EPA-approved IDEXX Enterolert method for fecal indicator monitoring, through a paired sample comparison of IDEXX and HyServe (total of 424 paired samples). The HyServe and IDEXX methods return the same result for over 80% of the samples with regard to whether a water sample is above or below the EPA's recreational water quality criteria for a single sample of 110 enterococci per 100mL. The HyServe method classified as unsafe 90% of the 119 water samples that were classified as having unsafe enterococci concentrations by the more established IDEXX method. This study seeks to encourage other scientists to engage with citizen scientist communities and to also pursue the development of cost- and time-efficient methodologies to sample environmental variables that are not easily collected or analyzed in an automated manner. Copyright © 2016 Elsevier B.V. All rights reserved.
OGLE-IV Real-Time Transient Search

NASA Astrophysics Data System (ADS)

Wyrzykowski, Ł.; Kostrzewa-Rutkowska, Z.; Kozłowski, S.; Udalski, A.; Poleski, R.; Skowron, J.; Blagorodnova, N.; Kubiak, M.; Szymański, M. K.; Pietrzyński, G.; Soszyński, I.; Ulaczyk, K.; Pietrukowicz, P.; Mróz, P.

2014-09-01

We present the design and first results of a real-time search for transients within the 650 sq. deg. area around the Magellanic Clouds, conducted as part of the OGLE-IV project and aimed at detecting supernovae, novae and other events. The average sampling of about four days from September to May, yielded a detection of 238 transients in 2012/2013 and 2013/2014 seasons. The superb photometric and astrometric quality of the OGLE data allows for numerous applications of the discovered transients. We use this sample to prepare and train a Machine Learning-based automated classifier for early light curves, which distinguishes major classes of transients with more than 80% of correct answers. Spectroscopically classified 49 supernovae Type Ia are used to construct a Hubble Diagram with statistical scatter of about 0.3 mag and fill the least populated region of the redshifts range in the Union sample. We investigate the influence of host galaxy environments on supernovae statistics and find the mean host extinction of AI=0.19±0.10 mag and AV=0.39±0.21 mag based on a subsample of supernovae Type Ia. We show that the positional accuracy of the survey is of the order of 0.5 pixels (0.13'') and that the OGLE-IV Transient Detection System is capable of detecting transients within the nuclei of galaxies. We present a few interesting cases of nuclear transients of unknown type. All data on the OGLE transients are made publicly available to the astronomical community via the OGLE website.
Automated classification of articular cartilage surfaces based on surface texture.

PubMed

Stachowiak, G P; Stachowiak, G W; Podsiadlo, P

2006-11-01

In this study the automated classification system previously developed by the authors was used to classify articular cartilage surfaces with different degrees of wear. This automated system classifies surfaces based on their texture. Plug samples of sheep cartilage (pins) were run on stainless steel discs under various conditions using a pin-on-disc tribometer. Testing conditions were specifically designed to produce different severities of cartilage damage due to wear. Environmental scanning electron microscope (SEM) (ESEM) images of cartilage surfaces, that formed a database for pattern recognition analysis, were acquired. The ESEM images of cartilage were divided into five groups (classes), each class representing different wear conditions or wear severity. Each class was first examined and assessed visually. Next, the automated classification system (pattern recognition) was applied to all classes. The results of the automated surface texture classification were compared to those based on visual assessment of surface morphology. It was shown that the texture-based automated classification system was an efficient and accurate method of distinguishing between various cartilage surfaces generated under different wear conditions. It appears that the texture-based classification method has potential to become a useful tool in medical diagnostics.

Mapping raised bogs with an iterative one-class classification approach

NASA Astrophysics Data System (ADS)

Mack, Benjamin; Roscher, Ribana; Stenzel, Stefanie; Feilhauer, Hannes; Schmidtlein, Sebastian; Waske, Björn

2016-10-01

Land use and land cover maps are one of the most commonly used remote sensing products. In many applications the user only requires a map of one particular class of interest, e.g. a specific vegetation type or an invasive species. One-class classifiers are appealing alternatives to common supervised classifiers because they can be trained with labeled training data of the class of interest only. However, training an accurate one-class classification (OCC) model is challenging, particularly when facing a large image, a small class and few training samples. To tackle these problems we propose an iterative OCC approach. The presented approach uses a biased Support Vector Machine as core classifier. In an iterative pre-classification step a large part of the pixels not belonging to the class of interest is classified. The remaining data is classified by a final classifier with a novel model and threshold selection approach. The specific objective of our study is the classification of raised bogs in a study site in southeast Germany, using multi-seasonal RapidEye data and a small number of training sample. Results demonstrate that the iterative OCC outperforms other state of the art one-class classifiers and approaches for model selection. The study highlights the potential of the proposed approach for an efficient and improved mapping of small classes such as raised bogs. Overall the proposed approach constitutes a feasible approach and useful modification of a regular one-class classifier.
Classification of natural formations based on their optical characteristics using small volumes of samples

NASA Astrophysics Data System (ADS)

Abramovich, N. S.; Kovalev, A. A.; Plyuta, V. Y.

1986-02-01

A computer algorithm has been developed to classify the spectral bands of natural scenes on Earth according to their optical characteristics. The algorithm is written in FORTRAN-IV and can be used in spectral data processing programs requiring small data loads. The spectral classifications of some different types of green vegetable canopies are given in order to illustrate the effectiveness of the algorithm.
PREDICTING APHASIA TYPE FROM BRAIN DAMAGE MEASURED WITH STRUCTURAL MRI

PubMed Central

Yourganov, Grigori; Smith, Kimberly G.; Fridriksson, Julius; Rorden, Chris

2015-01-01

Chronic aphasia is a common consequence of a left-hemisphere stroke. Since the early insights by Broca and Wernicke, studying the relationship between the loci of cortical damage and patterns of language impairment has been one of the concerns of aphasiology. We utilized multivariate classification in a cross-validation framework to predict the type of chronic aphasia from the spatial pattern of brain damage. Our sample consisted of 98 patients with five types of aphasia (Broca’s, Wernicke’s, global, conduction, and anomic), classified based on scores on the Western Aphasia Battery. Binary lesion maps were obtained from structural MRI scans (obtained at least 6 months poststroke, and within 2 days of behavioural assessment); after spatial normalization, the lesions were parcellated into a disjoint set of brain areas. The proportion of damage to the brain areas was used to classify patients’ aphasia type. To create this parcellation, we relied on five brain atlases; our classifier (support vector machine) could differentiate between different kinds of aphasia using any of the five parcellations. In our sample, the best classification accuracy was obtained when using a novel parcellation that combined two previously published brain atlases, with the first atlas providing the segmentation of grey matter, and the second atlas used to segment the white matter. For each aphasia type, we computed the relative importance of different brain areas for distinguishing it from other aphasia types; our findings were consistent with previously published reports of lesion locations implicated in different types of aphasia. Overall, our results revealed that automated multivariate classification could distinguish between aphasia types based on damage to atlas-defined brain areas. PMID:26465238
Accurate determination of surface reference data in digital photographs in ice-free surfaces of Maritime Antarctica.

PubMed

Pina, Pedro; Vieira, Gonçalo; Bandeira, Lourenço; Mora, Carla

2016-12-15

The ice-free areas of Maritime Antarctica show complex mosaics of surface covers, with wide patches of diverse bare soils and rock, together with various vegetation communities dominated by lichens and mosses. The microscale variability is difficult to characterize and quantify, but is essential for ground-truthing and for defining classifiers for large areas using, for example high resolution satellite imagery, or even ultra-high resolution unmanned aerial vehicle (UAV) imagery. The main objective of this paper is to verify the ability and robustness of an automated approach to discriminate the variety of surface types in digital photographs acquired at ground level in ice-free regions of Maritime Antarctica. The proposed method is based on an object-based classification procedure built in two main steps: first, on the automated delineation of homogeneous regions (the objects) of the images through the watershed transform with adequate filtering to avoid an over-segmentation, and second, on labelling each identified object with a supervised decision classifier trained with samples of representative objects of ice-free surface types (bare rock, bare soil, moss and lichen formations). The method is evaluated with images acquired in summer campaigns in Fildes and Barton peninsulas (King George Island, South Shetlands). The best performances for the datasets of the two peninsulas are achieved with a SVM classifier with overall accuracies of about 92% and kappa values around 0.89. The excellent performances allow validating the adequacy of the approach for obtaining accurate surface reference data at the complete pixel scale (sub-metric) of current very high resolution (VHR) satellite images, instead of a common single point sampling. Copyright Â© 2016 Elsevier B.V. All rights reserved.
Predicting aphasia type from brain damage measured with structural MRI.

PubMed

Yourganov, Grigori; Smith, Kimberly G; Fridriksson, Julius; Rorden, Chris

2015-12-01

Chronic aphasia is a common consequence of a left-hemisphere stroke. Since the early insights by Broca and Wernicke, studying the relationship between the loci of cortical damage and patterns of language impairment has been one of the concerns of aphasiology. We utilized multivariate classification in a cross-validation framework to predict the type of chronic aphasia from the spatial pattern of brain damage. Our sample consisted of 98 patients with five types of aphasia (Broca's, Wernicke's, global, conduction, and anomic), classified based on scores on the Western Aphasia Battery (WAB). Binary lesion maps were obtained from structural MRI scans (obtained at least 6 months poststroke, and within 2 days of behavioural assessment); after spatial normalization, the lesions were parcellated into a disjoint set of brain areas. The proportion of damage to the brain areas was used to classify patients' aphasia type. To create this parcellation, we relied on five brain atlases; our classifier (support vector machine - SVM) could differentiate between different kinds of aphasia using any of the five parcellations. In our sample, the best classification accuracy was obtained when using a novel parcellation that combined two previously published brain atlases, with the first atlas providing the segmentation of grey matter, and the second atlas used to segment the white matter. For each aphasia type, we computed the relative importance of different brain areas for distinguishing it from other aphasia types; our findings were consistent with previously published reports of lesion locations implicated in different types of aphasia. Overall, our results revealed that automated multivariate classification could distinguish between aphasia types based on damage to atlas-defined brain areas. Copyright © 2015 Elsevier Ltd. All rights reserved.
Groundwater quality assessment of the quaternary unconsolidated sedimentary basin near the Pi river using fuzzy evaluation technique

NASA Astrophysics Data System (ADS)

Mohamed, Adam Khalifa; Liu, Dan; Mohamed, Mohamed A. A.; Song, Kai

2018-05-01

The present study was carried out to assess the groundwater quality for drinking purposes in the Quaternary Unconsolidated Sedimentary Basin of the North Chengdu Plain, China. Six groups of water samples (S1, S2, S3, S4, S5, and S6) are selected in the study area. These samples were analyzed for 19 different physicochemical water quality parameters to assess groundwater quality. The physicochemical parameters of groundwater were compared with China's Quality Standards for Groundwater (GB/T14848-93). Interpretation of physicochemical data revealed that groundwater in the basin was slightly alkaline. Total hardness and total dissolved solid values show that the investigated water is classified as very hard and fresh water, respectively. The sustainability of groundwater for drinking purposes was assessed based on the fuzzy mathematics evaluation (FME) method. The results of the assessment were classified into five groups based on their relative suitability for portable use (grade I = most suitable to grade V = least suitable), according to (GB/T 14848-93). The assessment results reveal that the quality of groundwater in most of the wells was class I, II and III and suitable for drinking purposes, but well (S2) has been found to be in class V, which is classified as very poor and cannot be used for drinking. Also, the FME method was compared with the comprehensive evaluation method. The FME method was found to be more comprehensive and reasonable to assess groundwater quality. This study can provide an important frame of reference for decision making on improving groundwater quality in the study area and nearby surrounding.
Relationship of preschool special education outcomes to instructional practices and parent-child interaction.

PubMed

Mahoney, Gerald; Wheeden, C Abigail; Perales, Frida

2004-01-01

Developmental outcomes attained by children receiving preschool special education services in relationship to both the general instructional approach used by their teachers and their parents' style of interaction were examined. The sample included 70 children from 41 Early Childhood Special Education (ECSE) classrooms. The type of instructional model children received was determined by dividing the sample into three clusters based upon six global ratings of children's classroom environment: Choice; Cognitive Problem-Solving; Child-Initiated Learning; Developmental Match; Child-Centered Routines; and Rewards and Discipline Strategies. Based on this analysis, 27 children were classified as receiving developmental instruction; 15 didactic instruction; and 28 naturalistic instruction. Observations of parent-child interaction collected at the beginning and end of the year were classified along four dimensions using the Maternal Behavior Rating Scale: Responsiveness, Affect, Achievement Orientation and Directiveness. Results indicated that the kinds of experiences that children received varied significantly across the three instructional models. However, there were no significant differences in the impact of these instructional models on children's rate of development. Regression analyses indicated that children's rate of development at the end of intervention was significantly related to their parents' style of interaction but was unrelated to the type of instructional model they received.
Assessing Local Risk of Rifampicin-Resistant Tuberculosis in KwaZulu-Natal, South Africa Using Lot Quality Assurance Sampling.

PubMed

Heidebrecht, Christine L; Podewils, Laura J; Pym, Alexander; Mthiyane, Thuli; Cohen, Ted

2016-01-01

KwaZulu-Natal (KZN) has the highest burden of notified multidrug-resistant tuberculosis (MDR TB) and extensively drug-resistant (XDR) TB cases in South Africa. A better understanding of spatial heterogeneity in the risk of drug-resistance may help to prioritize local responses. Between July 2012 and June 2013, we conducted a two-way Lot Quality Assurance Sampling (LQAS) study to classify the burden of rifampicin (RIF)-resistant TB among incident TB cases notified within the catchment areas of seven laboratories in two northern and one southern district of KZN. Decision rules for classification of areas as having either a high- or low-risk of RIF resistant TB (based on proportion of RIF resistance among all TB cases) were based on consultation with local policy makers. We classified five areas as high-risk and two as low-risk. High-risk areas were identified in both Southern and Northern districts, with the greatest proportion of RIF resistance observed in the northernmost area, the Manguzi community situated on the Mozambique border. Our study revealed heterogeneity in the risk of RIF resistant disease among incident TB cases in KZN. This study demonstrates the potential for LQAS to detect geographic heterogeneity in areas where access to drug susceptibility testing is limited.
Assessing Local Risk of Rifampicin-Resistant Tuberculosis in KwaZulu-Natal, South Africa Using Lot Quality Assurance Sampling

PubMed Central

Heidebrecht, Christine L.; Podewils, Laura J.; Pym, Alexander; Mthiyane, Thuli; Cohen, Ted

2016-01-01

Background KwaZulu-Natal (KZN) has the highest burden of notified multidrug-resistant tuberculosis (MDR TB) and extensively drug-resistant (XDR) TB cases in South Africa. A better understanding of spatial heterogeneity in the risk of drug-resistance may help to prioritize local responses. Methods Between July 2012 and June 2013, we conducted a two-way Lot Quality Assurance Sampling (LQAS) study to classify the burden of rifampicin (RIF)-resistant TB among incident TB cases notified within the catchment areas of seven laboratories in two northern and one southern district of KZN. Decision rules for classification of areas as having either a high- or low-risk of RIF resistant TB (based on proportion of RIF resistance among all TB cases) were based on consultation with local policy makers. Results We classified five areas as high-risk and two as low-risk. High-risk areas were identified in both Southern and Northern districts, with the greatest proportion of RIF resistance observed in the northernmost area, the Manguzi community situated on the Mozambique border. Conclusion Our study revealed heterogeneity in the risk of RIF resistant disease among incident TB cases in KZN. This study demonstrates the potential for LQAS to detect geographic heterogeneity in areas where access to drug susceptibility testing is limited. PMID:27050561
vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria.

PubMed

Bolduc, Benjamin; Jang, Ho Bin; Doulcier, Guilhem; You, Zhi-Qiang; Roux, Simon; Sullivan, Matthew B

2017-01-01

Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined and found to represent areas of viral genome 'sequence space' that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.
vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

PubMed Central

Doulcier, Guilhem; You, Zhi-Qiang; Roux, Simon

2017-01-01

Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined and found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure. PMID:28480138
Classification of buildings mold threat using electronic nose

NASA Astrophysics Data System (ADS)

Łagód, Grzegorz; Suchorab, Zbigniew; Guz, Łukasz; Sobczuk, Henryk

2017-07-01

Mold is considered to be one of the most important features of Sick Building Syndrome and is an important problem in current building industry. In many cases it is caused by the rising moisture of building envelopes surface and exaggerated humidity of indoor air. Concerning historical buildings it is mostly caused by outdated raising techniques among that is absence of horizontal isolation against moisture and hygroscopic materials applied for construction. Recent buildings also suffer problem of mold risk which is caused in many cases by hermetization leading to improper performance of gravitational ventilation systems that make suitable conditions for mold development. Basing on our research there is proposed a method of buildings mold threat classification using electronic nose, based on a gas sensors array which consists of MOS sensors (metal oxide semiconductor). Used device is frequently applied for air quality assessment in environmental engineering branches. Presented results show the interpretation of e-nose readouts of indoor air sampled in rooms threatened with mold development in comparison with clean reference rooms and synthetic air. Obtained multivariate data were processed, visualized and classified using a PCA (Principal Component Analysis) and ANN (Artificial Neural Network) methods. Described investigation confirmed that electronic nose - gas sensors array supported with data processing enables to classify air samples taken from different rooms affected with mold.
vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bolduc, Benjamin; Jang, Ho Bin; Doulcier, Guilhem

Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined andmore » found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.« less
vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria

DOE PAGES

Bolduc, Benjamin; Jang, Ho Bin; Doulcier, Guilhem; ...

2017-05-03

Taxonomic classification of archaeal and bacterial viruses is challenging, yet also fundamental for developing a predictive understanding of microbial ecosystems. Recent identification of hundreds of thousands of new viral genomes and genome fragments, whose hosts remain unknown, requires a paradigm shift away from traditional classification approaches and towards the use of genomes for taxonomy. Here we revisited the use of genomes and their protein content as a means for developing a viral taxonomy for bacterial and archaeal viruses. A network-based analytic was evaluated and benchmarked against authority-accepted taxonomic assignments and found to be largely concordant. Exceptions were manually examined andmore » found to represent areas of viral genome ‘sequence space’ that are under-sampled or prone to excessive genetic exchange. While both cases are poorly resolved by genome-based taxonomic approaches, the former will improve as viral sequence space is better sampled and the latter are uncommon. Finally, given the largely robust taxonomic capabilities of this approach, we sought to enable researchers to easily and systematically classify new viruses. Thus, we established a tool, vConTACT, as an app at iVirus, where it operates as a fast, highly scalable, user-friendly app within the free and powerful CyVerse cyberinfrastructure.« less
Quasi-Supervised Scoring of Human Sleep in Polysomnograms Using Augmented Input Variables

PubMed Central

Yaghouby, Farid; Sunderam, Sridhar

2015-01-01

The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18 to 79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models—specifically Gaussian mixtures and hidden Markov models—are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's K statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. PMID:25679475
Quasi-supervised scoring of human sleep in polysomnograms using augmented input variables.

PubMed

Yaghouby, Farid; Sunderam, Sridhar

2015-04-01

The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18-79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models-specifically Gaussian mixtures and hidden Markov models--are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's Κ statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. Copyright © 2015 Elsevier Ltd. All rights reserved.
Less is more: Avoiding the LIBS dimensionality curse through judicious feature selection for explosive detection.

PubMed

Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R; Barman, Ishan; Kumar Gundawar, Manoj

2015-08-19

Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the 'curse of dimensionality' have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers -based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations.
Aesthetic preference recognition of 3D shapes using EEG.

PubMed

Chew, Lin Hou; Teo, Jason; Mountstephens, James

2016-04-01

Recognition and identification of aesthetic preference is indispensable in industrial design. Humans tend to pursue products with aesthetic values and make buying decisions based on their aesthetic preferences. The existence of neuromarketing is to understand consumer responses toward marketing stimuli by using imaging techniques and recognition of physiological parameters. Numerous studies have been done to understand the relationship between human, art and aesthetics. In this paper, we present a novel preference-based measurement of user aesthetics using electroencephalogram (EEG) signals for virtual 3D shapes with motion. The 3D shapes are designed to appear like bracelets, which is generated by using the Gielis superformula. EEG signals were collected by using a medical grade device, the B-Alert X10 from advance brain monitoring, with a sampling frequency of 256 Hz and resolution of 16 bits. The signals obtained when viewing 3D bracelet shapes were decomposed into alpha, beta, theta, gamma and delta rhythm by using time-frequency analysis, then classified into two classes, namely like and dislike by using support vector machines and K-nearest neighbors (KNN) classifiers respectively. Classification accuracy of up to 80 % was obtained by using KNN with the alpha, theta and delta rhythms as the features extracted from frontal channels, Fz, F3 and F4 to classify two classes, like and dislike.
A three-parameter model for classifying anurans into four genera based on advertisement calls.

PubMed

Gingras, Bruno; Fitch, William Tecumseh

2013-01-01

The vocalizations of anurans are innate in structure and may therefore contain indicators of phylogenetic history. Thus, advertisement calls of species which are more closely related phylogenetically are predicted to be more similar than those of distant species. This hypothesis was evaluated by comparing several widely used machine-learning algorithms. Recordings of advertisement calls from 142 species belonging to four genera were analyzed. A logistic regression model, using mean values for dominant frequency, coefficient of variation of root-mean square energy, and spectral flux, correctly classified advertisement calls with regard to genus with an accuracy above 70%. Similar accuracy rates were obtained using these parameters with a support vector machine model, a K-nearest neighbor algorithm, and a multivariate Gaussian distribution classifier, whereas a Gaussian mixture model performed slightly worse. In contrast, models based on mel-frequency cepstral coefficients did not fare as well. Comparable accuracy levels were obtained on out-of-sample recordings from 52 of the 142 original species. The results suggest that a combination of low-level acoustic attributes is sufficient to discriminate efficiently between the vocalizations of these four genera, thus supporting the initial premise and validating the use of high-throughput algorithms on animal vocalizations to evaluate phylogenetic hypotheses.
Less is more: Avoiding the LIBS dimensionality curse through judicious feature selection for explosive detection

PubMed Central

Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R.; Barman, Ishan; Kumar Gundawar, Manoj

2015-01-01

Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the ‘curse of dimensionality’ have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers –based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations. PMID:26286630

The classification of almonds (Prunus dulcis) by country and variety using UHPLC-HRMS-based untargeted metabolomics.

PubMed

Gil Solsona, R; Boix, C; Ibáñez, M; Sancho, J V

2018-03-01

The aim of this study was to use an untargeted UHPLC-HRMS-based metabolomics approach allowing discrimination between almonds based on their origin and variety. Samples were homogenised, extracted with ACN:H 2 O (80:20) containing 0.1% HCOOH and injected in a UHPLC-QTOF instrument in both positive and negative ionisation modes. Principal component analysis (PCA) was performed to ensure the absence of outliers. Partial least squares - discriminant analysis (PLS-DA) was employed to create and validate the models for country (with five different compounds) and variety (with 20 features), showing more than 95% accuracy. Additional samples were injected and the model was evaluated with blind samples, with more than 95% of samples being correctly classified using both models. MS/MS experiments were carried out to tentatively elucidate the highlighted marker compounds (pyranosides, peptides or amino acids, among others). This study has shown the potential of high-resolution mass spectrometry to perform and validate classification models, also providing information concerning the identification of the unexpected biomarkers which showed the highest discriminant power.
Visual terrain mapping for traversable path planning of mobile robots

NASA Astrophysics Data System (ADS)

Shirkhodaie, Amir; Amrani, Rachida; Tunstel, Edward W.

2004-10-01

In this paper, we have primarily discussed technical challenges and navigational skill requirements of mobile robots for traversability path planning in natural terrain environments similar to Mars surface terrains. We have described different methods for detection of salient terrain features based on imaging texture analysis techniques. We have also presented three competing techniques for terrain traversability assessment of mobile robots navigating in unstructured natural terrain environments. These three techniques include: a rule-based terrain classifier, a neural network-based terrain classifier, and a fuzzy-logic terrain classifier. Each proposed terrain classifier divides a region of natural terrain into finite sub-terrain regions and classifies terrain condition exclusively within each sub-terrain region based on terrain visual clues. The Kalman Filtering technique is applied for aggregative fusion of sub-terrain assessment results. The last two terrain classifiers are shown to have remarkable capability for terrain traversability assessment of natural terrains. We have conducted a comparative performance evaluation of all three terrain classifiers and presented the results in this paper.
Classifying Imbalanced Data Streams via Dynamic Feature Group Weighting with Importance Sampling.

PubMed

Wu, Ke; Edwards, Andrea; Fan, Wei; Gao, Jing; Zhang, Kun

2014-04-01

Data stream classification and imbalanced data learning are two important areas of data mining research. Each has been well studied to date with many interesting algorithms developed. However, only a few approaches reported in literature address the intersection of these two fields due to their complex interplay. In this work, we proposed an importance sampling driven, dynamic feature group weighting framework (DFGW-IS) for classifying data streams of imbalanced distribution. Two components are tightly incorporated into the proposed approach to address the intrinsic characteristics of concept-drifting, imbalanced streaming data. Specifically, the ever-evolving concepts are tackled by a weighted ensemble trained on a set of feature groups with each sub-classifier (i.e. a single classifier or an ensemble) weighed by its discriminative power and stable level. The un-even class distribution, on the other hand, is typically battled by the sub-classifier built in a specific feature group with the underlying distribution rebalanced by the importance sampling technique. We derived the theoretical upper bound for the generalization error of the proposed algorithm. We also studied the empirical performance of our method on a set of benchmark synthetic and real world data, and significant improvement has been achieved over the competing algorithms in terms of standard evaluation metrics and parallel running time. Algorithm implementations and datasets are available upon request.
Classifying Microorganisms.

ERIC Educational Resources Information Center

Baker, William P.; Leyva, Kathryn J.; Lang, Michael; Goodmanis, Ben

2002-01-01

Focuses on an activity in which students sample air at school and generate ideas about how to classify the microorganisms they observe. The results are used to compare air quality among schools via the Internet. Supports the development of scientific inquiry and technology skills. (DDR)
Classification and Verification of Handwritten Signatures with Time Causal Information Theory Quantifiers

PubMed Central

Ospina, Raydonal; Frery, Alejandro C.

2016-01-01

We present a new approach for handwritten signature classification and verification based on descriptors stemming from time causal information theory. The proposal uses the Shannon entropy, the statistical complexity, and the Fisher information evaluated over the Bandt and Pompe symbolization of the horizontal and vertical coordinates of signatures. These six features are easy and fast to compute, and they are the input to an One-Class Support Vector Machine classifier. The results are better than state-of-the-art online techniques that employ higher-dimensional feature spaces which often require specialized software and hardware. We assess the consistency of our proposal with respect to the size of the training sample, and we also use it to classify the signatures into meaningful groups. PMID:27907014
Self-Regulation Mediates the Relationship between Learner Typology and Achievement in At - Risk Children

PubMed Central

Weed, Keri; Keogh, Deborah; Borkowski, John G.; Whitman, Thomas; Noria, Christine W.

2010-01-01

A person-centered approach was used to explore the mediating role of self-regulation between learner typology at age 8 and academic achievement at age 14while controlling for domain-specific achievement in a longitudinal sample of 113 children born to adolescent mothers. Children were classified into one of 5 learner typologies at age 8based on interactive patterns of intellectual, achievement, and adaptive abilities. Typology classification explained significant variance in both reading and mathematics achievement at age 14. A bootstrapping approach confirmed that self-regulation mediated the relationship between typology and reading and mathematical achievement for children from all typologies except those classified as Cognitively and Adaptively Challenged. Implications of person-centered approaches for understanding processes involved with achievement are discussed. PMID:21278904
Sample Selection for Training Cascade Detectors.

PubMed

Vállez, Noelia; Deniz, Oscar; Bueno, Gloria

2015-01-01

Automatic detection systems usually require large and representative training datasets in order to obtain good detection and false positive rates. Training datasets are such that the positive set has few samples and/or the negative set should represent anything except the object of interest. In this respect, the negative set typically contains orders of magnitude more images than the positive set. However, imbalanced training databases lead to biased classifiers. In this paper, we focus our attention on a negative sample selection method to properly balance the training data for cascade detectors. The method is based on the selection of the most informative false positive samples generated in one stage to feed the next stage. The results show that the proposed cascade detector with sample selection obtains on average better partial AUC and smaller standard deviation than the other compared cascade detectors.
Infrared dim-small target tracking via singular value decomposition and improved Kernelized correlation filter

NASA Astrophysics Data System (ADS)

Qian, Kun; Zhou, Huixin; Rong, Shenghui; Wang, Bingjian; Cheng, Kuanhong

2017-05-01

Infrared small target tracking plays an important role in applications including military reconnaissance, early warning and terminal guidance. In this paper, an effective algorithm based on the Singular Value Decomposition (SVD) and the improved Kernelized Correlation Filter (KCF) is presented for infrared small target tracking. Firstly, the super performance of the SVD-based algorithm is that it takes advantage of the target's global information and obtains a background estimation of an infrared image. A dim target is enhanced by subtracting the corresponding estimated background with update from the original image. Secondly, the KCF algorithm is combined with Gaussian Curvature Filter (GCF) to eliminate the excursion problem. The GCF technology is adopted to preserve the edge and eliminate the noise of the base sample in the KCF algorithm, helping to calculate the classifier parameter for a small target. At last, the target position is estimated with a response map, which is obtained via the kernelized classifier. Experimental results demonstrate that the presented algorithm performs favorably in terms of efficiency and accuracy, compared with several state-of-the-art algorithms.
Diagnostic Accuracy and Cost-Effectiveness of Alternative Methods for Detection of Soil-Transmitted Helminths in a Post-Treatment Setting in Western Kenya

PubMed Central

Kepha, Stella; Kihara, Jimmy H.; Njenga, Sammy M.; Pullan, Rachel L.; Brooker, Simon J.

2014-01-01

Objectives This study evaluates the diagnostic accuracy and cost-effectiveness of the Kato-Katz and Mini-FLOTAC methods for detection of soil-transmitted helminths (STH) in a post-treatment setting in western Kenya. A cost analysis also explores the cost implications of collecting samples during school surveys when compared to household surveys. Methods Stool samples were collected from children (n = 652) attending 18 schools in Bungoma County and diagnosed by the Kato-Katz and Mini-FLOTAC coprological methods. Sensitivity and additional diagnostic performance measures were analyzed using Bayesian latent class modeling. Financial and economic costs were calculated for all survey and diagnostic activities, and cost per child tested, cost per case detected and cost per STH infection correctly classified were estimated. A sensitivity analysis was conducted to assess the impact of various survey parameters on cost estimates. Results Both diagnostic methods exhibited comparable sensitivity for detection of any STH species over single and consecutive day sampling: 52.0% for single day Kato-Katz; 49.1% for single-day Mini-FLOTAC; 76.9% for consecutive day Kato-Katz; and 74.1% for consecutive day Mini-FLOTAC. Diagnostic performance did not differ significantly between methods for the different STH species. Use of Kato-Katz with school-based sampling was the lowest cost scenario for cost per child tested ($10.14) and cost per case correctly classified ($12.84). Cost per case detected was lowest for Kato-Katz used in community-based sampling ($128.24). Sensitivity analysis revealed the cost of case detection for any STH decreased non-linearly as prevalence rates increased and was influenced by the number of samples collected. Conclusions The Kato-Katz method was comparable in diagnostic sensitivity to the Mini-FLOTAC method, but afforded greater cost-effectiveness. Future work is required to evaluate the cost-effectiveness of STH surveillance in different settings. PMID:24810593
Diagnostic accuracy and cost-effectiveness of alternative methods for detection of soil-transmitted helminths in a post-treatment setting in western Kenya.

PubMed

Assefa, Liya M; Crellen, Thomas; Kepha, Stella; Kihara, Jimmy H; Njenga, Sammy M; Pullan, Rachel L; Brooker, Simon J

2014-05-01

This study evaluates the diagnostic accuracy and cost-effectiveness of the Kato-Katz and Mini-FLOTAC methods for detection of soil-transmitted helminths (STH) in a post-treatment setting in western Kenya. A cost analysis also explores the cost implications of collecting samples during school surveys when compared to household surveys. Stool samples were collected from children (n = 652) attending 18 schools in Bungoma County and diagnosed by the Kato-Katz and Mini-FLOTAC coprological methods. Sensitivity and additional diagnostic performance measures were analyzed using Bayesian latent class modeling. Financial and economic costs were calculated for all survey and diagnostic activities, and cost per child tested, cost per case detected and cost per STH infection correctly classified were estimated. A sensitivity analysis was conducted to assess the impact of various survey parameters on cost estimates. Both diagnostic methods exhibited comparable sensitivity for detection of any STH species over single and consecutive day sampling: 52.0% for single day Kato-Katz; 49.1% for single-day Mini-FLOTAC; 76.9% for consecutive day Kato-Katz; and 74.1% for consecutive day Mini-FLOTAC. Diagnostic performance did not differ significantly between methods for the different STH species. Use of Kato-Katz with school-based sampling was the lowest cost scenario for cost per child tested ($10.14) and cost per case correctly classified ($12.84). Cost per case detected was lowest for Kato-Katz used in community-based sampling ($128.24). Sensitivity analysis revealed the cost of case detection for any STH decreased non-linearly as prevalence rates increased and was influenced by the number of samples collected. The Kato-Katz method was comparable in diagnostic sensitivity to the Mini-FLOTAC method, but afforded greater cost-effectiveness. Future work is required to evaluate the cost-effectiveness of STH surveillance in different settings.
Hexabromocyclododecane determination in seafood samples collected from Japanese coastal areas.

PubMed

Nakagawa, R; Murata, S; Ashizuka, Y; Shintani, Y; Hori, T; Tsutsumi, T

2010-09-01

The levels of three hexabromocyclododecane (HBCD) isomers and ΣHBCDs in 54 wild and 11 farmed seafood samples collected from four regions of Japan were determined by LC/MS/MS. For the fish classified as Anguilliformes, Perciformes, Clupeiformes and farmed Salmoniformes, the medians (ranges) of ΣHBCDs are 2.09 (0.05-36.9), 0.75 (ND-26.2), 0.12 (0.09-77.3) and 1.29 (1.09-1.34) ng g(-1)ww, respectively. However, HBCDs were not detected in samples classified as Crustacea, Mollusca, Pleuronectiformes and Scorpaeniformes, or if detected, the levels were very low. The rank correlation between ΣHBCDs (or α-HBCD) and fat content could not be found except for the Japanese sea bass of the Tohoku region. In HBCD isomer profiles, for fish samples above 20 ng g(-1)ww, the trend was found that γ-HBCD was predominant, which suggests the influence of discharge from a nearby industrial plant. In the other wild fish and the farmed fish samples, on the other hand, α-HBCD was mostly predominant, which suggests biomagnification via the food chain. Additionally, to assess the risk to human health, based on the determined HBCD median concentrations for Anguilliformes, farmed Salmoniformes and Perciformes, the daily intake of HBCDs from fish by an average Japanese adult was tentatively calculated to be 3.7, 2.3 and 1.3 ng (kg body weight)(-1) d(-1), respectively. Copyright © 2010 Elsevier Ltd. All rights reserved.
Thin-layer chromatographic identification of Chinese propolis using chemometric fingerprinting.

PubMed

Tang, Tie-xin; Guo, Wei-yan; Xu, Ye; Zhang, Si-ming; Xu, Xin-jun; Wang, Dong-mei; Zhao, Zhi-min; Zhu, Long-ping; Yang, De-po

2014-01-01

Poplar tree gum has a similar chemical composition and appearance to Chinese propolis (bee glue) and has been widely used as a counterfeit propolis because Chinese propolis is typically the poplar-type propolis, the chemical composition of which is determined mainly by the resin of poplar trees. The discrimination of Chinese propolis from poplar tree gum is a challenging task. To develop a rapid thin-layer chromatographic (TLC) identification method using chemometric fingerprinting to discriminate Chinese propolis from poplar tree gum. A new TLC method using a combination of ammonia and hydrogen peroxide vapours as the visualisation reagent was developed to characterise the chemical profile of Chinese propolis. Three separate people performed TLC on eight Chinese propolis samples and three poplar tree gum samples of varying origins. Five chemometric methods, including similarity analysis, hierarchical clustering, k-means clustering, neural network and support vector machine, were compared for use in classifying the samples based on their densitograms obtained from the TLC chromatograms via image analysis. Hierarchical clustering, neural network and support vector machine analyses achieved a correct classification rate of 100% in classifying the samples. A strategy for TLC identification of Chinese propolis using chemometric fingerprinting was proposed and it provided accurate sample classification. The study has shown that the TLC identification method using chemometric fingerprinting is a rapid, low-cost method for the discrimination of Chinese propolis from poplar tree gum and may be used for the quality control of Chinese propolis. Copyright © 2014 John Wiley & Sons, Ltd.
Self-supervised online metric learning with low rank constraint for scene categorization.

PubMed

Cong, Yang; Liu, Ji; Yuan, Junsong; Luo, Jiebo

2013-08-01

Conventional visual recognition systems usually train an image classifier in a bath mode with all training data provided in advance. However, in many practical applications, only a small amount of training samples are available in the beginning and many more would come sequentially during online recognition. Because the image data characteristics could change over time, it is important for the classifier to adapt to the new data incrementally. In this paper, we present an online metric learning method to address the online scene recognition problem via adaptive similarity measurement. Given a number of labeled data followed by a sequential input of unseen testing samples, the similarity metric is learned to maximize the margin of the distance among different classes of samples. By considering the low rank constraint, our online metric learning model not only can provide competitive performance compared with the state-of-the-art methods, but also guarantees convergence. A bi-linear graph is also defined to model the pair-wise similarity, and an unseen sample is labeled depending on the graph-based label propagation, while the model can also self-update using the more confident new samples. With the ability of online learning, our methodology can well handle the large-scale streaming video data with the ability of incremental self-updating. We evaluate our model to online scene categorization and experiments on various benchmark datasets and comparisons with state-of-the-art methods demonstrate the effectiveness and efficiency of our algorithm.
Drought Management Activities of the National Drought Mitigation Center (NDMC): Contributions Toward a Global Drought Early Warning System (GDEWS)

NASA Astrophysics Data System (ADS)

Stumpf, A.; Lachiche, N.; Malet, J.; Kerle, N.; Puissant, A.

2011-12-01

VHR satellite images have become a primary source for landslide inventory mapping after major triggering events such as earthquakes and heavy rainfalls. Visual image interpretation is still the prevailing standard method for operational purposes but is time-consuming and not well suited to fully exploit the increasingly better supply of remote sensing data. Recent studies have addressed the development of more automated image analysis workflows for landslide inventory mapping. In particular object-oriented approaches that account for spatial and textural image information have been demonstrated to be more adequate than pixel-based classification but manually elaborated rule-based classifiers are difficult to adapt under changing scene characteristics. Machine learning algorithm allow learning classification rules for complex image patterns from labelled examples and can be adapted straightforwardly with available training data. In order to reduce the amount of costly training data active learning (AL) has evolved as a key concept to guide the sampling for many applications. The underlying idea of AL is to initialize a machine learning model with a small training set, and to subsequently exploit the model state and data structure to iteratively select the most valuable samples that should be labelled by the user. With relatively few queries and labelled samples, an AL strategy yields higher accuracies than an equivalent classifier trained with many randomly selected samples. This study addressed the development of an AL method for landslide mapping from VHR remote sensing images with special consideration of the spatial distribution of the samples. Our approach [1] is based on the Random Forest algorithm and considers the classifier uncertainty as well as the variance of potential sampling regions to guide the user towards the most valuable sampling areas. The algorithm explicitly searches for compact regions and thereby avoids a spatially disperse sampling pattern inherent to most other AL methods. The accuracy, the sampling time and the computational runtime of the algorithm were evaluated on multiple satellite images capturing recent large scale landslide events. Sampling between 1-4% of the study areas the accuracies between 74% and 80% were achieved, whereas standard sampling schemes yielded only accuracies between 28% and 50% with equal sampling costs. Compared to commonly used point-wise AL algorithm the proposed approach significantly reduces the number of iterations and hence the computational runtime. Since the user can focus on relatively few compact areas (rather than on hundreds of distributed points) the overall labeling time is reduced by more than 50% compared to point-wise queries. An experimental evaluation of multiple expert mappings demonstrated strong relationships between the uncertainties of the experts and the machine learning model. It revealed that the achieved accuracies are within the range of the inter-expert disagreement and that it will be indispensable to consider ground truth uncertainties to truly achieve further enhancements in the future. The proposed method is generally applicable to a wide range of optical satellite images and landslide types. [1] A. Stumpf, N. Lachiche, J.-P. Malet, N. Kerle, and A. Puissant, Active learning in the spatial domain for remote sensing image classification, IEEE Transactions on Geosciece and Remote Sensing. 2013, DOI 10.1109/TGRS.2013.2262052.
Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.

PubMed

Bokulich, Nicholas A; Kaehler, Benjamin D; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A; Gregory Caporaso, J

2018-05-17

Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
Performance of the BioPlex 2200 HIV Ag-Ab assay for identifying acute HIV infection.

PubMed

Eshleman, Susan H; Piwowar-Manning, Estelle; Sivay, Mariya V; Debevec, Barbara; Veater, Stephanie; McKinstry, Laura; Bekker, Linda-Gail; Mannheimer, Sharon; Grant, Robert M; Chesney, Margaret A; Coates, Thomas J; Koblin, Beryl A; Fogel, Jessica M

Assays that detect HIV antigen (Ag) and antibody (Ab) can be used to screen for HIV infection. To compare the performance of the BioPlex 2200 HIV Ag-Ab assay and two other Ag/Ab combination assays for detection of acute HIV infection. Samples were obtained from 24 individuals (18 from the US, 6 from South Africa); these individuals were classified as having acute infection based on the following criteria: positive qualitative RNA assay; two negative rapid tests; negative discriminatory test. The samples were tested with the BioPlex assay, the ARCHITECT HIV Ag/Ab Combo test, the Bio-Rad GS HIV Combo Ag-Ab EIA test, and a viral load assay. Twelve (50.0%) of 24 samples had RNA detected only ( > 40 to 13,476 copies/mL). Ten (43.5%) samples had reactive results with all three Ag/Ab assays, one sample was reactive with the ARCHITECT and Bio-Rad assays, and one sample was reactive with the Bio-Rad and BioPlex assays. The 11 samples that were reactive with the BioPlex assay had viral loads from 83,010 to >750,000 copies/mL; 9/11 samples were classified as Ag positive/Ab negative by the BioPlex assay. Detection of acute HIV infection was similar for the BioPlex assay and two other Ag/Ab assays. All three tests were less sensitive than a qualitative RNA assay and only detected HIV Ag when the viral load was high. The BioPlex assay detected acute infection in about half of the cases, and identified most of those infections as Ag positive/Ab negative. Copyright © 2018 Elsevier B.V. All rights reserved.
Seroprevalence and molecular epidemiology of EAST1 gene-carrying Escherichia coli from diarrheal patients and raw meats.

PubMed

Sukkua, Kannika; Manothong, Somruthai; Sukhumungoon, Pharanai

2017-03-31

Several Escherichia coli pathotypes have been reported in Thailand; however, information on enteroaggregative heat-stable enterotoxin 1 (EAST1)-carrying E. coli (EAST1-EC) is insufficient. Previous reports show that consumption of raw meats causes diarrheagenic E. coli infections. In this study, we investigated the seroprevalence and genetic relationship of EAST1-EC from clinical and raw meat samples. Diarrheal patients and raw meat samples were investigated for the presence of EAST1-EC by performing polymerase chain reaction (PCR) to detect astA. Serotyping, antimicrobial susceptibility tests, and PCR-based phylogenetic group assay were performed. Molecular epidemiology of E. coli strains from clinical and raw meat samples was determined using repetitive element-PCR typing, BOX-PCR, and ERIC2-PCR. Results showed that 11.2% (17/152) of clinical samples and 53.3% (16/30) of raw meat samples had EAST1-EC. In all, 24 and 36 EAST1-EC strains were successfully isolated from 17 clinical and 16 raw meat samples, respectively. These strains had astA but did not possess the indicative genes of other E. coli pathotypes and were therefore classified as EAST1-EC. Most of these strains were multidrug resistant and were classified into nine serogroups. Molecular genotyping showed identical DNA fingerprint among EAST1-EC serotype O15 strains from clinical and raw chicken samples, suggesting that they were derived from the same bacterial clone. Our results indicated a high prevalence of multidrug-resistant EAST1-EC strains in clinical and environmental samples in Thailand belonging to nine serogroups. Moreover, the study highlighted the close association between infections caused by EAST1-EC serotype O15 and raw meat consumption.
Challenging the Cancer Molecular Stratification Dogma: Intratumoral Heterogeneity Undermines Consensus Molecular Subtypes and Potential Diagnostic Value in Colorectal Cancer.

PubMed

Dunne, Philip D; McArt, Darragh G; Bradley, Conor A; O'Reilly, Paul G; Barrett, Helen L; Cummins, Robert; O'Grady, Tony; Arthur, Ken; Loughrey, Maurice B; Allen, Wendy L; McDade, Simon S; Waugh, David J; Hamilton, Peter W; Longley, Daniel B; Kay, Elaine W; Johnston, Patrick G; Lawler, Mark; Salto-Tellez, Manuel; Van Schaeybroeck, Sandra

2016-08-15

A number of independent gene expression profiling studies have identified transcriptional subtypes in colorectal cancer with potential diagnostic utility, culminating in publication of a colorectal cancer Consensus Molecular Subtype classification. The worst prognostic subtype has been defined by genes associated with stem-like biology. Recently, it has been shown that the majority of genes associated with this poor prognostic group are stromal derived. We investigated the potential for tumor misclassification into multiple diagnostic subgroups based on tumoral region sampled. We performed multiregion tissue RNA extraction/transcriptomic analysis using colorectal-specific arrays on invasive front, central tumor, and lymph node regions selected from tissue samples from 25 colorectal cancer patients. We identified a consensus 30-gene list, which represents the intratumoral heterogeneity within a cohort of primary colorectal cancer tumors. Using a series of online datasets, we showed that this gene list displays prognostic potential HR = 2.914 (confidence interval 0.9286-9.162) in stage II/III colorectal cancer patients, but in addition, we demonstrated that these genes are stromal derived, challenging the assumption that poor prognosis tumors with stem-like biology have undergone a widespread epithelial-mesenchymal transition. Most importantly, we showed that patients can be simultaneously classified into multiple diagnostically relevant subgroups based purely on the tumoral region analyzed. Gene expression profiles derived from the nonmalignant stromal region can influence assignment of colorectal cancer transcriptional subtypes, questioning the current molecular classification dogma and highlighting the need to consider pathology sampling region and degree of stromal infiltration when employing transcription-based classifiers to underpin clinical decision making in colorectal cancer. Clin Cancer Res; 22(16); 4095-104. ©2016 AACRSee related commentary by Morris and Kopetz, p. 3989. ©2016 American Association for Cancer Research.
Authentication of beef versus horse meat using 60 MHz 1H NMR spectroscopy

PubMed Central

Jakes, W.; Gerdova, A.; Defernez, M.; Watson, A.D.; McCallum, C.; Limer, E.; Colquhoun, I.J.; Williamson, D.C.; Kemsley, E.K.

2015-01-01

This work reports a candidate screening protocol to distinguish beef from horse meat based upon comparison of triglyceride signatures obtained by 60 MHz 1H NMR spectroscopy. Using a simple chloroform-based extraction, we obtained classic low-field triglyceride spectra from typically a 10 min acquisition time. Peak integration was sufficient to differentiate samples of fresh beef (76 extractions) and horse (62 extractions) using Naïve Bayes classification. Principal component analysis gave a two-dimensional “authentic” beef region (p = 0.001) against which further spectra could be compared. This model was challenged using a subset of 23 freeze–thawed training samples. The outcomes indicated that storing samples by freezing does not adversely affect the analysis. Of a further collection of extractions from previously unseen samples, 90/91 beef spectra were classified as authentic, and 16/16 horse spectra as non-authentic. We conclude that 60 MHz 1H NMR represents a feasible high-throughput approach for screening raw meat. PMID:25577043
[Combining speech sample and feature bilateral selection algorithm for classification of Parkinson's disease].

PubMed

Zhang, Xiaoheng; Wang, Lirui; Cao, Yao; Wang, Pin; Zhang, Cheng; Yang, Liuyang; Li, Yongming; Zhang, Yanling; Cheng, Oumei

2018-02-01

Diagnosis of Parkinson's disease (PD) based on speech data has been proved to be an effective way in recent years. However, current researches just care about the feature extraction and classifier design, and do not consider the instance selection. Former research by authors showed that the instance selection can lead to improvement on classification accuracy. However, no attention is paid on the relationship between speech sample and feature until now. Therefore, a new diagnosis algorithm of PD is proposed in this paper by simultaneously selecting speech sample and feature based on relevant feature weighting algorithm and multiple kernel method, so as to find their synergy effects, thereby improving classification accuracy. Experimental results showed that this proposed algorithm obtained apparent improvement on classification accuracy. It can obtain mean classification accuracy of 82.5%, which was 30.5% higher than the relevant algorithm. Besides, the proposed algorithm detected the synergy effects of speech sample and feature, which is valuable for speech marker extraction.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.