Prototype-Incorporated Emotional Neural Network.
Oyedotun, Oyebade K; Khashman, Adnan
2017-08-15
Artificial neural networks (ANNs) aim to simulate the biological neural activities. Interestingly, many ''engineering'' prospects in ANN have relied on motivations from cognition and psychology studies. So far, two important learning theories that have been subject of active research are the prototype and adaptive learning theories. The learning rules employed for ANNs can be related to adaptive learning theory, where several examples of the different classes in a task are supplied to the network for adjusting internal parameters. Conversely, the prototype-learning theory uses prototypes (representative examples); usually, one prototype per class of the different classes contained in the task. These prototypes are supplied for systematic matching with new examples so that class association can be achieved. In this paper, we propose and implement a novel neural network algorithm based on modifying the emotional neural network (EmNN) model to unify the prototype- and adaptive-learning theories. We refer to our new model as ``prototype-incorporated EmNN''. Furthermore, we apply the proposed model to two real-life challenging tasks, namely, static hand-gesture recognition and face recognition, and compare the result to those obtained using the popular back-propagation neural network (BPNN), emotional BPNN (EmNN), deep networks, an exemplar classification model, and k-nearest neighbor.
Fall Detection System for the Elderly Based on the Classification of Shimmer Sensor Prototype Data
Ahmed, Moiz; Mehmood, Nadeem; Mehmood, Amir; Rizwan, Kashif
2017-01-01
Objectives Falling in the elderly is considered a major cause of death. In recent years, ambient and wireless sensor platforms have been extensively used in developed countries for the detection of falls in the elderly. However, we believe extra efforts are required to address this issue in developing countries, such as Pakistan, where most deaths due to falls are not even reported. Considering this, in this paper, we propose a fall detection system prototype that s based on the classification on real time shimmer sensor data. Methods We first developed a data set, ‘SMotion’ of certain postures that could lead to falls in the elderly by using a body area network of Shimmer sensors and categorized the items in this data set into age and weight groups. We developed a feature selection and classification system using three classifiers, namely, support vector machine (SVM), K-nearest neighbor (KNN), and neural network (NN). Finally, a prototype was fabricated to generate alerts to caregivers, health experts, or emergency services in case of fall. Results To evaluate the proposed system, SVM, KNN, and NN were used. The results of this study identified KNN as the most accurate classifier with maximum accuracy of 96% for age groups and 93% for weight groups. Conclusions In this paper, a classification-based fall detection system is proposed. For this purpose, the SMotion data set was developed and categorized into two groups (age and weight groups). The proposed fall detection system for the elderly is implemented through a body area sensor network using third-generation sensors. The evaluation results demonstrate the reasonable performance of the proposed fall detection prototype system in the tested scenarios. PMID:28875049
Automatic classification and detection of clinically relevant images for diabetic retinopathy
NASA Astrophysics Data System (ADS)
Xu, Xinyu; Li, Baoxin
2008-03-01
We proposed a novel approach to automatic classification of Diabetic Retinopathy (DR) images and retrieval of clinically-relevant DR images from a database. Given a query image, our approach first classifies the image into one of the three categories: microaneurysm (MA), neovascularization (NV) and normal, and then it retrieves DR images that are clinically-relevant to the query image from an archival image database. In the classification stage, the query DR images are classified by the Multi-class Multiple-Instance Learning (McMIL) approach, where images are viewed as bags, each of which contains a number of instances corresponding to non-overlapping blocks, and each block is characterized by low-level features including color, texture, histogram of edge directions, and shape. McMIL first learns a collection of instance prototypes for each class that maximizes the Diverse Density function using Expectation- Maximization algorithm. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a point in a new multi-class bag feature space. Finally a multi-class Support Vector Machine is trained in the multi-class bag feature space. In the retrieval stage, we retrieve images from the archival database who bear the same label with the query image, and who are the top K nearest neighbors of the query image in terms of similarity in the multi-class bag feature space. The classification approach achieves high classification accuracy, and the retrieval of clinically-relevant images not only facilitates utilization of the vast amount of hidden diagnostic knowledge in the database, but also improves the efficiency and accuracy of DR lesion diagnosis and assessment.
K-Nearest Neighbor Algorithm Optimization in Text Categorization
NASA Astrophysics Data System (ADS)
Chen, Shufeng
2018-01-01
K-Nearest Neighbor (KNN) classification algorithm is one of the simplest methods of data mining. It has been widely used in classification, regression and pattern recognition. The traditional KNN method has some shortcomings such as large amount of sample computation and strong dependence on the sample library capacity. In this paper, a method of representative sample optimization based on CURE algorithm is proposed. On the basis of this, presenting a quick algorithm QKNN (Quick k-nearest neighbor) to find the nearest k neighbor samples, which greatly reduces the similarity calculation. The experimental results show that this algorithm can effectively reduce the number of samples and speed up the search for the k nearest neighbor samples to improve the performance of the algorithm.
Generative Models for Similarity-based Classification
2007-01-01
NC), local nearest centroid (local NC), k-nearest neighbors ( kNN ), and condensed nearest neighbors (CNN) are all similarity-based classifiers which...vector machine to the k nearest neighbors of the test sample [80]. The SVM- KNN method was developed to address the robustness and dimensionality...concerns that afflict nearest neighbors and SVMs. Similarly to the nearest-means classifier, the SVM- KNN is a hybrid local and global classifier developed
NASA Astrophysics Data System (ADS)
Fatollahi, Amir H.; Khorrami, Mohammad; Shariati, Ahmad; Aghamohammadi, Amir
2011-04-01
A complete classification is given for one-dimensional chains with nearest-neighbor interactions having two states in each site, for which a matrix product ground state exists. The Hamiltonians and their corresponding matrix product ground states are explicitly obtained.
Fuzzy-Rough Nearest Neighbour Classification
NASA Astrophysics Data System (ADS)
Jensen, Richard; Cornelis, Chris
A new fuzzy-rough nearest neighbour (FRNN) classification algorithm is presented in this paper, as an alternative to Sarkar's fuzzy-rough ownership function (FRNN-O) approach. By contrast to the latter, our method uses the nearest neighbours to construct lower and upper approximations of decision classes, and classifies test instances based on their membership to these approximations. In the experimental analysis, we evaluate our approach with both classical fuzzy-rough approximations (based on an implicator and a t-norm), as well as with the recently introduced vaguely quantified rough sets. Preliminary results are very good, and in general FRNN outperforms FRNN-O, as well as the traditional fuzzy nearest neighbour (FNN) algorithm.
K-Nearest Neighbor Estimation of Forest Attributes: Improving Mapping Efficiency
Andrew O. Finley; Alan R. Ek; Yun Bai; Marvin E. Bauer
2005-01-01
This paper describes our efforts in refining k-nearest neighbor forest attributes classification using U.S. Department of Agriculture Forest Service Forest Inventory and Analysis plot data and Landsat 7 Enhanced Thematic Mapper Plus imagery. The analysis focuses on FIA-defined forest type classification across St. Louis County in northeastern Minnesota. We outline...
NASA Technical Reports Server (NTRS)
Salu, Yehuda; Tilton, James
1993-01-01
The classification of multispectral image data obtained from satellites has become an important tool for generating ground cover maps. This study deals with the application of nonparametric pixel-by-pixel classification methods in the classification of pixels, based on their multispectral data. A new neural network, the Binary Diamond, is introduced, and its performance is compared with a nearest neighbor algorithm and a back-propagation network. The Binary Diamond is a multilayer, feed-forward neural network, which learns from examples in unsupervised, 'one-shot' mode. It recruits its neurons according to the actual training set, as it learns. The comparisons of the algorithms were done by using a realistic data base, consisting of approximately 90,000 Landsat 4 Thematic Mapper pixels. The Binary Diamond and the nearest neighbor performances were close, with some advantages to the Binary Diamond. The performance of the back-propagation network lagged behind. An efficient nearest neighbor algorithm, the binned nearest neighbor, is described. Ways for improving the performances, such as merging categories, and analyzing nonboundary pixels, are addressed and evaluated.
Rotationally Invariant Image Representation for Viewing Direction Classification in Cryo-EM
Zhao, Zhizhen; Singer, Amit
2014-01-01
We introduce a new rotationally invariant viewing angle classification method for identifying, among a large number of cryo-EM projection images, similar views without prior knowledge of the molecule. Our rotationally invariant features are based on the bispectrum. Each image is denoised and compressed using steerable principal component analysis (PCA) such that rotating an image is equivalent to phase shifting the expansion coefficients. Thus we are able to extend the theory of bispectrum of 1D periodic signals to 2D images. The randomized PCA algorithm is then used to efficiently reduce the dimensionality of the bispectrum coefficients, enabling fast computation of the similarity between any pair of images. The nearest neighbors provide an initial classification of similar viewing angles. In this way, rotational alignment is only performed for images with their nearest neighbors. The initial nearest neighbor classification and alignment are further improved by a new classification method called vector diffusion maps. Our pipeline for viewing angle classification and alignment is experimentally shown to be faster and more accurate than reference-free alignment with rotationally invariant K-means clustering, MSA/MRA 2D classification, and their modern approximations. PMID:24631969
An Improvement To The k-Nearest Neighbor Classifier For ECG Database
NASA Astrophysics Data System (ADS)
Jaafar, Haryati; Hidayah Ramli, Nur; Nasir, Aimi Salihah Abdul
2018-03-01
The k nearest neighbor (kNN) is a non-parametric classifier and has been widely used for pattern classification. However, in practice, the performance of kNN often tends to fail due to the lack of information on how the samples are distributed among them. Moreover, kNN is no longer optimal when the training samples are limited. Another problem observed in kNN is regarding the weighting issues in assigning the class label before classification. Thus, to solve these limitations, a new classifier called Mahalanobis fuzzy k-nearest centroid neighbor (MFkNCN) is proposed in this study. Here, a Mahalanobis distance is applied to avoid the imbalance of samples distribition. Then, a surrounding rule is employed to obtain the nearest centroid neighbor based on the distributions of training samples and its distance to the query point. Consequently, the fuzzy membership function is employed to assign the query point to the class label which is frequently represented by the nearest centroid neighbor Experimental studies from electrocardiogram (ECG) signal is applied in this study. The classification performances are evaluated in two experimental steps i.e. different values of k and different sizes of feature dimensions. Subsequently, a comparative study of kNN, kNCN, FkNN and MFkCNN classifier is conducted to evaluate the performances of the proposed classifier. The results show that the performance of MFkNCN consistently exceeds the kNN, kNCN and FkNN with the best classification rates of 96.5%.
Frog sound identification using extended k-nearest neighbor classifier
NASA Astrophysics Data System (ADS)
Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati
2017-09-01
Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.
Large margin nearest neighbor classifiers.
Domeniconi, Carlotta; Gunopulos, Dimitrios; Peng, Jing
2005-07-01
The nearest neighbor technique is a simple and appealing approach to addressing classification problems. It relies on the assumption of locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with a finite number of examples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. The employment of a locally adaptive metric becomes crucial in order to keep class conditional probabilities close to uniform, thereby minimizing the bias of estimates. We propose a technique that computes a locally flexible metric by means of support vector machines (SVMs). The decision function constructed by SVMs is used to determine the most discriminant direction in a neighborhood around the query. Such a direction provides a local feature weighting scheme. We formally show that our method increases the margin in the weighted space where classification takes place. Moreover, our method has the important advantage of online computational efficiency over competing locally adaptive techniques for nearest neighbor classification. We demonstrate the efficacy of our method using both real and simulated data.
Li, Qingbo; Hao, Can; Kang, Xue; Zhang, Jialin; Sun, Xuejun; Wang, Wenbo; Zeng, Haishan
2017-11-27
Combining Fourier transform infrared spectroscopy (FTIR) with endoscopy, it is expected that noninvasive, rapid detection of colorectal cancer can be performed in vivo in the future. In this study, Fourier transform infrared spectra were collected from 88 endoscopic biopsy colorectal tissue samples (41 colitis and 47 cancers). A new method, viz., entropy weight local-hyperplane k-nearest-neighbor (EWHK), which is an improved version of K-local hyperplane distance nearest-neighbor (HKNN), is proposed for tissue classification. In order to avoid limiting high dimensions and small values of the nearest neighbor, the new EWHK method calculates feature weights based on information entropy. The average results of the random classification showed that the EWHK classifier for differentiating cancer from colitis samples produced a sensitivity of 81.38% and a specificity of 92.69%.
Vehicle Classification Using an Imbalanced Dataset Based on a Single Magnetic Sensor.
Xu, Chang; Wang, Yingguan; Bao, Xinghe; Li, Fengrong
2018-05-24
This paper aims to improve the accuracy of automatic vehicle classifiers for imbalanced datasets. Classification is made through utilizing a single anisotropic magnetoresistive sensor, with the models of vehicles involved being classified into hatchbacks, sedans, buses, and multi-purpose vehicles (MPVs). Using time domain and frequency domain features in combination with three common classification algorithms in pattern recognition, we develop a novel feature extraction method for vehicle classification. These three common classification algorithms are the k-nearest neighbor, the support vector machine, and the back-propagation neural network. Nevertheless, a problem remains with the original vehicle magnetic dataset collected being imbalanced, and may lead to inaccurate classification results. With this in mind, we propose an approach called SMOTE, which can further boost the performance of classifiers. Experimental results show that the k-nearest neighbor (KNN) classifier with the SMOTE algorithm can reach a classification accuracy of 95.46%, thus minimizing the effect of the imbalance.
Chikh, Mohamed Amine; Saidi, Meryem; Settouti, Nesma
2012-10-01
The use of expert systems and artificial intelligence techniques in disease diagnosis has been increasing gradually. Artificial Immune Recognition System (AIRS) is one of the methods used in medical classification problems. AIRS2 is a more efficient version of the AIRS algorithm. In this paper, we used a modified AIRS2 called MAIRS2 where we replace the K- nearest neighbors algorithm with the fuzzy K-nearest neighbors to improve the diagnostic accuracy of diabetes diseases. The diabetes disease dataset used in our work is retrieved from UCI machine learning repository. The performances of the AIRS2 and MAIRS2 are evaluated regarding classification accuracy, sensitivity and specificity values. The highest classification accuracy obtained when applying the AIRS2 and MAIRS2 using 10-fold cross-validation was, respectively 82.69% and 89.10%.
NASA Technical Reports Server (NTRS)
Jayroe, R. R., Jr.
1976-01-01
Geographical correction effects on LANDSAT image data are identified, using the nearest neighbor, bilinear interpolation and bicubic interpolation techniques. Potential impacts of registration on image compression and classification are explored.
An incremental knowledge assimilation system (IKAS) for mine detection
NASA Astrophysics Data System (ADS)
Porway, Jake; Raju, Chaitanya; Varadarajan, Karthik Mahesh; Nguyen, Hieu; Yadegar, Joseph
2010-04-01
In this paper we present an adaptive incremental learning system for underwater mine detection and classification that utilizes statistical models of seabed texture and an adaptive nearest-neighbor classifier to identify varied underwater targets in many different environments. The first stage of processing uses our Background Adaptive ANomaly detector (BAAN), which identifies statistically likely target regions using Gabor filter responses over the image. Using this information, BAAN classifies the background type and updates its detection using background-specific parameters. To perform classification, a Fully Adaptive Nearest Neighbor (FAAN) determines the best label for each detection. FAAN uses an extremely fast version of Nearest Neighbor to find the most likely label for the target. The classifier perpetually assimilates new and relevant information into its existing knowledge database in an incremental fashion, allowing improved classification accuracy and capturing concept drift in the target classes. Experiments show that the system achieves >90% classification accuracy on underwater mine detection tasks performed on synthesized datasets provided by the Office of Naval Research. We have also demonstrated that the system can incrementally improve its detection accuracy by constantly learning from new samples.
Method of Grassland Information Extraction Based on Multi-Level Segmentation and Cart Model
NASA Astrophysics Data System (ADS)
Qiao, Y.; Chen, T.; He, J.; Wen, Q.; Liu, F.; Wang, Z.
2018-04-01
It is difficult to extract grassland accurately by traditional classification methods, such as supervised method based on pixels or objects. This paper proposed a new method combing the multi-level segmentation with CART (classification and regression tree) model. The multi-level segmentation which combined the multi-resolution segmentation and the spectral difference segmentation could avoid the over and insufficient segmentation seen in the single segmentation mode. The CART model was established based on the spectral characteristics and texture feature which were excavated from training sample data. Xilinhaote City in Inner Mongolia Autonomous Region was chosen as the typical study area and the proposed method was verified by using visual interpretation results as approximate truth value. Meanwhile, the comparison with the nearest neighbor supervised classification method was obtained. The experimental results showed that the total precision of classification and the Kappa coefficient of the proposed method was 95 % and 0.9, respectively. However, the total precision of classification and the Kappa coefficient of the nearest neighbor supervised classification method was 80 % and 0.56, respectively. The result suggested that the accuracy of classification proposed in this paper was higher than the nearest neighbor supervised classification method. The experiment certificated that the proposed method was an effective extraction method of grassland information, which could enhance the boundary of grassland classification and avoid the restriction of grassland distribution scale. This method was also applicable to the extraction of grassland information in other regions with complicated spatial features, which could avoid the interference of woodland, arable land and water body effectively.
Analysis of miRNA expression profile based on SVM algorithm
NASA Astrophysics Data System (ADS)
Ting-ting, Dai; Chang-ji, Shan; Yan-shou, Dong; Yi-duo, Bian
2018-05-01
Based on mirna expression spectrum data set, a new data mining algorithm - tSVM - KNN (t statistic with support vector machine - k nearest neighbor) is proposed. the idea of the algorithm is: firstly, the feature selection of the data set is carried out by the unified measurement method; Secondly, SVM - KNN algorithm, which combines support vector machine (SVM) and k - nearest neighbor (k - nearest neighbor) is used as classifier. Simulation results show that SVM - KNN algorithm has better classification ability than SVM and KNN alone. Tsvm - KNN algorithm only needs 5 mirnas to obtain 96.08 % classification accuracy in terms of the number of mirna " tags" and recognition accuracy. compared with similar algorithms, tsvm - KNN algorithm has obvious advantages.
Privacy Preserving Nearest Neighbor Search
NASA Astrophysics Data System (ADS)
Shaneck, Mark; Kim, Yongdae; Kumar, Vipin
Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this chapter we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. We prove the security of these algorithms under the semi-honest adversarial model, and describe methods that can be used to optimize their performance. Keywords: Privacy Preserving Data Mining, Nearest Neighbor Search, Outlier Detection, Clustering, Classification, Secure Multiparty Computation
AVNM: A Voting based Novel Mathematical Rule for Image Classification.
Vidyarthi, Ankit; Mittal, Namita
2016-12-01
In machine learning, the accuracy of the system depends upon classification result. Classification accuracy plays an imperative role in various domains. Non-parametric classifier like K-Nearest Neighbor (KNN) is the most widely used classifier for pattern analysis. Besides its easiness, simplicity and effectiveness characteristics, the main problem associated with KNN classifier is the selection of a number of nearest neighbors i.e. "k" for computation. At present, it is hard to find the optimal value of "k" using any statistical algorithm, which gives perfect accuracy in terms of low misclassification error rate. Motivated by the prescribed problem, a new sample space reduction weighted voting mathematical rule (AVNM) is proposed for classification in machine learning. The proposed AVNM rule is also non-parametric in nature like KNN. AVNM uses the weighted voting mechanism with sample space reduction to learn and examine the predicted class label for unidentified sample. AVNM is free from any initial selection of predefined variable and neighbor selection as found in KNN algorithm. The proposed classifier also reduces the effect of outliers. To verify the performance of the proposed AVNM classifier, experiments are made on 10 standard datasets taken from UCI database and one manually created dataset. The experimental result shows that the proposed AVNM rule outperforms the KNN classifier and its variants. Experimentation results based on confusion matrix accuracy parameter proves higher accuracy value with AVNM rule. The proposed AVNM rule is based on sample space reduction mechanism for identification of an optimal number of nearest neighbor selections. AVNM results in better classification accuracy and minimum error rate as compared with the state-of-art algorithm, KNN, and its variants. The proposed rule automates the selection of nearest neighbor selection and improves classification rate for UCI dataset and manually created dataset. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Salari, Nader; Shohaimi, Shamarina; Najafi, Farid; Nallappan, Meenakshii; Karishnarajah, Isthrinayagy
2014-01-01
Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the proposed model in terms of classification accuracy is desirable, promising, and competitive to the existing state-of-the-art classification models. PMID:25419659
Predicting Flavonoid UGT Regioselectivity
Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip
2011-01-01
Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849
Histogram Curve Matching Approaches for Object-based Image Classification of Land Cover and Land Use
Toure, Sory I.; Stow, Douglas A.; Weeks, John R.; Kumar, Sunil
2013-01-01
The classification of image-objects is usually done using parametric statistical measures of central tendency and/or dispersion (e.g., mean or standard deviation). The objectives of this study were to analyze digital number histograms of image objects and evaluate classifications measures exploiting characteristic signatures of such histograms. Two histograms matching classifiers were evaluated and compared to the standard nearest neighbor to mean classifier. An ADS40 airborne multispectral image of San Diego, California was used for assessing the utility of curve matching classifiers in a geographic object-based image analysis (GEOBIA) approach. The classifications were performed with data sets having 0.5 m, 2.5 m, and 5 m spatial resolutions. Results show that histograms are reliable features for characterizing classes. Also, both histogram matching classifiers consistently performed better than the one based on the standard nearest neighbor to mean rule. The highest classification accuracies were produced with images having 2.5 m spatial resolution. PMID:24403648
Autonomous target recognition using remotely sensed surface vibration measurements
NASA Astrophysics Data System (ADS)
Geurts, James; Ruck, Dennis W.; Rogers, Steven K.; Oxley, Mark E.; Barr, Dallas N.
1993-09-01
The remotely measured surface vibration signatures of tactical military ground vehicles are investigated for use in target classification and identification friend or foe (IFF) systems. The use of remote surface vibration sensing by a laser radar reduces the effects of partial occlusion, concealment, and camouflage experienced by automatic target recognition systems using traditional imagery in a tactical battlefield environment. Linear Predictive Coding (LPC) efficiently represents the vibration signatures and nearest neighbor classifiers exploit the LPC feature set using a variety of distortion metrics. Nearest neighbor classifiers achieve an 88 percent classification rate in an eight class problem, representing a classification performance increase of thirty percent from previous efforts. A novel confidence figure of merit is implemented to attain a 100 percent classification rate with less than 60 percent rejection. The high classification rates are achieved on a target set which would pose significant problems to traditional image-based recognition systems. The targets are presented to the sensor in a variety of aspects and engine speeds at a range of 1 kilometer. The classification rates achieved demonstrate the benefits of using remote vibration measurement in a ground IFF system. The signature modeling and classification system can also be used to identify rotary and fixed-wing targets.
Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan
2014-01-01
Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.
Distributed Adaptive Binary Quantization for Fast Nearest Neighbor Search.
Xianglong Liu; Zhujin Li; Cheng Deng; Dacheng Tao
2017-11-01
Hashing has been proved an attractive technique for fast nearest neighbor search over big data. Compared with the projection based hashing methods, prototype-based ones own stronger power to generate discriminative binary codes for the data with complex intrinsic structure. However, existing prototype-based methods, such as spherical hashing and K-means hashing, still suffer from the ineffective coding that utilizes the complete binary codes in a hypercube. To address this problem, we propose an adaptive binary quantization (ABQ) method that learns a discriminative hash function with prototypes associated with small unique binary codes. Our alternating optimization adaptively discovers the prototype set and the code set of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes, and enjoys the fast training linear to the number of the training data. We further devise a distributed framework for the large-scale learning, which can significantly speed up the training of ABQ in the distributed environment that has been widely deployed in many areas nowadays. The extensive experiments on four large-scale (up to 80 million) data sets demonstrate that our method significantly outperforms state-of-the-art hashing methods, with up to 58.84% performance gains relatively.
[Galaxy/quasar classification based on nearest neighbor method].
Li, Xiang-Ru; Lu, Yu; Zhou, Jian-Ming; Wang, Yong-Jun
2011-09-01
With the wide application of high-quality CCD in celestial spectrum imagery and the implementation of many large sky survey programs (e. g., Sloan Digital Sky Survey (SDSS), Two-degree-Field Galaxy Redshift Survey (2dF), Spectroscopic Survey Telescope (SST), Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) program and Large Synoptic Survey Telescope (LSST) program, etc.), celestial observational data are coming into the world like torrential rain. Therefore, to utilize them effectively and fully, research on automated processing methods for celestial data is imperative. In the present work, we investigated how to recognizing galaxies and quasars from spectra based on nearest neighbor method. Galaxies and quasars are extragalactic objects, they are far away from earth, and their spectra are usually contaminated by various noise. Therefore, it is a typical problem to recognize these two types of spectra in automatic spectra classification. Furthermore, the utilized method, nearest neighbor, is one of the most typical, classic, mature algorithms in pattern recognition and data mining, and often is used as a benchmark in developing novel algorithm. For applicability in practice, it is shown that the recognition ratio of nearest neighbor method (NN) is comparable to the best results reported in the literature based on more complicated methods, and the superiority of NN is that this method does not need to be trained, which is useful in incremental learning and parallel computation in mass spectral data processing. In conclusion, the results in this work are helpful for studying galaxies and quasars spectra classification.
Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance
NASA Astrophysics Data System (ADS)
Ruan, Yue; Xue, Xiling; Liu, Heng; Tan, Jianing; Li, Xi
2017-11-01
K-nearest neighbors (KNN) algorithm is a common algorithm used for classification, and also a sub-routine in various complicated machine learning tasks. In this paper, we presented a quantum algorithm (QKNN) for implementing this algorithm based on the metric of Hamming distance. We put forward a quantum circuit for computing Hamming distance between testing sample and each feature vector in the training set. Taking advantage of this method, we realized a good analog for classical KNN algorithm by setting a distance threshold value t to select k - n e a r e s t neighbors. As a result, QKNN achieves O( n 3) performance which is only relevant to the dimension of feature vectors and high classification accuracy, outperforms Llyod's algorithm (Lloyd et al. 2013) and Wiebe's algorithm (Wiebe et al. 2014).
Non-parametric analysis of LANDSAT maps using neural nets and parallel computers
NASA Technical Reports Server (NTRS)
Salu, Yehuda; Tilton, James
1991-01-01
Nearest neighbor approaches and a new neural network, the Binary Diamond, are used for the classification of images of ground pixels obtained by LANDSAT satellite. The performances are evaluated by comparing classifications of a scene in the vicinity of Washington DC. The problem of optimal selection of categories is addressed as a step in the classification process.
Applied algorithm in the liner inspection of solid rocket motors
NASA Astrophysics Data System (ADS)
Hoffmann, Luiz Felipe Simões; Bizarria, Francisco Carlos Parquet; Bizarria, José Walter Parquet
2018-03-01
In rocket motors, the bonding between the solid propellant and thermal insulation is accomplished by a thin adhesive layer, known as liner. The liner application method involves a complex sequence of tasks, which includes in its final stage, the surface integrity inspection. Nowadays in Brazil, an expert carries out a thorough visual inspection to detect defects on the liner surface that may compromise the propellant interface bonding. Therefore, this paper proposes an algorithm that uses the photometric stereo technique and the K-nearest neighbor (KNN) classifier to assist the expert in the surface inspection. Photometric stereo allows the surface information recovery of the test images, while the KNN method enables image pixels classification into two classes: non-defect and defect. Tests performed on a computer vision based prototype validate the algorithm. The positive results suggest that the algorithm is feasible and when implemented in a real scenario, will be able to help the expert in detecting defective areas on the liner surface.
1980-12-05
classification procedures that are common in speech processing. The anesthesia level classification by EEG time series population screening problem example is in...formance. The use of the KL number type metric in NN rule classification, in a delete-one subj ect ’s EE-at-a-time KL-NN and KL- kNN classification of the...17 individual labeled EEG sample population using KL-NN and KL- kNN rules. The results obtained are shown in Table 1. The entries in the table indicate
Research on cardiovascular disease prediction based on distance metric learning
NASA Astrophysics Data System (ADS)
Ni, Zhuang; Liu, Kui; Kang, Guixia
2018-04-01
Distance metric learning algorithm has been widely applied to medical diagnosis and exhibited its strengths in classification problems. The k-nearest neighbour (KNN) is an efficient method which treats each feature equally. The large margin nearest neighbour classification (LMNN) improves the accuracy of KNN by learning a global distance metric, which did not consider the locality of data distributions. In this paper, we propose a new distance metric algorithm adopting cosine metric and LMNN named COS-SUBLMNN which takes more care about local feature of data to overcome the shortage of LMNN and improve the classification accuracy. The proposed methodology is verified on CVDs patient vector derived from real-world medical data. The Experimental results show that our method provides higher accuracy than KNN and LMNN did, which demonstrates the effectiveness of the Risk predictive model of CVDs based on COS-SUBLMNN.
Rule groupings in expert systems using nearest neighbour decision rules, and convex hulls
NASA Technical Reports Server (NTRS)
Anastasiadis, Stergios
1991-01-01
Expert System shells are lacking in many areas of software engineering. Large rule based systems are not semantically comprehensible, difficult to debug, and impossible to modify or validate. Partitioning a set of rules found in CLIPS (C Language Integrated Production System) into groups of rules which reflect the underlying semantic subdomains of the problem, will address adequately the concerns stated above. Techniques are introduced to structure a CLIPS rule base into groups of rules that inherently have common semantic information. The concepts involved are imported from the field of A.I., Pattern Recognition, and Statistical Inference. Techniques focus on the areas of feature selection, classification, and a criteria of how 'good' the classification technique is, based on Bayesian Decision Theory. A variety of distance metrics are discussed for measuring the 'closeness' of CLIPS rules and various Nearest Neighbor classification algorithms are described based on the above metric.
A Coupled k-Nearest Neighbor Algorithm for Multi-Label Classification
2015-05-22
classification, an image may contain several concepts simultaneously, such as beach, sunset and kangaroo . Such tasks are usually denoted as multi-label...informatics, a gene can belong to both metabolism and transcription classes; and in music categorization, a song may labeled as Mozart and sad. In the
Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio
NASA Astrophysics Data System (ADS)
Nababan, A. A.; Sitompul, O. S.; Tulus
2018-04-01
K- Nearest Neighbor (KNN) is a good classifier, but from several studies, the result performance accuracy of KNN still lower than other methods. One of the causes of the low accuracy produced, because each attribute has the same effect on the classification process, while some less relevant characteristics lead to miss-classification of the class assignment for new data. In this research, we proposed Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio as a parameter to see the correlation between each attribute in the data and the Gain Ratio also will be used as the basis for weighting each attribute of the dataset. The accuracy of results is compared to the accuracy acquired from the original KNN method using 10-fold Cross-Validation with several datasets from the UCI Machine Learning repository and KEEL-Dataset Repository, such as abalone, glass identification, haberman, hayes-roth and water quality status. Based on the result of the test, the proposed method was able to increase the classification accuracy of KNN, where the highest difference of accuracy obtained hayes-roth dataset is worth 12.73%, and the lowest difference of accuracy obtained in the abalone dataset of 0.07%. The average result of the accuracy of all dataset increases the accuracy by 5.33%.
Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar
2018-06-07
Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Schmalz, M.; Ritter, G.; Key, R.
Accurate and computationally efficient spectral signature classification is a crucial step in the nonimaging detection and recognition of spaceborne objects. In classical hyperspectral recognition applications using linear mixing models, signature classification accuracy depends on accurate spectral endmember discrimination [1]. If the endmembers cannot be classified correctly, then the signatures cannot be classified correctly, and object recognition from hyperspectral data will be inaccurate. In practice, the number of endmembers accurately classified often depends linearly on the number of inputs. This can lead to potentially severe classification errors in the presence of noise or densely interleaved signatures. In this paper, we present an comparison of emerging technologies for nonimaging spectral signature classfication based on a highly accurate, efficient search engine called Tabular Nearest Neighbor Encoding (TNE) [3,4] and a neural network technology called Morphological Neural Networks (MNNs) [5]. Based on prior results, TNE can optimize its classifier performance to track input nonergodicities, as well as yield measures of confidence or caution for evaluation of classification results. Unlike neural networks, TNE does not have a hidden intermediate data structure (e.g., the neural net weight matrix). Instead, TNE generates and exploits a user-accessible data structure called the agreement map (AM), which can be manipulated by Boolean logic operations to effect accurate classifier refinement algorithms. The open architecture and programmability of TNE's agreement map processing allows a TNE programmer or user to determine classification accuracy, as well as characterize in detail the signatures for which TNE did not obtain classification matches, and why such mis-matches occurred. In this study, we will compare TNE and MNN based endmember classification, using performance metrics such as probability of correct classification (Pd) and rate of false detections (Rfa). As proof of principle, we analyze classification of multiple closely spaced signatures from a NASA database of space material signatures. Additional analysis pertains to computational complexity and noise sensitivity, which are superior to Bayesian techniques based on classical neural networks. [1] Winter, M.E. "Fast autonomous spectral end-member determination in hyperspectral data," in Proceedings of the 13th International Conference On Applied Geologic Remote Sensing, Vancouver, B.C., Canada, pp. 337-44 (1999). [2] N. Keshava, "A survey of spectral unmixing algorithms," Lincoln Laboratory Journal 14:55-78 (2003). [3] Key, G., M.S. SCHMALZ, F.M. Caimi, and G.X. Ritter. "Performance analysis of tabular nearest neighbor encoding algorithm for joint compression and ATR", in Proceedings SPIE 3814:115-126 (1999). [4] Schmalz, M.S. and G. Key. "Algorithms for hyperspectral signature classification in unresolved object detection using tabular nearest neighbor encoding" in Proceedings of the 2007 AMOS Conference, Maui HI (2007). [5] Ritter, G.X., G. Urcid, and M.S. Schmalz. "Autonomous single-pass endmember approximation using lattice auto-associative memories", Neurocomputing (Elsevier), accepted (June 2008).
Understanding Tumor Dormancy as a Means of Secondary Prevention
2015-10-14
CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME OF RESPONSIBLE PERSON USAMRMC a. REPORT Unclassified b. ABSTRACT Unclassified c ...Nearest Person month worked: 10 calendar month Contribution to project: Elvin Wagenblast has left CSHL. His effort will be replaced by Simon Knott ...beginning October 15, 2015 Name: Simon Knott Project Role: Post Doc Nearest Person month worked: 0 Contribution to project: Simon will be
Natural Language Processing Based Instrument for Classification of Free Text Medical Records
2016-01-01
According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful. PMID:27668260
The minimum distance approach to classification
NASA Technical Reports Server (NTRS)
Wacker, A. G.; Landgrebe, D. A.
1971-01-01
The work to advance the state-of-the-art of miminum distance classification is reportd. This is accomplished through a combination of theoretical and comprehensive experimental investigations based on multispectral scanner data. A survey of the literature for suitable distance measures was conducted and the results of this survey are presented. It is shown that minimum distance classification, using density estimators and Kullback-Leibler numbers as the distance measure, is equivalent to a form of maximum likelihood sample classification. It is also shown that for the parametric case, minimum distance classification is equivalent to nearest neighbor classification in the parameter space.
Emotion recognition from multichannel EEG signals using K-nearest neighbor classification.
Li, Mi; Xu, Hongpei; Liu, Xingwang; Lu, Shengfu
2018-04-27
Many studies have been done on the emotion recognition based on multi-channel electroencephalogram (EEG) signals. This paper explores the influence of the emotion recognition accuracy of EEG signals in different frequency bands and different number of channels. We classified the emotional states in the valence and arousal dimensions using different combinations of EEG channels. Firstly, DEAP default preprocessed data were normalized. Next, EEG signals were divided into four frequency bands using discrete wavelet transform, and entropy and energy were calculated as features of K-nearest neighbor Classifier. The classification accuracies of the 10, 14, 18 and 32 EEG channels based on the Gamma frequency band were 89.54%, 92.28%, 93.72% and 95.70% in the valence dimension and 89.81%, 92.24%, 93.69% and 95.69% in the arousal dimension. As the number of channels increases, the classification accuracy of emotional states also increases, the classification accuracy of the gamma frequency band is greater than that of the beta frequency band followed by the alpha and theta frequency bands. This paper provided better frequency bands and channels reference for emotion recognition based on EEG.
The distance function effect on k-nearest neighbor classification for medical datasets.
Hu, Li-Yu; Huang, Min-Wei; Ke, Shih-Wen; Tsai, Chih-Fong
2016-01-01
K-nearest neighbor (k-NN) classification is conventional non-parametric classifier, which has been used as the baseline classifier in many pattern classification problems. It is based on measuring the distances between the test data and each of the training data to decide the final classification output. Since the Euclidean distance function is the most widely used distance metric in k-NN, no study examines the classification performance of k-NN by different distance functions, especially for various medical domain problems. Therefore, the aim of this paper is to investigate whether the distance function can affect the k-NN performance over different medical datasets. Our experiments are based on three different types of medical datasets containing categorical, numerical, and mixed types of data and four different distance functions including Euclidean, cosine, Chi square, and Minkowsky are used during k-NN classification individually. The experimental results show that using the Chi square distance function is the best choice for the three different types of datasets. However, using the cosine and Euclidean (and Minkowsky) distance function perform the worst over the mixed type of datasets. In this paper, we demonstrate that the chosen distance function can affect the classification accuracy of the k-NN classifier. For the medical domain datasets including the categorical, numerical, and mixed types of data, K-NN based on the Chi square distance function performs the best.
Weighted Parzen Windows for Pattern Classification
1994-05-01
Nearest-Neighbor Rule The k-Nearest-Neighbor ( kNN ) technique is nonparametric, assuming nothing about the distribution of the data. Stated succinctly...probabilities P(wj I x) from samples." Raudys and Jain [20:255] advance this interpretation by pointing out that the kNN technique can be viewed as the...34Parzen window classifier with a hyper- rectangular window function." As with the Parzen-window technique, the kNN classifier is more accurate as the
Silva Filho, Telmo M; Souza, Renata M C R; Prudêncio, Ricardo B C
2016-08-01
Some complex data types are capable of modeling data variability and imprecision. These data types are studied in the symbolic data analysis field. One such data type is interval data, which represents ranges of values and is more versatile than classic point data for many domains. This paper proposes a new prototype-based classifier for interval data, trained by a swarm optimization method. Our work has two main contributions: a swarm method which is capable of performing both automatic selection of features and pruning of unused prototypes and a generalized weighted squared Euclidean distance for interval data. By discarding unnecessary features and prototypes, the proposed algorithm deals with typical limitations of prototype-based methods, such as the problem of prototype initialization. The proposed distance is useful for learning classes in interval datasets with different shapes, sizes and structures. When compared to other prototype-based methods, the proposed method achieves lower error rates in both synthetic and real interval datasets. Copyright © 2016 Elsevier Ltd. All rights reserved.
Error minimizing algorithms for nearest eighbor classifiers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Porter, Reid B; Hush, Don; Zimmer, G. Beate
2011-01-03
Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. Wemore » use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.« less
Mark D. Nelson; Ronald E. McRoberts; Greg C. Liknes; Geoffrey R. Holden
2005-01-01
Landsat Thematic Mapper (TM) satellite imagery and Forest Inventory and Analysis (FIA) plot data were used to construct forest/nonforest maps of Mapping Zone 41, National Land Cover Dataset 2000 (NLCD 2000). Stratification approaches resulting from Maximum Likelihood, Fuzzy Convolution, Logistic Regression, and k-Nearest Neighbors classification/prediction methods were...
NASA Astrophysics Data System (ADS)
Hu, Yifan; Han, Hao; Zhu, Wei; Li, Lihong; Pickhardt, Perry J.; Liang, Zhengrong
2016-03-01
Feature classification plays an important role in differentiation or computer-aided diagnosis (CADx) of suspicious lesions. As a widely used ensemble learning algorithm for classification, random forest (RF) has a distinguished performance for CADx. Our recent study has shown that the location index (LI), which is derived from the well-known kNN (k nearest neighbor) and wkNN (weighted k nearest neighbor) classifier [1], has also a distinguished role in the classification for CADx. Therefore, in this paper, based on the property that the LI will achieve a very high accuracy, we design an algorithm to integrate the LI into RF for improved or higher value of AUC (area under the curve of receiver operating characteristics -- ROC). Experiments were performed by the use of a database of 153 lesions (polyps), including 116 neoplastic lesions and 37 hyperplastic lesions, with comparison to the existing classifiers of RF and wkNN, respectively. A noticeable gain by the proposed integrated classifier was quantified by the AUC measure.
Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang
2016-11-16
The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
Metric learning for automatic sleep stage classification.
Phan, Huy; Do, Quan; Do, The-Luan; Vu, Duc-Lung
2013-01-01
We introduce in this paper a metric learning approach for automatic sleep stage classification based on single-channel EEG data. We show that learning a global metric from training data instead of using the default Euclidean metric, the k-nearest neighbor classification rule outperforms state-of-the-art methods on Sleep-EDF dataset with various classification settings. The overall accuracy for Awake/Sleep and 4-class classification setting are 98.32% and 94.49% respectively. Furthermore, the superior accuracy is achieved by performing classification on a low-dimensional feature space derived from time and frequency domains and without the need for artifact removal as a preprocessing step.
Umut, İlhan; Çentik, Güven
2016-01-01
The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present. PMID:27213008
Umut, İlhan; Çentik, Güven
2016-01-01
The number of channels used for polysomnographic recording frequently causes difficulties for patients because of the many cables connected. Also, it increases the risk of having troubles during recording process and increases the storage volume. In this study, it is intended to detect periodic leg movement (PLM) in sleep with the use of the channels except leg electromyography (EMG) by analysing polysomnography (PSG) data with digital signal processing (DSP) and machine learning methods. PSG records of 153 patients of different ages and genders with PLM disorder diagnosis were examined retrospectively. A novel software was developed for the analysis of PSG records. The software utilizes the machine learning algorithms, statistical methods, and DSP methods. In order to classify PLM, popular machine learning methods (multilayer perceptron, K-nearest neighbour, and random forests) and logistic regression were used. Comparison of classified results showed that while K-nearest neighbour classification algorithm had higher average classification rate (91.87%) and lower average classification error value (RMSE = 0.2850), multilayer perceptron algorithm had the lowest average classification rate (83.29%) and the highest average classification error value (RMSE = 0.3705). Results showed that PLM can be classified with high accuracy (91.87%) without leg EMG record being present.
Text Classification for Intelligent Portfolio Management
2002-05-01
years including nearest neighbor classification [15], naive Bayes with EM (Ex- pectation Maximization) [11] [13], Winnow with active learning [10... Active Learning and Expectation Maximization (EM). In particular, active learning is used to actively select documents for labeling, then EM assigns...generalization with active learning . Machine Learning, 15(2):201–221, 1994. [3] I. Dagan and P. Engelson. Committee-based sampling for training
Rectangular Array Of Digital Processors For Planning Paths
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.
1993-01-01
Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.
Typicality effects in artificial categories: is there a hemisphere difference?
Richards, L G; Chiarello, C
1990-07-01
In category classification tasks, typicality effects are usually found: accuracy and reaction time depend upon distance from a prototype. In this study, subjects learned either verbal or nonverbal dot pattern categories, followed by a lateralized classification task. Comparable typicality effects were found in both reaction time and accuracy across visual fields for both verbal and nonverbal categories. Both hemispheres appeared to use a similarity-to-prototype matching strategy in classification. This indicates that merely having a verbal label does not differentiate classification in the two hemispheres.
NASA Astrophysics Data System (ADS)
Jiang, Yicheng; Cheng, Ping; Ou, Yangkui
2001-09-01
A new method for target classification of high-range resolution radar is proposed. It tries to use neural learning to obtain invariant subclass features of training range profiles. A modified Euclidean metric based on the Box-Cox transformation technique is investigated for Nearest Neighbor target classification improvement. The classification experiments using real radar data of three different aircraft have demonstrated that classification error can reduce 8% if this method proposed in this paper is chosen instead of the conventional method. The results of this paper have shown that by choosing an optimized metric, it is indeed possible to reduce the classification error without increasing the number of samples.
Minimum Expected Risk Estimation for Near-neighbor Classification
2006-04-01
We consider the problems of class probability estimation and classification when using near-neighbor classifiers, such as k-nearest neighbors ( kNN ...estimate for weighted kNN classifiers with different prior information, for a broad class of risk functions. Theory and simulations show how significant...the difference is compared to the standard maximum likelihood weighted kNN estimates. Comparisons are made with uniform weights, symmetric weights
A Proposed Methodology to Classify Frontier Capital Markets
2011-07-31
but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project involves basic...machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ), ensemble...Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F. Horizontal
A Proposed Methodology to Classify Frontier Capital Markets
2011-07-31
out of charity, but because it is the surest route to our common good.” -Inaugural Speech by President Barack Obama, Jan 2009 This project...identification, and machine learning. The algorithm consists of a unique binary classifier mechanism that combines three methods: k-Nearest Neighbors ( kNN ...Support Through kNN Ensemble Classification Techniques E. Capital Market Classification Based on Capital Flows and Trading Architecture F
Image-classification-based global dimming algorithm for LED backlights in LCDs
NASA Astrophysics Data System (ADS)
Qibin, Feng; Huijie, He; Dong, Han; Lei, Zhang; Guoqiang, Lv
2015-07-01
Backlight dimming can help LCDs reduce power consumption and improve CR. With fixed parameters, dimming algorithm cannot achieve satisfied effects for all kinds of images. The paper introduces an image-classification-based global dimming algorithm. The proposed classification method especially for backlight dimming is based on luminance and CR of input images. The parameters for backlight dimming level and pixel compensation are adaptive with image classifications. The simulation results show that the classification based dimming algorithm presents 86.13% power reduction improvement compared with dimming without classification, with almost same display quality. The prototype is developed. There are no perceived distortions when playing videos. The practical average power reduction of the prototype TV is 18.72%, compared with common TV without dimming.
Efficient Fingercode Classification
NASA Astrophysics Data System (ADS)
Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang
In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.
Finger vein identification using fuzzy-based k-nearest centroid neighbor classifier
NASA Astrophysics Data System (ADS)
Rosdi, Bakhtiar Affendi; Jaafar, Haryati; Ramli, Dzati Athiar
2015-02-01
In this paper, a new approach for personal identification using finger vein image is presented. Finger vein is an emerging type of biometrics that attracts attention of researchers in biometrics area. As compared to other biometric traits such as face, fingerprint and iris, finger vein is more secured and hard to counterfeit since the features are inside the human body. So far, most of the researchers focus on how to extract robust features from the captured vein images. Not much research was conducted on the classification of the extracted features. In this paper, a new classifier called fuzzy-based k-nearest centroid neighbor (FkNCN) is applied to classify the finger vein image. The proposed FkNCN employs a surrounding rule to obtain the k-nearest centroid neighbors based on the spatial distributions of the training images and their distance to the test image. Then, the fuzzy membership function is utilized to assign the test image to the class which is frequently represented by the k-nearest centroid neighbors. Experimental evaluation using our own database which was collected from 492 fingers shows that the proposed FkNCN has better performance than the k-nearest neighbor, k-nearest-centroid neighbor and fuzzy-based-k-nearest neighbor classifiers. This shows that the proposed classifier is able to identify the finger vein image effectively.
An electronic nose for reliable measurement and correct classification of beverages.
Mamat, Mazlina; Samad, Salina Abdul; Hannan, Mahammad A
2011-01-01
This paper reports the design of an electronic nose (E-nose) prototype for reliable measurement and correct classification of beverages. The prototype was developed and fabricated in the laboratory using commercially available metal oxide gas sensors and a temperature sensor. The repeatability, reproducibility and discriminative ability of the developed E-nose prototype were tested on odors emanating from different beverages such as blackcurrant juice, mango juice and orange juice, respectively. Repeated measurements of three beverages showed very high correlation (r > 0.97) between the same beverages to verify the repeatability. The prototype also produced highly correlated patterns (r > 0.97) in the measurement of beverages using different sensor batches to verify its reproducibility. The E-nose prototype also possessed good discriminative ability whereby it was able to produce different patterns for different beverages, different milk heat treatments (ultra high temperature, pasteurization) and fresh and spoiled milks. The discriminative ability of the E-nose was evaluated using Principal Component Analysis and a Multi Layer Perception Neural Network, with both methods showing good classification results.
An Electronic Nose for Reliable Measurement and Correct Classification of Beverages
Mamat, Mazlina; Samad, Salina Abdul; Hannan, Mahammad A.
2011-01-01
This paper reports the design of an electronic nose (E-nose) prototype for reliable measurement and correct classification of beverages. The prototype was developed and fabricated in the laboratory using commercially available metal oxide gas sensors and a temperature sensor. The repeatability, reproducibility and discriminative ability of the developed E-nose prototype were tested on odors emanating from different beverages such as blackcurrant juice, mango juice and orange juice, respectively. Repeated measurements of three beverages showed very high correlation (r > 0.97) between the same beverages to verify the repeatability. The prototype also produced highly correlated patterns (r > 0.97) in the measurement of beverages using different sensor batches to verify its reproducibility. The E-nose prototype also possessed good discriminative ability whereby it was able to produce different patterns for different beverages, different milk heat treatments (ultra high temperature, pasteurization) and fresh and spoiled milks. The discriminative ability of the E-nose was evaluated using Principal Component Analysis and a Multi Layer Perception Neural Network, with both methods showing good classification results. PMID:22163964
A classification scheme for edge-localized modes based on their probability distributions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shabbir, A., E-mail: aqsa.shabbir@ugent.be; Max Planck Institute for Plasma Physics, D-85748 Garching; Hornung, G.
We present here an automated classification scheme which is particularly well suited to scenarios where the parameters have significant uncertainties or are stochastic quantities. To this end, the parameters are modeled with probability distributions in a metric space and classification is conducted using the notion of nearest neighbors. The presented framework is then applied to the classification of type I and type III edge-localized modes (ELMs) from a set of carbon-wall plasmas at JET. This provides a fast, standardized classification of ELM types which is expected to significantly reduce the effort of ELM experts in identifying ELM types. Further, themore » classification scheme is general and can be applied to various other plasma phenomena as well.« less
The nearest neighbor and the bayes error rates.
Loizou, G; Maybank, S J
1987-02-01
The (k, l) nearest neighbor method of pattern classification is compared to the Bayes method. If the two acceptance rates are equal then the asymptotic error rates satisfy the inequalities Ek,l + 1 ¿ E*(¿) ¿ Ek,l dE*(¿), where d is a function of k, l, and the number of pattern classes, and ¿ is the reject threshold for the Bayes method. An explicit expression for d is given which is optimal in the sense that for some probability distributions Ek,l and dE* (¿) are equal.
The Use of Fuzzy Set Classification for Pattern Recognition of the Polygraph
1993-12-01
actual feature extraction was done, It was decided to use the K-nearest neighbor ( KNN ) the data was preprocessed. The electrocardiogram classifier in...showing heart pulse, and a low frequency not known beforehand, and the KNN classifier does not component showing blood volume. The derivative of...the characteristics of the conventional KNN these six derived signals were detrended and filtered, classification method is that it assigns each
Nearest Neighbor Algorithms for Pattern Classification
NASA Technical Reports Server (NTRS)
Barrios, J. O.
1972-01-01
A solution of the discrimination problem is considered by means of the minimum distance classifier, commonly referred to as the nearest neighbor (NN) rule. The NN rule is nonparametric, or distribution free, in the sense that it does not depend on any assumptions about the underlying statistics for its application. The k-NN rule is a procedure that assigns an observation vector z to a category F if most of the k nearby observations x sub i are elements of F. The condensed nearest neighbor (CNN) rule may be used to reduce the size of the training set required categorize The Bayes risk serves merely as a reference-the limit of excellence beyond which it is not possible to go. The NN rule is bounded below by the Bayes risk and above by twice the Bayes risk.
NASA Astrophysics Data System (ADS)
Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Tseng, Derek; Benien, Parul; Ozcan, Aydogan
2017-03-01
Giardia lamblia causes a disease known as giardiasis, which results in diarrhea, abdominal cramps, and bloating. Although conventional pathogen detection methods used in water analysis laboratories offer high sensitivity and specificity, they are time consuming, and need experts to operate bulky equipment and analyze the samples. Here we present a field-portable and cost-effective smartphone-based waterborne pathogen detection platform that can automatically classify Giardia cysts using machine learning. Our platform enables the detection and quantification of Giardia cysts in one hour, including sample collection, labeling, filtration, and automated counting steps. We evaluated the performance of three prototypes using Giardia-spiked water samples from different sources (e.g., reagent-grade, tap, non-potable, and pond water samples). We populated a training database with >30,000 cysts and estimated our detection sensitivity and specificity using 20 different classifier models, including decision trees, nearest neighbor classifiers, support vector machines (SVMs), and ensemble classifiers, and compared their speed of training and classification, as well as predicted accuracies. Among them, cubic SVM, medium Gaussian SVM, and bagged-trees were the most promising classifier types with accuracies of 94.1%, 94.2%, and 95%, respectively; we selected the latter as our preferred classifier for the detection and enumeration of Giardia cysts that are imaged using our mobile-phone fluorescence microscope. Without the need for any experts or microbiologists, this field-portable pathogen detection platform can present a useful tool for water quality monitoring in resource-limited-settings.
Initial results from a video-laser rangefinder device
Neil A. Clark
2000-01-01
Three hundred and nine width measurements at various heights to 10 m on a metal light pole were calculated from video images captured with a prototype video-laser rangefinder instrument. Data were captured at distances from 6 to 15 m. The endpoints for the width measurements were manually selected to the nearest pixel from individual video frames.Chi-square...
Prototype learning and dissociable categorization systems in Alzheimer's disease.
Heindel, William C; Festa, Elena K; Ott, Brian R; Landy, Kelly M; Salmon, David P
2013-08-01
Recent neuroimaging studies suggest that prototype learning may be mediated by at least two dissociable memory systems depending on the mode of acquisition, with A/Not-A prototype learning dependent upon a perceptual representation system located within posterior visual cortex and A/B prototype learning dependent upon a declarative memory system associated with medial temporal and frontal regions. The degree to which patients with Alzheimer's disease (AD) can acquire new categorical information may therefore critically depend upon the mode of acquisition. The present study examined A/Not-A and A/B prototype learning in AD patients using procedures that allowed direct comparison of learning across tasks. Despite impaired explicit recall of category features in all tasks, patients showed differential patterns of category acquisition across tasks. First, AD patients demonstrated impaired prototype induction along with intact exemplar classification under incidental A/Not-A conditions, suggesting that the loss of functional connectivity within visual cortical areas disrupted the integration processes supporting prototype induction within the perceptual representation system. Second, AD patients demonstrated intact prototype induction but impaired exemplar classification during A/B learning under observational conditions, suggesting that this form of prototype learning is dependent on a declarative memory system that is disrupted in AD. Third, the surprisingly intact classification of both prototypes and exemplars during A/B learning under trial-and-error feedback conditions suggests that AD patients shifted control from their deficient declarative memory system to a feedback-dependent procedural memory system when training conditions allowed. Taken together, these findings serve to not only increase our understanding of category learning in AD, but to also provide new insights into the ways in which different memory systems interact to support the acquisition of categorical knowledge. Copyright © 2013 Elsevier Ltd. All rights reserved.
Prototype Learning and Dissociable Categorization Systems in Alzheimer’s Disease
Heindel, William C.; Festa, Elena K.; Ott, Brian R.; Landy, Kelly M.; Salmon, David P.
2015-01-01
Recent neuroimaging studies suggest that prototype learning may be mediated by at least two dissociable memory systems depending on the mode of acquisition, with A/Not-A prototype learning dependent upon a perceptual representation system located within posterior visual cortex and A/B prototype learning dependent upon a declarative memory system associated with medial temporal and frontal regions. The degree to which patients with Alzheimer’s disease (AD) can acquire new categorical information may therefore critically depend upon the mode of acquisition. The present study examined A/Not-A and A/B prototype learning in AD patients using procedures that allowed direct comparison of learning across tasks. Despite impaired explicit recall of category features in all tasks, patients showed differential patterns of category acquisition across tasks. First, AD patients demonstrated impaired prototype induction along with intact exemplar classification under incidental A/Not-A conditions, suggesting that the loss of functional connectivity within visual cortical areas disrupted the integration processes supporting prototype induction within the perceptual representation system. Second, AD patients demonstrated intact prototype induction but impaired exemplar classification during A/B learning under observational conditions, suggesting that this form of prototype learning is dependent on a declarative memory system that is disrupted in AD. Third, the surprisingly intact classification of both prototypes and exemplars during A/B learning under trial-and-error feedback conditions suggests that AD patients shifted control from their deficient declarative memory system to a feedback-dependent procedural memory system when training conditions allowed. Taken together, these findings serve to not only increase our understanding of category learning in AD, but to also provide new insights into the ways in which different memory systems interact to support the acquisition of categorical knowledge. PMID:23751172
Semi-supervised morphosyntactic classification of Old Icelandic.
Urban, Kryztof; Tangherlini, Timothy R; Vijūnas, Aurelijus; Broadwell, Peter M
2014-01-01
We present IceMorph, a semi-supervised morphosyntactic analyzer of Old Icelandic. In addition to machine-read corpora and dictionaries, it applies a small set of declension prototypes to map corpus words to dictionary entries. A web-based GUI allows expert users to modify and augment data through an online process. A machine learning module incorporates prototype data, edit-distance metrics, and expert feedback to continuously update part-of-speech and morphosyntactic classification. An advantage of the analyzer is its ability to achieve competitive classification accuracy with minimum training data.
Prototype Expert System for Climate Classification.
ERIC Educational Resources Information Center
Harris, Clay
Many students find climate classification laborious and time-consuming, and through their lack of repetition fail to grasp the details of classification. This paper describes an expert system for climate classification that is being developed at Middle Tennessee State University. Topics include: (1) an introduction to the nature of classification,…
On the classification techniques in data mining for microarray data classification
NASA Astrophysics Data System (ADS)
Aydadenta, Husna; Adiwijaya
2018-03-01
Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.
Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry
2018-01-01
The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout (Oncorhynchus mykiss) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k-Nearest neighbours (k-NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k-NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet’s effects on fish skin. PMID:29596375
Saberioon, Mohammadmehdi; Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry
2018-03-29
The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout ( Oncorhynchus mykiss ) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k -Nearest neighbours ( k -NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k -NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet's effects on fish skin.
A Mobile GPS Application: Mosque Tracking with Prayer Time Synchronization
NASA Astrophysics Data System (ADS)
Hashim, Rathiah; Ikhmatiar, Mohammad Sibghotulloh; Surip, Miswan; Karmin, Masiri; Herawan, Tutut
Global Positioning System (GPS) is a popular technology applied in many areas and embedded in many devices, facilitating end-users to navigate effectively to user's intended destination via the best calculated route. The ability of GPS to track precisely according to coordinates of specific locations can be utilized to assist a Muslim traveler visiting or passing an unfamiliar place to find the nearest mosque in order to perform his prayer. However, not many techniques have been proposed for Mosque tracking. This paper presents the development of GPS technology in tracking the nearest mosque using mobile application software embedded with the prayer time's synchronization system on a mobile application. The prototype GPS system developed has been successfully incorporated with a map and several mosque locations.
NASA Astrophysics Data System (ADS)
Rusdiana, Lili; Marfuah
2017-12-01
K-Nearest Neighbors method is one of methods used for classification which calculate a value to find out the closest in distance. It is used to group a set of data such as students’ graduation status that are got from the amount of course credits taken by them, the grade point average (AVG), and the mini-thesis grade. The study is conducted to know the results of using K-Nearest Neighbors method on the application of determining students’ graduation status, so it can be analyzed from the method used, the data, and the application constructed. The aim of this study is to find out the application results by using K-Nearest Neighbors concept to determine students’ graduation status using the data of STMIK Palangkaraya students. The development of the software used Extreme Programming, since it was appropriate and precise for this study which was to quickly finish the project. The application was created using Microsoft Office Excel 2007 for the training data and Matlab 7 to implement the application. The result of K-Nearest Neighbors method on the application of determining students’ graduation status was 92.5%. It could determine the predicate graduation of 94 data used from the initial data before the processing as many as 136 data which the maximal training data was 50data. The K-Nearest Neighbors method is one of methods used to group a set of data based on the closest value, so that using K-Nearest Neighbors method agreed with this study. The results of K-Nearest Neighbors method on the application of determining students’ graduation status was 92.5% could determine the predicate graduation which is the maximal training data. The K-Nearest Neighbors method is one of methods used to group a set of data based on the closest value, so that using K-Nearest Neighbors method agreed with this study.
Fast Query-Optimized Kernel-Machine Classification
NASA Technical Reports Server (NTRS)
Mazzoni, Dominic; DeCoste, Dennis
2004-01-01
A recently developed algorithm performs kernel-machine classification via incremental approximate nearest support vectors. The algorithm implements support-vector machines (SVMs) at speeds 10 to 100 times those attainable by use of conventional SVM algorithms. The algorithm offers potential benefits for classification of images, recognition of speech, recognition of handwriting, and diverse other applications in which there are requirements to discern patterns in large sets of data. SVMs constitute a subset of kernel machines (KMs), which have become popular as models for machine learning and, more specifically, for automated classification of input data on the basis of labeled training data. While similar in many ways to k-nearest-neighbors (k-NN) models and artificial neural networks (ANNs), SVMs tend to be more accurate. Using representations that scale only linearly in the numbers of training examples, while exploring nonlinear (kernelized) feature spaces that are exponentially larger than the original input dimensionality, KMs elegantly and practically overcome the classic curse of dimensionality. However, the price that one must pay for the power of KMs is that query-time complexity scales linearly with the number of training examples, making KMs often orders of magnitude more computationally expensive than are ANNs, decision trees, and other popular machine learning alternatives. The present algorithm treats an SVM classifier as a special form of a k-NN. The algorithm is based partly on an empirical observation that one can often achieve the same classification as that of an exact KM by using only small fraction of the nearest support vectors (SVs) of a query. The exact KM output is a weighted sum over the kernel values between the query and the SVs. In this algorithm, the KM output is approximated with a k-NN classifier, the output of which is a weighted sum only over the kernel values involving k selected SVs. Before query time, there are gathered statistics about how misleading the output of the k-NN model can be, relative to the outputs of the exact KM for a representative set of examples, for each possible k from 1 to the total number of SVs. From these statistics, there are derived upper and lower thresholds for each step k. These thresholds identify output levels for which the particular variant of the k-NN model already leans so strongly positively or negatively that a reversal in sign is unlikely, given the weaker SV neighbors still remaining. At query time, the partial output of each query is incrementally updated, stopping as soon as it exceeds the predetermined statistical thresholds of the current step. For an easy query, stopping can occur as early as step k = 1. For more difficult queries, stopping might not occur until nearly all SVs are touched. A key empirical observation is that this approach can tolerate very approximate nearest-neighbor orderings. In experiments, SVs and queries were projected to a subspace comprising the top few principal- component dimensions and neighbor orderings were computed in that subspace. This approach ensured that the overhead of the nearest-neighbor computations was insignificant, relative to that of the exact KM computation.
Improved Fuzzy K-Nearest Neighbor Using Modified Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Jamaluddin; Siringoringo, Rimbun
2017-12-01
Fuzzy k-Nearest Neighbor (FkNN) is one of the most powerful classification methods. The presence of fuzzy concepts in this method successfully improves its performance on almost all classification issues. The main drawbackof FKNN is that it is difficult to determine the parameters. These parameters are the number of neighbors (k) and fuzzy strength (m). Both parameters are very sensitive. This makes it difficult to determine the values of ‘m’ and ‘k’, thus making FKNN difficult to control because no theories or guides can deduce how proper ‘m’ and ‘k’ should be. This study uses Modified Particle Swarm Optimization (MPSO) to determine the best value of ‘k’ and ‘m’. MPSO is focused on the Constriction Factor Method. Constriction Factor Method is an improvement of PSO in order to avoid local circumstances optima. The model proposed in this study was tested on the German Credit Dataset. The test of the data/The data test has been standardized by UCI Machine Learning Repository which is widely applied to classification problems. The application of MPSO to the determination of FKNN parameters is expected to increase the value of classification performance. Based on the experiments that have been done indicating that the model offered in this research results in a better classification performance compared to the Fk-NN model only. The model offered in this study has an accuracy rate of 81%, while. With using Fk-NN model, it has the accuracy of 70%. At the end is done comparison of research model superiority with 2 other classification models;such as Naive Bayes and Decision Tree. This research model has a better performance level, where Naive Bayes has accuracy 75%, and the decision tree model has 70%
Assessment of sexual orientation using the hemodynamic brain response to visual sexual stimuli.
Ponseti, Jorge; Granert, Oliver; Jansen, Olav; Wolff, Stephan; Mehdorn, Hubertus; Bosinski, Hartmut; Siebner, Hartwig
2009-06-01
The assessment of sexual orientation is of importance to the diagnosis and treatment of sex offenders and paraphilic disorders. Phallometry is considered gold standard in objectifying sexual orientation, yet this measurement has been criticized because of its intrusiveness and limited reliability. To evaluate whether the spatial response pattern to sexual stimuli as revealed by a change in blood oxygen level-dependent (BOLD) signal can be used for individual classification of sexual orientation. We used a preexisting functional MRI (fMRI) data set that had been acquired in a nonclinical sample of 12 heterosexual men and 14 homosexual men. During fMRI, participants were briefly exposed to pictures of same-sex and opposite-sex genitals. Data analysis involved four steps: (i) differences in the BOLD response to female and male sexual stimuli were calculated for each subject; (ii) these contrast images were entered into a group analysis to calculate whole-brain difference maps between homosexual and heterosexual participants; (iii) a single expression value was computed for each subject expressing its correspondence to the group result; and (iv) based on these expression values, Fisher's linear discriminant analysis and the kappa-nearest neighbor classification method were used to predict the sexual orientation of each subject. Sensitivity and specificity of the two classification methods in predicting individual sexual orientation. Both classification methods performed well in predicting individual sexual orientation with a mean accuracy of >85% (Fisher's linear discriminant analysis: 92% sensitivity, 85% specificity; kappa-nearest neighbor classification: 88% sensitivity, 92% specificity). Despite the small sample size, the functional response patterns of the brain to sexual stimuli contained sufficient information to predict individual sexual orientation with high accuracy. These results suggest that fMRI-based classification methods hold promise for the diagnosis of paraphilic disorders (e.g., pedophilia).
Li, Haiquan; Dai, Xinbin; Zhao, Xuechun
2008-05-01
Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework. Our approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3% on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8% of our predictions on the 11 organisms, including 55.9% of our predictions overlapping with 83.6% of the predicted transporters in TransportDB.
NASA Astrophysics Data System (ADS)
Purwanti, Endah; Calista, Evelyn
2017-05-01
Leukemia is a type of cancer which is caused by malignant neoplasms in leukocyte cells. Leukemia disease which can cause death quickly enough for the sufferer is a type of acute lymphocyte leukemia (ALL). In this study, we propose automatic detection of lymphocyte leukemia through classification of lymphocyte cell images obtained from peripheral blood smear single cell. There are two main objectives in this study. The first is to extract featuring cells. The second objective is to classify the lymphocyte cells into two classes, namely normal and abnormal lymphocytes. In conducting this study, we use combination of shape feature and histogram feature, and the classification algorithm is k-nearest Neighbour with k variation is 1, 3, 5, 7, 9, 11, 13, and 15. The best level of accuracy, sensitivity, and specificity in this study are 90%, 90%, and 90%, and they were obtained from combined features of area-perimeter-mean-standard deviation with k=7.
NASA Astrophysics Data System (ADS)
Jiang, Yuning; Kang, Jinfeng; Wang, Xinan
2017-03-01
Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
Jiménez-Carvelo, Ana M; González-Casado, Antonio; Pérez-Castaño, Estefanía; Cuadros-Rodríguez, Luis
2017-03-01
A new analytical method for the differentiation of olive oil from other vegetable oils using reversed-phase LC and applying chemometric techniques was developed. A 3 cm short column was used to obtain the chromatographic fingerprint of the methyl-transesterified fraction of each vegetable oil. The chromatographic analysis took only 4 min. The multivariate classification methods used were k-nearest neighbors, partial least-squares (PLS) discriminant analysis, one-class PLS, support vector machine classification, and soft independent modeling of class analogies. The discrimination of olive oil from other vegetable edible oils was evaluated by several classification quality metrics. Several strategies for the classification of the olive oil were used: one input-class, two input-class, and pseudo two input-class.
On the Discriminant Analysis in the 2-Populations Case
NASA Astrophysics Data System (ADS)
Rublík, František
2008-01-01
The empirical Bayes Gaussian rule, which in the normal case yields good values of the probability of total error, may yield high values of the maximum probability error. From this point of view the presented modified version of the classification rule of Broffitt, Randles and Hogg appears to be superior. The modification included in this paper is termed as a WR method, and the choice of its weights is discussed. The mentioned methods are also compared with the K nearest neighbours classification rule.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 49 Transportation 6 2010-10-01 2010-10-01 false Definitions. 523.2 Section 523.2 Transportation..., DEPARTMENT OF TRANSPORTATION VEHICLE CLASSIFICATION § 523.2 Definitions. Approach angle means the smallest... to the nearest tenth of a square foot. For purposes of this definition, track width is the lateral...
System for selecting relevant information for decision support.
Kalina, Jan; Seidl, Libor; Zvára, Karel; Grünfeldová, Hana; Slovák, Dalibor; Zvárová, Jana
2013-01-01
We implemented a prototype of a decision support system called SIR which has a form of a web-based classification service for diagnostic decision support. The system has the ability to select the most relevant variables and to learn a classification rule, which is guaranteed to be suitable also for high-dimensional measurements. The classification system can be useful for clinicians in primary care to support their decision-making tasks with relevant information extracted from any available clinical study. The implemented prototype was tested on a sample of patients in a cardiological study and performs an information extraction from a high-dimensional set containing both clinical and gene expression data.
GPU based cloud system for high-performance arrhythmia detection with parallel k-NN algorithm.
Tae Joon Jun; Hyun Ji Park; Hyuk Yoo; Young-Hak Kim; Daeyoung Kim
2016-08-01
In this paper, we propose an GPU based Cloud system for high-performance arrhythmia detection. Pan-Tompkins algorithm is used for QRS detection and we optimized beat classification algorithm with K-Nearest Neighbor (K-NN). To support high performance beat classification on the system, we parallelized beat classification algorithm with CUDA to execute the algorithm on virtualized GPU devices on the Cloud system. MIT-BIH Arrhythmia database is used for validation of the algorithm. The system achieved about 93.5% of detection rate which is comparable to previous researches while our algorithm shows 2.5 times faster execution time compared to CPU only detection algorithm.
Aided diagnosis methods of breast cancer based on machine learning
NASA Astrophysics Data System (ADS)
Zhao, Yue; Wang, Nian; Cui, Xiaoyu
2017-08-01
In the field of medicine, quickly and accurately determining whether the patient is malignant or benign is the key to treatment. In this paper, K-Nearest Neighbor, Linear Discriminant Analysis, Logistic Regression were applied to predict the classification of thyroid,Her-2,PR,ER,Ki67,metastasis and lymph nodes in breast cancer, in order to recognize the benign and malignant breast tumors and achieve the purpose of aided diagnosis of breast cancer. The results showed that the highest classification accuracy of LDA was 88.56%, while the classification effect of KNN and Logistic Regression were better than that of LDA, the best accuracy reached 96.30%.
Comparison of four approaches to a rock facies classification problem
Dubois, M.K.; Bohling, Geoffrey C.; Chakrabarti, S.
2007-01-01
In this study, seven classifiers based on four different approaches were tested in a rock facies classification problem: classical parametric methods using Bayes' rule, and non-parametric methods using fuzzy logic, k-nearest neighbor, and feed forward-back propagating artificial neural network. Determining the most effective classifier for geologic facies prediction in wells without cores in the Panoma gas field, in Southwest Kansas, was the objective. Study data include 3600 samples with known rock facies class (from core) with each sample having either four or five measured properties (wire-line log curves), and two derived geologic properties (geologic constraining variables). The sample set was divided into two subsets, one for training and one for testing the ability of the trained classifier to correctly assign classes. Artificial neural networks clearly outperformed all other classifiers and are effective tools for this particular classification problem. Classical parametric models were inadequate due to the nature of the predictor variables (high dimensional and not linearly correlated), and feature space of the classes (overlapping). The other non-parametric methods tested, k-nearest neighbor and fuzzy logic, would need considerable improvement to match the neural network effectiveness, but further work, possibly combining certain aspects of the three non-parametric methods, may be justified. ?? 2006 Elsevier Ltd. All rights reserved.
ESONET LIDO Demonstration Mission: the Iberian Margin node.
NASA Astrophysics Data System (ADS)
Embriaco, Davide; André, Michel; Zitellini, Nevio; Esonet Lido Demonstration Mission Team
2010-05-01
The Gulf of Cadiz is one of two the test sites chosen for the demonstration of the ESONET - LIDO Demonstration Mission (DM) [1], which will establish a first nucleus of regional network of multidisciplinary sea floor observatories. The Gulf of Cadiz is a highly populated area, characterized by tsunamigenic sources, which caused the devastating earthquake and tsunamis that struck Lisbon in 1755. The seismic activity is concentrated along a belt going from this region to the Azores and the main tsunamigenic tectonic sources are located near the coastline. In the framework of the EU - NEAREST project [2] the GEOSTAR deep ocean bottom multi-parametric observatory was deployed for a one year mission off cape Saint Vincent at about 3200 m depth. GEOSTAR was equipped with a set of oceanographic, seismic and geophysical sensors and with a new tsunami detector prototype. In November 2009 the GEOSTAR abyssal station equipped with the tsunami prototype was redeployed at the same site on behalf of NEAREST and ESONET - LIDO DM. The system is able to communicate from the ocean bottom to the land station via an acoustic and satellite link. The abyssal station is designed both for long term geophysical and oceanographic observation and for tsunami early warning purpose. The tsunami detection is performed by two different algorithms: a new real time dedicated tsunami detection algorithm which analyses the water pressure data, and a seismic algorithm which triggers on strong events. Examples of geophysical and oceanographic data acquired by the abyssal station during the one year mission will be shown. The development of a new acoustic antenna equipped with a stand alone and autonomous acquisition system will allow the recording of marine mammals and the evaluation of environmental noise. References [1] M. André and The ESONET LIDO Demonstration Mission Team, "Listening to the deep-ocean environment: an ESONET initiative for the real-time monitoring of geohazards and marine ambient noise", EGU General Assembly, Vienna 2-7 May 2010 [2] EU - NEAREST Project web site: http://nearest.bo.ismar.cnr.it/
40 CFR 75.53 - Monitoring plan.
Code of Federal Regulations, 2010 CFR
2010-07-01
... this part. (c)-(d) [Reserved] (e) Contents of the monitoring plan. Each monitoring plan shall contain..., rounded to the nearest 100 lb/hr); (J) Identification of all units using a common stack; (K) Activation...: (A) Program(s) for which the EDR is submitted; (B) Unit classification; (C) Reporting frequency; (D...
40 CFR 75.53 - Monitoring plan.
Code of Federal Regulations, 2011 CFR
2011-07-01
... this part. (c)-(d) [Reserved] (e) Contents of the monitoring plan. Each monitoring plan shall contain..., rounded to the nearest 100 lb/hr); (J) Identification of all units using a common stack; (K) Activation...: (A) Program(s) for which the EDR is submitted; (B) Unit classification; (C) Reporting frequency; (D...
Deep learning application: rubbish classification with aid of an android device
NASA Astrophysics Data System (ADS)
Liu, Sijiang; Jiang, Bo; Zhan, Jie
2017-06-01
Deep learning is a very hot topic currently in pattern recognition and artificial intelligence researches. Aiming at the practical problem that people usually don't know correct classifications some rubbish should belong to, based on the powerful image classification ability of the deep learning method, we have designed a prototype system to help users to classify kinds of rubbish. Firstly the CaffeNet Model was adopted for our classification network training on the ImageNet dataset, and the trained network was deployed on a web server. Secondly an android app was developed for users to capture images of unclassified rubbish, upload images to the web server for analyzing backstage and retrieve the feedback, so that users can obtain the classification guide by an android device conveniently. Tests on our prototype system of rubbish classification show that: an image of one single type of rubbish with origin shape can be better used to judge its classification, while an image containing kinds of rubbish or rubbish with changed shape may fail to help users to decide rubbish's classification. However, the system still shows promising auxiliary function for rubbish classification if the network training strategy can be optimized further.
Thanh Noi, Phan; Kappas, Martin
2017-01-01
In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets. PMID:29271909
Thanh Noi, Phan; Kappas, Martin
2017-12-22
In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km² within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.
Smart, Otis; Burrell, Lauren
2014-01-01
Pattern classification for intracranial electroencephalogram (iEEG) and functional magnetic resonance imaging (fMRI) signals has furthered epilepsy research toward understanding the origin of epileptic seizures and localizing dysfunctional brain tissue for treatment. Prior research has demonstrated that implicitly selecting features with a genetic programming (GP) algorithm more effectively determined the proper features to discern biomarker and non-biomarker interictal iEEG and fMRI activity than conventional feature selection approaches. However for each the iEEG and fMRI modalities, it is still uncertain whether the stochastic properties of indirect feature selection with a GP yield (a) consistent results within a patient data set and (b) features that are specific or universal across multiple patient data sets. We examined the reproducibility of implicitly selecting features to classify interictal activity using a GP algorithm by performing several selection trials and subsequent frequent itemset mining (FIM) for separate iEEG and fMRI epilepsy patient data. We observed within-subject consistency and across-subject variability with some small similarity for selected features, indicating a clear need for patient-specific features and possible need for patient-specific feature selection or/and classification. For the fMRI, using nearest-neighbor classification and 30 GP generations, we obtained over 60% median sensitivity and over 60% median selectivity. For the iEEG, using nearest-neighbor classification and 30 GP generations, we obtained over 65% median sensitivity and over 65% median selectivity except one patient. PMID:25580059
Local Subspace Classifier with Transform-Invariance for Image Classification
NASA Astrophysics Data System (ADS)
Hotta, Seiji
A family of linear subspace classifiers called local subspace classifier (LSC) outperforms the k-nearest neighbor rule (kNN) and conventional subspace classifiers in handwritten digit classification. However, LSC suffers very high sensitivity to image transformations because it uses projection and the Euclidean distances for classification. In this paper, I present a combination of a local subspace classifier (LSC) and a tangent distance (TD) for improving accuracy of handwritten digit recognition. In this classification rule, we can deal with transform-invariance easily because we are able to use tangent vectors for approximation of transformations. However, we cannot use tangent vectors in other type of images such as color images. Hence, kernel LSC (KLSC) is proposed for incorporating transform-invariance into LSC via kernel mapping. The performance of the proposed methods is verified with the experiments on handwritten digit and color image classification.
Automated Classification of Phonological Errors in Aphasic Language
Ahuja, Sanjeev B.; Reggia, James A.; Berndt, Rita S.
1984-01-01
Using heuristically-guided state space search, a prototype program has been developed to simulate and classify phonemic errors occurring in the speech of neurologically-impaired patients. Simulations are based on an interchangeable rule/operator set of elementary errors which represent a theory of phonemic processing faults. This work introduces and evaluates a novel approach to error simulation and classification, it provides a prototype simulation tool for neurolinguistic research, and it forms the initial phase of a larger research effort involving computer modelling of neurolinguistic processes.
Jennifer L. Long; Melanie Miller; James P. Menakis; Robert E. Keane
2006-01-01
The Landscape Fire and Resource Management Planning Tools Prototype Project, or LANDFIRE Prototype Project, required a system for classifying vegetation composition, biophysical settings, and vegetation structure to facilitate the mapping of vegetation and wildland fuel characteristics and the simulation of vegetation dynamics using landscape modeling. We developed...
Prototype active scanner for nighttime oil spill mapping and classification
NASA Technical Reports Server (NTRS)
Sandness, G. A.; Ailes, S. B.
1977-01-01
A prototype, active, aerial scanner system was constructed for nighttime water pollution detection and nighttime multispectral imaging of the ground. An arc lamp was used to produce the transmitted light and four detector channels provided a multispectral measurement capability. The feasibility of the design concept was demonstrated by laboratory and flight tests of the prototype system.
Realization of the axial next-nearest-neighbor Ising model in U 3 Al 2 Ge 3
Fobes, David M.; Lin, Shi-Zeng; Ghimire, Nirmal J.; ...
2017-11-09
Inmore » this paper, we report small-angle neutron scattering (SANS) measurements and theoretical modeling of U 3 Al 2 Ge 3 . Analysis of the SANS data reveals a phase transition to sinusoidally modulated magnetic order at T N = 63 K to be second order and a first-order phase transition to ferromagnetic order at T c = 48 K. Within the sinusoidally modulated magnetic phase (T c < T < T N), we uncover a dramatic change, by a factor of 3, in the ordering wave vector as a function of temperature. Finally, these observations all indicate that U 3 Al 2 Ge 3 is a close realization of the three-dimensional axial next-nearest-neighbor Ising model, a prototypical framework for describing commensurate to incommensurate phase transitions in frustrated magnets.« less
Realization of the axial next-nearest-neighbor Ising model in U 3 Al 2 Ge 3
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fobes, David M.; Lin, Shi-Zeng; Ghimire, Nirmal J.
Inmore » this paper, we report small-angle neutron scattering (SANS) measurements and theoretical modeling of U 3 Al 2 Ge 3 . Analysis of the SANS data reveals a phase transition to sinusoidally modulated magnetic order at T N = 63 K to be second order and a first-order phase transition to ferromagnetic order at T c = 48 K. Within the sinusoidally modulated magnetic phase (T c < T < T N), we uncover a dramatic change, by a factor of 3, in the ordering wave vector as a function of temperature. Finally, these observations all indicate that U 3 Al 2 Ge 3 is a close realization of the three-dimensional axial next-nearest-neighbor Ising model, a prototypical framework for describing commensurate to incommensurate phase transitions in frustrated magnets.« less
Xia, Wenjun; Mita, Yoshio; Shibata, Tadashi
2016-05-01
Aiming at efficient data condensation and improving accuracy, this paper presents a hardware-friendly template reduction (TR) method for the nearest neighbor (NN) classifiers by introducing the concept of critical boundary vectors. A hardware system is also implemented to demonstrate the feasibility of using an field-programmable gate array (FPGA) to accelerate the proposed method. Initially, k -means centers are used as substitutes for the entire template set. Then, to enhance the classification performance, critical boundary vectors are selected by a novel learning algorithm, which is completed within a single iteration. Moreover, to remove noisy boundary vectors that can mislead the classification in a generalized manner, a global categorization scheme has been explored and applied to the algorithm. The global characterization automatically categorizes each classification problem and rapidly selects the boundary vectors according to the nature of the problem. Finally, only critical boundary vectors and k -means centers are used as the new template set for classification. Experimental results for 24 data sets show that the proposed algorithm can effectively reduce the number of template vectors for classification with a high learning speed. At the same time, it improves the accuracy by an average of 2.17% compared with the traditional NN classifiers and also shows greater accuracy than seven other TR methods. We have shown the feasibility of using a proof-of-concept FPGA system of 256 64-D vectors to accelerate the proposed method on hardware. At a 50-MHz clock frequency, the proposed system achieves a 3.86 times higher learning speed than on a 3.4-GHz PC, while consuming only 1% of the power of that used by the PC.
Transportation Modes Classification Using Sensors on Smartphones.
Fang, Shih-Hau; Liao, Hao-Hsiang; Fei, Yu-Xiang; Chen, Kai-Hsiang; Huang, Jen-Wei; Lu, Yu-Ding; Tsao, Yu
2016-08-19
This paper investigates the transportation and vehicular modes classification by using big data from smartphone sensors. The three types of sensors used in this paper include the accelerometer, magnetometer, and gyroscope. This study proposes improved features and uses three machine learning algorithms including decision trees, K-nearest neighbor, and support vector machine to classify the user's transportation and vehicular modes. In the experiments, we discussed and compared the performance from different perspectives including the accuracy for both modes, the executive time, and the model size. Results show that the proposed features enhance the accuracy, in which the support vector machine provides the best performance in classification accuracy whereas it consumes the largest prediction time. This paper also investigates the vehicle classification mode and compares the results with that of the transportation modes.
Transportation Modes Classification Using Sensors on Smartphones
Fang, Shih-Hau; Liao, Hao-Hsiang; Fei, Yu-Xiang; Chen, Kai-Hsiang; Huang, Jen-Wei; Lu, Yu-Ding; Tsao, Yu
2016-01-01
This paper investigates the transportation and vehicular modes classification by using big data from smartphone sensors. The three types of sensors used in this paper include the accelerometer, magnetometer, and gyroscope. This study proposes improved features and uses three machine learning algorithms including decision trees, K-nearest neighbor, and support vector machine to classify the user’s transportation and vehicular modes. In the experiments, we discussed and compared the performance from different perspectives including the accuracy for both modes, the executive time, and the model size. Results show that the proposed features enhance the accuracy, in which the support vector machine provides the best performance in classification accuracy whereas it consumes the largest prediction time. This paper also investigates the vehicle classification mode and compares the results with that of the transportation modes. PMID:27548182
Consistent latent position estimation and vertex classification for random dot product graphs.
Sussman, Daniel L; Tang, Minh; Priebe, Carey E
2014-01-01
In this work, we show that using the eigen-decomposition of the adjacency matrix, we can consistently estimate latent positions for random dot product graphs provided the latent positions are i.i.d. from some distribution. If class labels are observed for a number of vertices tending to infinity, then we show that the remaining vertices can be classified with error converging to Bayes optimal using the $(k)$-nearest-neighbors classification rule. We evaluate the proposed methods on simulated data and a graph derived from Wikipedia.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 49 Transportation 2 2014-10-01 2014-10-01 false Separation distances for undeveloped film from... Classification of Material § 175.706 Separation distances for undeveloped film from packages containing Class 7... film. Transport index Minimum separation distance to nearest undeveloped film for various times in...
Code of Federal Regulations, 2011 CFR
2011-10-01
... 49 Transportation 2 2011-10-01 2011-10-01 false Separation distances for undeveloped film from... Classification of Material § 175.706 Separation distances for undeveloped film from packages containing Class 7... film. Transport index Minimum separation distance to nearest undeveloped film for various times in...
Code of Federal Regulations, 2013 CFR
2013-10-01
... 49 Transportation 2 2013-10-01 2013-10-01 false Separation distances for undeveloped film from... Classification of Material § 175.706 Separation distances for undeveloped film from packages containing Class 7... film. Transport index Minimum separation distance to nearest undeveloped film for various times in...
Code of Federal Regulations, 2012 CFR
2012-10-01
... 49 Transportation 2 2012-10-01 2012-10-01 false Separation distances for undeveloped film from... Classification of Material § 175.706 Separation distances for undeveloped film from packages containing Class 7... film. Transport index Minimum separation distance to nearest undeveloped film for various times in...
Code of Federal Regulations, 2010 CFR
2010-10-01
... 49 Transportation 2 2010-10-01 2010-10-01 false Separation distances for undeveloped film from... Classification of Material § 175.706 Separation distances for undeveloped film from packages containing Class 7... film. Transport index Minimum separation distance to nearest undeveloped film for various times in...
Nearest Neighbor Classification Using a Density Sensitive Distance Measurement
2009-09-01
both the proposed density sensitive distance measurement and Euclidean distance are compared on the Wisconsin Diagnostic Breast Cancer dataset and...proposed density sensitive distance measurement and Euclidean distance are compared on the Wisconsin Diagnostic Breast Cancer dataset and the MNIST...35 1. The Wisconsin Diagnostic Breast Cancer (WDBC) Dataset..........35 2. The
New feature extraction method for classification of agricultural products from x-ray images
NASA Astrophysics Data System (ADS)
Talukder, Ashit; Casasent, David P.; Lee, Ha-Woon; Keagy, Pamela M.; Schatzki, Thomas F.
1999-01-01
Classification of real-time x-ray images of randomly oriented touching pistachio nuts is discussed. The ultimate objective is the development of a system for automated non- invasive detection of defective product items on a conveyor belt. We discuss the extraction of new features that allow better discrimination between damaged and clean items. This feature extraction and classification stage is the new aspect of this paper; our new maximum representation and discrimination between damaged and clean items. This feature extraction and classification stage is the new aspect of this paper; our new maximum representation and discriminating feature (MRDF) extraction method computes nonlinear features that are used as inputs to a new modified k nearest neighbor classifier. In this work the MRDF is applied to standard features. The MRDF is robust to various probability distributions of the input class and is shown to provide good classification and new ROC data.
Video based object representation and classification using multiple covariance matrices.
Zhang, Yurong; Liu, Quan
2017-01-01
Video based object recognition and classification has been widely studied in computer vision and image processing area. One main issue of this task is to develop an effective representation for video. This problem can generally be formulated as image set representation. In this paper, we present a new method called Multiple Covariance Discriminative Learning (MCDL) for image set representation and classification problem. The core idea of MCDL is to represent an image set using multiple covariance matrices with each covariance matrix representing one cluster of images. Firstly, we use the Nonnegative Matrix Factorization (NMF) method to do image clustering within each image set, and then adopt Covariance Discriminative Learning on each cluster (subset) of images. At last, we adopt KLDA and nearest neighborhood classification method for image set classification. Promising experimental results on several datasets show the effectiveness of our MCDL method.
Examining change detection approaches for tropical mangrove monitoring
Myint, Soe W.; Franklin, Janet; Buenemann, Michaela; Kim, Won; Giri, Chandra
2014-01-01
This study evaluated the effectiveness of different band combinations and classifiers (unsupervised, supervised, object-oriented nearest neighbor, and object-oriented decision rule) for quantifying mangrove forest change using multitemporal Landsat data. A discriminant analysis using spectra of different vegetation types determined that bands 2 (0.52 to 0.6 μm), 5 (1.55 to 1.75 μm), and 7 (2.08 to 2.35 μm) were the most effective bands for differentiating mangrove forests from surrounding land cover types. A ranking of thirty-six change maps, produced by comparing the classification accuracy of twelve change detection approaches, was used. The object-based Nearest Neighbor classifier produced the highest mean overall accuracy (84 percent) regardless of band combinations. The automated decision rule-based approach (mean overall accuracy of 88 percent) as well as a composite of bands 2, 5, and 7 used with the unsupervised classifier and the same composite or all band difference with the object-oriented Nearest Neighbor classifier were the most effective approaches.
Sentinel-2 Level 2A Prototype Processor: Architecture, Algorithms And First Results
NASA Astrophysics Data System (ADS)
Muller-Wilm, Uwe; Louis, Jerome; Richter, Rudolf; Gascon, Ferran; Niezette, Marc
2013-12-01
Sen2Core is a prototype processor for Sentinel-2 Level 2A product processing and formatting. The processor is developed for and with ESA and performs the tasks of Atmospheric Correction and Scene Classification of Level 1C input data. Level 2A outputs are: Bottom-Of- Atmosphere (BOA) corrected reflectance images, Aerosol Optical Thickness-, Water Vapour-, Scene Classification maps and Quality indicators, including cloud and snow probabilities. The Level 2A Product Formatting performed by the processor follows the specification of the Level 1C User Product.
An optimized video system for augmented reality in endodontics: a feasibility study.
Bruellmann, D D; Tjaden, H; Schwanecke, U; Barth, P
2013-03-01
We propose an augmented reality system for the reliable detection of root canals in video sequences based on a k-nearest neighbor color classification and introduce a simple geometric criterion for teeth. The new software was implemented using C++, Qt, and the image processing library OpenCV. Teeth are detected in video images to restrict the segmentation of the root canal orifices by using a k-nearest neighbor algorithm. The location of the root canal orifices were determined using Euclidean distance-based image segmentation. A set of 126 human teeth with known and verified locations of the root canal orifices was used for evaluation. The software detects root canals orifices for automatic classification of the teeth in video images and stores location and size of the found structures. Overall 287 of 305 root canals were correctly detected. The overall sensitivity was about 94 %. Classification accuracy for molars ranged from 65.0 to 81.2 % and from 85.7 to 96.7 % for premolars. The realized software shows that observations made in anatomical studies can be exploited to automate real-time detection of root canal orifices and tooth classification with a software system. Automatic storage of location, size, and orientation of the found structures with this software can be used for future anatomical studies. Thus, statistical tables with canal locations will be derived, which can improve anatomical knowledge of the teeth to alleviate root canal detection in the future. For this purpose the software is freely available at: http://www.dental-imaging.zahnmedizin.uni-mainz.de/.
Thermography based diagnosis of ruptured anterior cruciate ligament (ACL) in canines
NASA Astrophysics Data System (ADS)
Lama, Norsang; Umbaugh, Scott E.; Mishra, Deependra; Dahal, Rohini; Marino, Dominic J.; Sackman, Joseph
2016-09-01
Anterior cruciate ligament (ACL) rupture in canines is a common orthopedic injury in veterinary medicine. Veterinarians use both imaging and non-imaging methods to diagnose the disease. Common imaging methods such as radiography, computed tomography (CT scan) and magnetic resonance imaging (MRI) have some disadvantages: expensive setup, high dose of radiation, and time-consuming. In this paper, we present an alternative diagnostic method based on feature extraction and pattern classification (FEPC) to diagnose abnormal patterns in ACL thermograms. The proposed method was experimented with a total of 30 thermograms for each camera view (anterior, lateral and posterior) including 14 disease and 16 non-disease cases provided from Long Island Veterinary Specialists. The normal and abnormal patterns in thermograms are analyzed in two steps: feature extraction and pattern classification. Texture features based on gray level co-occurrence matrices (GLCM), histogram features and spectral features are extracted from the color normalized thermograms and the computed feature vectors are applied to Nearest Neighbor (NN) classifier, K-Nearest Neighbor (KNN) classifier and Support Vector Machine (SVM) classifier with leave-one-out validation method. The algorithm gives the best classification success rate of 86.67% with a sensitivity of 85.71% and a specificity of 87.5% in ACL rupture detection using NN classifier for the lateral view and Norm-RGB-Lum color normalization method. Our results show that the proposed method has the potential to detect ACL rupture in canines.
Landscape scale mapping of forest inventory data by nearest neighbor classification
Andrew Lister
2009-01-01
One of the goals of the Forest Service, U.S. Department of Agriculture's Forest Inventory and Analysis (FIA) program is large-area mapping. FIA scientists have tried many methods in the past, including geostatistical methods, linear modeling, nonlinear modeling, and simple choropleth and dot maps. Mapping methods that require individual model-based maps to be...
Analysis of A Drug Target-based Classification System using Molecular Descriptors.
Lu, Jing; Zhang, Pin; Bi, Yi; Luo, Xiaomin
2016-01-01
Drug-target interaction is an important topic in drug discovery and drug repositioning. KEGG database offers a drug annotation and classification using a target-based classification system. In this study, we gave an investigation on five target-based classes: (I) G protein-coupled receptors; (II) Nuclear receptors; (III) Ion channels; (IV) Enzymes; (V) Pathogens, using molecular descriptors to represent each drug compound. Two popular feature selection methods, maximum relevance minimum redundancy and incremental feature selection, were adopted to extract the important descriptors. Meanwhile, an optimal prediction model based on nearest neighbor algorithm was constructed, which got the best result in identifying drug target-based classes. Finally, some key descriptors were discussed to uncover their important roles in the identification of drug-target classes.
Modeling ready biodegradability of fragrance materials.
Ceriani, Lidia; Papa, Ester; Kovarich, Simona; Boethling, Robert; Gramatica, Paola
2015-06-01
In the present study, quantitative structure activity relationships were developed for predicting ready biodegradability of approximately 200 heterogeneous fragrance materials. Two classification methods, classification and regression tree (CART) and k-nearest neighbors (kNN), were applied to perform the modeling. The models were validated with multiple external prediction sets, and the structural applicability domain was verified by the leverage approach. The best models had good sensitivity (internal ≥80%; external ≥68%), specificity (internal ≥80%; external 73%), and overall accuracy (≥75%). Results from the comparison with BIOWIN global models, based on group contribution method, show that specific models developed in the present study perform better in prediction than BIOWIN6, in particular for the correct classification of not readily biodegradable fragrance materials. © 2015 SETAC.
Improving the accuracy of k-nearest neighbor using local mean based and distance weight
NASA Astrophysics Data System (ADS)
Syaliman, K. U.; Nababan, E. B.; Sitompul, O. S.
2018-03-01
In k-nearest neighbor (kNN), the determination of classes for new data is normally performed by a simple majority vote system, which may ignore the similarities among data, as well as allowing the occurrence of a double majority class that can lead to misclassification. In this research, we propose an approach to resolve the majority vote issues by calculating the distance weight using a combination of local mean based k-nearest neighbor (LMKNN) and distance weight k-nearest neighbor (DWKNN). The accuracy of results is compared to the accuracy acquired from the original k-NN method using several datasets from the UCI Machine Learning repository, Kaggle and Keel, such as ionosphare, iris, voice genre, lower back pain, and thyroid. In addition, the proposed method is also tested using real data from a public senior high school in city of Tualang, Indonesia. Results shows that the combination of LMKNN and DWKNN was able to increase the classification accuracy of kNN, whereby the average accuracy on test data is 2.45% with the highest increase in accuracy of 3.71% occurring on the lower back pain symptoms dataset. For the real data, the increase in accuracy is obtained as high as 5.16%.
Determination of authenticity of brand perfume using electronic nose prototypes
NASA Astrophysics Data System (ADS)
Gebicki, Jacek; Szulczynski, Bartosz; Kaminski, Marian
2015-12-01
The paper presents the practical application of an electronic nose technique for fast and efficient discrimination between authentic and fake perfume samples. Two self-built electronic nose prototypes equipped with a set of semiconductor sensors were employed for that purpose. Additionally 10 volunteers took part in the sensory analysis. The following perfumes and their fake counterparts were analysed: Dior—Fahrenheit, Eisenberg—J’ose, YSL—La nuit de L’homme, 7 Loewe and Spice Bomb. The investigations were carried out using the headspace of the aqueous solutions. Data analysis utilized multidimensional techniques: principle component analysis (PCA), linear discrimination analysis (LDA), k-nearest neighbour (k-NN). The results obtained confirmed the legitimacy of the electronic nose technique as an alternative to the sensory analysis as far as the determination of authenticity of perfume is concerned.
Assessment of various supervised learning algorithms using different performance metrics
NASA Astrophysics Data System (ADS)
Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.
2017-11-01
Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.
NASA Astrophysics Data System (ADS)
Lin, Y.; Chen, X.
2016-12-01
Land cover classification systems used in remote sensing image data have been developed to meet the needs for depicting land covers in scientific investigations and policy decisions. However, accuracy assessments of a spate of data sets demonstrate that compared with the real physiognomy, each of the thematic map of specific land cover classification system contains some unavoidable flaws and unintended deviation. This work proposes a web-based land cover classification system, an integrated prototype, based on an ontology model of various classification systems, each of which is assigned the same weight in the final determination of land cover type. Ontology, a formal explication of specific concepts and relations, is employed in this prototype to build up the connections among different systems to resolve the naming conflicts. The process is initialized by measuring semantic similarity between terminologies in the systems and the search key to produce certain set of satisfied classifications, and carries on through searching the predefined relations in concepts of all classification systems to generate classification maps with user-specified land cover type highlighted, based on probability calculated by votes from data sets with different classification system adopted. The present system is verified and validated by comparing the classification results with those most common systems. Due to full consideration and meaningful expression of each classification system using ontology and the convenience that the web brings with itself, this system, as a preliminary model, proposes a flexible and extensible architecture for classification system integration and data fusion, thereby providing a strong foundation for the future work.
Proximal humeral fracture classification systems revisited.
Majed, Addie; Macleod, Iain; Bull, Anthony M J; Zyto, Karol; Resch, Herbert; Hertel, Ralph; Reilly, Peter; Emery, Roger J H
2011-10-01
This study evaluated several classification systems and expert surgeons' anatomic understanding of these complex injuries based on a consecutive series of patients. We hypothesized that current proximal humeral fracture classification systems, regardless of imaging methods, are not sufficiently reliable to aid clinical management of these injuries. Complex fractures in 96 consecutive patients were investigated by generation of rapid sequence prototyping models from computed tomography Digital Imaging and Communications in Medicine (DICOM) imaging data. Four independent senior observers were asked to classify each model using 4 classification systems: Neer, AO, Codman-Hertel, and a prototype classification system by Resch. Interobserver and intraobserver κ coefficient values were calculated for the overall classification system and for selected classification items. The κ coefficient values for the interobserver reliability were 0.33 for Neer, 0.11 for AO, 0.44 for Codman-Hertel, and 0.15 for Resch. Interobserver reliability κ coefficient values were 0.32 for the number of fragments and 0.30 for the anatomic segment involved using the Neer system, 0.30 for the AO type (A, B, C), and 0.53, 0.48, and 0.08 for the Resch impaction/distraction, varus/valgus and flexion/extension subgroups, respectively. Three-part fractures showed low reliability for the Neer and AO systems. Currently available evidence suggests fracture classifications in use have poor intra- and inter-observer reliability despite the modality of imaging used thus making treating these injuries difficult as weak as affecting scientific research as well. This study was undertaken to evaluate the reliability of several systems using rapid sequence prototype models. Overall interobserver κ values represented slight to moderate agreement. The most reliable interobserver scores were found with the Codman-Hertel classification, followed by elements of Resch's trial system. The AO system had the lowest values. The higher interobserver reliability values for the Codman-Hertel system showed that is the only comprehensive fracture description studied, whereas the novel classification by Resch showed clear definition in respect to varus/valgus and impaction/distraction angulation. Copyright © 2011 Journal of Shoulder and Elbow Surgery Board of Trustees. All rights reserved.
Personality Subtypes in Adolescents with Eating Disorders: Validation of a Classification Approach
ERIC Educational Resources Information Center
Thompson-Brenner, Heather; Eddy, Kamryn T.; Satir, Dana A.; Boisseau, Christina L.; Westen, Drew
2008-01-01
Background: Research has identified three personality subtypes in adults with eating disorders (EDs): a high-functioning, an undercontrolled, and an overcontrolled group. The current study investigated whether similar personality prototypes exist in adolescents with EDs, and whether these personality prototypes show relationships to external…
Spectrum, symmetries, and dynamics of Heisenberg spin-1/2 chains
NASA Astrophysics Data System (ADS)
Joel, Kira; Kollmar, Davida; Santos, Lea
2013-03-01
Quantum spin chains are prototype quantum many-body systems. They are employed in the description of various complex physical phenomena. Here we provide an introduction to the subject by focusing on the time evolution of Heisenberg spin-1/2 chains with couplings between nearest-neighbor sites only. We study how the anisotropy parameter and the symmetries of the model affect its time evolution. Our predictions are based on the analysis of the eigenvalues and eigenstates of the system and then confirmed with actual numerical results.
Can We Speculate Running Application With Server Power Consumption Trace?
Li, Yuanlong; Hu, Han; Wen, Yonggang; Zhang, Jun
2018-05-01
In this paper, we propose to detect the running applications in a server by classifying the observed power consumption series for the purpose of data center energy consumption monitoring and analysis. Time series classification problem has been extensively studied with various distance measurements developed; also recently the deep learning-based sequence models have been proved to be promising. In this paper, we propose a novel distance measurement and build a time series classification algorithm hybridizing nearest neighbor and long short term memory (LSTM) neural network. More specifically, first we propose a new distance measurement termed as local time warping (LTW), which utilizes a user-specified index set for local warping, and is designed to be noncommutative and nondynamic programming. Second, we hybridize the 1-nearest neighbor (1NN)-LTW and LSTM together. In particular, we combine the prediction probability vector of 1NN-LTW and LSTM to determine the label of the test cases. Finally, using the power consumption data from a real data center, we show that the proposed LTW can improve the classification accuracy of dynamic time warping (DTW) from about 84% to 90%. Our experimental results prove that the proposed LTW is competitive on our data set compared with existed DTW variants and its noncommutative feature is indeed beneficial. We also test a linear version of LTW and find out that it can perform similar to state-of-the-art DTW-based method while it runs as fast as the linear runtime lower bound methods like LB_Keogh for our problem. With the hybrid algorithm, for the power series classification task we achieve an accuracy up to about 93%. Our research can inspire more studies on time series distance measurement and the hybrid of the deep learning models with other traditional models.
Jiménez-Carvelo, Ana M; Pérez-Castaño, Estefanía; González-Casado, Antonio; Cuadros-Rodríguez, Luis
2017-04-15
A new method for differentiation of olive oil (independently of the quality category) from other vegetable oils (canola, safflower, corn, peanut, seeds, grapeseed, palm, linseed, sesame and soybean) has been developed. The analytical procedure for chromatographic fingerprinting of the methyl-transesterified fraction of each vegetable oil, using normal-phase liquid chromatography, is described and the chemometric strategies applied and discussed. Some chemometric methods, such as k-nearest neighbours (kNN), partial least squared-discriminant analysis (PLS-DA), support vector machine classification analysis (SVM-C), and soft independent modelling of class analogies (SIMCA), were applied to build classification models. Performance of the classification was evaluated and ranked using several classification quality metrics. The discriminant analysis, based on the use of one input-class, (plus a dummy class) was applied for the first time in this study. Copyright © 2016 Elsevier Ltd. All rights reserved.
Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.
Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi
2013-01-01
The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.
Zhang, Y N
2017-01-01
Parkinson's disease (PD) is primarily diagnosed by clinical examinations, such as walking test, handwriting test, and MRI diagnostic. In this paper, we propose a machine learning based PD telediagnosis method for smartphone. Classification of PD using speech records is a challenging task owing to the fact that the classification accuracy is still lower than doctor-level. Here we demonstrate automatic classification of PD using time frequency features, stacked autoencoders (SAE), and K nearest neighbor (KNN) classifier. KNN classifier can produce promising classification results from useful representations which were learned by SAE. Empirical results show that the proposed method achieves better performance with all tested cases across classification tasks, demonstrating machine learning capable of classifying PD with a level of competence comparable to doctor. It concludes that a smartphone can therefore potentially provide low-cost PD diagnostic care. This paper also gives an implementation on browser/server system and reports the running time cost. Both advantages and disadvantages of the proposed telediagnosis system are discussed.
2017-01-01
Parkinson's disease (PD) is primarily diagnosed by clinical examinations, such as walking test, handwriting test, and MRI diagnostic. In this paper, we propose a machine learning based PD telediagnosis method for smartphone. Classification of PD using speech records is a challenging task owing to the fact that the classification accuracy is still lower than doctor-level. Here we demonstrate automatic classification of PD using time frequency features, stacked autoencoders (SAE), and K nearest neighbor (KNN) classifier. KNN classifier can produce promising classification results from useful representations which were learned by SAE. Empirical results show that the proposed method achieves better performance with all tested cases across classification tasks, demonstrating machine learning capable of classifying PD with a level of competence comparable to doctor. It concludes that a smartphone can therefore potentially provide low-cost PD diagnostic care. This paper also gives an implementation on browser/server system and reports the running time cost. Both advantages and disadvantages of the proposed telediagnosis system are discussed. PMID:29075547
Computational approaches for the classification of seed storage proteins.
Radhika, V; Rao, V Sree Hari
2015-07-01
Seed storage proteins comprise a major part of the protein content of the seed and have an important role on the quality of the seed. These storage proteins are important because they determine the total protein content and have an effect on the nutritional quality and functional properties for food processing. Transgenic plants are being used to develop improved lines for incorporation into plant breeding programs and the nutrient composition of seeds is a major target of molecular breeding programs. Hence, classification of these proteins is crucial for the development of superior varieties with improved nutritional quality. In this study we have applied machine learning algorithms for classification of seed storage proteins. We have presented an algorithm based on nearest neighbor approach for classification of seed storage proteins and compared its performance with decision tree J48, multilayer perceptron neural (MLP) network and support vector machine (SVM) libSVM. The model based on our algorithm has been able to give higher classification accuracy in comparison to the other methods.
Stephens, David; Diesing, Markus
2014-01-01
Detailed seabed substrate maps are increasingly in demand for effective planning and management of marine ecosystems and resources. It has become common to use remotely sensed multibeam echosounder data in the form of bathymetry and acoustic backscatter in conjunction with ground-truth sampling data to inform the mapping of seabed substrates. Whilst, until recently, such data sets have typically been classified by expert interpretation, it is now obvious that more objective, faster and repeatable methods of seabed classification are required. This study compares the performances of a range of supervised classification techniques for predicting substrate type from multibeam echosounder data. The study area is located in the North Sea, off the north-east coast of England. A total of 258 ground-truth samples were classified into four substrate classes. Multibeam bathymetry and backscatter data, and a range of secondary features derived from these datasets were used in this study. Six supervised classification techniques were tested: Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forest and Naive Bayes. Each classifier was trained multiple times using different input features, including i) the two primary features of bathymetry and backscatter, ii) a subset of the features chosen by a feature selection process and iii) all of the input features. The predictive performances of the models were validated using a separate test set of ground-truth samples. The statistical significance of model performances relative to a simple baseline model (Nearest Neighbour predictions on bathymetry and backscatter) were tested to assess the benefits of using more sophisticated approaches. The best performing models were tree based methods and Naive Bayes which achieved accuracies of around 0.8 and kappa coefficients of up to 0.5 on the test set. The models that used all input features didn't generally perform well, highlighting the need for some means of feature selection.
NASA Astrophysics Data System (ADS)
Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.
2016-03-01
The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care, this technology may improve the standard of burn care for patients without access to specialized facilities.
Nonlinear features for product inspection
NASA Astrophysics Data System (ADS)
Talukder, Ashit; Casasent, David P.
1999-03-01
Classification of real-time X-ray images of randomly oriented touching pistachio nuts is discussed. The ultimate objective is the development of a system for automated non-invasive detection of defective product items on a conveyor belt. We discuss the extraction of new features that allow better discrimination between damaged and clean items (pistachio nuts). This feature extraction and classification stage is the new aspect of this paper; our new maximum representation and discriminating feature (MRDF) extraction method computes nonlinear features that are used as inputs to a new modified k nearest neighbor classifier. In this work, the MRDF is applied to standard features (rather than iconic data). The MRDF is robust to various probability distributions of the input class and is shown to provide good classification and new ROC (receiver operating characteristic) data.
Song, Wen; Ade, Carl; Broxterman, Ryan; Barstow, Thomas; Nelson, Thomas; Warren, Steve
2012-01-01
Accelerometer data provide useful information about subject activity in many different application scenarios. For this study, single-accelerometer data were acquired from subjects participating in field tests that mimic tasks that astronauts might encounter in reduced gravity environments. The primary goal of this effort was to apply classification algorithms that could identify these tasks based on features present in their corresponding accelerometer data, where the end goal is to establish methods to unobtrusively gauge subject well-being based on sensors that reside in their local environment. In this initial analysis, six different activities that involve leg movement are classified. The k-Nearest Neighbors (kNN) algorithm was found to be the most effective, with an overall classification success rate of 90.8%.
Motion data classification on the basis of dynamic time warping with a cloud point distance measure
NASA Astrophysics Data System (ADS)
Switonski, Adam; Josinski, Henryk; Zghidi, Hafedh; Wojciechowski, Konrad
2016-06-01
The paper deals with the problem of classification of model free motion data. The nearest neighbors classifier which is based on comparison performed by Dynamic Time Warping transform with cloud point distance measure is proposed. The classification utilizes both specific gait features reflected by a movements of subsequent skeleton joints and anthropometric data. To validate proposed approach human gait identification challenge problem is taken into consideration. The motion capture database containing data of 30 different humans collected in Human Motion Laboratory of Polish-Japanese Academy of Information Technology is used. The achieved results are satisfactory, the obtained accuracy of human recognition exceeds 90%. What is more, the applied cloud point distance measure does not depend on calibration process of motion capture system which results in reliable validation.
Delavarian, Mona; Towhidkhah, Farzad; Gharibzadeh, Shahriar; Dibajnia, Parvin
2011-07-12
Automatic classification of different behavioral disorders with many similarities (e.g. in symptoms) by using an automated approach will help psychiatrists to concentrate on correct disorder and its treatment as soon as possible, to avoid wasting time on diagnosis, and to increase the accuracy of diagnosis. In this study, we tried to differentiate and classify (diagnose) 306 children with many similar symptoms and different behavioral disorders such as ADHD, depression, anxiety, comorbid depression and anxiety and conduct disorder with high accuracy. Classification was based on the symptoms and their severity. With examining 16 different available classifiers, by using "Prtools", we have proposed nearest mean classifier as the most accurate classifier with 96.92% accuracy in this research. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zagouras, Athanassios; Argiriou, Athanassios A.; Flocas, Helena A.; Economou, George; Fotopoulos, Spiros
2012-11-01
Classification of weather maps at various isobaric levels as a methodological tool is used in several problems related to meteorology, climatology, atmospheric pollution and to other fields for many years. Initially the classification was performed manually. The criteria used by the person performing the classification are features of isobars or isopleths of geopotential height, depending on the type of maps to be classified. Although manual classifications integrate the perceptual experience and other unquantifiable qualities of the meteorology specialists involved, these are typically subjective and time consuming. Furthermore, during the last years different approaches of automated methods for atmospheric circulation classification have been proposed, which present automated and so-called objective classifications. In this paper a new method of atmospheric circulation classification of isobaric maps is presented. The method is based on graph theory. It starts with an intelligent prototype selection using an over-partitioning mode of fuzzy c-means (FCM) algorithm, proceeds to a graph formulation for the entire dataset and produces the clusters based on the contemporary dominant sets clustering method. Graph theory is a novel mathematical approach, allowing a more efficient representation of spatially correlated data, compared to the classical Euclidian space representation approaches, used in conventional classification methods. The method has been applied to the classification of 850 hPa atmospheric circulation over the Eastern Mediterranean. The evaluation of the automated methods is performed by statistical indexes; results indicate that the classification is adequately comparable with other state-of-the-art automated map classification methods, for a variable number of clusters.
U.S. Coast Guard Fleet Mix Planning: A Decision Support System Prototype
1991-03-01
91-16785 Al ’ 1 1 1 Unclassified SECURITY CLASSIFICATION OF ThIS PAGE REPORT DOCUMENTATION PAGE I L REPORTSECURITY CLASSIFICATION lb. RESTRICTIVE...MARKINGS Unclassified 2a. SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION/ AVAILABITY OF REPORT Approved for public release; distribution is inlimited...2b. DECIASSIFICATION/DOWNGRADING SCHEDULE 4. PERFORMING ORGANIZATION REPORT NUMBER(S) 5. MONITORING ORGANIZATION REPORT NUMBER(S) 6a. NAME OF
Multi-dimensionality and variability in folk classification of stingless bees (Apidae: Meliponini).
Zamudio, Fernando; Hilgert, Norma I
2015-05-23
Not long ago Eugene Hunn suggested using a combination of cognitive, linguistic, ecological and evolutionary theories in order to account for the dynamic character of ethnoecology in the study of folk classification systems. In this way he intended to question certain homogeneity in folk classifications models and deepen in the analysis and interpretation of variability in folk classifications. This paper studies how a rural culturally mixed population of the Atlantic Forest of Misiones (Argentina) classified honey-producing stingless bees according to the linguistic, cognitive and ecological dimensions of folk classification. We also analyze the socio-ecological meaning of binomialization in naming and the meaning of general local variability in the appointment of stingless bees. We used three different approaches: the classical approach developed by Brent Berlin which relies heavily on linguistic criteria, the approach developed by Eleonor Rosch which relies on psychological (cognitive) principles of categorization and finally we have captured the ecological dimension of folk classification in local narratives. For the second approximation, we developed ways of measuring the degree of prototypicality based on a total of 107 comparisons of the type "X is similar to Y" identified in personal narratives. Various logical and grouping strategies coexist and were identified as: graded of lateral linkage, hierarchical and functional. Similarity judgments among folk taxa resulted in an implicit logic of classification graded according to taxa's prototypicality. While there is a high agreement on naming stingless bees with monomial names, a considerable number of underrepresented binomial names and lack of names were observed. Two possible explanations about reported local naming variability are presented. We support the multidimensionality of folk classification systems. This confirms the specificity of local classification systems but also reflects the use of grouping strategies and mechanisms commonly observed in other cultural groups, such as the use of similarity judgments between more or less prototypical organisms. Also we support the idea that alternative naming results from a process of fragmentation of knowledge or incomplete transmission of knowledge. These processes lean on the facts that culturally based knowledge, on the one hand, and biologic knowledge of nature on the other, can be acquired through different learning pathways.
Classification Model for Damage Localization in a Plate Structure
NASA Astrophysics Data System (ADS)
Janeliukstis, R.; Ruchevskis, S.; Chate, A.
2018-01-01
The present study is devoted to the problem of damage localization by means of data classification. The commercial ANSYS finite-elements program was used to make a model of a cantilevered composite plate equipped with numerous strain sensors. The plate was divided into zones, and, for data classification purposes, each of them housed several points to which a point mass of magnitude 5 and 10% of plate mass was applied. At each of these points, a numerical modal analysis was performed, from which the first few natural frequencies and strain readings were extracted. The strain data for every point were the input for a classification procedure involving k nearest neighbors and decision trees. The classification model was trained and optimized by finetuning the key parameters of both algorithms. Finally, two new query points were simulated and subjected to a classification in terms of assigning a label to one of the zones of the plate, thus localizing these points. Damage localization results were compared for both algorithms and were found to be in good agreement with the actual application positions of point load.
NASA Astrophysics Data System (ADS)
Zhang, Min; Zhou, Xiangrong; Goshima, Satoshi; Chen, Huayue; Muramatsu, Chisako; Hara, Takeshi; Yokoyama, Ryojiro; Kanematsu, Masayuki; Fujita, Hiroshi
2012-03-01
We aim at using a new texton based texture classification method in the classification of pulmonary emphysema in computed tomography (CT) images of the lungs. Different from conventional computer-aided diagnosis (CAD) pulmonary emphysema classification methods, in this paper, firstly, the dictionary of texton is learned via applying sparse representation(SR) to image patches in the training dataset. Then the SR coefficients of the test images over the dictionary are used to construct the histograms for texture presentations. Finally, classification is performed by using a nearest neighbor classifier with a histogram dissimilarity measure as distance. The proposed approach is tested on 3840 annotated regions of interest consisting of normal tissue and mild, moderate and severe pulmonary emphysema of three subtypes. The performance of the proposed system, with an accuracy of about 88%, is comparably higher than state of the art method based on the basic rotation invariant local binary pattern histograms and the texture classification method based on texton learning by k-means, which performs almost the best among other approaches in the literature.
Voice based gender classification using machine learning
NASA Astrophysics Data System (ADS)
Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.
2017-11-01
Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.
ERIC Educational Resources Information Center
Chin-Parker, Seth; Ross, Brian H.
2004-01-01
Category knowledge allows for both the determination of category membership and an understanding of what the members of a category are like. Diagnostic information is used to determine category membership; prototypical information reflects the most likely features given category membership. Two experiments examined 2 means of category learning,…
Probability machines: consistent probability estimation using nonparametric learning machines.
Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A
2012-01-01
Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.
Portable Language-Independent Adaptive Translation from OCR. Phase 1
2009-04-01
including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation
Exploitation of RF-DNA for Device Classification and Verification Using GRLVQI Processing
2012-12-01
5 FLD Fisher’s Linear Discriminant . . . . . . . . . . . . . . . . . . . 6 kNN K-Nearest Neighbor...Neighbor ( kNN ), Support Vector Machine (SVM), and simple cross-correlation techniques [40, 57, 82, 88, 94, 95]. The RF-DNA fingerprinting research in...Expansion and the Dis- crete Gabor Transform on a Non-Separable Lattice”. 2000 IEEE Int’l Conf on Acoustics, Speech , and Signal Processing (ICASSP00
Integration of the Execution Support System for the Computer-Aided Prototyping System (CAPS)
1990-09-01
SUPPORT SYSTEM FOR THE COMPUTER -AIDED PROTOTYPING SYSTEM (CAPS) by Frank V. Palazzo September 1990 Thesis Advisor: Luq± Approved for public release...ZATON REPOR ,,.VBE (, 6a NAME OF PERPORMING ORGAN ZAT7ON 6b OFF:CE SYVBOL 7a NAME OF MONITORINC O0-CA’Za- ON Computer Science Department (if applicable...Include Security Classification) Integration of the Execution Support System for the Computer -Aided Prototyping System (C S) 12 PERSONAL AUTHOR(S) Frank V
Information mining in remote sensing imagery
NASA Astrophysics Data System (ADS)
Li, Jiang
The volume of remotely sensed imagery continues to grow at an enormous rate due to the advances in sensor technology, and our capability for collecting and storing images has greatly outpaced our ability to analyze and retrieve information from the images. This motivates us to develop image information mining techniques, which is very much an interdisciplinary endeavor drawing upon expertise in image processing, databases, information retrieval, machine learning, and software design. This dissertation proposes and implements an extensive remote sensing image information mining (ReSIM) system prototype for mining useful information implicitly stored in remote sensing imagery. The system consists of three modules: image processing subsystem, database subsystem, and visualization and graphical user interface (GUI) subsystem. Land cover and land use (LCLU) information corresponding to spectral characteristics is identified by supervised classification based on support vector machines (SVM) with automatic model selection, while textural features that characterize spatial information are extracted using Gabor wavelet coefficients. Within LCLU categories, textural features are clustered using an optimized k-means clustering approach to acquire search efficient space. The clusters are stored in an object-oriented database (OODB) with associated images indexed in an image database (IDB). A k-nearest neighbor search is performed using a query-by-example (QBE) approach. Furthermore, an automatic parametric contour tracing algorithm and an O(n) time piecewise linear polygonal approximation (PLPA) algorithm are developed for shape information mining of interesting objects within the image. A fuzzy object-oriented database based on the fuzzy object-oriented data (FOOD) model is developed to handle the fuzziness and uncertainty. Three specific applications are presented: integrated land cover and texture pattern mining, shape information mining for change detection of lakes, and fuzzy normalized difference vegetation index (NDVI) pattern mining. The study results show the effectiveness of the proposed system prototype and the potentials for other applications in remote sensing.
NASA Astrophysics Data System (ADS)
Wang, Dong
2016-03-01
Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.
ISE-based sensor array system for classification of foodstuffs
NASA Astrophysics Data System (ADS)
Ciosek, Patrycja; Sobanski, Tomasz; Augustyniak, Ewa; Wróblewski, Wojciech
2006-01-01
A system composed of an array of polymeric membrane ion-selective electrodes and a pattern recognition block—a so-called 'electronic tongue'—was used for the classification of liquid samples: milk, fruit juice and tonic. The task of this system was to automatically recognize a brand of the product. To analyze the measurement set-up responses various non-parametric classifiers such as k-nearest neighbours, a feedforward neural network and a probabilistic neural network were used. In order to enhance the classification ability of the system, standard model solutions of salts were measured (in order to take into account any variation in time of the working parameters of the sensors). This system was capable of recognizing the brand of the products with accuracy ranging from 68% to 100% (in the case of the best classifier).
Carbon p Electron Ferromagnetism in Silicon Carbide
Wang, Yutian; Liu, Yu; Wang, Gang; Anwand, Wolfgang; Jenkins, Catherine A.; Arenholz, Elke; Munnik, Frans; Gordan, Ovidiu D.; Salvan, Georgeta; Zahn, Dietrich R. T.; Chen, Xiaolong; Gemming, Sibylle; Helm, Manfred; Zhou, Shengqiang
2015-01-01
Ferromagnetism can occur in wide-band gap semiconductors as well as in carbon-based materials when specific defects are introduced. It is thus desirable to establish a direct relation between the defects and the resulting ferromagnetism. Here, we contribute to revealing the origin of defect-induced ferromagnetism using SiC as a prototypical example. We show that the long-range ferromagnetic coupling can be attributed to the p electrons of the nearest-neighbor carbon atoms around the VSiVC divacancies. Thus, the ferromagnetism is traced down to its microscopic electronic origin. PMID:25758040
Carbon p electron ferromagnetism in silicon carbide
Wang, Yutian; Liu, Yu; Wang, Gang; ...
2015-03-11
Ferromagnetism can occur in wide-band gap semiconductors as well as in carbon-based materials when specific defects are introduced. It is thus desirable to establish a direct relation between the defects and the resulting ferromagnetism. Here, we contribute to revealing the origin of defect-induced ferromagnetism using SiC as a prototypical example. We show that the long-range ferromagnetic coupling can be attributed to the p electrons of the nearest-neighbor carbon atoms around the V SiV C divacancies. Thus, the ferromagnetism is traced down to its microscopic electronic origin.
Meat and Fish Freshness Inspection System Based on Odor Sensing
Hasan, Najam ul; Ejaz, Naveed; Ejaz, Waleed; Kim, Hyung Seok
2012-01-01
We propose a method for building a simple electronic nose based on commercially available sensors used to sniff in the market and identify spoiled/contaminated meat stocked for sale in butcher shops. Using a metal oxide semiconductor-based electronic nose, we measured the smell signature from two of the most common meat foods (beef and fish) stored at room temperature. Food samples were divided into two groups: fresh beef with decayed fish and fresh fish with decayed beef. The prime objective was to identify the decayed item using the developed electronic nose. Additionally, we tested the electronic nose using three pattern classification algorithms (artificial neural network, support vector machine and k-nearest neighbor), and compared them based on accuracy, sensitivity, and specificity. The results demonstrate that the k-nearest neighbor algorithm has the highest accuracy. PMID:23202222
NASA Astrophysics Data System (ADS)
Sukawattanavijit, Chanika; Srestasathiern, Panu
2017-10-01
Land Use and Land Cover (LULC) information are significant to observe and evaluate environmental change. LULC classification applying remotely sensed data is a technique popularly employed on a global and local dimension particularly, in urban areas which have diverse land cover types. These are essential components of the urban terrain and ecosystem. In the present, object-based image analysis (OBIA) is becoming widely popular for land cover classification using the high-resolution image. COSMO-SkyMed SAR data was fused with THAICHOTE (namely, THEOS: Thailand Earth Observation Satellite) optical data for land cover classification using object-based. This paper indicates a comparison between object-based and pixel-based approaches in image fusion. The per-pixel method, support vector machines (SVM) was implemented to the fused image based on Principal Component Analysis (PCA). For the objectbased classification was applied to the fused images to separate land cover classes by using nearest neighbor (NN) classifier. Finally, the accuracy assessment was employed by comparing with the classification of land cover mapping generated from fused image dataset and THAICHOTE image. The object-based data fused COSMO-SkyMed with THAICHOTE images demonstrated the best classification accuracies, well over 85%. As the results, an object-based data fusion provides higher land cover classification accuracy than per-pixel data fusion.
Feature weight estimation for gene selection: a local hyperlinear learning approach
2014-01-01
Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071
NASA Astrophysics Data System (ADS)
Song, Xiaoning; Feng, Zhen-Hua; Hu, Guosheng; Yang, Xibei; Yang, Jingyu; Qi, Yunsong
2015-09-01
This paper proposes a progressive sparse representation-based classification algorithm using local discrete cosine transform (DCT) evaluation to perform face recognition. Specifically, the sum of the contributions of all training samples of each subject is first taken as the contribution of this subject, then the redundant subject with the smallest contribution to the test sample is iteratively eliminated. Second, the progressive method aims at representing the test sample as a linear combination of all the remaining training samples, by which the representation capability of each training sample is exploited to determine the optimal "nearest neighbors" for the test sample. Third, the transformed DCT evaluation is constructed to measure the similarity between the test sample and each local training sample using cosine distance metrics in the DCT domain. The final goal of the proposed method is to determine an optimal weighted sum of nearest neighbors that are obtained under the local correlative degree evaluation, which is approximately equal to the test sample, and we can use this weighted linear combination to perform robust classification. Experimental results conducted on the ORL database of faces (created by the Olivetti Research Laboratory in Cambridge), the FERET face database (managed by the Defense Advanced Research Projects Agency and the National Institute of Standards and Technology), AR face database (created by Aleix Martinez and Robert Benavente in the Computer Vision Center at U.A.B), and USPS handwritten digit database (gathered at the Center of Excellence in Document Analysis and Recognition at SUNY Buffalo) demonstrate the effectiveness of the proposed method.
Larrañaga, Ana; Bielza, Concha; Pongrácz, Péter; Faragó, Tamás; Bálint, Anna; Larrañaga, Pedro
2015-03-01
Barking is perhaps the most characteristic form of vocalization in dogs; however, very little is known about its role in the intraspecific communication of this species. Besides the obvious need for ethological research, both in the field and in the laboratory, the possible information content of barks can also be explored by computerized acoustic analyses. This study compares four different supervised learning methods (naive Bayes, classification trees, [Formula: see text]-nearest neighbors and logistic regression) combined with three strategies for selecting variables (all variables, filter and wrapper feature subset selections) to classify Mudi dogs by sex, age, context and individual from their barks. The classification accuracy of the models obtained was estimated by means of [Formula: see text]-fold cross-validation. Percentages of correct classifications were 85.13 % for determining sex, 80.25 % for predicting age (recodified as young, adult and old), 55.50 % for classifying contexts (seven situations) and 67.63 % for recognizing individuals (8 dogs), so the results are encouraging. The best-performing method was [Formula: see text]-nearest neighbors following a wrapper feature selection approach. The results for classifying contexts and recognizing individual dogs were better with this method than they were for other approaches reported in the specialized literature. This is the first time that the sex and age of domestic dogs have been predicted with the help of sound analysis. This study shows that dog barks carry ample information regarding the caller's indexical features. Our computerized analysis provides indirect proof that barks may serve as an important source of information for dogs as well.
Rajagopal, Rekha; Ranganathan, Vidhyapriya
2018-06-05
Automation in cardiac arrhythmia classification helps medical professionals make accurate decisions about the patient's health. The aim of this work was to design a hybrid classification model to classify cardiac arrhythmias. The design phase of the classification model comprises the following stages: preprocessing of the cardiac signal by eliminating detail coefficients that contain noise, feature extraction through Daubechies wavelet transform, and arrhythmia classification using a collaborative decision from the K nearest neighbor classifier (KNN) and a support vector machine (SVM). The proposed model is able to classify 5 arrhythmia classes as per the ANSI/AAMI EC57: 1998 classification standard. Level 1 of the proposed model involves classification using the KNN and the classifier is trained with examples from all classes. Level 2 involves classification using an SVM and is trained specifically to classify overlapped classes. The final classification of a test heartbeat pertaining to a particular class is done using the proposed KNN/SVM hybrid model. The experimental results demonstrated that the average sensitivity of the proposed model was 92.56%, the average specificity 99.35%, the average positive predictive value 98.13%, the average F-score 94.5%, and the average accuracy 99.78%. The results obtained using the proposed model were compared with the results of discriminant, tree, and KNN classifiers. The proposed model is able to achieve a high classification accuracy.
Acosta-Mesa, Héctor Gabriel; Cruz-Ramírez, Nicandro; Hernández-Jiménez, Rodolfo
2017-01-01
Efforts have been being made to improve the diagnostic performance of colposcopy, trying to help better diagnose cervical cancer, particularly in developing countries. However, improvements in a number of areas are still necessary, such as the time it takes to process the full digital image of the cervix, the performance of the computing systems used to identify different kinds of tissues, and biopsy sampling. In this paper, we explore three different, well-known automatic classification methods (k-Nearest Neighbors, Naïve Bayes, and C4.5), in addition to different data models that take full advantage of this information and improve the diagnostic performance of colposcopy based on acetowhite temporal patterns. Based on the ROC and PRC area scores, the k-Nearest Neighbors and discrete PLA representation performed better than other methods. The values of sensitivity, specificity, and accuracy reached using this method were 60% (95% CI 50–70), 79% (95% CI 71–86), and 70% (95% CI 60–80), respectively. The acetowhitening phenomenon is not exclusive to high-grade lesions, and we have found acetowhite temporal patterns of epithelial changes that are not precancerous lesions but that are similar to positive ones. These findings need to be considered when developing more robust computing systems in the future. PMID:28744318
Kireeva, Natalia V; Ovchinnikova, Svetlana I; Kuznetsov, Sergey L; Kazennov, Andrey M; Tsivadze, Aslan Yu
2014-02-01
This study concerns large margin nearest neighbors classifier and its multi-metric extension as the efficient approaches for metric learning which aimed to learn an appropriate distance/similarity function for considered case studies. In recent years, many studies in data mining and pattern recognition have demonstrated that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. The paper describes application of the metric learning approach to in silico assessment of chemical liabilities. Chemical liabilities, such as adverse effects and toxicity, play a significant role in drug discovery process, in silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Here, to our knowledge for the first time, a distance-based metric learning procedures have been applied for in silico assessment of chemical liabilities, the impact of metric learning on structure-activity landscapes and predictive performance of developed models has been analyzed, the learned metric was used in support vector machines. The metric learning results have been illustrated using linear and non-linear data visualization techniques in order to indicate how the change of metrics affected nearest neighbors relations and descriptor space.
Ostrand, William D.; Gotthardt, Tracey A.; Howlin, Shay; Robards, Martin D.
2005-01-01
We modeled habitat selection by Pacific sand lance (Ammodytes hexapterus) by examining their distribution in relation to water depth, distance to shore, bottom slope, bottom type, distance from sand bottom, and shoreline type. Through both logistic regression and classification tree models, we compared the characteristics of 29 known sand lance locations to 58 randomly selected sites. The best models indicated a strong selection of shallow water by sand lance, with weaker association between sand lance distribution and beach shorelines, sand bottoms, distance to shore, bottom slope, and distance to the nearest sand bottom. We applied an information-theoretic approach to the interpretation of the logistic regression analysis and determined importance values of 0.99, 0.54, 0.52, 0.44, 0.39, and 0.25 for depth, beach shorelines, sand bottom, distance to shore, gradual bottom slope, and distance to the nearest sand bottom, respectively. The classification tree model indicated that sand lance selected shallow-water habitats and remained near sand bottoms when located in habitats with depths between 40 and 60 m. All sand lance locations were at depths <60 m and 93% occurred at depths <40 m. Probable reasons for the modeled relationships between the distribution of sand lance and the independent variables are discussed.
NASA Astrophysics Data System (ADS)
Kireeva, Natalia V.; Ovchinnikova, Svetlana I.; Kuznetsov, Sergey L.; Kazennov, Andrey M.; Tsivadze, Aslan Yu.
2014-02-01
This study concerns large margin nearest neighbors classifier and its multi-metric extension as the efficient approaches for metric learning which aimed to learn an appropriate distance/similarity function for considered case studies. In recent years, many studies in data mining and pattern recognition have demonstrated that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. The paper describes application of the metric learning approach to in silico assessment of chemical liabilities. Chemical liabilities, such as adverse effects and toxicity, play a significant role in drug discovery process, in silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Here, to our knowledge for the first time, a distance-based metric learning procedures have been applied for in silico assessment of chemical liabilities, the impact of metric learning on structure-activity landscapes and predictive performance of developed models has been analyzed, the learned metric was used in support vector machines. The metric learning results have been illustrated using linear and non-linear data visualization techniques in order to indicate how the change of metrics affected nearest neighbors relations and descriptor space.
Kumar, Mukesh; Rath, Nitish Kumar; Rath, Santanu Kumar
2016-04-01
Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Qingjie; Xin, Jingmin; Wu, Jiayi; Zheng, Nanning
2017-03-01
Microaneurysms are the earliest clinic signs of diabetic retinopathy, and many algorithms were developed for the automatic classification of these specific pathology. However, the imbalanced class distribution of dataset usually causes the classification accuracy of true microaneurysms be low. Therefore, by combining the borderline synthetic minority over-sampling technique (BSMOTE) with the data cleaning techniques such as Tomek links and Wilson's edited nearest neighbor rule (ENN) to resample the imbalanced dataset, we propose two new support vector machine (SVM) classification algorithms for the microaneurysms. The proposed BSMOTE-Tomek and BSMOTE-ENN algorithms consist of: 1) the adaptive synthesis of the minority samples in the neighborhood of the borderline, and 2) the remove of redundant training samples for improving the efficiency of data utilization. Moreover, the modified SVM classifier with probabilistic outputs is used to divide the microaneurysm candidates into two groups: true microaneurysms and false microaneurysms. The experiments with a public microaneurysms database shows that the proposed algorithms have better classification performance including the receiver operating characteristic (ROC) curve and the free-response receiver operating characteristic (FROC) curve.
Discriminative Hierarchical K-Means Tree for Large-Scale Image Classification.
Chen, Shizhi; Yang, Xiaodong; Tian, Yingli
2015-09-01
A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.
Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin
2015-08-01
Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.
An ant colony optimization based feature selection for web page classification.
Saraç, Esra; Özel, Selma Ayşe
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.
In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
Canizo, Brenda V; Escudero, Leticia B; Pérez, María B; Pellerano, Roberto G; Wuilloud, Rodolfo G
2018-03-01
The feasibility of the application of chemometric techniques associated with multi-element analysis for the classification of grape seeds according to their provenance vineyard soil was investigated. Grape seed samples from different localities of Mendoza province (Argentina) were evaluated. Inductively coupled plasma mass spectrometry (ICP-MS) was used for the determination of twenty-nine elements (Ag, As, Ce, Co, Cs, Cu, Eu, Fe, Ga, Gd, La, Lu, Mn, Mo, Nb, Nd, Ni, Pr, Rb, Sm, Te, Ti, Tl, Tm, U, V, Y, Zn and Zr). Once the analytical data were collected, supervised pattern recognition techniques such as linear discriminant analysis (LDA), partial least square discriminant analysis (PLS-DA), k-nearest neighbors (k-NN), support vector machine (SVM) and Random Forest (RF) were applied to construct classification/discrimination rules. The results indicated that nonlinear methods, RF and SVM, perform best with up to 98% and 93% accuracy rate, respectively, and therefore are excellent tools for classification of grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Drivelos, Spiros A; Danezis, Georgios P; Haroutounian, Serkos A; Georgiou, Constantinos A
2016-12-15
This study examines the trace and rare earth elemental (REE) fingerprint variations of PDO (Protected Designation of Origin) "Fava Santorinis" over three consecutive harvesting years (2011-2013). Classification of samples in harvesting years was studied by performing discriminant analysis (DA), k nearest neighbours (κ-NN), partial least squares (PLS) analysis and probabilistic neural networks (PNN) using rare earth elements and trace metals determined using ICP-MS. DA performed better than κ-NN, producing 100% discrimination using trace elements and 79% using REEs. PLS was found to be superior to PNN, achieving 99% and 90% classification for trace and REEs, respectively, while PNN achieved 96% and 71% classification for trace and REEs, respectively. The information obtained using REEs did not enhance classification, indicating that REEs vary minimally per harvesting year, providing robust geographical origin discrimination. The results show that seasonal patterns can occur in the elemental composition of "Fava Santorinis", probably reflecting seasonality of climate. Copyright © 2016 Elsevier Ltd. All rights reserved.
Voxel classification based airway tree segmentation
NASA Astrophysics Data System (ADS)
Lo, Pechin; de Bruijne, Marleen
2008-03-01
This paper presents a voxel classification based method for segmenting the human airway tree in volumetric computed tomography (CT) images. In contrast to standard methods that use only voxel intensities, our method uses a more complex appearance model based on a set of local image appearance features and Kth nearest neighbor (KNN) classification. The optimal set of features for classification is selected automatically from a large set of features describing the local image structure at several scales. The use of multiple features enables the appearance model to differentiate between airway tree voxels and other voxels of similar intensities in the lung, thus making the segmentation robust to pathologies such as emphysema. The classifier is trained on imperfect segmentations that can easily be obtained using region growing with a manual threshold selection. Experiments show that the proposed method results in a more robust segmentation that can grow into the smaller airway branches without leaking into emphysematous areas, and is able to segment many branches that are not present in the training set.
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric
Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.; ...
2015-10-09
In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
Lagrangian methods of cosmic web classification
NASA Astrophysics Data System (ADS)
Fisher, J. D.; Faltenbacher, A.; Johnson, M. S. T.
2016-05-01
The cosmic web defines the large-scale distribution of matter we see in the Universe today. Classifying the cosmic web into voids, sheets, filaments and nodes allows one to explore structure formation and the role environmental factors have on halo and galaxy properties. While existing studies of cosmic web classification concentrate on grid-based methods, this work explores a Lagrangian approach where the V-web algorithm proposed by Hoffman et al. is implemented with techniques borrowed from smoothed particle hydrodynamics. The Lagrangian approach allows one to classify individual objects (e.g. particles or haloes) based on properties of their nearest neighbours in an adaptive manner. It can be applied directly to a halo sample which dramatically reduces computational cost and potentially allows an application of this classification scheme to observed galaxy samples. Finally, the Lagrangian nature admits a straightforward inclusion of the Hubble flow negating the necessity of a visually defined threshold value which is commonly employed by grid-based classification methods.
Comparison of two target classification techniques
NASA Astrophysics Data System (ADS)
Chen, J. S.; Walton, E. K.
1986-01-01
Radar target classification techniques based on backscatter measurements in the resonance region (1.0-20.0 MHz) are discussed. Attention is given to two novel methods currently being tested at the radar range of Ohio State University. The methods include: (1) the nearest neighbor (NN) algorithm for determining the radar cross section (RCS) magnitude and range corrected phase at various operating frequencies; and (2) an inverse Fourier transformation of the complex multifrequency radar returns of the time domain, followed by cross correlation analysis. Comparisons are made of the performance of the two techniques as a function of signal-to-error noise ratio for different types of processing. The results of the comparison are discussed in detail.
Quantum Correlation Properties in Composite Parity-Conserved Matrix Product States
NASA Astrophysics Data System (ADS)
Zhu, Jing-Min
2016-09-01
We give a new thought for constructing long-range quantum correlation in quantum many-body systems. Our proposed composite parity-conserved matrix product state has long-range quantum correlation only for two spin blocks where their spin-block length larger than 1 compared to any subsystem only having short-range quantum correlation, and we investigate quantum correlation properties of two spin blocks varying with environment parameter and spacing spin number. We also find that the geometry quantum discords of two nearest-neighbor spin blocks and two next-nearest-neighbor spin blocks become smaller and for other conditions the geometry quantum discord becomes larger than that in any subcomponent, i.e., the increase or the production of the long-range quantum correlation is at the cost of reducing the short-range quantum correlation compared to the corresponding classical correlation and total correlation having no any characteristic of regulation. For nearest-neighbor and next-nearest-neighbor all the correlations take their maximal values at the same points, while for other conditions no whether for spacing same spin number or for different spacing spin numbers all the correlations taking their maximal values are respectively at different points which are very close. We believe that our work is helpful to comprehensively and deeply understand the organization and structure of quantum correlation especially for long-range quantum correlation of quantum many-body systems; and further helpful for the classification, the depiction and the measure of quantum correlation of quantum many-body systems.
A semi-supervised classification algorithm using the TAD-derived background as training data
NASA Astrophysics Data System (ADS)
Fan, Lei; Ambeau, Brittany; Messinger, David W.
2013-05-01
In general, spectral image classification algorithms fall into one of two categories: supervised and unsupervised. In unsupervised approaches, the algorithm automatically identifies clusters in the data without a priori information about those clusters (except perhaps the expected number of them). Supervised approaches require an analyst to identify training data to learn the characteristics of the clusters such that they can then classify all other pixels into one of the pre-defined groups. The classification algorithm presented here is a semi-supervised approach based on the Topological Anomaly Detection (TAD) algorithm. The TAD algorithm defines background components based on a mutual k-Nearest Neighbor graph model of the data, along with a spectral connected components analysis. Here, the largest components produced by TAD are used as regions of interest (ROI's),or training data for a supervised classification scheme. By combining those ROI's with a Gaussian Maximum Likelihood (GML) or a Minimum Distance to the Mean (MDM) algorithm, we are able to achieve a semi supervised classification method. We test this classification algorithm against data collected by the HyMAP sensor over the Cooke City, MT area and University of Pavia scene.
Data mining of text as a tool in authorship attribution
NASA Astrophysics Data System (ADS)
Visa, Ari J. E.; Toivonen, Jarmo; Autio, Sami; Maekinen, Jarno; Back, Barbro; Vanharanta, Hannu
2001-03-01
It is common that text documents are characterized and classified by keywords that the authors use to give them. Visa et al. have developed a new methodology based on prototype matching. The prototype is an interesting document or a part of an extracted, interesting text. This prototype is matched with the document database of the monitored document flow. The new methodology is capable of extracting the meaning of the document in a certain degree. Our claim is that the new methodology is also capable of authenticating the authorship. To verify this claim two tests were designed. The test hypothesis was that the words and the word order in the sentences could authenticate the author. In the first test three authors were selected. The selected authors were William Shakespeare, Edgar Allan Poe, and George Bernard Shaw. Three texts from each author were examined. Every text was one by one used as a prototype. The two nearest matches with the prototype were noted. The second test uses the Reuters-21578 financial news database. A group of 25 short financial news reports from five different authors are examined. Our new methodology and the interesting results from the two tests are reported in this paper. In the first test, for Shakespeare and for Poe all cases were successful. For Shaw one text was confused with Poe. In the second test the Reuters-21578 financial news were identified by the author relatively well. The resolution is that our text mining methodology seems to be capable of authorship attribution.
Comparative Analysis of Document level Text Classification Algorithms using R
NASA Astrophysics Data System (ADS)
Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.
2017-08-01
From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
Emotional State Classification in Virtual Reality Using Wearable Electroencephalography
NASA Astrophysics Data System (ADS)
Suhaimi, N. S.; Teo, J.; Mountstephens, J.
2018-03-01
This paper presents the classification of emotions on EEG signals. One of the key issues in this research is the lack of mental classification using VR as the medium to stimulate emotion. The approach towards this research is by using K-nearest neighbor (KNN) and Support Vector Machine (SVM). Firstly, each of the participant will be required to wear the EEG headset and recording their brainwaves when they are immersed inside the VR. The data points are then marked if they showed any physical signs of emotion or by observing the brainwave pattern. Secondly, the data will then be tested and trained with KNN and SVM algorithms. The accuracy achieved from both methods were approximately 82% throughout the brainwave spectrum (α, β, γ, δ, θ). These methods showed promising results and will be further enhanced using other machine learning approaches in VR stimulus.
New nonlinear features for inspection, robotics, and face recognition
NASA Astrophysics Data System (ADS)
Casasent, David P.; Talukder, Ashit
1999-10-01
Classification of real-time X-ray images of randomly oriented touching pistachio nuts is discussed. The ultimate objective is the development of a system for automated non- invasive detection of defective product items on a conveyor belt. We discuss the extraction of new features that allow better discrimination between damaged and clean items (pistachio nuts). This feature extraction and classification stage is the new aspect of this paper; our new maximum representation and discriminating feature (MRDF) extraction method computes nonlinear features that are used as inputs to a new modified k nearest neighbor classifier. In this work, the MRDF is applied to standard features (rather than iconic data). The MRDF is robust to various probability distributions of the input class and is shown to provide good classification and new ROC (receiver operating characteristic) data. Other applications of these new feature spaces in robotics and face recognition are also noted.
E-Nose Vapor Identification Based on Dempster-Shafer Fusion of Multiple Classifiers
NASA Technical Reports Server (NTRS)
Li, Winston; Leung, Henry; Kwan, Chiman; Linnell, Bruce R.
2005-01-01
Electronic nose (e-nose) vapor identification is an efficient approach to monitor air contaminants in space stations and shuttles in order to ensure the health and safety of astronauts. Data preprocessing (measurement denoising and feature extraction) and pattern classification are important components of an e-nose system. In this paper, a wavelet-based denoising method is applied to filter the noisy sensor measurements. Transient-state features are then extracted from the denoised sensor measurements, and are used to train multiple classifiers such as multi-layer perceptions (MLP), support vector machines (SVM), k nearest neighbor (KNN), and Parzen classifier. The Dempster-Shafer (DS) technique is used at the end to fuse the results of the multiple classifiers to get the final classification. Experimental analysis based on real vapor data shows that the wavelet denoising method can remove both random noise and outliers successfully, and the classification rate can be improved by using classifier fusion.
Vrooman, Henri A; Cocosco, Chris A; van der Lijn, Fedde; Stokking, Rik; Ikram, M Arfan; Vernooij, Meike W; Breteler, Monique M B; Niessen, Wiro J
2007-08-01
Conventional k-Nearest-Neighbor (kNN) classification, which has been successfully applied to classify brain tissue in MR data, requires training on manually labeled subjects. This manual labeling is a laborious and time-consuming procedure. In this work, a new fully automated brain tissue classification procedure is presented, in which kNN training is automated. This is achieved by non-rigidly registering the MR data with a tissue probability atlas to automatically select training samples, followed by a post-processing step to keep the most reliable samples. The accuracy of the new method was compared to rigid registration-based training and to conventional kNN-based segmentation using training on manually labeled subjects for segmenting gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) in 12 data sets. Furthermore, for all classification methods, the performance was assessed when varying the free parameters. Finally, the robustness of the fully automated procedure was evaluated on 59 subjects. The automated training method using non-rigid registration with a tissue probability atlas was significantly more accurate than rigid registration. For both automated training using non-rigid registration and for the manually trained kNN classifier, the difference with the manual labeling by observers was not significantly larger than inter-observer variability for all tissue types. From the robustness study, it was clear that, given an appropriate brain atlas and optimal parameters, our new fully automated, non-rigid registration-based method gives accurate and robust segmentation results. A similarity index was used for comparison with manually trained kNN. The similarity indices were 0.93, 0.92 and 0.92, for CSF, GM and WM, respectively. It can be concluded that our fully automated method using non-rigid registration may replace manual segmentation, and thus that automated brain tissue segmentation without laborious manual training is feasible.
Mannil, Manoj; von Spiczak, Jochen; Manka, Robert; Alkadhi, Hatem
2018-06-01
The aim of this study was to test whether texture analysis and machine learning enable the detection of myocardial infarction (MI) on non-contrast-enhanced low radiation dose cardiac computed tomography (CCT) images. In this institutional review board-approved retrospective study, we included non-contrast-enhanced electrocardiography-gated low radiation dose CCT image data (effective dose, 0.5 mSv) acquired for the purpose of calcium scoring of 27 patients with acute MI (9 female patients; mean age, 60 ± 12 years), 30 patients with chronic MI (8 female patients; mean age, 68 ± 13 years), and in 30 subjects (9 female patients; mean age, 44 ± 6 years) without cardiac abnormality, hereafter termed controls. Texture analysis of the left ventricle was performed using free-hand regions of interest, and texture features were classified twice (Model I: controls versus acute MI versus chronic MI; Model II: controls versus acute and chronic MI). For both classifications, 6 commonly used machine learning classifiers were used: decision tree C4.5 (J48), k-nearest neighbors, locally weighted learning, RandomForest, sequential minimal optimization, and an artificial neural network employing deep learning. In addition, 2 blinded, independent readers visually assessed noncontrast CCT images for the presence or absence of MI. In Model I, best classification results were obtained using the k-nearest neighbors classifier (sensitivity, 69%; specificity, 85%; false-positive rate, 0.15). In Model II, the best classification results were found with the locally weighted learning classification (sensitivity, 86%; specificity, 81%; false-positive rate, 0.19) with an area under the curve from receiver operating characteristics analysis of 0.78. In comparison, both readers were not able to identify MI in any of the noncontrast, low radiation dose CCT images. This study indicates the ability of texture analysis and machine learning in detecting MI on noncontrast low radiation dose CCT images being not visible for the radiologists' eye.
Application of the 1:2,000,000-scale data base: A National Atlas sectional prototype
Dixon, Donna M.
1985-01-01
A study of the potential to produce a National Atlas sectional prototype from the 1:2,000,000-scale data base was concluded recently by the National Mapping Division, U. S. Geological Survey. This paper discusses the specific digital cartographic production procedures involved in the preparation of the prototype map, as well as the theoretical and practical cartographic framework for the study. Such items as data organization, data classification, digital techniques, data conversions, and modification of traditional design specifications for an automated environment are discussed. The bulk of the cartographic work for the production of the prototype was carried out in raster format on the Scitex Response-250 mapping system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2006-10-25
The purpose of the eXtended MetaData Registry (XMDR) prototype is to demonstrate the feasibility and utility of constructing an extended metadata registry, i.e., one which encompasses richer classification support, facilities for including terminologies, and better support for formal specification of semantics. The prototype registry will also serve as a reference implementation for the revised versions of ISO 11179, Parts 2 and 3 to help guide production implementations.
Derrac, Joaquín; Triguero, Isaac; Garcia, Salvador; Herrera, Francisco
2012-10-01
Cooperative coevolution is a successful trend of evolutionary computation which allows us to define partitions of the domain of a given problem, or to integrate several related techniques into one, by the use of evolutionary algorithms. It is possible to apply it to the development of advanced classification methods, which integrate several machine learning techniques into a single proposal. A novel approach integrating instance selection, instance weighting, and feature weighting into the framework of a coevolutionary model is presented in this paper. We compare it with a wide range of evolutionary and nonevolutionary related methods, in order to show the benefits of the employment of coevolution to apply the techniques considered simultaneously. The results obtained, contrasted through nonparametric statistical tests, show that our proposal outperforms other methods in the comparison, thus becoming a suitable tool in the task of enhancing the nearest neighbor classifier.
Implementation of Nearest Neighbor using HSV to Identify Skin Disease
NASA Astrophysics Data System (ADS)
Gerhana, Y. A.; Zulfikar, W. B.; Ramdani, A. H.; Ramdhani, M. A.
2018-01-01
Today, Android is one of the most widely used operating system in the world. Most of android device has a camera that could capture an image, this feature could be optimized to identify skin disease. The disease is one of health problem caused by bacterium, fungi, and virus. The symptoms of skin disease usually visible. In this work, the symptoms that captured as image contains HSV in every pixel of the image. HSV can extracted and then calculate to earn euclidean value. The value compared using nearest neighbor algorithm to discover closer value between image testing and image training to get highest value that decide class label or type of skin disease. The testing result show that 166 of 200 or about 80% is accurate. There are some reasons that influence the result of classification model like number of image training and quality of android device’s camera.
Moga, Tudor Voicu; Popescu, Alina; Sporea, Ioan; Danila, Mirela; David, Ciprian; Gui, Vasile; Iacob, Nicoleta; Miclaus, Gratian; Sirli, Roxana
2017-08-23
Contrast enhanced ultrasound (CEUS) improved the characterization of focal liver lesions (FLLs), but is an operatordependent method. The goal of this paper was to test a computer assisted diagnosis (CAD) prototype and to see its benefit in assisting a beginner in the evaluation of FLLs. Our cohort included 97 good quality CEUS videos[34% hepatocellular carcinomas (HCC), 12.3% hypervascular metastases (HiperM), 11.3% hypovascular metastases (HipoM), 24.7% hemangiomas (HMG), 17.5% focal nodular hyperplasia (FNH)] that were used to develop a CAD prototype based on an algorithm that tested a binary decision based classifier. Two young medical doctors (1 year CEUS experience), two experts and the CAD prototype, reevaluated 50 FLLs CEUS videos (diagnosis of benign vs. malignant) first blinded to clinical data, in order to evaluate the diagnostic gap beginner vs. expert. The CAD classifier managed a 75.2% overall (benign vs. malignant) correct classification rate. The overall classification rates for the evaluators, before and after clinical data were: first beginner-78%; 94%; second beginner-82%; 96%; first expert-94%; 100%; second expert-96%; 98%. For both beginners, the malignant vs. benign diagnosis significantly improved after knowing the clinical data (p=0.005; p=0,008). The expert was better than the beginner (p=0.04) and better than the CAD (p=0.001). CAD in addition to the beginner can reach the expert diagnosis. The most frequent lesions misdiagnosed at CEUS were FNH and HCC. The CAD prototype is a good comparing tool for a beginner operator that can be developed to assist the diagnosis. In order to increase the classification rate, the CAD system for FLL in CEUS must integrate the clinical data.
Interactive classification and content-based retrieval of tissue images
NASA Astrophysics Data System (ADS)
Aksoy, Selim; Marchisio, Giovanni B.; Tusk, Carsten; Koperski, Krzysztof
2002-11-01
We describe a system for interactive classification and retrieval of microscopic tissue images. Our system models tissues in pixel, region and image levels. Pixel level features are generated using unsupervised clustering of color and texture values. Region level features include shape information and statistics of pixel level feature values. Image level features include statistics and spatial relationships of regions. To reduce the gap between low-level features and high-level expert knowledge, we define the concept of prototype regions. The system learns the prototype regions in an image collection using model-based clustering and density estimation. Different tissue types are modeled using spatial relationships of these regions. Spatial relationships are represented by fuzzy membership functions. The system automatically selects significant relationships from training data and builds models which can also be updated using user relevance feedback. A Bayesian framework is used to classify tissues based on these models. Preliminary experiments show that the spatial relationship models we developed provide a flexible and powerful framework for classification and retrieval of tissue images.
Mediterranean Land Use and Land Cover Classification Assessment Using High Spatial Resolution Data
NASA Astrophysics Data System (ADS)
Elhag, Mohamed; Boteva, Silvena
2016-10-01
Landscape fragmentation is noticeably practiced in Mediterranean regions and imposes substantial complications in several satellite image classification methods. To some extent, high spatial resolution data were able to overcome such complications. For better classification performances in Land Use Land Cover (LULC) mapping, the current research adopts different classification methods comparison for LULC mapping using Sentinel-2 satellite as a source of high spatial resolution. Both of pixel-based and an object-based classification algorithms were assessed; the pixel-based approach employs Maximum Likelihood (ML), Artificial Neural Network (ANN) algorithms, Support Vector Machine (SVM), and, the object-based classification uses the Nearest Neighbour (NN) classifier. Stratified Masking Process (SMP) that integrates a ranking process within the classes based on spectral fluctuation of the sum of the training and testing sites was implemented. An analysis of the overall and individual accuracy of the classification results of all four methods reveals that the SVM classifier was the most efficient overall by distinguishing most of the classes with the highest accuracy. NN succeeded to deal with artificial surface classes in general while agriculture area classes, and forest and semi-natural area classes were segregated successfully with SVM. Furthermore, a comparative analysis indicates that the conventional classification method yielded better accuracy results than the SMP method overall with both classifiers used, ML and SVM.
Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi
2017-11-02
Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.
Yang, Jiaojiao; Guo, Qian; Li, Wenjie; Wang, Suhong; Zou, Ling
2016-04-01
This paper aims to assist the individual clinical diagnosis of children with attention-deficit/hyperactivity disorder using electroencephalogram signal detection method.Firstly,in our experiments,we obtained and studied the electroencephalogram signals from fourteen attention-deficit/hyperactivity disorder children and sixteen typically developing children during the classic interference control task of Simon-spatial Stroop,and we completed electroencephalogram data preprocessing including filtering,segmentation,removal of artifacts and so on.Secondly,we selected the subset electroencephalogram electrodes using principal component analysis(PCA)method,and we collected the common channels of the optimal electrodes which occurrence rates were more than 90%in each kind of stimulation.We then extracted the latency(200~450ms)mean amplitude features of the common electrodes.Finally,we used the k-nearest neighbor(KNN)classifier based on Euclidean distance and the support vector machine(SVM)classifier based on radial basis kernel function to classify.From the experiment,at the same kind of interference control task,the attention-deficit/hyperactivity disorder children showed lower correct response rates and longer reaction time.The N2 emerged in prefrontal cortex while P2 presented in the inferior parietal area when all kinds of stimuli demonstrated.Meanwhile,the children with attention-deficit/hyperactivity disorder exhibited markedly reduced N2 and P2amplitude compared to typically developing children.KNN resulted in better classification accuracy than SVM classifier,and the best classification rate was 89.29%in StI task.The results showed that the electroencephalogram signals were different in the brain regions of prefrontal cortex and inferior parietal cortex between attention-deficit/hyperactivity disorder and typically developing children during the interference control task,which provided a scientific basis for the clinical diagnosis of attention-deficit/hyperactivity disorder individuals.
NASA Astrophysics Data System (ADS)
Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying
2018-06-01
In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.
Inspection of wear particles in oils by using a fuzzy classifier
NASA Astrophysics Data System (ADS)
Hamalainen, Jari J.; Enwald, Petri
1994-11-01
The reliability of stand-alone machines and larger production units can be improved by automated condition monitoring. Analysis of wear particles in lubricating or hydraulic oils helps diagnosing the wear states of machine parts. This paper presents a computer vision system for automated classification of wear particles. Digitized images from experiments with a bearing test bench, a hydraulic system with an industrial company, and oil samples from different industrial sources were used for algorithm development and testing. The wear particles were divided into four classes indicating different wear mechanisms: cutting wear, fatigue wear, adhesive wear, and abrasive wear. The results showed that the fuzzy K-nearest neighbor classifier utilized gave the same distribution of wear particles as the classification by a human expert.
Clustering Of Left Ventricular Wall Motion Patterns
NASA Astrophysics Data System (ADS)
Bjelogrlic, Z.; Jakopin, J.; Gyergyek, L.
1982-11-01
A method for detection of wall regions with similar motion was presented. A model based on local direction information was used to measure the left ventricular wall motion from cineangiographic sequence. Three time functions were used to define segmental motion patterns: distance of a ventricular contour segment from the mean contour, the velocity of a segment and its acceleration. Motion patterns were clustered by the UPGMA algorithm and by an algorithm based on K-nearest neighboor classification rule.
Automatic Recognition of Breathing Route During Sleep Using Snoring Sounds
NASA Astrophysics Data System (ADS)
Mikami, Tsuyoshi; Kojima, Yohichiro
This letter classifies snoring sounds into three breathing routes (oral, nasal, and oronasal) with discriminant analysis of the power spectra and k-nearest neighbor method. It is necessary to recognize breathing route during snoring, because oral snoring is a typical symptom of sleep apnea but we cannot know our own breathing and snoring condition during sleep. As a result, about 98.8% classification rate is obtained by using leave-one-out test for performance evaluation.
NASA Astrophysics Data System (ADS)
Taha, Z.; Razman, M. A. M.; Ghani, A. S. Abdul; Majeed, A. P. P. Abdul; Musa, R. M.; Adnan, F. A.; Sallehudin, M. F.; Mukai, Y.
2018-04-01
Fish Hunger behaviour is essential in determining the fish feeding routine, particularly for fish farmers. The inability to provide accurate feeding routines (under-feeding or over-feeding) may lead the death of the fish and consequently inhibits the quantity of the fish produced. Moreover, the excessive food that is not consumed by the fish will be dissolved in the water and accordingly reduce the water quality through the reduction of oxygen quantity. This problem also leads the death of the fish or even spur fish diseases. In the present study, a correlation of Barramundi fish-school behaviour with hunger condition through the hybrid data integration of image processing technique is established. The behaviour is clustered with respect to the position of the school size as well as the school density of the fish before feeding, during feeding and after feeding. The clustered fish behaviour is then classified through k-Nearest Neighbour (k-NN) learning algorithm. Three different variations of the algorithm namely cosine, cubic and weighted are assessed on its ability to classify the aforementioned fish hunger behaviour. It was found from the study that the weighted k-NN variation provides the best classification with an accuracy of 86.5%. Therefore, it could be concluded that the proposed integration technique may assist fish farmers in ascertaining fish feeding routine.
Streamflow variability and classification using false nearest neighbor method
NASA Astrophysics Data System (ADS)
Vignesh, R.; Jothiprakash, V.; Sivakumar, B.
2015-12-01
Understanding regional streamflow dynamics and patterns continues to be a challenging problem. The present study introduces the false nearest neighbor (FNN) algorithm, a nonlinear dynamic-based method, to examine the spatial variability of streamflow over a region. The FNN method is a dimensionality-based approach, where the dimension of the time series represents its variability. The method uses phase space reconstruction and nearest neighbor concepts, and identifies false neighbors in the reconstructed phase space. The FNN method is applied to monthly streamflow data monitored over a period of 53 years (1950-2002) in an extensive network of 639 stations in the contiguous United States (US). Since selection of delay time in phase space reconstruction may influence the FNN outcomes, analysis is carried out for five different delay time values: monthly, seasonal, and annual separation of data as well as delay time values obtained using autocorrelation function (ACF) and average mutual information (AMI) methods. The FNN dimensions for the 639 streamflow series are generally identified to range from 4 to 12 (with very few exceptional cases), indicating a wide range of variability in the dynamics of streamflow across the contiguous US. However, the FNN dimensions for a majority of the streamflow series are found to be low (less than or equal to 6), suggesting low level of complexity in streamflow dynamics in most of the individual stations and over many sub-regions. The FNN dimension estimates also reveal that streamflow dynamics in the western parts of the US (including far west, northwestern, and southwestern parts) generally exhibit much greater variability compared to that in the eastern parts of the US (including far east, northeastern, and southeastern parts), although there are also differences among 'pockets' within these regions. These results are useful for identification of appropriate model complexity at individual stations, patterns across regions and sub-regions, interpolation and extrapolation of data, and catchment classification. An attempt is also made to relate the FNN dimensions with catchment characteristics and streamflow statistical properties.
Tabu search and binary particle swarm optimization for feature selection using microarray data.
Chuang, Li-Yeh; Yang, Cheng-Huei; Yang, Cheng-Hong
2009-12-01
Gene expression profiles have great potential as a medical diagnosis tool because they represent the state of a cell at the molecular level. In the classification of cancer type research, available training datasets generally have a fairly small sample size compared to the number of genes involved. This fact poses an unprecedented challenge to some classification methodologies due to training data limitations. Therefore, a good selection method for genes relevant for sample classification is needed to improve the predictive accuracy, and to avoid incomprehensibility due to the large number of genes investigated. In this article, we propose to combine tabu search (TS) and binary particle swarm optimization (BPSO) for feature selection. BPSO acts as a local optimizer each time the TS has been run for a single generation. The K-nearest neighbor method with leave-one-out cross-validation and support vector machine with one-versus-rest serve as evaluators of the TS and BPSO. The proposed method is applied and compared to the 11 classification problems taken from the literature. Experimental results show that our method simplifies features effectively and either obtains higher classification accuracy or uses fewer features compared to other feature selection methods.
An Ant Colony Optimization Based Feature Selection for Web Page Classification
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. PMID:25136678
3D texture analysis for classification of second harmonic generation images of human ovarian cancer
NASA Astrophysics Data System (ADS)
Wen, Bruce; Campbell, Kirby R.; Tilbury, Karissa; Nadiarnykh, Oleg; Brewer, Molly A.; Patankar, Manish; Singh, Vikas; Eliceiri, Kevin. W.; Campagnola, Paul J.
2016-10-01
Remodeling of the collagen architecture in the extracellular matrix (ECM) has been implicated in ovarian cancer. To quantify these alterations we implemented a form of 3D texture analysis to delineate the fibrillar morphology observed in 3D Second Harmonic Generation (SHG) microscopy image data of normal (1) and high risk (2) ovarian stroma, benign ovarian tumors (3), low grade (4) and high grade (5) serous tumors, and endometrioid tumors (6). We developed a tailored set of 3D filters which extract textural features in the 3D image sets to build (or learn) statistical models of each tissue class. By applying k-nearest neighbor classification using these learned models, we achieved 83-91% accuracies for the six classes. The 3D method outperformed the analogous 2D classification on the same tissues, where we suggest this is due the increased information content. This classification based on ECM structural changes will complement conventional classification based on genetic profiles and can serve as an additional biomarker. Moreover, the texture analysis algorithm is quite general, as it does not rely on single morphological metrics such as fiber alignment, length, and width but their combined convolution with a customizable basis set.
Mohammed, Ameer; Zamani, Majid; Bayford, Richard; Demosthenous, Andreas
2017-12-01
In Parkinson's disease (PD), on-demand deep brain stimulation is required so that stimulation is regulated to reduce side effects resulting from continuous stimulation and PD exacerbation due to untimely stimulation. Also, the progressive nature of PD necessitates the use of dynamic detection schemes that can track the nonlinearities in PD. This paper proposes the use of dynamic feature extraction and dynamic pattern classification to achieve dynamic PD detection taking into account the demand for high accuracy, low computation, and real-time detection. The dynamic feature extraction and dynamic pattern classification are selected by evaluating a subset of feature extraction, dimensionality reduction, and classification algorithms that have been used in brain-machine interfaces. A novel dimensionality reduction technique, the maximum ratio method (MRM) is proposed, which provides the most efficient performance. In terms of accuracy and complexity for hardware implementation, a combination having discrete wavelet transform for feature extraction, MRM for dimensionality reduction, and dynamic k-nearest neighbor for classification was chosen as the most efficient. It achieves a classification accuracy of 99.29%, an F1-score of 97.90%, and a choice probability of 99.86%.
NASA Astrophysics Data System (ADS)
Omenzetter, Piotr; de Lautour, Oliver R.
2010-04-01
Developed for studying long, periodic records of various measured quantities, time series analysis methods are inherently suited and offer interesting possibilities for Structural Health Monitoring (SHM) applications. However, their use in SHM can still be regarded as an emerging application and deserves more studies. In this research, Autoregressive (AR) models were used to fit experimental acceleration time histories from two experimental structural systems, a 3- storey bookshelf-type laboratory structure and the ASCE Phase II SHM Benchmark Structure, in healthy and several damaged states. The coefficients of the AR models were chosen as damage sensitive features. Preliminary visual inspection of the large, multidimensional sets of AR coefficients to check the presence of clusters corresponding to different damage severities was achieved using Sammon mapping - an efficient nonlinear data compression technique. Systematic classification of damage into states based on the analysis of the AR coefficients was achieved using two supervised classification techniques: Nearest Neighbor Classification (NNC) and Learning Vector Quantization (LVQ), and one unsupervised technique: Self-organizing Maps (SOM). This paper discusses the performance of AR coefficients as damage sensitive features and compares the efficiency of the three classification techniques using experimental data.
Deep feature extraction and combination for synthetic aperture radar target classification
NASA Astrophysics Data System (ADS)
Amrani, Moussa; Jiang, Feng
2017-10-01
Feature extraction has always been a difficult problem in the classification performance of synthetic aperture radar automatic target recognition (SAR-ATR). It is very important to select discriminative features to train a classifier, which is a prerequisite. Inspired by the great success of convolutional neural network (CNN), we address the problem of SAR target classification by proposing a feature extraction method, which takes advantage of exploiting the extracted deep features from CNNs on SAR images to introduce more powerful discriminative features and robust representation ability for them. First, the pretrained VGG-S net is fine-tuned on moving and stationary target acquisition and recognition (MSTAR) public release database. Second, after a simple preprocessing is performed, the fine-tuned network is used as a fixed feature extractor to extract deep features from the processed SAR images. Third, the extracted deep features are fused by using a traditional concatenation and a discriminant correlation analysis algorithm. Finally, for target classification, K-nearest neighbors algorithm based on LogDet divergence-based metric learning triplet constraints is adopted as a baseline classifier. Experiments on MSTAR are conducted, and the classification accuracy results demonstrate that the proposed method outperforms the state-of-the-art methods.
Mapping growing stock volume and forest live biomass: a case study of the Polissya region of Ukraine
NASA Astrophysics Data System (ADS)
Bilous, Andrii; Myroniuk, Viktor; Holiaka, Dmytrii; Bilous, Svitlana; See, Linda; Schepaschenko, Dmitry
2017-10-01
Forest inventory and biomass mapping are important tasks that require inputs from multiple data sources. In this paper we implement two methods for the Ukrainian region of Polissya: random forest (RF) for tree species prediction and k-nearest neighbors (k-NN) for growing stock volume and biomass mapping. We examined the suitability of the five-band RapidEye satellite image to predict the distribution of six tree species. The accuracy of RF is quite high: ~99% for forest/non-forest mask and 89% for tree species prediction. Our results demonstrate that inclusion of elevation as a predictor variable in the RF model improved the performance of tree species classification. We evaluated different distance metrics for the k-NN method, including Euclidean or Mahalanobis distance, most similar neighbor (MSN), gradient nearest neighbor, and independent component analysis. The MSN with the four nearest neighbors (k = 4) is the most precise (according to the root-mean-square deviation) for predicting forest attributes across the study area. The k-NN method allowed us to estimate growing stock volume with an accuracy of 3 m3 ha-1 and for live biomass of about 2 t ha-1 over the study area.
Wen, Tingxi; Zhang, Zhongnan
2017-01-01
Abstract In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy. PMID:28489789
A False Alarm Reduction Method for a Gas Sensor Based Electronic Nose
Rahman, Mohammad Mizanur; Suksompong, Prapun; Toochinda, Pisanu; Taparugssanagorn, Attaphongse
2017-01-01
Electronic noses (E-Noses) are becoming popular for food and fruit quality assessment due to their robustness and repeated usability without fatigue, unlike human experts. An E-Nose equipped with classification algorithms and having open ended classification boundaries such as the k-nearest neighbor (k-NN), support vector machine (SVM), and multilayer perceptron neural network (MLPNN), are found to suffer from false classification errors of irrelevant odor data. To reduce false classification and misclassification errors, and to improve correct rejection performance; algorithms with a hyperspheric boundary, such as a radial basis function neural network (RBFNN) and generalized regression neural network (GRNN) with a Gaussian activation function in the hidden layer should be used. The simulation results presented in this paper show that GRNN has more correct classification efficiency and false alarm reduction capability compared to RBFNN. As the design of a GRNN and RBFNN is complex and expensive due to large numbers of neuron requirements, a simple hyperspheric classification method based on minimum, maximum, and mean (MMM) values of each class of the training dataset was presented. The MMM algorithm was simple and found to be fast and efficient in correctly classifying data of training classes, and correctly rejecting data of extraneous odors, and thereby reduced false alarms. PMID:28895910
A False Alarm Reduction Method for a Gas Sensor Based Electronic Nose.
Rahman, Mohammad Mizanur; Charoenlarpnopparut, Chalie; Suksompong, Prapun; Toochinda, Pisanu; Taparugssanagorn, Attaphongse
2017-09-12
Electronic noses (E-Noses) are becoming popular for food and fruit quality assessment due to their robustness and repeated usability without fatigue, unlike human experts. An E-Nose equipped with classification algorithms and having open ended classification boundaries such as the k -nearest neighbor ( k -NN), support vector machine (SVM), and multilayer perceptron neural network (MLPNN), are found to suffer from false classification errors of irrelevant odor data. To reduce false classification and misclassification errors, and to improve correct rejection performance; algorithms with a hyperspheric boundary, such as a radial basis function neural network (RBFNN) and generalized regression neural network (GRNN) with a Gaussian activation function in the hidden layer should be used. The simulation results presented in this paper show that GRNN has more correct classification efficiency and false alarm reduction capability compared to RBFNN. As the design of a GRNN and RBFNN is complex and expensive due to large numbers of neuron requirements, a simple hyperspheric classification method based on minimum, maximum, and mean (MMM) values of each class of the training dataset was presented. The MMM algorithm was simple and found to be fast and efficient in correctly classifying data of training classes, and correctly rejecting data of extraneous odors, and thereby reduced false alarms.
Wen, Tingxi; Zhang, Zhongnan
2017-05-01
In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.
A neural network approach to cloud classification
NASA Technical Reports Server (NTRS)
Lee, Jonathan; Weger, Ronald C.; Sengupta, Sailes K.; Welch, Ronald M.
1990-01-01
It is shown that, using high-spatial-resolution data, very high cloud classification accuracies can be obtained with a neural network approach. A texture-based neural network classifier using only single-channel visible Landsat MSS imagery achieves an overall cloud identification accuracy of 93 percent. Cirrus can be distinguished from boundary layer cloudiness with an accuracy of 96 percent, without the use of an infrared channel. Stratocumulus is retrieved with an accuracy of 92 percent, cumulus at 90 percent. The use of the neural network does not improve cirrus classification accuracy. Rather, its main effect is in the improved separation between stratocumulus and cumulus cloudiness. While most cloud classification algorithms rely on linear parametric schemes, the present study is based on a nonlinear, nonparametric four-layer neural network approach. A three-layer neural network architecture, the nonparametric K-nearest neighbor approach, and the linear stepwise discriminant analysis procedure are compared. A significant finding is that significantly higher accuracies are attained with the nonparametric approaches using only 20 percent of the database as training data, compared to 67 percent of the database in the linear approach.
NASA Astrophysics Data System (ADS)
Sokolov, Anton; Gengembre, Cyril; Dmitriev, Egor; Delbarre, Hervé
2017-04-01
The problem is considered of classification of local atmospheric meteorological events in the coastal area such as sea breezes, fogs and storms. The in-situ meteorological data as wind speed and direction, temperature, humidity and turbulence are used as predictors. Local atmospheric events of 2013-2014 were analysed manually to train classification algorithms in the coastal area of English Channel in Dunkirk (France). Then, ultrasonic anemometer data and LIDAR wind profiler data were used as predictors. A few algorithms were applied to determine meteorological events by local data such as a decision tree, the nearest neighbour classifier, a support vector machine. The comparison of classification algorithms was carried out, the most important predictors for each event type were determined. It was shown that in more than 80 percent of the cases machine learning algorithms detect the meteorological class correctly. We expect that this methodology could be applied also to classify events by climatological in-situ data or by modelling data. It allows estimating frequencies of each event in perspective of climate change.
NASA Astrophysics Data System (ADS)
Uzbaş, Betül; Arslan, Ahmet
2018-04-01
Gender is an important step for human computer interactive processes and identification. Human face image is one of the important sources to determine gender. In the present study, gender classification is performed automatically from facial images. In order to classify gender, we propose a combination of features that have been extracted face, eye and lip regions by using a hybrid method of Local Binary Pattern and Gray-Level Co-Occurrence Matrix. The features have been extracted from automatically obtained face, eye and lip regions. All of the extracted features have been combined and given as input parameters to classification methods (Support Vector Machine, Artificial Neural Networks, Naive Bayes and k-Nearest Neighbor methods) for gender classification. The Nottingham Scan face database that consists of the frontal face images of 100 people (50 male and 50 female) is used for this purpose. As the result of the experimental studies, the highest success rate has been achieved as 98% by using Support Vector Machine. The experimental results illustrate the efficacy of our proposed method.
Comparisons and Selections of Features and Classifiers for Short Text Classification
NASA Astrophysics Data System (ADS)
Wang, Ye; Zhou, Zhi; Jin, Shan; Liu, Debin; Lu, Mi
2017-10-01
Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.
Attention Recognition in EEG-Based Affective Learning Research Using CFS+KNN Algorithm.
Hu, Bin; Li, Xiaowei; Sun, Shuting; Ratcliffe, Martyn
2018-01-01
The research detailed in this paper focuses on the processing of Electroencephalography (EEG) data to identify attention during the learning process. The identification of affect using our procedures is integrated into a simulated distance learning system that provides feedback to the user with respect to attention and concentration. The authors propose a classification procedure that combines correlation-based feature selection (CFS) and a k-nearest-neighbor (KNN) data mining algorithm. To evaluate the CFS+KNN algorithm, it was test against CFS+C4.5 algorithm and other classification algorithms. The classification performance was measured 10 times with different 3-fold cross validation data. The data was derived from 10 subjects while they were attempting to learn material in a simulated distance learning environment. A self-assessment model of self-report was used with a single valence to evaluate attention on 3 levels (high, neutral, low). It was found that CFS+KNN had a much better performance, giving the highest correct classification rate (CCR) of % for the valence dimension divided into three classes.
Gunavathi, Chellamuthu; Premalatha, Kandasamy
2014-01-01
Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.
Kharbach, Mourad; Kamal, Rabie; Mansouri, Mohammed Alaoui; Marmouzi, Ilias; Viaene, Johan; Cherrah, Yahia; Alaoui, Katim; Vercammen, Joeri; Bouklouze, Abdelaziz; Vander Heyden, Yvan
2018-10-15
This study investigated the effectiveness of SIFT-MS versus chemical profiling, both coupled to multivariate data analysis, to classify 95 Extra Virgin Argan Oils (EVAO), originating from five Moroccan Argan forest locations. The full scan option of SIFT-MS, is suitable to indicate the geographic origin of EVAO based on the fingerprints obtained using the three chemical ionization precursors (H 3 O + , NO + and O 2 + ). The chemical profiling (including acidity, peroxide value, spectrophotometric indices, fatty acids, tocopherols- and sterols composition) was also used for classification. Partial least squares discriminant analysis (PLS-DA), soft independent modeling of class analogy (SIMCA), K-nearest neighbors (KNN), and support vector machines (SVM), were compared. The SIFT-MS data were therefore fed to variable-selection methods to find potential biomarkers for classification. The classification models based either on chemical profiling or SIFT-MS data were able to classify the samples with high accuracy. SIFT-MS was found to be advantageous for rapid geographic classification. Copyright © 2018 Elsevier Ltd. All rights reserved.
Speaker gender identification based on majority vote classifiers
NASA Astrophysics Data System (ADS)
Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri
2017-03-01
Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.
Performance of resonant radar target identification algorithms using intra-class weighting functions
NASA Astrophysics Data System (ADS)
Mustafa, A.
The use of calibrated resonant-region radar cross section (RCS) measurements of targets for the classification of large aircraft is discussed. Errors in the RCS estimate of full scale aircraft flying over an ocean, introduced by the ionospheric variability and the sea conditions were studied. The Weighted Target Representative (WTR) classification algorithm was developed, implemented, tested and compared with the nearest neighbor (NN) algorithm. The WTR-algorithm has a low sensitivity to the uncertainty in the aspect angle of the unknown target returns. In addition, this algorithm was based on the development of a new catalog of representative data which reduces the storage requirements and increases the computational efficiency of the classification system compared to the NN-algorithm. Experiments were designed to study and evaluate the characteristics of the WTR- and the NN-algorithms, investigate the classifiability of targets and study the relative behavior of the number of misclassifications as a function of the target backscatter features. The classification results and statistics were shown in the form of performance curves, performance tables and confusion tables.
Classification of mathematics deficiency using shape and scale analysis of 3D brain structures
NASA Astrophysics Data System (ADS)
Kurtek, Sebastian; Klassen, Eric; Gore, John C.; Ding, Zhaohua; Srivastava, Anuj
2011-03-01
We investigate the use of a recent technique for shape analysis of brain substructures in identifying learning disabilities in third-grade children. This Riemannian technique provides a quantification of differences in shapes of parameterized surfaces, using a distance that is invariant to rigid motions and re-parameterizations. Additionally, it provides an optimal registration across surfaces for improved matching and comparisons. We utilize an efficient gradient based method to obtain the optimal re-parameterizations of surfaces. In this study we consider 20 different substructures in the human brain and correlate the differences in their shapes with abnormalities manifested in deficiency of mathematical skills in 106 subjects. The selection of these structures is motivated in part by the past links between their shapes and cognitive skills, albeit in broader contexts. We have studied the use of both individual substructures and multiple structures jointly for disease classification. Using a leave-one-out nearest neighbor classifier, we obtained a 62.3% classification rate based on the shape of the left hippocampus. The use of multiple structures resulted in an improved classification rate of 71.4%.
An unbalanced spectra classification method based on entropy
NASA Astrophysics Data System (ADS)
Liu, Zhong-bao; Zhao, Wen-juan
2017-05-01
How to solve the problem of distinguishing the minority spectra from the majority of the spectra is quite important in astronomy. In view of this, an unbalanced spectra classification method based on entropy (USCM) is proposed in this paper to deal with the unbalanced spectra classification problem. USCM greatly improves the performances of the traditional classifiers on distinguishing the minority spectra as it takes the data distribution into consideration in the process of classification. However, its time complexity is exponential with the training size, and therefore, it can only deal with the problem of small- and medium-scale classification. How to solve the large-scale classification problem is quite important to USCM. It can be easily obtained by mathematical computation that the dual form of USCM is equivalent to the minimum enclosing ball (MEB), and core vector machine (CVM) is introduced, USCM based on CVM is proposed to deal with the large-scale classification problem. Several comparative experiments on the 4 subclasses of K-type spectra, 3 subclasses of F-type spectra and 3 subclasses of G-type spectra from Sloan Digital Sky Survey (SDSS) verify USCM and USCM based on CVM perform better than kNN (k nearest neighbor) and SVM (support vector machine) in dealing with the problem of rare spectra mining respectively on the small- and medium-scale datasets and the large-scale datasets.
Machine Learning Methods for Production Cases Analysis
NASA Astrophysics Data System (ADS)
Mokrova, Nataliya V.; Mokrov, Alexander M.; Safonova, Alexandra V.; Vishnyakov, Igor V.
2018-03-01
Approach to analysis of events occurring during the production process were proposed. Described machine learning system is able to solve classification tasks related to production control and hazard identification at an early stage. Descriptors of the internal production network data were used for training and testing of applied models. k-Nearest Neighbors and Random forest methods were used to illustrate and analyze proposed solution. The quality of the developed classifiers was estimated using standard statistical metrics, such as precision, recall and accuracy.
Consensus Classification Using Non-Optimized Classifiers.
Brownfield, Brett; Lemos, Tony; Kalivas, John H
2018-04-03
Classifying samples into categories is a common problem in analytical chemistry and other fields. Classification is usually based on only one method, but numerous classifiers are available with some being complex, such as neural networks, and others are simple, such as k nearest neighbors. Regardless, most classification schemes require optimization of one or more tuning parameters for best classification accuracy, sensitivity, and specificity. A process not requiring exact selection of tuning parameter values would be useful. To improve classification, several ensemble approaches have been used in past work to combine classification results from multiple optimized single classifiers. The collection of classifications for a particular sample are then combined by a fusion process such as majority vote to form the final classification. Presented in this Article is a method to classify a sample by combining multiple classification methods without specifically classifying the sample by each method, that is, the classification methods are not optimized. The approach is demonstrated on three analytical data sets. The first is a beer authentication set with samples measured on five instruments, allowing fusion of multiple instruments by three ways. The second data set is composed of textile samples from three classes based on Raman spectra. This data set is used to demonstrate the ability to classify simultaneously with different data preprocessing strategies, thereby reducing the need to determine the ideal preprocessing method, a common prerequisite for accurate classification. The third data set contains three wine cultivars for three classes measured at 13 unique chemical and physical variables. In all cases, fusion of nonoptimized classifiers improves classification. Also presented are atypical uses of Procrustes analysis and extended inverted signal correction (EISC) for distinguishing sample similarities to respective classes.
A multiple-point spatially weighted k-NN method for object-based classification
NASA Astrophysics Data System (ADS)
Tang, Yunwei; Jing, Linhai; Li, Hui; Atkinson, Peter M.
2016-10-01
Object-based classification, commonly referred to as object-based image analysis (OBIA), is now commonly regarded as able to produce more appealing classification maps, often of greater accuracy, than pixel-based classification and its application is now widespread. Therefore, improvement of OBIA using spatial techniques is of great interest. In this paper, multiple-point statistics (MPS) is proposed for object-based classification enhancement in the form of a new multiple-point k-nearest neighbour (k-NN) classification method (MPk-NN). The proposed method first utilises a training image derived from a pre-classified map to characterise the spatial correlation between multiple points of land cover classes. The MPS borrows spatial structures from other parts of the training image, and then incorporates this spatial information, in the form of multiple-point probabilities, into the k-NN classifier. Two satellite sensor images with a fine spatial resolution were selected to evaluate the new method. One is an IKONOS image of the Beijing urban area and the other is a WorldView-2 image of the Wolong mountainous area, in China. The images were object-based classified using the MPk-NN method and several alternatives, including the k-NN, the geostatistically weighted k-NN, the Bayesian method, the decision tree classifier (DTC), and the support vector machine classifier (SVM). It was demonstrated that the new spatial weighting based on MPS can achieve greater classification accuracy relative to the alternatives and it is, thus, recommended as appropriate for object-based classification.
Physical Human Activity Recognition Using Wearable Sensors.
Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine
2015-12-11
This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.
Photometric Supernova Classification with Machine Learning
NASA Astrophysics Data System (ADS)
Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.
2016-08-01
Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.
Physical Human Activity Recognition Using Wearable Sensors
Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine
2015-01-01
This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors’ placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject. PMID:26690450
Brain tissue segmentation in 4D CT using voxel classification
NASA Astrophysics Data System (ADS)
van den Boom, R.; Oei, M. T. H.; Lafebre, S.; Oostveen, L. J.; Meijer, F. J. A.; Steens, S. C. A.; Prokop, M.; van Ginneken, B.; Manniesing, R.
2012-02-01
A method is proposed to segment anatomical regions of the brain from 4D computer tomography (CT) patient data. The method consists of a three step voxel classification scheme, each step focusing on structures that are increasingly difficult to segment. The first step classifies air and bone, the second step classifies vessels and the third step classifies white matter, gray matter and cerebrospinal fluid. As features the time averaged intensity value and the temporal intensity change value were used. In each step, a k-Nearest-Neighbor classifier was used to classify the voxels. Training data was obtained by placing regions of interest in reconstructed 3D image data. The method has been applied to ten 4D CT cerebral patient data. A leave-one-out experiment showed consistent and accurate segmentation results.
A Global Covariance Descriptor for Nuclear Atypia Scoring in Breast Histopathology Images.
Khan, Adnan Mujahid; Sirinukunwattana, Korsuk; Rajpoot, Nasir
2015-09-01
Nuclear atypia scoring is a diagnostic measure commonly used to assess tumor grade of various cancers, including breast cancer. It provides a quantitative measure of deviation in visual appearance of cell nuclei from those in normal epithelial cells. In this paper, we present a novel image-level descriptor for nuclear atypia scoring in breast cancer histopathology images. The method is based on the region covariance descriptor that has recently become a popular method in various computer vision applications. The descriptor in its original form is not suitable for classification of histopathology images as cancerous histopathology images tend to possess diversely heterogeneous regions in a single field of view. Our proposed image-level descriptor, which we term as the geodesic mean of region covariance descriptors, possesses all the attractive properties of covariance descriptors lending itself to tractable geodesic-distance-based k-nearest neighbor classification using efficient kernels. The experimental results suggest that the proposed image descriptor yields high classification accuracy compared to a variety of widely used image-level descriptors.
NASA Astrophysics Data System (ADS)
Georganos, Stefanos; Grippa, Tais; Vanhuysse, Sabine; Lennert, Moritz; Shimoni, Michal; Wolff, Eléonore
2017-10-01
This study evaluates the impact of three Feature Selection (FS) algorithms in an Object Based Image Analysis (OBIA) framework for Very-High-Resolution (VHR) Land Use-Land Cover (LULC) classification. The three selected FS algorithms, Correlation Based Selection (CFS), Mean Decrease in Accuracy (MDA) and Random Forest (RF) based Recursive Feature Elimination (RFE), were tested on Support Vector Machine (SVM), K-Nearest Neighbor, and Random Forest (RF) classifiers. The results demonstrate that the accuracy of SVM and KNN classifiers are the most sensitive to FS. The RF appeared to be more robust to high dimensionality, although a significant increase in accuracy was found by using the RFE method. In terms of classification accuracy, SVM performed the best using FS, followed by RF and KNN. Finally, only a small number of features is needed to achieve the highest performance using each classifier. This study emphasizes the benefits of rigorous FS for maximizing performance, as well as for minimizing model complexity and interpretation.
NASA Astrophysics Data System (ADS)
He, Y.; He, Y.
2018-04-01
Urban shanty towns are communities that has contiguous old and dilapidated houses with more than 2000 square meters built-up area or more than 50 households. This study makes attempts to extract shanty towns in Nanning City using the product of Census and TripleSat satellite images. With 0.8-meter high-resolution remote sensing images, five texture characteristics (energy, contrast, maximum probability, and inverse difference moment) of shanty towns are trained and analyzed through GLCM. In this study, samples of shanty town are well classified with 98.2 % producer accuracy of unsupervised classification and 73.2 % supervised classification correctness. Low-rise and mid-rise residential blocks in Nanning City are classified into 4 different types by using k-means clustering and nearest neighbour classification respectively. This study initially establish texture feature descriptions of different types of residential areas, especially low-rise and mid-rise buildings, which would help city administrator evaluate residential blocks and reconstruction shanty towns.
Object Classification in Semi Structured Enviroment Using Forward-Looking Sonar
dos Santos, Matheus; Ribeiro, Pedro Otávio; Núñez, Pedro; Botelho, Silvia
2017-01-01
The submarine exploration using robots has been increasing in recent years. The automation of tasks such as monitoring, inspection, and underwater maintenance requires the understanding of the robot’s environment. The object recognition in the scene is becoming a critical issue for these systems. On this work, an underwater object classification pipeline applied in acoustic images acquired by Forward-Looking Sonar (FLS) are studied. The object segmentation combines thresholding, connected pixels searching and peak of intensity analyzing techniques. The object descriptor extract intensity and geometric features of the detected objects. A comparison between the Support Vector Machine, K-Nearest Neighbors, and Random Trees classifiers are presented. An open-source tool was developed to annotate and classify the objects and evaluate their classification performance. The proposed method efficiently segments and classifies the structures in the scene using a real dataset acquired by an underwater vehicle in a harbor area. Experimental results demonstrate the robustness and accuracy of the method described in this paper. PMID:28961163
Classification of speech dysfluencies using LPC based parameterization techniques.
Hariharan, M; Chee, Lim Sin; Ai, Ooi Chia; Yaacob, Sazali
2012-06-01
The goal of this paper is to discuss and compare three feature extraction methods: Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC) and Weighted Linear Prediction Cepstral Coefficients (WLPCC) for recognizing the stuttered events. Speech samples from the University College London Archive of Stuttered Speech (UCLASS) were used for our analysis. The stuttered events were identified through manual segmentation and were used for feature extraction. Two simple classifiers namely, k-nearest neighbour (kNN) and Linear Discriminant Analysis (LDA) were employed for speech dysfluencies classification. Conventional validation method was used for testing the reliability of the classifier results. The study on the effect of different frame length, percentage of overlapping, value of ã in a first order pre-emphasizer and different order p were discussed. The speech dysfluencies classification accuracy was found to be improved by applying statistical normalization before feature extraction. The experimental investigation elucidated LPC, LPCC and WLPCC features can be used for identifying the stuttered events and WLPCC features slightly outperforms LPCC features and LPC features.
An Intelligent Weather Station
Mestre, Gonçalo; Ruano, Antonio; Duarte, Helder; Silva, Sergio; Khosravani, Hamid; Pesteh, Shabnam; Ferreira, Pedro M.; Horta, Ricardo
2015-01-01
Accurate measurements of global solar radiation, atmospheric temperature and relative humidity, as well as the availability of the predictions of their evolution over time, are important for different areas of applications, such as agriculture, renewable energy and energy management, or thermal comfort in buildings. For this reason, an intelligent, light-weight, self-powered and portable sensor was developed, using a nearest-neighbors (NEN) algorithm and artificial neural network (ANN) models as the time-series predictor mechanisms. The hardware and software design of the implemented prototype are described, as well as the forecasting performance related to the three atmospheric variables, using both approaches, over a prediction horizon of 48-steps-ahead. PMID:26690433
An Intelligent Weather Station.
Mestre, Gonçalo; Ruano, Antonio; Duarte, Helder; Silva, Sergio; Khosravani, Hamid; Pesteh, Shabnam; Ferreira, Pedro M; Horta, Ricardo
2015-12-10
Accurate measurements of global solar radiation, atmospheric temperature and relative humidity, as well as the availability of the predictions of their evolution over time, are important for different areas of applications, such as agriculture, renewable energy and energy management, or thermal comfort in buildings. For this reason, an intelligent, light-weight, self-powered and portable sensor was developed, using a nearest-neighbors (NEN) algorithm and artificial neural network (ANN) models as the time-series predictor mechanisms. The hardware and software design of the implemented prototype are described, as well as the forecasting performance related to the three atmospheric variables, using both approaches, over a prediction horizon of 48-steps-ahead.
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds.
Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M; Bloom, Peter H; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd
2017-01-01
Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds
Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd
2017-01-01
Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data. PMID:28403159
Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds
Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael J.; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd
2017-01-01
Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.
NASA Astrophysics Data System (ADS)
Gulliver, John; de Hoogh, Kees; Fecht, Daniela; Vienneau, Danielle; Briggs, David
2011-12-01
The development of geographical information system techniques has opened up a wide array of methods for air pollution exposure assessment. The extent to which these provide reliable estimates of air pollution concentrations is nevertheless not clearly established. Nor is it clear which methods or metrics should be preferred in epidemiological studies. This paper compares the performance of ten different methods and metrics in terms of their ability to predict mean annual PM 10 concentrations across 52 monitoring sites in London, UK. Metrics analysed include indicators (distance to nearest road, traffic volume on nearest road, heavy duty vehicle (HDV) volume on nearest road, road density within 150 m, traffic volume within 150 m and HDV volume within 150 m) and four modelling approaches: based on the nearest monitoring site, kriging, dispersion modelling and land use regression (LUR). Measures were computed in a GIS, and resulting metrics calibrated and validated against monitoring data using a form of grouped jack-knife analysis. The results show that PM 10 concentrations across London show little spatial variation. As a consequence, most methods can predict the average without serious bias. Few of the approaches, however, show good correlations with monitored PM 10 concentrations, and most predict no better than a simple classification based on site type. Only land use regression reaches acceptable levels of correlation ( R2 = 0.47), though this can be improved by also including information on site type. This might therefore be taken as a recommended approach in many studies, though care is needed in developing meaningful land use regression models, and like any method they need to be validated against local data before their application as part of epidemiological studies.
Li, Zhaohua; Wang, Yuduo; Quan, Wenxiang; Wu, Tongning; Lv, Bin
2015-02-15
Based on near-infrared spectroscopy (NIRS), recent converging evidence has been observed that patients with schizophrenia exhibit abnormal functional activities in the prefrontal cortex during a verbal fluency task (VFT). Therefore, some studies have attempted to employ NIRS measurements to differentiate schizophrenia patients from healthy controls with different classification methods. However, no systematic evaluation was conducted to compare their respective classification performances on the same study population. In this study, we evaluated the classification performance of four classification methods (including linear discriminant analysis, k-nearest neighbors, Gaussian process classifier, and support vector machines) on an NIRS-aided schizophrenia diagnosis. We recruited a large sample of 120 schizophrenia patients and 120 healthy controls and measured the hemoglobin response in the prefrontal cortex during the VFT using a multichannel NIRS system. Features for classification were extracted from three types of NIRS data in each channel. We subsequently performed a principal component analysis (PCA) for feature selection prior to comparison of the different classification methods. We achieved a maximum accuracy of 85.83% and an overall mean accuracy of 83.37% using a PCA-based feature selection on oxygenated hemoglobin signals and support vector machine classifier. This is the first comprehensive evaluation of different classification methods for the diagnosis of schizophrenia based on different types of NIRS signals. Our results suggested that, using the appropriate classification method, NIRS has the potential capacity to be an effective objective biomarker for the diagnosis of schizophrenia. Copyright © 2014 Elsevier B.V. All rights reserved.
Protein classification based on text document classification techniques.
Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith
2005-03-01
The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.
Wang, Jian-Gang; Sung, Eric; Yau, Wei-Yun
2011-07-01
Facial age classification is an approach to classify face images into one of several predefined age groups. One of the difficulties in applying learning techniques to the age classification problem is the large amount of labeled training data required. Acquiring such training data is very costly in terms of age progress, privacy, human time, and effort. Although unlabeled face images can be obtained easily, it would be expensive to manually label them on a large scale and getting the ground truth. The frugal selection of the unlabeled data for labeling to quickly reach high classification performance with minimal labeling efforts is a challenging problem. In this paper, we present an active learning approach based on an online incremental bilateral two-dimension linear discriminant analysis (IB2DLDA) which initially learns from a small pool of labeled data and then iteratively selects the most informative samples from the unlabeled set to increasingly improve the classifier. Specifically, we propose a novel data selection criterion called the furthest nearest-neighbor (FNN) that generalizes the margin-based uncertainty to the multiclass case and which is easy to compute, so that the proposed active learning algorithm can handle a large number of classes and large data sizes efficiently. Empirical experiments on FG-NET and Morph databases together with a large unlabeled data set for age categorization problems show that the proposed approach can achieve results comparable or even outperform a conventionally trained active classifier that requires much more labeling effort. Our IB2DLDA-FNN algorithm can achieve similar results much faster than random selection and with fewer samples for age categorization. It also can achieve comparable results with active SVM but is much faster than active SVM in terms of training because kernel methods are not needed. The results on the face recognition database and palmprint/palm vein database showed that our approach can handle problems with large number of classes. Our contributions in this paper are twofold. First, we proposed the IB2DLDA-FNN, the FNN being our novel idea, as a generic on-line or active learning paradigm. Second, we showed that it can be another viable tool for active learning of facial age range classification.
NASA Astrophysics Data System (ADS)
Moldovanu, Simona; Bibicu, Dorin; Moraru, Luminita; Nicolae, Mariana Carmen
2011-12-01
Co-occurrence matrix has been applied successfully for echographic images characterization because it contains information about spatial distribution of grey-scale levels in an image. The paper deals with the analysis of pixels in selected regions of interest of an US image of the liver. The useful information obtained refers to texture features such as entropy, contrast, dissimilarity and correlation extract with co-occurrence matrix. The analyzed US images were grouped in two distinct sets: healthy liver and steatosis (or fatty) liver. These two sets of echographic images of the liver build a database that includes only histological confirmed cases: 10 images of healthy liver and 10 images of steatosis liver. The healthy subjects help to compute four textural indices and as well as control dataset. We chose to study these diseases because the steatosis is the abnormal retention of lipids in cells. The texture features are statistical measures and they can be used to characterize irregularity of tissues. The goal is to extract the information using the Nearest Neighbor classification algorithm. The K-NN algorithm is a powerful tool to classify features textures by means of grouping in a training set using healthy liver, on the one hand, and in a holdout set using the features textures of steatosis liver, on the other hand. The results could be used to quantify the texture information and will allow a clear detection between health and steatosis liver.
AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.
Sun, Lei; Wang, Jun; Wei, Jinmao
2017-03-14
The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.
Random ensemble learning for EEG classification.
Hosseini, Mohammad-Parsa; Pompili, Dario; Elisevich, Kost; Soltanian-Zadeh, Hamid
2018-01-01
Real-time detection of seizure activity in epilepsy patients is critical in averting seizure activity and improving patients' quality of life. Accurate evaluation, presurgical assessment, seizure prevention, and emergency alerts all depend on the rapid detection of seizure onset. A new method of feature selection and classification for rapid and precise seizure detection is discussed wherein informative components of electroencephalogram (EEG)-derived data are extracted and an automatic method is presented using infinite independent component analysis (I-ICA) to select independent features. The feature space is divided into subspaces via random selection and multichannel support vector machines (SVMs) are used to classify these subspaces. The result of each classifier is then combined by majority voting to establish the final output. In addition, a random subspace ensemble using a combination of SVM, multilayer perceptron (MLP) neural network and an extended k-nearest neighbors (k-NN), called extended nearest neighbor (ENN), is developed for the EEG and electrocorticography (ECoG) big data problem. To evaluate the solution, a benchmark ECoG of eight patients with temporal and extratemporal epilepsy was implemented in a distributed computing framework as a multitier cloud-computing architecture. Using leave-one-out cross-validation, the accuracy, sensitivity, specificity, and both false positive and false negative ratios of the proposed method were found to be 0.97, 0.98, 0.96, 0.04, and 0.02, respectively. Application of the solution to cases under investigation with ECoG has also been effected to demonstrate its utility. Copyright © 2017 Elsevier B.V. All rights reserved.
Accelerometer and Camera-Based Strategy for Improved Human Fall Detection.
Zerrouki, Nabil; Harrou, Fouzi; Sun, Ying; Houacine, Amrane
2016-12-01
In this paper, we address the problem of detecting human falls using anomaly detection. Detection and classification of falls are based on accelerometric data and variations in human silhouette shape. First, we use the exponentially weighted moving average (EWMA) monitoring scheme to detect a potential fall in the accelerometric data. We used an EWMA to identify features that correspond with a particular type of fall allowing us to classify falls. Only features corresponding with detected falls were used in the classification phase. A benefit of using a subset of the original data to design classification models minimizes training time and simplifies models. Based on features corresponding to detected falls, we used the support vector machine (SVM) algorithm to distinguish between true falls and fall-like events. We apply this strategy to the publicly available fall detection databases from the university of Rzeszow's. Results indicated that our strategy accurately detected and classified fall events, suggesting its potential application to early alert mechanisms in the event of fall situations and its capability for classification of detected falls. Comparison of the classification results using the EWMA-based SVM classifier method with those achieved using three commonly used machine learning classifiers, neural network, K-nearest neighbor and naïve Bayes, proved our model superior.
Multi-class SVM model for fMRI-based classification and grading of liver fibrosis
NASA Astrophysics Data System (ADS)
Freiman, M.; Sela, Y.; Edrei, Y.; Pappo, O.; Joskowicz, L.; Abramovitch, R.
2010-03-01
We present a novel non-invasive automatic method for the classification and grading of liver fibrosis from fMRI maps based on hepatic hemodynamic changes. This method automatically creates a model for liver fibrosis grading based on training datasets. Our supervised learning method evaluates hepatic hemodynamics from an anatomical MRI image and three T2*-W fMRI signal intensity time-course scans acquired during the breathing of air, air-carbon dioxide, and carbogen. It constructs a statistical model of liver fibrosis from these fMRI scans using a binary-based one-against-all multi class Support Vector Machine (SVM) classifier. We evaluated the resulting classification model with the leave-one out technique and compared it to both full multi-class SVM and K-Nearest Neighbor (KNN) classifications. Our experimental study analyzed 57 slice sets from 13 mice, and yielded a 98.2% separation accuracy between healthy and low grade fibrotic subjects, and an overall accuracy of 84.2% for fibrosis grading. These results are better than the existing image-based methods which can only discriminate between healthy and high grade fibrosis subjects. With appropriate extensions, our method may be used for non-invasive classification and progression monitoring of liver fibrosis in human patients instead of more invasive approaches, such as biopsy or contrast-enhanced imaging.
NASA Astrophysics Data System (ADS)
Navratil, Peter; Wilps, Hans
2013-01-01
Three different object-based image classification techniques are applied to high-resolution satellite data for the mapping of the habitats of Asian migratory locust (Locusta migratoria migratoria) in the southern Aral Sea basin, Uzbekistan. A set of panchromatic and multispectral Système Pour l'Observation de la Terre-5 satellite images was spectrally enhanced by normalized difference vegetation index and tasseled cap transformation and segmented into image objects, which were then classified by three different classification approaches: a rule-based hierarchical fuzzy threshold (HFT) classification method was compared to a supervised nearest neighbor classifier and classification tree analysis by the quick, unbiased, efficient statistical trees algorithm. Special emphasis was laid on the discrimination of locust feeding and breeding habitats due to the significance of this discrimination for practical locust control. Field data on vegetation and land cover, collected at the time of satellite image acquisition, was used to evaluate classification accuracy. The results show that a robust HFT classifier outperformed the two automated procedures by 13% overall accuracy. The classification method allowed a reliable discrimination of locust feeding and breeding habitats, which is of significant importance for the application of the resulting data for an economically and environmentally sound control of locust pests because exact spatial knowledge on the habitat types allows a more effective surveying and use of pesticides.
NASA Astrophysics Data System (ADS)
Wang, Yonggang; Li, Deng; Lu, Xiaoming; Cheng, Xinyi; Wang, Liwei
2014-10-01
Continuous crystal-based positron emission tomography (PET) detectors could be an ideal alternative for current high-resolution pixelated PET detectors if the issues of high performance γ interaction position estimation and its real-time implementation are solved. Unfortunately, existing position estimators are not very feasible for implementation on field-programmable gate array (FPGA). In this paper, we propose a new self-organizing map neural network-based nearest neighbor (SOM-NN) positioning scheme aiming not only at providing high performance, but also at being realistic for FPGA implementation. Benefitting from the SOM feature mapping mechanism, the large set of input reference events at each calibration position is approximated by a small set of prototypes, and the computation of the nearest neighbor searching for unknown events is largely reduced. Using our experimental data, the scheme was evaluated, optimized and compared with the smoothed k-NN method. The spatial resolutions of full-width-at-half-maximum (FWHM) of both methods averaged over the center axis of the detector were obtained as 1.87 ±0.17 mm and 1.92 ±0.09 mm, respectively. The test results show that the SOM-NN scheme has an equivalent positioning performance with the smoothed k-NN method, but the amount of computation is only about one-tenth of the smoothed k-NN method. In addition, the algorithm structure of the SOM-NN scheme is more feasible for implementation on FPGA. It has the potential to realize real-time position estimation on an FPGA with a high-event processing throughput.
NASA Astrophysics Data System (ADS)
Qur’ania, A.; Sarinah, I.
2018-03-01
People often wrong in knowing the type of jasmine by just looking at the white color of the jasmine, while not all white flowers including jasmine and not all jasmine flowers have white. There is a jasmine that is yellow and there is a jasmine that is white and purple.The aim of this research is to identify Jasmine flower (Jasminum sp.) based on the shape of the flower image-based using Sobel edge detection and k-Nearest Neighbor. Edge detection is used to detect the type of flower from the flower shape. Edge detection aims to improve the appearance of the border of a digital image. While k-Nearest Neighbor method is used to classify the classification of test objects into classes that have neighbouring properties closest to the object of training. The data used in this study are three types of jasmine namely jasmine white (Jasminum sambac), jasmine gambir (Jasminum pubescens), and jasmine japan (Pseuderanthemum reticulatum). Testing of jasmine flower image resized 50 × 50 pixels, 100 × 100 pixels, 150 × 150 pixels yields an accuracy of 84%. Tests on distance values of the k-NN method with spacing 5, 10 and 15 resulted in different accuracy rates for 5 and 10 closest distances yielding the same accuracy rate of 84%, for the 15 shortest distance resulted in a small accuracy of 65.2%.
NASA Astrophysics Data System (ADS)
Adeniyi, D. A.; Wei, Z.; Yang, Y.
2017-10-01
Recommendation problem has been extensively studied by researchers in the field of data mining, database and information retrieval. This study presents the design and realisation of an automated, personalised news recommendations system based on Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model. The proposed χ2SB-KNN model has the potential to overcome computational complexity and information overloading problems, reduces runtime and speeds up execution process through the use of critical value of χ2 distribution. The proposed recommendation engine can alleviate scalability challenges through combined online pattern discovery and pattern matching for real-time recommendations. This work also showcases the development of a novel method of feature selection referred to as Data Discretisation-Based feature selection method. This is used for selecting the best features for the proposed χ2SB-KNN algorithm at the preprocessing stage of the classification procedures. The implementation of the proposed χ2SB-KNN model is achieved through the use of a developed in-house Java program on an experimental website called OUC newsreaders' website. Finally, we compared the performance of our system with two baseline methods which are traditional Euclidean distance K-nearest neighbour and Naive Bayesian techniques. The result shows a significant improvement of our method over the baseline methods studied.
2013-11-27
SECURITY CLASSIFICATION OF: CUBRC has developed an in-line, multi-analyte isolation technology that utilizes solid phase extraction chemistries to purify...goals. Specifically, CUBRC will design and manufacture a prototype cartridge(s) and test the prototype cartridge for its ability to isolate each...display a currently valid OMB control number. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. CUBRC , Inc. P. O. Box 400 Buffalo, NY 14225 -1955
Glatz Prototype Seat Impact Testing
2013-07-03
airbag restraint, H-60A/L, crashworthiness, crashworthy, helicopter, rotorcraft, occupant restraint 16. SECURITY CLASSIFICATION OF: 17...data for Cell C, incorporating the H-60 Comp data, the new Glatz prototype data, and data from the airbag restraint program with a modified H-60A/L seat...1711 GLATZ VDT6290 Glatz 24.37 40.52 YES 922 42.40 1732 AIRBAG VDT6287 H-60A/L w/crotch strap mod 21.39* 40.56 YES 1267 21.27 1373 *Issue with
Maguari Virus Associated with Human Disease
Groseth, Allison; Vine, Veronica; Weisend, Carla; Guevara, Carolina; Watts, Douglas; Russell, Brandy; Tesh, Robert B.
2017-01-01
Despite the lack of evidence for symptomatic human infection with Maguari virus (MAGV), its close relation to Cache Valley virus (CVV), which does infect humans, remains a concern. We sequenced the complete genome of a MAGV-like isolate (OBS6657) obtained from a febrile patient in Pucallpa, Ucayali, Peru, in 1998. To facilitate its classification, we generated additional full-length sequences for the MAGV prototype strain, 3 additional MAGV-like isolates, and the closely related CVV (7 strains), Tlacotalpan (1 strain), Playas (3 strains), and Fort Sherman (1 strain) viruses. The OBS6657 isolate is similar to the MAGV prototype, whereas 2 of the other MAGV-like isolates are located on a distinct branch and most likely warrant classification as a separate virus species and 1 is, in fact, a misclassified CVV strain. Our findings provide clear evidence that MAGV can cause human disease. PMID:28726602
NASA Astrophysics Data System (ADS)
Salman, S. S.; Abbas, W. A.
2018-05-01
The goal of the study is to support analysis Enhancement of Resolution and study effect on classification methods on bands spectral information of specific and quantitative approaches. In this study introduce a method to enhancement resolution Landsat 8 of combining the bands spectral of 30 meters resolution with panchromatic band 8 of 15 meters resolution, because of importance multispectral imagery to extracting land - cover. Classification methods used in this study to classify several lands -covers recorded from OLI- 8 imagery. Two methods of Data mining can be classified as either supervised or unsupervised. In supervised methods, there is a particular predefined target, that means the algorithm learn which values of the target are associated with which values of the predictor sample. K-nearest neighbors and maximum likelihood algorithms examine in this work as supervised methods. In other hand, no sample identified as target in unsupervised methods, the algorithm of data extraction searches for structure and patterns between all the variables, represented by Fuzzy C-mean clustering method as one of the unsupervised methods, NDVI vegetation index used to compare the results of classification method, the percent of dense vegetation in maximum likelihood method give a best results.
Comparison of GOES Cloud Classification Algorithms Employing Explicit and Implicit Physics
NASA Technical Reports Server (NTRS)
Bankert, Richard L.; Mitrescu, Cristian; Miller, Steven D.; Wade, Robert H.
2009-01-01
Cloud-type classification based on multispectral satellite imagery data has been widely researched and demonstrated to be useful for distinguishing a variety of classes using a wide range of methods. The research described here is a comparison of the classifier output from two very different algorithms applied to Geostationary Operational Environmental Satellite (GOES) data over the course of one year. The first algorithm employs spectral channel thresholding and additional physically based tests. The second algorithm was developed through a supervised learning method with characteristic features of expertly labeled image samples used as training data for a 1-nearest-neighbor classification. The latter's ability to identify classes is also based in physics, but those relationships are embedded implicitly within the algorithm. A pixel-to-pixel comparison analysis was done for hourly daytime scenes within a region in the northeastern Pacific Ocean. Considerable agreement was found in this analysis, with many of the mismatches or disagreements providing insight to the strengths and limitations of each classifier. Depending upon user needs, a rule-based or other postprocessing system that combines the output from the two algorithms could provide the most reliable cloud-type classification.
Identification of Anisomerous Motor Imagery EEG Signals Based on Complex Algorithms
Zhang, Zhiwen; Duan, Feng; Zhou, Xin; Meng, Zixuan
2017-01-01
Motor imagery (MI) electroencephalograph (EEG) signals are widely applied in brain-computer interface (BCI). However, classified MI states are limited, and their classification accuracy rates are low because of the characteristics of nonlinearity and nonstationarity. This study proposes a novel MI pattern recognition system that is based on complex algorithms for classifying MI EEG signals. In electrooculogram (EOG) artifact preprocessing, band-pass filtering is performed to obtain the frequency band of MI-related signals, and then, canonical correlation analysis (CCA) combined with wavelet threshold denoising (WTD) is used for EOG artifact preprocessing. We propose a regularized common spatial pattern (R-CSP) algorithm for EEG feature extraction by incorporating the principle of generic learning. A new classifier combining the K-nearest neighbor (KNN) and support vector machine (SVM) approaches is used to classify four anisomerous states, namely, imaginary movements with the left hand, right foot, and right shoulder and the resting state. The highest classification accuracy rate is 92.5%, and the average classification accuracy rate is 87%. The proposed complex algorithm identification method can significantly improve the identification rate of the minority samples and the overall classification performance. PMID:28874909
Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong
2017-06-19
A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.
PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer
Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models tomore » curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.« less
A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs.
Li, Feifei; Piao, Minghao; Piao, Yongjun; Li, Meijing; Ryu, Keun Ho
2014-10-01
Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.
A face and palmprint recognition approach based on discriminant DCT feature extraction.
Jing, Xiao-Yuan; Zhang, David
2004-12-01
In the field of image processing and recognition, discrete cosine transform (DCT) and linear discrimination are two widely used techniques. Based on them, we present a new face and palmprint recognition approach in this paper. It first uses a two-dimensional separability judgment to select the DCT frequency bands with favorable linear separability. Then from the selected bands, it extracts the linear discriminative features by an improved Fisherface method and performs the classification by the nearest neighbor classifier. We detailedly analyze theoretical advantages of our approach in feature extraction. The experiments on face databases and palmprint database demonstrate that compared to the state-of-the-art linear discrimination methods, our approach obtains better classification performance. It can significantly improve the recognition rates for face and palmprint data and effectively reduce the dimension of feature space.
Automatic tissue characterization from ultrasound imagery
NASA Astrophysics Data System (ADS)
Kadah, Yasser M.; Farag, Aly A.; Youssef, Abou-Bakr M.; Badawi, Ahmed M.
1993-08-01
In this work, feature extraction algorithms are proposed to extract the tissue characterization parameters from liver images. Then the resulting parameter set is further processed to obtain the minimum number of parameters representing the most discriminating pattern space for classification. This preprocessing step was applied to over 120 pathology-investigated cases to obtain the learning data for designing the classifier. The extracted features are divided into independent training and test sets and are used to construct both statistical and neural classifiers. The optimal criteria for these classifiers are set to have minimum error, ease of implementation and learning, and the flexibility for future modifications. Various algorithms for implementing various classification techniques are presented and tested on the data. The best performance was obtained using a single layer tensor model functional link network. Also, the voting k-nearest neighbor classifier provided comparably good diagnostic rates.
Automatic morphological classification of galaxy images
Shamir, Lior
2009-01-01
We describe an image analysis supervised learning algorithm that can automatically classify galaxy images. The algorithm is first trained using a manually classified images of elliptical, spiral, and edge-on galaxies. A large set of image features is extracted from each image, and the most informative features are selected using Fisher scores. Test images can then be classified using a simple Weighted Nearest Neighbor rule such that the Fisher scores are used as the feature weights. Experimental results show that galaxy images from Galaxy Zoo can be classified automatically to spiral, elliptical and edge-on galaxies with accuracy of ~90% compared to classifications carried out by the author. Full compilable source code of the algorithm is available for free download, and its general-purpose nature makes it suitable for other uses that involve automatic image analysis of celestial objects. PMID:20161594
Recognition memory is modulated by visual similarity.
Yago, Elena; Ishai, Alumit
2006-06-01
We used event-related fMRI to test whether recognition memory depends on visual similarity between familiar prototypes and novel exemplars. Subjects memorized portraits, landscapes, and abstract compositions by six painters with a unique style, and later performed a memory recognition task. The prototypes were presented with new exemplars that were either visually similar or dissimilar. Behaviorally, novel, dissimilar items were detected faster and more accurately. We found activation in a distributed cortical network that included face- and object-selective regions in the visual cortex, where familiar prototypes evoked stronger responses than new exemplars; attention-related regions in parietal cortex, where responses elicited by new exemplars were reduced with decreased similarity to the prototypes; and the hippocampus and memory-related regions in parietal and prefrontal cortices, where stronger responses were evoked by the dissimilar exemplars. Our findings suggest that recognition memory is mediated by classification of novel exemplars as a match or a mismatch, based on their visual similarity to familiar prototypes.
Controlling basins of attraction in a neural network-based telemetry monitor
NASA Technical Reports Server (NTRS)
Bell, Benjamin; Eilbert, James L.
1988-01-01
The size of the basins of attraction around fixed points in recurrent neural nets (NNs) can be modified by a training process. Controlling these attractive regions by presenting training data with various amount of noise added to the prototype signal vectors is discussed. Application of this technique to signal processing results in a classification system whose sensitivity can be controlled. This new technique is applied to the classification of temporal sequences in telemetry data.
NASA Technical Reports Server (NTRS)
Hepner, George F.; Logan, Thomas; Ritter, Niles; Bryant, Nevin
1990-01-01
Recent research has shown an artificial neural network (ANN) to be capable of pattern recognition and the classification of image data. This paper examines the potential for the application of neural network computing to satellite image processing. A second objective is to provide a preliminary comparison and ANN classification. An artificial neural network can be trained to do land-cover classification of satellite imagery using selected sites representative of each class in a manner similar to conventional supervised classification. One of the major problems associated with recognition and classifications of pattern from remotely sensed data is the time and cost of developing a set of training sites. This reseach compares the use of an ANN back propagation classification procedure with a conventional supervised maximum likelihood classification procedure using a minimal training set. When using a minimal training set, the neural network is able to provide a land-cover classification superior to the classification derived from the conventional classification procedure. This research is the foundation for developing application parameters for further prototyping of software and hardware implementations for artificial neural networks in satellite image and geographic information processing.
Prototype diagnosis of psychiatric syndromes
WESTEN, DREW
2012-01-01
The method of diagnosing patients used since the early 1980s in psychiatry, which involves evaluating each of several hundred symptoms for their presence or absence and then applying idiosyncratic rules for combining them for each of several hundred disorders, has led to great advances in research over the last 30 years. However, its problems have become increasingly apparent, particularly for clinical practice. An alternative approach, designed to maximize clinical utility, is prototype matching. Instead of counting symptoms of a disorder and determining whether they cross an arbitrary cutoff, the task of the diagnostician is to gauge the extent to which a patient’s clinical presentation matches a paragraph-length description of the disorder using a simple 5-point scale, from 1 (“little or no match”) to 5 (“very good match”). The result is both a dimensional diagnosis that captures the extent to which the patient “has” the disorder and a categorical diagnosis, with ratings of 4 and 5 corresponding to presence of the disorder and a rating of 3 indicating “subthreshold” or “clinically significant features”. The disorders and criteria woven into the prototypes can be identified empirically, so that the prototypes are both scientifically grounded and clinically useful. Prototype diagnosis has a number of advantages: it better captures the way humans naturally classify novel and complex stimuli; is clinically helpful, reliable, and easy to use in everyday practice; facilitates both dimensional and categorical diagnosis and dramatically reduces the number of categories required for classification; allows for clinically richer, empirically derived, and culturally relevant classification; reduces the gap between research criteria and clinical knowledge, by allowing clinicians in training to learn a small set of standardized prototypes and to develop richer mental representations of the disorders over time through clinical experience; and can help resolve the thorny issue of the relation between psychiatric diagnosis and functional impairment. PMID:22294998
Remote Sensing and Wetland Ecology: a South African Case Study.
De Roeck, Els R; Verhoest, Niko E C; Miya, Mtemi H; Lievens, Hans; Batelaan, Okke; Thomas, Abraham; Brendonck, Luc
2008-05-26
Remote sensing offers a cost efficient means for identifying and monitoring wetlands over a large area and at different moments in time. In this study, we aim at providing ecologically relevant information on characteristics of temporary and permanent isolated open water wetlands, obtained by standard techniques and relatively cheap imagery. The number, surface area, nearest distance, and dynamics of isolated temporary and permanent wetlands were determined for the Western Cape, South Africa. Open water bodies (wetlands) were mapped from seven Landsat images (acquired during 1987 - 2002) using supervised maximum likelihood classification. The number of wetlands fluctuated over time. Most wetlands were detected in the winter of 2000 and 2002, probably related to road constructions. Imagery acquired in summer contained fewer wetlands than in winter. Most wetlands identified from Landsat images were smaller than one hectare. The average distance to the nearest wetland was larger in summer. In comparison to temporary wetlands, fewer, but larger permanent wetlands were detected. In addition, classification of non-vegetated wetlands on an Envisat ASAR radar image (acquired in June 2005) was evaluated. The number of detected small wetlands was lower for radar imagery than optical imagery (acquired in June 2002), probably because of deterioration of the spatial information content due the extensive pre-processing requirements of the radar image. Both optical and radar classifications allow to assess wetland characteristics that potentially influence plant and animal metacommunity structure. Envisat imagery, however, was less suitable than Landsat imagery for the extraction of detailed ecological information, as only large wetlands can be detected. This study has indicated that ecologically relevant data can be generated for the larger wetlands through relatively cheap imagery and standard techniques, despite the relatively low resolution of Landsat and Envisat imagery. For the characterisation of very small wetlands, high spatial resolution optical or radar images are needed. This study exemplifies the benefits of integrating remote sensing and ecology and hence stimulates interdisciplinary research of isolated wetlands.
Automated reuseable components system study results
NASA Technical Reports Server (NTRS)
Gilroy, Kathy
1989-01-01
The Automated Reusable Components System (ARCS) was developed under a Phase 1 Small Business Innovative Research (SBIR) contract for the U.S. Army CECOM. The objectives of the ARCS program were: (1) to investigate issues associated with automated reuse of software components, identify alternative approaches, and select promising technologies, and (2) to develop tools that support component classification and retrieval. The approach followed was to research emerging techniques and experimental applications associated with reusable software libraries, to investigate the more mature information retrieval technologies for applicability, and to investigate the applicability of specialized technologies to improve the effectiveness of a reusable component library. Various classification schemes and retrieval techniques were identified and evaluated for potential application in an automated library system for reusable components. Strategies for library organization and management, component submittal and storage, and component search and retrieval were developed. A prototype ARCS was built to demonstrate the feasibility of automating the reuse process. The prototype was created using a subset of the classification and retrieval techniques that were investigated. The demonstration system was exercised and evaluated using reusable Ada components selected from the public domain. A requirements specification for a production-quality ARCS was also developed.
Reading the lesson: eliciting requirements for a mammography training application
NASA Astrophysics Data System (ADS)
Hartswood, M.; Blot, L.; Taylor, P.; Anderson, S.; Procter, R.; Wilkinson, L.; Smart, L.
2009-02-01
Demonstrations of a prototype training tool were used to elicit requirements for an intelligent training system for screening mammography. The prototype allowed senior radiologists (mentors) to select cases from a distributed database of images to meet the specific training requirements of junior colleagues (trainees) and then provided automated feedback in response to trainees' attempts at interpretation. The tool was demonstrated to radiologists and radiographers working in the breast screening service at four evaluation sessions. Participants highlighted ease of selecting cases that can deliver specific learning objectives as important for delivering effective training. To usefully structure a large data set of training images we undertook a classification exercise of mentor authored free text 'learning points' attached to training case obtained from two screening centres (n=333, n=129 respectively). We were able to adduce a hierarchy of abstract categories representing classes of lesson that groups of cases were intended to convey (e.g. Temporal change, Misleading juxtapositions, Position of lesion, Typical/Atypical presentation, and so on). In this paper we present the method used to devise this classification, the classification scheme itself, initial user-feedback, and our plans to incorporated it into a software tool to aid case selection.
NASA Astrophysics Data System (ADS)
Peresan, Antonella; Gentili, Stefania
2017-04-01
Identification and statistical characterization of seismic clusters may provide useful insights about the features of seismic energy release and their relation to physical properties of the crust within a given region. Moreover, a number of studies based on spatio-temporal analysis of main-shocks occurrence require preliminary declustering of the earthquake catalogs. Since various methods, relying on different physical/statistical assumptions, may lead to diverse classifications of earthquakes into main events and related events, we aim to investigate the classification differences among different declustering techniques. Accordingly, a formal selection and comparative analysis of earthquake clusters is carried out for the most relevant earthquakes in North-Eastern Italy, as reported in the local OGS-CRS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. The comparison is then extended to selected earthquake sequences associated with a different seismotectonic setting, namely to events that occurred in the region struck by the recent Central Italy destructive earthquakes, making use of INGV data. Various techniques, ranging from classical space-time windows methods to ad hoc manual identification of aftershocks, are applied for detection of earthquake clusters. In particular, a statistical method based on nearest-neighbor distances of events in space-time-energy domain, is considered. Results from clusters identification by the nearest-neighbor method turn out quite robust with respect to the time span of the input catalogue, as well as to minimum magnitude cutoff. The identified clusters for the largest events reported in North-Eastern Italy since 1977 are well consistent with those reported in earlier studies, which were aimed at detailed manual aftershocks identification. The study shows that the data-driven approach, based on the nearest-neighbor distances, can be satisfactorily applied to decompose the seismic catalog into background seismicity and individual sequences of earthquake clusters, also in areas characterized by moderate seismic activity, where the standard declustering techniques may turn out rather gross approximations. With these results acquired, the main statistical features of seismic clusters are explored, including complex interdependence of related events, with the aim to characterize the space-time patterns of earthquakes occurrence in North-Eastern Italy and capture their basic differences with Central Italy sequences.
Characterization of Used Nuclear Fuel with Multivariate Analysis for Process Monitoring
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dayman, Kenneth J.; Coble, Jamie B.; Orton, Christopher R.
2014-01-01
The Multi-Isotope Process (MIP) Monitor combines gamma spectroscopy and multivariate analysis to detect anomalies in various process streams in a nuclear fuel reprocessing system. Measured spectra are compared to models of nominal behavior at each measurement location to detect unexpected changes in system behavior. In order to improve the accuracy and specificity of process monitoring, fuel characterization may be used to more accurately train subsequent models in a full analysis scheme. This paper presents initial development of a reactor-type classifier that is used to select a reactor-specific partial least squares model to predict fuel burnup. Nuclide activities for prototypic usedmore » fuel samples were generated in ORIGEN-ARP and used to investigate techniques to characterize used nuclear fuel in terms of reactor type (pressurized or boiling water reactor) and burnup. A variety of reactor type classification algorithms, including k-nearest neighbors, linear and quadratic discriminant analyses, and support vector machines, were evaluated to differentiate used fuel from pressurized and boiling water reactors. Then, reactor type-specific partial least squares models were developed to predict the burnup of the fuel. Using these reactor type-specific models instead of a model trained for all light water reactors improved the accuracy of burnup predictions. The developed classification and prediction models were combined and applied to a large dataset that included eight fuel assembly designs, two of which were not used in training the models, and spanned the range of the initial 235U enrichment, cooling time, and burnup values expected of future commercial used fuel for reprocessing. Error rates were consistent across the range of considered enrichment, cooling time, and burnup values. Average absolute relative errors in burnup predictions for validation data both within and outside the training space were 0.0574% and 0.0597%, respectively. The errors seen in this work are artificially low, because the models were trained, optimized, and tested on simulated, noise-free data. However, these results indicate that the developed models may generalize well to new data and that the proposed approach constitutes a viable first step in developing a fuel characterization algorithm based on gamma spectra.« less
Pace, Giuseppina; Ferri, Violetta; Grave, Christian; Elbing, Mark; von Hänisch, Carsten; Zharnikov, Michael; Mayor, Marcel; Rampi, Maria Anita; Samorì, Paolo
2007-06-12
Photochromic systems can convert light energy into mechanical energy, thus they can be used as building blocks for the fabrication of prototypes of molecular devices that are based on the photomechanical effect. Hitherto a controlled photochromic switch on surfaces has been achieved either on isolated chromophores or within assemblies of randomly arranged molecules. Here we show by scanning tunneling microscopy imaging the photochemical switching of a new terminally thiolated azobiphenyl rigid rod molecule. Interestingly, the switching of entire molecular 2D crystalline domains is observed, which is ruled by the interactions between nearest neighbors. This observation of azobenzene-based systems displaying collective switching might be of interest for applications in high-density data storage.
Hamiltonian thermostats fail to promote heat flow
NASA Astrophysics Data System (ADS)
Hoover, Wm. G.; Hoover, Carol G.
2013-12-01
Hamiltonian mechanics can be used to constrain temperature simultaneously with energy. We illustrate the interesting situations that develop when two different temperatures are imposed within a composite Hamiltonian system. The model systems we treat are ϕ4 chains, with quartic tethers and quadratic nearest-neighbor Hooke's-law interactions. This model is known to satisfy Fourier's law. Our prototypical problem sandwiches a Newtonian subsystem between hot and cold Hamiltonian reservoir regions. We have characterized four different Hamiltonian reservoir types. There is no tendency for any of these two-temperature Hamiltonian simulations to transfer heat from the hot to the cold degrees of freedom. Evidently steady heat flow simulations require energy sources and sinks, and are therefore incompatible with Hamiltonian mechanics.
Hamiltonian dynamics of thermostated systems: two-temperature heat-conducting phi4 chains.
Hoover, Wm G; Hoover, Carol G
2007-04-28
We consider and compare four Hamiltonian formulations of thermostated mechanics, three of them kinetic, and the other one configurational. Though all four approaches "work" at equilibrium, their application to many-body nonequilibrium simulations can fail to provide a proper flow of heat. All the Hamiltonian formulations considered here are applied to the same prototypical two-temperature "phi4" model of a heat-conducting chain. This model incorporates nearest-neighbor Hooke's-Law interactions plus a quartic tethering potential. Physically correct results, obtained with the isokinetic Gaussian and Nose-Hoover thermostats, are compared with two other Hamiltonian results. The latter results, based on constrained Hamiltonian thermostats, fail to model correctly the flow of heat.
A binary linear programming formulation of the graph edit distance.
Justice, Derek; Hero, Alfred
2006-08-01
A binary linear programming formulation of the graph edit distance for unweighted, undirected graphs with vertex attributes is derived and applied to a graph recognition problem. A general formulation for editing graphs is used to derive a graph edit distance that is proven to be a metric, provided the cost function for individual edit operations is a metric. Then, a binary linear program is developed for computing this graph edit distance, and polynomial time methods for determining upper and lower bounds on the solution of the binary program are derived by applying solution methods for standard linear programming and the assignment problem. A recognition problem of comparing a sample input graph to a database of known prototype graphs in the context of a chemical information system is presented as an application of the new method. The costs associated with various edit operations are chosen by using a minimum normalized variance criterion applied to pairwise distances between nearest neighbors in the database of prototypes. The new metric is shown to perform quite well in comparison to existing metrics when applied to a database of chemical graphs.
NASA Technical Reports Server (NTRS)
Erickson, J. D.; Nalepka, R. F.
1976-01-01
PROCAMS (Prototype Classification and Mensuration System) has been designed for the classification and mensuration of agricultural crops (specifically small grains including wheat, rye, oats, and barley) through the use of data provided by Landsat. The system includes signature extension as a major feature and incorporates multitemporal as well as early season unitemporal approaches for using multiple training sites. Also addressed are partial cloud cover and cloud shadows, bad data points and lines, as well as changing sun angle and atmospheric state variations.
Cost analysis of life support systems
NASA Technical Reports Server (NTRS)
Yakut, M. M.
1973-01-01
A methodology was developed to predict realistic relative cost of Life Support Systems (LSS) and to define areas of major cost impacts in the development cycle. Emphasis was given to tailoring the cost data for usage by program planners and designers. The equipment classifications used based on the degree of refinement were as follows: (1) Working model; (2) low-fidelity prototype; (3) high-fidelity prototype; and (4) flight-qualified system. The major advanced LSS evaluated included the following: (1) Carbon dioxide removal; (2) oxygen recovery systems; (3) water recovery systems; (4) atmosphere analysis system.
Recent development on computer aided tissue engineering--a review.
Sun, Wei; Lal, Pallavi
2002-02-01
The utilization of computer-aided technologies in tissue engineering has evolved in the development of a new field of computer-aided tissue engineering (CATE). This article reviews recent development and application of enabling computer technology, imaging technology, computer-aided design and computer-aided manufacturing (CAD and CAM), and rapid prototyping (RP) technology in tissue engineering, particularly, in computer-aided tissue anatomical modeling, three-dimensional (3-D) anatomy visualization and 3-D reconstruction, CAD-based anatomical modeling, computer-aided tissue classification, computer-aided tissue implantation and prototype modeling assisted surgical planning and reconstruction.
NASA Astrophysics Data System (ADS)
Jiang, Li; Shi, Tielin; Xuan, Jianping
2012-05-01
Generally, the vibration signals of fault bearings are non-stationary and highly nonlinear under complicated operating conditions. Thus, it's a big challenge to extract optimal features for improving classification and simultaneously decreasing feature dimension. Kernel Marginal Fisher analysis (KMFA) is a novel supervised manifold learning algorithm for feature extraction and dimensionality reduction. In order to avoid the small sample size problem in KMFA, we propose regularized KMFA (RKMFA). A simple and efficient intelligent fault diagnosis method based on RKMFA is put forward and applied to fault recognition of rolling bearings. So as to directly excavate nonlinear features from the original high-dimensional vibration signals, RKMFA constructs two graphs describing the intra-class compactness and the inter-class separability, by combining traditional manifold learning algorithm with fisher criteria. Therefore, the optimal low-dimensional features are obtained for better classification and finally fed into the simplest K-nearest neighbor (KNN) classifier to recognize different fault categories of bearings. The experimental results demonstrate that the proposed approach improves the fault classification performance and outperforms the other conventional approaches.
Myint, S.W.; Yuan, M.; Cerveny, R.S.; Giri, C.P.
2008-01-01
Remote sensing techniques have been shown effective for large-scale damage surveys after a hazardous event in both near real-time or post-event analyses. The paper aims to compare accuracy of common imaging processing techniques to detect tornado damage tracks from Landsat TM data. We employed the direct change detection approach using two sets of images acquired before and after the tornado event to produce a principal component composite images and a set of image difference bands. Techniques in the comparison include supervised classification, unsupervised classification, and objectoriented classification approach with a nearest neighbor classifier. Accuracy assessment is based on Kappa coefficient calculated from error matrices which cross tabulate correctly identified cells on the TM image and commission and omission errors in the result. Overall, the Object-oriented Approach exhibits the highest degree of accuracy in tornado damage detection. PCA and Image Differencing methods show comparable outcomes. While selected PCs can improve detection accuracy 5 to 10%, the Object-oriented Approach performs significantly better with 15-20% higher accuracy than the other two techniques. ?? 2008 by MDPI.
Myint, Soe W.; Yuan, May; Cerveny, Randall S.; Giri, Chandra P.
2008-01-01
Remote sensing techniques have been shown effective for large-scale damage surveys after a hazardous event in both near real-time or post-event analyses. The paper aims to compare accuracy of common imaging processing techniques to detect tornado damage tracks from Landsat TM data. We employed the direct change detection approach using two sets of images acquired before and after the tornado event to produce a principal component composite images and a set of image difference bands. Techniques in the comparison include supervised classification, unsupervised classification, and object-oriented classification approach with a nearest neighbor classifier. Accuracy assessment is based on Kappa coefficient calculated from error matrices which cross tabulate correctly identified cells on the TM image and commission and omission errors in the result. Overall, the Object-oriented Approach exhibits the highest degree of accuracy in tornado damage detection. PCA and Image Differencing methods show comparable outcomes. While selected PCs can improve detection accuracy 5 to 10%, the Object-oriented Approach performs significantly better with 15-20% higher accuracy than the other two techniques. PMID:27879757
Automatic Cataract Hardness Classification Ex Vivo by Ultrasound Techniques.
Caixinha, Miguel; Santos, Mário; Santos, Jaime
2016-04-01
To demonstrate the feasibility of a new methodology for cataract hardness characterization and automatic classification using ultrasound techniques, different cataract degrees were induced in 210 porcine lenses. A 25-MHz ultrasound transducer was used to obtain acoustical parameters (velocity and attenuation) and backscattering signals. B-Scan and parametric Nakagami images were constructed. Ninety-seven parameters were extracted and subjected to a Principal Component Analysis. Bayes, K-Nearest-Neighbours, Fisher Linear Discriminant and Support Vector Machine (SVM) classifiers were used to automatically classify the different cataract severities. Statistically significant increases with cataract formation were found for velocity, attenuation, mean brightness intensity of the B-Scan images and mean Nakagami m parameter (p < 0.01). The four classifiers showed a good performance for healthy versus cataractous lenses (F-measure ≥ 92.68%), while for initial versus severe cataracts the SVM classifier showed the higher performance (90.62%). The results showed that ultrasound techniques can be used for non-invasive cataract hardness characterization and automatic classification. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Developing a radiomics framework for classifying non-small cell lung carcinoma subtypes
NASA Astrophysics Data System (ADS)
Yu, Dongdong; Zang, Yali; Dong, Di; Zhou, Mu; Gevaert, Olivier; Fang, Mengjie; Shi, Jingyun; Tian, Jie
2017-03-01
Patient-targeted treatment of non-small cell lung carcinoma (NSCLC) has been well documented according to the histologic subtypes over the past decade. In parallel, recent development of quantitative image biomarkers has recently been highlighted as important diagnostic tools to facilitate histological subtype classification. In this study, we present a radiomics analysis that classifies the adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). We extract 52-dimensional, CT-based features (7 statistical features and 45 image texture features) to represent each nodule. We evaluate our approach on a clinical dataset including 324 ADCs and 110 SqCCs patients with CT image scans. Classification of these features is performed with four different machine-learning classifiers including Support Vector Machines with Radial Basis Function kernel (RBF-SVM), Random forest (RF), K-nearest neighbor (KNN), and RUSBoost algorithms. To improve the classifiers' performance, optimal feature subset is selected from the original feature set by using an iterative forward inclusion and backward eliminating algorithm. Extensive experimental results demonstrate that radiomics features achieve encouraging classification results on both complete feature set (AUC=0.89) and optimal feature subset (AUC=0.91).
Al-Qazzaz, Noor Kamal; Ali, Sawal; Ahmad, Siti Anom; Escudero, Javier
2017-07-01
The aim of the present study was to discriminate the electroencephalogram (EEG) of 5 patients with vascular dementia (VaD), 15 patients with stroke-related mild cognitive impairment (MCI), and 15 control normal subjects during a working memory (WM) task. We used independent component analysis (ICA) and wavelet transform (WT) as a hybrid preprocessing approach for EEG artifact removal. Three different features were extracted from the cleaned EEG signals: spectral entropy (SpecEn), permutation entropy (PerEn) and Tsallis entropy (TsEn). Two classification schemes were applied - support vector machine (SVM) and k-nearest neighbors (kNN) - with fuzzy neighborhood preserving analysis with QR-decomposition (FNPAQR) as a dimensionality reduction technique. The FNPAQR dimensionality reduction technique increased the SVM classification accuracy from 82.22% to 90.37% and from 82.6% to 86.67% for kNN. These results suggest that FNPAQR consistently improves the discrimination of VaD, MCI patients and control normal subjects and it could be a useful feature selection to help the identification of patients with VaD and MCI.
Shermeyer, Jacob S.; Haack, Barry N.
2015-01-01
Two forestry-change detection methods are described, compared, and contrasted for estimating deforestation and growth in threatened forests in southern Peru from 2000 to 2010. The methods used in this study rely on freely available data, including atmospherically corrected Landsat 5 Thematic Mapper and Moderate Resolution Imaging Spectroradiometer (MODIS) vegetation continuous fields (VCF). The two methods include a conventional supervised signature extraction method and a unique self-calibrating method called MODIS VCF guided forest/nonforest (FNF) masking. The process chain for each of these methods includes a threshold classification of MODIS VCF, training data or signature extraction, signature evaluation, k-nearest neighbor classification, analyst-guided reclassification, and postclassification image differencing to generate forest change maps. Comparisons of all methods were based on an accuracy assessment using 500 validation pixels. Results of this accuracy assessment indicate that FNF masking had a 5% higher overall accuracy and was superior to conventional supervised classification when estimating forest change. Both methods succeeded in classifying persistently forested and nonforested areas, and both had limitations when classifying forest change.
Dissimilarity representations in lung parenchyma classification
NASA Astrophysics Data System (ADS)
Sørensen, Lauge; de Bruijne, Marleen
2009-02-01
A good problem representation is important for a pattern recognition system to be successful. The traditional approach to statistical pattern recognition is feature representation. More specifically, objects are represented by a number of features in a feature vector space, and classifiers are built in this representation. This is also the general trend in lung parenchyma classification in computed tomography (CT) images, where the features often are measures on feature histograms. Instead, we propose to build normal density based classifiers in dissimilarity representations for lung parenchyma classification. This allows for the classifiers to work on dissimilarities between objects, which might be a more natural way of representing lung parenchyma. In this context, dissimilarity is defined between CT regions of interest (ROI)s. ROIs are represented by their CT attenuation histogram and ROI dissimilarity is defined as a histogram dissimilarity measure between the attenuation histograms. In this setting, the full histograms are utilized according to the chosen histogram dissimilarity measure. We apply this idea to classification of different emphysema patterns as well as normal, healthy tissue. Two dissimilarity representation approaches as well as different histogram dissimilarity measures are considered. The approaches are evaluated on a set of 168 CT ROIs using normal density based classifiers all showing good performance. Compared to using histogram dissimilarity directly as distance in a emph{k} nearest neighbor classifier, which achieves a classification accuracy of 92.9%, the best dissimilarity representation based classifier is significantly better with a classification accuracy of 97.0% (text{emph{p" border="0" class="imgtopleft"> = 0.046).
NASA Astrophysics Data System (ADS)
Li, Manchun; Ma, Lei; Blaschke, Thomas; Cheng, Liang; Tiede, Dirk
2016-07-01
Geographic Object-Based Image Analysis (GEOBIA) is becoming more prevalent in remote sensing classification, especially for high-resolution imagery. Many supervised classification approaches are applied to objects rather than pixels, and several studies have been conducted to evaluate the performance of such supervised classification techniques in GEOBIA. However, these studies did not systematically investigate all relevant factors affecting the classification (segmentation scale, training set size, feature selection and mixed objects). In this study, statistical methods and visual inspection were used to compare these factors systematically in two agricultural case studies in China. The results indicate that Random Forest (RF) and Support Vector Machines (SVM) are highly suitable for GEOBIA classifications in agricultural areas and confirm the expected general tendency, namely that the overall accuracies decline with increasing segmentation scale. All other investigated methods except for RF and SVM are more prone to obtain a lower accuracy due to the broken objects at fine scales. In contrast to some previous studies, the RF classifiers yielded the best results and the k-nearest neighbor classifier were the worst results, in most cases. Likewise, the RF and Decision Tree classifiers are the most robust with or without feature selection. The results of training sample analyses indicated that the RF and adaboost. M1 possess a superior generalization capability, except when dealing with small training sample sizes. Furthermore, the classification accuracies were directly related to the homogeneity/heterogeneity of the segmented objects for all classifiers. Finally, it was suggested that RF should be considered in most cases for agricultural mapping.
Kalegowda, Yogesh; Harmer, Sarah L
2012-03-20
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.
Sertel, O.; Kong, J.; Shimada, H.; Catalyurek, U.V.; Saltz, J.H.; Gurcan, M.N.
2009-01-01
We are developing a computer-aided prognosis system for neuroblastoma (NB), a cancer of the nervous system and one of the most malignant tumors affecting children. Histopathological examination is an important stage for further treatment planning in routine clinical diagnosis of NB. According to the International Neuroblastoma Pathology Classification (the Shimada system), NB patients are classified into favorable and unfavorable histology based on the tissue morphology. In this study, we propose an image analysis system that operates on digitized H&E stained whole-slide NB tissue samples and classifies each slide as either stroma-rich or stroma-poor based on the degree of Schwannian stromal development. Our statistical framework performs the classification based on texture features extracted using co-occurrence statistics and local binary patterns. Due to the high resolution of digitized whole-slide images, we propose a multi-resolution approach that mimics the evaluation of a pathologist such that the image analysis starts from the lowest resolution and switches to higher resolutions when necessary. We employ an offine feature selection step, which determines the most discriminative features at each resolution level during the training step. A modified k-nearest neighbor classifier is used to determine the confidence level of the classification to make the decision at a particular resolution level. The proposed approach was independently tested on 43 whole-slide samples and provided an overall classification accuracy of 88.4%. PMID:20161324
Sahan, Seral; Polat, Kemal; Kodaz, Halife; Güneş, Salih
2007-03-01
The use of machine learning tools in medical diagnosis is increasing gradually. This is mainly because the effectiveness of classification and recognition systems has improved in a great deal to help medical experts in diagnosing diseases. Such a disease is breast cancer, which is a very common type of cancer among woman. As the incidence of this disease has increased significantly in the recent years, machine learning applications to this problem have also took a great attention as well as medical consideration. This study aims at diagnosing breast cancer with a new hybrid machine learning method. By hybridizing a fuzzy-artificial immune system with k-nearest neighbour algorithm, a method was obtained to solve this diagnosis problem via classifying Wisconsin Breast Cancer Dataset (WBCD). This data set is a very commonly used data set in the literature relating the use of classification systems for breast cancer diagnosis and it was used in this study to compare the classification performance of our proposed method with regard to other studies. We obtained a classification accuracy of 99.14%, which is the highest one reached so far. The classification accuracy was obtained via 10-fold cross validation. This result is for WBCD but it states that this method can be used confidently for other breast cancer diagnosis problems, too.
Topological side-chain classification of beta-turns: ideal motifs for peptidomimetic development.
Tran, Tran Trung; McKie, Jim; Meutermans, Wim D F; Bourne, Gregory T; Andrews, Peter R; Smythe, Mark L
2005-08-01
Beta-turns are important topological motifs for biological recognition of proteins and peptides. Organic molecules that sample the side chain positions of beta-turns have shown broad binding capacity to multiple different receptors, for example benzodiazepines. Beta-turns have traditionally been classified into various types based on the backbone dihedral angles (phi2, psi2, phi3 and psi3). Indeed, 57-68% of beta-turns are currently classified into 8 different backbone families (Type I, Type II, Type I', Type II', Type VIII, Type VIa1, Type VIa2 and Type VIb and Type IV which represents unclassified beta-turns). Although this classification of beta-turns has been useful, the resulting beta-turn types are not ideal for the design of beta-turn mimetics as they do not reflect topological features of the recognition elements, the side chains. To overcome this, we have extracted beta-turns from a data set of non-homologous and high-resolution protein crystal structures. The side chain positions, as defined by C(alpha)-C(beta) vectors, of these turns have been clustered using the kth nearest neighbor clustering and filtered nearest centroid sorting algorithms. Nine clusters were obtained that cluster 90% of the data, and the average intra-cluster RMSD of the four C(alpha)-C(beta) vectors is 0.36. The nine clusters therefore represent the topology of the side chain scaffold architecture of the vast majority of beta-turns. The mean structures of the nine clusters are useful for the development of beta-turn mimetics and as biological descriptors for focusing combinatorial chemistry towards biologically relevant topological space.
Rapid prototyping of an EEG-based brain-computer interface (BCI).
Guger, C; Schlögl, A; Neuper, C; Walterspacher, D; Strein, T; Pfurtscheller, G
2001-03-01
The electroencephalogram (EEG) is modified by motor imagery and can be used by patients with severe motor impairments (e.g., late stage of amyotrophic lateral sclerosis) to communicate with their environment. Such a direct connection between the brain and the computer is known as an EEG-based brain-computer interface (BCI). This paper describes a new type of BCI system that uses rapid prototyping to enable a fast transition of various types of parameter estimation and classification algorithms to real-time implementation and testing. Rapid prototyping is possible by using Matlab, Simulink, and the Real-Time Workshop. It is shown how to automate real-time experiments and perform the interplay between on-line experiments and offline analysis. The system is able to process multiple EEG channels on-line and operates under Windows 95 in real-time on a standard PC without an additional digital signal processor (DSP) board. The BCI can be controlled over the Internet, LAN or modem. This BCI was tested on 3 subjects whose task it was to imagine either left or right hand movement. A classification accuracy between 70% and 95% could be achieved with two EEG channels after some sessions with feedback using an adaptive autoregressive (AAR) model and linear discriminant analysis (LDA).
NASA Astrophysics Data System (ADS)
Wiegert, R. F.
2009-05-01
A man-portable Magnetic Scalar Triangulation and Ranging ("MagSTAR") technology for Detection, Localization and Classification (DLC) of unexploded ordnance (UXO) has been developed by Naval Surface Warfare Center Panama City Division (NSWC PCD) with support from the Strategic Environmental Research and Development Program (SERDP). Proof of principle of the MagSTAR concept and its unique advantages for real-time, high-mobility magnetic sensing applications have been demonstrated by field tests of a prototype man-portable MagSTAR sensor. The prototype comprises: a) An array of fluxgate magnetometers configured as a multi-tensor gradiometer, b) A GPS-synchronized signal processing system. c) Unique STAR algorithms for point-by-point, standoff DLC of magnetic targets. This paper outlines details of: i) MagSTAR theory, ii) Design and construction of the prototype sensor, iii) Signal processing algorithms recently developed to improve the technology's target-discrimination accuracy, iv) Results of field tests of the portable gradiometer system against magnetic dipole targets. The results demonstrate that the MagSTAR technology is capable of very accurate, high-speed localization of magnetic targets at standoff distances of several meters. These advantages could readily be transitioned to a wide range of defense, security and sensing applications to provide faster and more effective DLC of UXO and buried mines.
Integrating multisource land use and land cover data
Wright, Bruce E.; Tait, Mike; Lins, K.F.; Crawford, J.S.; Benjamin, S.P.; Brown, Jesslyn F.
1995-01-01
As part of the U.S. Geological Survey's (USGS) land use and land cover (LULC) program, the USGS in cooperation with the Environmental Systems Research Institute (ESRI) is collecting and integrating LULC data for a standard USGS 1:100,000-scale product. The LULC data collection techniques include interpreting spectrally clustered Landsat Thematic Mapper (TM) images; interpreting 1-meter resolution digital panchromatic orthophoto images; and, for comparison, aggregating locally available large-scale digital data of urban areas. The area selected is the Vancouver, WA-OR quadrangle, which has a mix of urban, rural agriculture, and forest land. Anticipated products include an integrated LULC prototype data set in a standard classification scheme referenced to the USGS digital line graph (DLG) data of the area and prototype software to develop digital LULC data sets.This project will evaluate a draft standard LULC classification system developed by the USGS for use with various source material and collection techniques. Federal, State, and local governments, and private sector groups will have an opportunity to evaluate the resulting prototype software and data sets and to provide recommendations. It is anticipated that this joint research endeavor will increase future collaboration among interested organizations, public and private, for LULC data collection using common standards and tools.
A personality classification system for eating disorders: a longitudinal study.
Thompson-Brenner, Heather; Eddy, Kamryn T; Franko, Debra L; Dorer, David J; Vashchenko, Maryna; Kass, Andrea E; Herzog, David B
2008-01-01
Studies of eating disorders (EDs) suggest that empirically derived personality subtypes may explain heterogeneity in ED samples that is not captured by the current diagnostic system. Longitudinal outcomes for personality subtypes have not been examined. In this study, personality pathology was assessed by clinical interview in 213 individuals with anorexia nervosa and bulimia nervosa at baseline. Interview data on EDs, comorbid diagnoses, global functioning, and treatment utilization were collected at baseline and at 6-month follow-up intervals over a median of 9 years. Q-factor analysis of the participants based on personality items produced a 5-prototype system, including high-functioning, behaviorally dysregulated, emotionally dysregulated, avoidant-insecure, and obsessional-sensitive types. Dimensional prototype scores were associated with baseline functioning and longitudinal outcome. Avoidant-Insecure scores showed consistent associations with poor functioning and outcome, including failure to show ED improvement, poor global functioning after 5 years, and high treatment utilization after 5 years. Behavioral dysregulation was associated with poor baseline functioning but did not show strong associations with ED or global outcome when histories of major depression and substance use disorder were covaried. Emotional dysregulation and obsessional-sensitivity were not associated with negative outcomes. High-functioning prototype scores were consistently associated with positive outcomes. Longitudinal results support the importance of personality subtypes to ED classification.
Sun, Xiyang; Miao, Jiacheng; Wang, You; Luo, Zhiyuan; Li, Guang
2017-01-01
An estimate on the reliability of prediction in the applications of electronic nose is essential, which has not been paid enough attention. An algorithm framework called conformal prediction is introduced in this work for discriminating different kinds of ginsengs with a home-made electronic nose instrument. Nonconformity measure based on k-nearest neighbors (KNN) is implemented separately as underlying algorithm of conformal prediction. In offline mode, the conformal predictor achieves a classification rate of 84.44% based on 1NN and 80.63% based on 3NN, which is better than that of simple KNN. In addition, it provides an estimate of reliability for each prediction. In online mode, the validity of predictions is guaranteed, which means that the error rate of region predictions never exceeds the significance level set by a user. The potential of this framework for detecting borderline examples and outliers in the application of E-nose is also investigated. The result shows that conformal prediction is a promising framework for the application of electronic nose to make predictions with reliability and validity. PMID:28805721
Biclustering Learning of Trading Rules.
Huang, Qinghua; Wang, Ting; Tao, Dacheng; Li, Xuelong
2015-10-01
Technical analysis with numerous indicators and patterns has been regarded as important evidence for making trading decisions in financial markets. However, it is extremely difficult for investors to find useful trading rules based on numerous technical indicators. This paper innovatively proposes the use of biclustering mining to discover effective technical trading patterns that contain a combination of indicators from historical financial data series. This is the first attempt to use biclustering algorithm on trading data. The mined patterns are regarded as trading rules and can be classified as three trading actions (i.e., the buy, the sell, and no-action signals) with respect to the maximum support. A modified K nearest neighborhood ( K -NN) method is applied to classification of trading days in the testing period. The proposed method [called biclustering algorithm and the K nearest neighbor (BIC- K -NN)] was implemented on four historical datasets and the average performance was compared with the conventional buy-and-hold strategy and three previously reported intelligent trading systems. Experimental results demonstrate that the proposed trading system outperforms its counterparts and will be useful for investment in various financial markets.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa
2018-07-01
Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Na, X D; Zang, S Y; Wu, C S; Li, W L
2015-11-01
Knowledge of the spatial extent of forested wetlands is essential to many studies including wetland functioning assessment, greenhouse gas flux estimation, and wildlife suitable habitat identification. For discriminating forested wetlands from their adjacent land cover types, researchers have resorted to image analysis techniques applied to numerous remotely sensed data. While with some success, there is still no consensus on the optimal approaches for mapping forested wetlands. To address this problem, we examined two machine learning approaches, random forest (RF) and K-nearest neighbor (KNN) algorithms, and applied these two approaches to the framework of pixel-based and object-based classifications. The RF and KNN algorithms were constructed using predictors derived from Landsat 8 imagery, Radarsat-2 advanced synthetic aperture radar (SAR), and topographical indices. The results show that the objected-based classifications performed better than per-pixel classifications using the same algorithm (RF) in terms of overall accuracy and the difference of their kappa coefficients are statistically significant (p<0.01). There were noticeably omissions for forested and herbaceous wetlands based on the per-pixel classifications using the RF algorithm. As for the object-based image analysis, there were also statistically significant differences (p<0.01) of Kappa coefficient between results performed based on RF and KNN algorithms. The object-based classification using RF provided a more visually adequate distribution of interested land cover types, while the object classifications based on the KNN algorithm showed noticeably commissions for forested wetlands and omissions for agriculture land. This research proves that the object-based classification with RF using optical, radar, and topographical data improved the mapping accuracy of land covers and provided a feasible approach to discriminate the forested wetlands from the other land cover types in forestry area.
Kotsianos, D; Rock, C; Wirth, S; Linsenmaier, U; Brandl, R; Fischer, T; Euler, E; Mutschler, W; Pfeifer, K J; Reiser, M
2002-01-01
To analyze a prototype mobile C-arm 3D image amplifier in the detection and classification of experimental tibial condylar fractures with multiplanar reconstructions (MPR). Human knee specimens (n = 22) with tibial condylar fractures were examined with a prototype C-arm (ISO-C-3D, Siemens AG), plain films (CR) and spiral CT (CT). The motorized C-arm provides fluoroscopic images during a 190 degrees orbital rotation computing a 119 mm data cube. From these 3D data sets MP reconstructions were obtained. All images were evaluated by four independent readers for the detection and assessment of fracture lines. All fractures were classified according to the Müller AO classification. To confirm the results, the specimens were finally surgically dissected. 97 % of the tibial condylar fractures were easily seen and correctly classified according to the Müller AO classification on MP reconstruction of the ISO-C-3D. There is no significant difference between ISO-C and CT in detection and correct classification of fractures, but ISO-CD-3D is significant by better than CR. The evaluation of fractures with the ISO-C is better than with plain films alone and comparable to CT scans. The three-dimensional reconstruction of the ISO-C can provide important information which cannot be obtained from plain films. The ISO-C-3D may be useful in planning operative reconstructions and evaluating surgical results in orthopaedic surgery of the limbs.
Han, Ruizhen; He, Yong; Liu, Fei
2012-01-01
This paper presents a feasibility study on a real-time in field pest classification system design based on Blackfin DSP and 3G wireless communication technology. This prototype system is composed of remote on-line classification platform (ROCP), which uses a digital signal processor (DSP) as a core CPU, and a host control platform (HCP). The ROCP is in charge of acquiring the pest image, extracting image features and detecting the class of pest using an Artificial Neural Network (ANN) classifier. It sends the image data, which is encoded using JPEG 2000 in DSP, to the HCP through the 3G network at the same time for further identification. The image transmission and communication are accomplished using 3G technology. Our system transmits the data via a commercial base station. The system can work properly based on the effective coverage of base stations, no matter the distance from the ROCP to the HCP. In the HCP, the image data is decoded and the pest image displayed in real-time for further identification. Authentication and performance tests of the prototype system were conducted. The authentication test showed that the image data were transmitted correctly. Based on the performance test results on six classes of pests, the average accuracy is 82%. Considering the different live pests’ pose and different field lighting conditions, the result is satisfactory. The proposed technique is well suited for implementation in field pest classification on-line for precision agriculture. PMID:22736996
Han, Ruizhen; He, Yong; Liu, Fei
2012-01-01
This paper presents a feasibility study on a real-time in field pest classification system design based on Blackfin DSP and 3G wireless communication technology. This prototype system is composed of remote on-line classification platform (ROCP), which uses a digital signal processor (DSP) as a core CPU, and a host control platform (HCP). The ROCP is in charge of acquiring the pest image, extracting image features and detecting the class of pest using an Artificial Neural Network (ANN) classifier. It sends the image data, which is encoded using JPEG 2000 in DSP, to the HCP through the 3G network at the same time for further identification. The image transmission and communication are accomplished using 3G technology. Our system transmits the data via a commercial base station. The system can work properly based on the effective coverage of base stations, no matter the distance from the ROCP to the HCP. In the HCP, the image data is decoded and the pest image displayed in real-time for further identification. Authentication and performance tests of the prototype system were conducted. The authentication test showed that the image data were transmitted correctly. Based on the performance test results on six classes of pests, the average accuracy is 82%. Considering the different live pests' pose and different field lighting conditions, the result is satisfactory. The proposed technique is well suited for implementation in field pest classification on-line for precision agriculture.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Ecological risk assessment of cheese whey effluents along a medium-sized river in southwest Greece.
Karadima, Constantina; Theodoropoulos, Chris; Rouvalis, Angela; Iliopoulou-Georgudaki, Joan
2010-01-01
An ecological risk assessment of cheese whey effluents was applied in three critical sampling sites located in Vouraikos river (southwest Greece), while ecological classification using Water Framework Directive 2000/60/EU criteria allowed a direct comparison of toxicological and ecological data. Two invertebrates (Daphnia magna and Thamnocephalus platyurus) and the zebra fish Danio rerio were used for toxicological analyses, while the aquatic risk was calculated on the basis of the risk quotient (RQ = PEC/PNEC). Chemical classification of sites was carried out using the Nutrient Classification System, while benthic invertebrates were collected and analyzed for biological classification. Toxicological results revealed the heavy pollution load of the two sites, nearest to the point pollution source, as the PEC/PNEC ratio exceeded 1.0, while unexpectedly, no risk was detected for the most downstream site, due to the consequent interference of the riparian flora. These toxicological results were in agreement with the ecological analysis: the ecological quality of the two heavily impacted sites ranged from moderate to bad, whereas it was found good for the most downstream site. The results of the study indicate major ecological risk for almost 15 km downstream of the point pollution source and the potentiality of the water quality remediation by the riparian vegetation, proving the significance of its maintenance.
Application of texture analysis method for mammogram density classification
NASA Astrophysics Data System (ADS)
Nithya, R.; Santhi, B.
2017-07-01
Mammographic density is considered a major risk factor for developing breast cancer. This paper proposes an automated approach to classify breast tissue types in digital mammogram. The main objective of the proposed Computer-Aided Diagnosis (CAD) system is to investigate various feature extraction methods and classifiers to improve the diagnostic accuracy in mammogram density classification. Texture analysis methods are used to extract the features from the mammogram. Texture features are extracted by using histogram, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Difference Matrix (GLDM), Local Binary Pattern (LBP), Entropy, Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT), Gabor transform and trace transform. These extracted features are selected using Analysis of Variance (ANOVA). The features selected by ANOVA are fed into the classifiers to characterize the mammogram into two-class (fatty/dense) and three-class (fatty/glandular/dense) breast density classification. This work has been carried out by using the mini-Mammographic Image Analysis Society (MIAS) database. Five classifiers are employed namely, Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). Experimental results show that ANN provides better performance than LDA, NB, KNN and SVM classifiers. The proposed methodology has achieved 97.5% accuracy for three-class and 99.37% for two-class density classification.
Goshvarpour, Ateke; Goshvarpour, Atefeh
2018-04-30
Heart rate variability (HRV) analysis has become a widely used tool for monitoring pathological and psychological states in medical applications. In a typical classification problem, information fusion is a process whereby the effective combination of the data can achieve a more accurate system. The purpose of this article was to provide an accurate algorithm for classifying HRV signals in various psychological states. Therefore, a novel feature level fusion approach was proposed. First, using the theory of information, two similarity indicators of the signal were extracted, including correntropy and Cauchy-Schwarz divergence. Applying probabilistic neural network (PNN) and k-nearest neighbor (kNN), the performance of each index in the classification of meditators and non-meditators HRV signals was appraised. Then, three fusion rules, including division, product, and weighted sum rules were used to combine the information of both similarity measures. For the first time, we propose an algorithm to define the weights of each feature based on the statistical p-values. The performance of HRV classification using combined features was compared with the non-combined features. Totally, the accuracy of 100% was obtained for discriminating all states. The results showed the strong ability and proficiency of division and weighted sum rules in the improvement of the classifier accuracies.
NASA Astrophysics Data System (ADS)
Babic, Z.; Pilipovic, R.; Risojevic, V.; Mirjanic, G.
2016-06-01
Honey bees have crucial role in pollination across the world. This paper presents a simple, non-invasive, system for pollen bearing honey bee detection in surveillance video obtained at the entrance of a hive. The proposed system can be used as a part of a more complex system for tracking and counting of honey bees with remote pollination monitoring as a final goal. The proposed method is executed in real time on embedded systems co-located with a hive. Background subtraction, color segmentation and morphology methods are used for segmentation of honey bees. Classification in two classes, pollen bearing honey bees and honey bees that do not have pollen load, is performed using nearest mean classifier, with a simple descriptor consisting of color variance and eccentricity features. On in-house data set we achieved correct classification rate of 88.7% with 50 training images per class. We show that the obtained classification results are not far behind from the results of state-of-the-art image classification methods. That favors the proposed method, particularly having in mind that real time video transmission to remote high performance computing workstation is still an issue, and transfer of obtained parameters of pollination process is much easier.
Anterior Chamber Angle Shape Analysis and Classification of Glaucoma in SS-OCT Images.
Ni Ni, Soe; Tian, J; Marziliano, Pina; Wong, Hong-Tym
2014-01-01
Optical coherence tomography is a high resolution, rapid, and noninvasive diagnostic tool for angle closure glaucoma. In this paper, we present a new strategy for the classification of the angle closure glaucoma using morphological shape analysis of the iridocorneal angle. The angle structure configuration is quantified by the following six features: (1) mean of the continuous measurement of the angle opening distance; (2) area of the trapezoidal profile of the iridocorneal angle centered at Schwalbe's line; (3) mean of the iris curvature from the extracted iris image; (4) complex shape descriptor, fractal dimension, to quantify the complexity, or changes of iridocorneal angle; (5) ellipticity moment shape descriptor; and (6) triangularity moment shape descriptor. Then, the fuzzy k nearest neighbor (fkNN) classifier is utilized for classification of angle closure glaucoma. Two hundred and sixty-four swept source optical coherence tomography (SS-OCT) images from 148 patients were analyzed in this study. From the experimental results, the fkNN reveals the best classification accuracy (99.11 ± 0.76%) and AUC (0.98 ± 0.012) with the combination of fractal dimension and biometric parameters. It showed that the proposed approach has promising potential to become a computer aided diagnostic tool for angle closure glaucoma (ACG) disease.
Martins, Lucia Regina Rocha; Pereira-Filho, Edenir Rodrigues; Cass, Quezia Bezerra
2011-04-01
Taking in consideration the global analysis of complex samples, proposed by the metabolomic approach, the chromatographic fingerprint encompasses an attractive chemical characterization of herbal medicines. Thus, it can be used as a tool in quality control analysis of phytomedicines. The generated multivariate data are better evaluated by chemometric analyses, and they can be modeled by classification methods. "Stone breaker" is a popular Brazilian plant of Phyllanthus genus, used worldwide to treat renal calculus, hepatitis, and many other diseases. In this study, gradient elution at reversed-phase conditions with detection at ultraviolet region were used to obtain chemical profiles (fingerprints) of botanically identified samples of six Phyllanthus species. The obtained chromatograms, at 275 nm, were organized in data matrices, and the time shifts of peaks were adjusted using the Correlation Optimized Warping algorithm. Principal Component Analyses were performed to evaluate similarities among cultivated and uncultivated samples and the discrimination among the species and, after that, the samples were used to compose three classification models using Soft Independent Modeling of Class analogy, K-Nearest Neighbor, and Partial Least Squares for Discriminant Analysis. The ability of classification models were discussed after their successful application for authenticity evaluation of 25 commercial samples of "stone breaker."
Pattern recognition for passive polarimetric data using nonparametric classifiers
NASA Astrophysics Data System (ADS)
Thilak, Vimal; Saini, Jatinder; Voelz, David G.; Creusere, Charles D.
2005-08-01
Passive polarization based imaging is a useful tool in computer vision and pattern recognition. A passive polarization imaging system forms a polarimetric image from the reflection of ambient light that contains useful information for computer vision tasks such as object detection (classification) and recognition. Applications of polarization based pattern recognition include material classification and automatic shape recognition. In this paper, we present two target detection algorithms for images captured by a passive polarimetric imaging system. The proposed detection algorithms are based on Bayesian decision theory. In these approaches, an object can belong to one of any given number classes and classification involves making decisions that minimize the average probability of making incorrect decisions. This minimum is achieved by assigning an object to the class that maximizes the a posteriori probability. Computing a posteriori probabilities requires estimates of class conditional probability density functions (likelihoods) and prior probabilities. A Probabilistic neural network (PNN), which is a nonparametric method that can compute Bayes optimal boundaries, and a -nearest neighbor (KNN) classifier, is used for density estimation and classification. The proposed algorithms are applied to polarimetric image data gathered in the laboratory with a liquid crystal-based system. The experimental results validate the effectiveness of the above algorithms for target detection from polarimetric data.
Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong
2017-01-01
A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification. PMID:28629202
Classification of CT examinations for COPD visual severity analysis
NASA Astrophysics Data System (ADS)
Tan, Jun; Zheng, Bin; Wang, Xingwei; Pu, Jiantao; Gur, David; Sciurba, Frank C.; Leader, J. Ken
2012-03-01
In this study we present a computational method of CT examination classification into visual assessed emphysema severity. The visual severity categories ranged from 0 to 5 and were rated by an experienced radiologist. The six categories were none, trace, mild, moderate, severe and very severe. Lung segmentation was performed for every input image and all image features are extracted from the segmented lung only. We adopted a two-level feature representation method for the classification. Five gray level distribution statistics, six gray level co-occurrence matrix (GLCM), and eleven gray level run-length (GLRL) features were computed for each CT image depicted segment lung. Then we used wavelets decomposition to obtain the low- and high-frequency components of the input image, and again extract from the lung region six GLCM features and eleven GLRL features. Therefore our feature vector length is 56. The CT examinations were classified using the support vector machine (SVM) and k-nearest neighbors (KNN) and the traditional threshold (density mask) approach. The SVM classifier had the highest classification performance of all the methods with an overall sensitivity of 54.4% and a 69.6% sensitivity to discriminate "no" and "trace visually assessed emphysema. We believe this work may lead to an automated, objective method to categorically classify emphysema severity on CT exam.
Primary mass discrimination of high energy cosmic rays using PNN and k-NN methods
NASA Astrophysics Data System (ADS)
Rastegarzadeh, G.; Nemati, M.
2018-02-01
Probabilistic neural network (PNN) and k-Nearest Neighbors (k-NN) methods are widely used data classification techniques. In this paper, these two methods have been used to classify the Extensive Air Shower (EAS) data sets which were simulated using the CORSIKA code for three primary cosmic rays. The primaries are proton, oxygen and iron nuclei at energies of 100 TeV-10 PeV. This study is performed in the following of the investigations into the primary cosmic ray mass sensitive observables. We propose a new approach for measuring the mass sensitive observables of EAS in order to improve the primary mass separation. In this work, the EAS observables measurement has performed locally instead of total measurements. Also the relationships between the included number of observables in the classification methods and the prediction accuracy have been investigated. We have shown that the local measurements and inclusion of more mass sensitive observables in the classification processes can improve the classifying quality and also we have shown that muons and electrons energy density can be considered as primary mass sensitive observables in primary mass classification. Also it must be noted that this study is performed for Tehran observation level without considering the details of any certain EAS detection array.
Spectral-Based Volume Sensor Prototype, Post-VS4 Test Series Algorithm Development
2009-04-30
Computer Pcorr Probabilty / Percentage of Correct Classification (# Correct / # Total) PD PhotoDiode Pd Probabilty / Percentage of Detection (# Correct...Detections / Total of Sources) Pfa Probabilty / Percentage of False Alarm (# FAs / Total # of Sources) SBVS Spectral-Based Volume Sensor SFA Smoke and
Category Representation for Classification and Feature Inference
ERIC Educational Resources Information Center
Johansen, Mark K.; Kruschke, John K.
2005-01-01
This research's purpose was to contrast the representations resulting from learning of the same categories by either classifying instances or inferring instance features. Prior inference learning research, particularly T. Yamauchi and A. B. Markman (1998), has suggested that feature inference learning fosters prototype representation, whereas…
NASA Astrophysics Data System (ADS)
Lee, Michael; Freed, Adrian; Wessel, David
1992-08-01
In this report we present our tools for prototyping adaptive user interfaces in the context of real-time musical instrument control. Characteristic of most human communication is the simultaneous use of classified events and estimated parameters. We have integrated a neural network object into the MAX language to explore adaptive user interfaces that considers these facets of human communication. By placing the neural processing in the context of a flexible real-time musical programming environment, we can rapidly prototype experiments on applications of adaptive interfaces and learning systems to musical problems. We have trained networks to recognize gestures from a Mathews radio baton, Nintendo Power GloveTM, and MIDI keyboard gestural input devices. In one experiment, a network successfully extracted classification and attribute data from gestural contours transduced by a continuous space controller, suggesting their application in the interpretation of conducting gestures and musical instrument control. We discuss network architectures, low-level features extracted for the networks to operate on, training methods, and musical applications of adaptive techniques.
Two-pass imputation algorithm for missing value estimation in gene expression time series.
Tsiporkova, Elena; Boeva, Veselka
2007-10-01
Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.
[Schizophrenia or spiritual crisis? On "raising the kundalini" and its diagnostic classification].
Hansen, G
1995-07-31
Two patients are described who had been diagnosed as schizophrenic, but had actually instead been going through spiritual crises, which in Eastern spiritual tradition are called raising the kundalini. Perhaps this experience is not a disease, but many--especially if not understood by oneself, the nearest relations and the medical profession--cause mental illness. In WHO ICD-10 the experience could be classified as F48.8, disordines neurotici specificati alii. The process falls outside the categories of both normal and psychotic. When allowed to progress to completion this process culminates in deep psychological balance, strength, and maturity.
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification
1999-05-17
Experimental Results In this section, we compare kNN -mut which uses the weight vector obtained using mutual information as the fi- nal weight vector and...WAKNN against kNN , C4.5 [Qui93], RIPPER [Coh95], PEBLS [CS93], Rainbow [McC96], VSM [Low95] on several synthetic and real data sets. VSM is another k...obtained without this option. 3 C4.5 RIPPER PEBLS Rainbow kNN WAKNN Syn-1 100.0 100.0 100.0 100.0 77.3 100.0 Syn-2 67.5 69.5 62.0 50.0 66.0 68.8 Syn
Cooperative light-induced molecular movements of highly ordered azobenzene self-assembled monolayers
Pace, Giuseppina; Ferri, Violetta; Grave, Christian; Elbing, Mark; von Hänisch, Carsten; Zharnikov, Michael; Mayor, Marcel; Rampi, Maria Anita; Samorì, Paolo
2007-01-01
Photochromic systems can convert light energy into mechanical energy, thus they can be used as building blocks for the fabrication of prototypes of molecular devices that are based on the photomechanical effect. Hitherto a controlled photochromic switch on surfaces has been achieved either on isolated chromophores or within assemblies of randomly arranged molecules. Here we show by scanning tunneling microscopy imaging the photochemical switching of a new terminally thiolated azobiphenyl rigid rod molecule. Interestingly, the switching of entire molecular 2D crystalline domains is observed, which is ruled by the interactions between nearest neighbors. This observation of azobenzene-based systems displaying collective switching might be of interest for applications in high-density data storage. PMID:17535889
Validity of prototype diagnosis for mood and anxiety disorders.
DeFife, Jared A; Peart, Joanne; Bradley, Bekh; Ressler, Kerry; Drill, Rebecca; Westen, Drew
2013-02-01
CONTEXT With growing recognition that most forms of psychopathology are best represented as dimensions or spectra, a central question becomes how to implement dimensional diagnosis in a way that is empirically sound and clinically useful. Prototype matching, which involves comparing a patient's clinical presentation with a prototypical description of the disorder, is an approach to diagnosis that has gained increasing attention with forthcoming revisions to both the DSM and the International Classification of Diseases. OBJECTIVE To examine prototype diagnosis for mood and anxiety disorders. DESIGN, SETTING, AND PATIENTS In the first study, we examined clinicians' DSM-IV and prototype diagnoses with their ratings of the patients' adaptive functioning and patients' self-reported symptoms. In the second study, independent interviewers made prototype diagnoses following either a systematic clinical interview or a structured diagnostic interview. A third interviewer provided independent ratings of global adaptive functioning. Patients were recruited as outpatients (study 1; N = 84) and from primary care clinics (study 2; N = 143). MAIN OUTCOME MEASURES Patients' self-reported mood, anxiety, and externalizing symptoms along with independent clinical ratings of adaptive functioning. RESULTS Clinicians' prototype diagnoses showed small to moderate correlations with patient-reported psychopathology and performed as well as or better than DSM-IV diagnoses. Prototype diagnoses from independent interviewers correlated on average r = .50 and showed substantial incremental validity over DSM-IV diagnoses in predicting adaptive functioning. CONCLUSIONS Prototype matching is a viable alternative for psychiatric diagnosis. As in research on personality disorders, mood and anxiety disorder prototypes outperformed DSM-IV decision rules in predicting psychopathology and global functioning. Prototype matching has multiple advantages, including ease of use in clinical practice, reduced artifactual comorbidity, compatibility with naturally occurring cognitive processes in diagnosticians, and ready translation into both categorical and dimensional diagnosis.
Vehicle Classification Using the Discrete Fourier Transform with Traffic Inductive Sensors.
Lamas-Seco, José J; Castro, Paula M; Dapena, Adriana; Vazquez-Araujo, Francisco J
2015-10-26
Inductive Loop Detectors (ILDs) are the most commonly used sensors in traffic management systems. This paper shows that some spectral features extracted from the Fourier Transform (FT) of inductive signatures do not depend on the vehicle speed. Such a property is used to propose a novel method for vehicle classification based on only one signature acquired from a sensor single-loop, in contrast to standard methods using two sensor loops. Our proposal will be evaluated by means of real inductive signatures captured with our hardware prototype.
Bhavani, Selvaraj Rani; Senthilkumar, Jagatheesan; Chilambuchelvan, Arul Gnanaprakasam; Manjula, Dhanabalachandran; Krishnamoorthy, Ramasamy; Kannan, Arputharaj
2015-03-27
The Internet has greatly enhanced health care, helping patients stay up-to-date on medical issues and general knowledge. Many cancer patients use the Internet for cancer diagnosis and related information. Recently, cloud computing has emerged as a new way of delivering health services but currently, there is no generic and fully automated cloud-based self-management intervention for breast cancer patients, as practical guidelines are lacking. We investigated the prevalence and predictors of cloud use for medical diagnosis among women with breast cancer to gain insight into meaningful usage parameters to evaluate the use of generic, fully automated cloud-based self-intervention, by assessing how breast cancer survivors use a generic self-management model. The goal of this study was implemented and evaluated with a new prototype called "CIMIDx", based on representative association rules that support the diagnosis of medical images (mammograms). The proposed Cloud-Based System Support Intelligent Medical Image Diagnosis (CIMIDx) prototype includes two modules. The first is the design and development of the CIMIDx training and test cloud services. Deployed in the cloud, the prototype can be used for diagnosis and screening mammography by assessing the cancers detected, tumor sizes, histology, and stage of classification accuracy. To analyze the prototype's classification accuracy, we conducted an experiment with data provided by clients. Second, by monitoring cloud server requests, the CIMIDx usage statistics were recorded for the cloud-based self-intervention groups. We conducted an evaluation of the CIMIDx cloud service usage, in which browsing functionalities were evaluated from the end-user's perspective. We performed several experiments to validate the CIMIDx prototype for breast health issues. The first set of experiments evaluated the diagnostic performance of the CIMIDx framework. We collected medical information from 150 breast cancer survivors from hospitals and health centers. The CIMIDx prototype achieved high sensitivity of up to 99.29%, and accuracy of up to 98%. The second set of experiments evaluated CIMIDx use for breast health issues, using t tests and Pearson chi-square tests to assess differences, and binary logistic regression to estimate the odds ratio (OR) for the predictors' use of CIMIDx. For the prototype usage statistics for the same 150 breast cancer survivors, we interviewed 114 (76.0%), through self-report questionnaires from CIMIDx blogs. The frequency of log-ins/person ranged from 0 to 30, total duration/person from 0 to 1500 minutes (25 hours). The 114 participants continued logging in to all phases, resulting in an intervention adherence rate of 44.3% (95% CI 33.2-55.9). The overall performance of the prototype for the good category, reported usefulness of the prototype (P=.77), overall satisfaction of the prototype (P=.31), ease of navigation (P=.89), user friendliness evaluation (P=.31), and overall satisfaction (P=.31). Positive evaluations given by 100 participants via a Web-based questionnaire supported our hypothesis. The present study shows that women felt favorably about the use of a generic fully automated cloud-based self- management prototype. The study also demonstrated that the CIMIDx prototype resulted in the detection of more cancers in screening and diagnosing patients, with an increased accuracy rate.
Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I
2017-01-01
A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.
Functional Basis of Microorganism Classification.
Zhu, Chengsheng; Delmont, Tom O; Vogel, Timothy M; Bromberg, Yana
2015-08-01
Correctly identifying nearest "neighbors" of a given microorganism is important in industrial and clinical applications where close relationships imply similar treatment. Microbial classification based on similarity of physiological and genetic organism traits (polyphasic similarity) is experimentally difficult and, arguably, subjective. Evolutionary relatedness, inferred from phylogenetic markers, facilitates classification but does not guarantee functional identity between members of the same taxon or lack of similarity between different taxa. Using over thirteen hundred sequenced bacterial genomes, we built a novel function-based microorganism classification scheme, functional-repertoire similarity-based organism network (FuSiON; flattened to fusion). Our scheme is phenetic, based on a network of quantitatively defined organism relationships across the known prokaryotic space. It correlates significantly with the current taxonomy, but the observed discrepancies reveal both (1) the inconsistency of functional diversity levels among different taxa and (2) an (unsurprising) bias towards prioritizing, for classification purposes, relatively minor traits of particular interest to humans. Our dynamic network-based organism classification is independent of the arbitrary pairwise organism similarity cut-offs traditionally applied to establish taxonomic identity. Instead, it reveals natural, functionally defined organism groupings and is thus robust in handling organism diversity. Additionally, fusion can use organism meta-data to highlight the specific environmental factors that drive microbial diversification. Our approach provides a complementary view to cladistic assignments and holds important clues for further exploration of microbial lifestyles. Fusion is a more practical fit for biomedical, industrial, and ecological applications, as many of these rely on understanding the functional capabilities of the microbes in their environment and are less concerned with phylogenetic descent.
Functional Basis of Microorganism Classification
Zhu, Chengsheng; Delmont, Tom O.; Vogel, Timothy M.; Bromberg, Yana
2015-01-01
Correctly identifying nearest “neighbors” of a given microorganism is important in industrial and clinical applications where close relationships imply similar treatment. Microbial classification based on similarity of physiological and genetic organism traits (polyphasic similarity) is experimentally difficult and, arguably, subjective. Evolutionary relatedness, inferred from phylogenetic markers, facilitates classification but does not guarantee functional identity between members of the same taxon or lack of similarity between different taxa. Using over thirteen hundred sequenced bacterial genomes, we built a novel function-based microorganism classification scheme, functional-repertoire similarity-based organism network (FuSiON; flattened to fusion). Our scheme is phenetic, based on a network of quantitatively defined organism relationships across the known prokaryotic space. It correlates significantly with the current taxonomy, but the observed discrepancies reveal both (1) the inconsistency of functional diversity levels among different taxa and (2) an (unsurprising) bias towards prioritizing, for classification purposes, relatively minor traits of particular interest to humans. Our dynamic network-based organism classification is independent of the arbitrary pairwise organism similarity cut-offs traditionally applied to establish taxonomic identity. Instead, it reveals natural, functionally defined organism groupings and is thus robust in handling organism diversity. Additionally, fusion can use organism meta-data to highlight the specific environmental factors that drive microbial diversification. Our approach provides a complementary view to cladistic assignments and holds important clues for further exploration of microbial lifestyles. Fusion is a more practical fit for biomedical, industrial, and ecological applications, as many of these rely on understanding the functional capabilities of the microbes in their environment and are less concerned with phylogenetic descent. PMID:26317871
NASA Astrophysics Data System (ADS)
Liu, F.; Chen, T.; He, J.; Wen, Q.; Yu, F.; Gu, X.; Wang, Z.
2018-04-01
In recent years, the quick upgrading and improvement of SAR sensors provide beneficial complements for the traditional optical remote sensing in the aspects of theory, technology and data. In this paper, Sentinel-1A SAR data and GF-1 optical data were selected for image fusion, and more emphases were put on the dryland crop classification under a complex crop planting structure, regarding corn and cotton as the research objects. Considering the differences among various data fusion methods, the principal component analysis (PCA), Gram-Schmidt (GS), Brovey and wavelet transform (WT) methods were compared with each other, and the GS and Brovey methods were proved to be more applicable in the study area. Then, the classification was conducted based on the object-oriented technique process. And for the GS, Brovey fusion images and GF-1 optical image, the nearest neighbour algorithm was adopted to realize the supervised classification with the same training samples. Based on the sample plots in the study area, the accuracy assessment was conducted subsequently. The values of overall accuracy and kappa coefficient of fusion images were all higher than those of GF-1 optical image, and GS method performed better than Brovey method. In particular, the overall accuracy of GS fusion image was 79.8 %, and the Kappa coefficient was 0.644. Thus, the results showed that GS and Brovey fusion images were superior to optical images for dryland crop classification. This study suggests that the fusion of SAR and optical images is reliable for dryland crop classification under a complex crop planting structure.
Automated detection and recognition of wildlife using thermal cameras.
Christiansen, Peter; Steen, Kim Arild; Jørgensen, Rasmus Nyholm; Karstoft, Henrik
2014-07-30
In agricultural mowing operations, thousands of animals are injured or killed each year, due to the increased working widths and speeds of agricultural machinery. Detection and recognition of wildlife within the agricultural fields is important to reduce wildlife mortality and, thereby, promote wildlife-friendly farming. The work presented in this paper contributes to the automated detection and classification of animals in thermal imaging. The methods and results are based on top-view images taken manually from a lift to motivate work towards unmanned aerial vehicle-based detection and recognition. Hot objects are detected based on a threshold dynamically adjusted to each frame. For the classification of animals, we propose a novel thermal feature extraction algorithm. For each detected object, a thermal signature is calculated using morphological operations. The thermal signature describes heat characteristics of objects and is partly invariant to translation, rotation, scale and posture. The discrete cosine transform (DCT) is used to parameterize the thermal signature and, thereby, calculate a feature vector, which is used for subsequent classification. Using a k-nearest-neighbor (kNN) classifier, animals are discriminated from non-animals with a balanced classification accuracy of 84.7% in an altitude range of 3-10 m and an accuracy of 75.2% for an altitude range of 10-20 m. To incorporate temporal information in the classification, a tracking algorithm is proposed. Using temporal information improves the balanced classification accuracy to 93.3% in an altitude range 3-10 of meters and 77.7% in an altitude range of 10-20 m.
Benchmarking protein classification algorithms via supervised cross-validation.
Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor
2008-04-24
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
Höller, Yvonne; Bergmann, Jürgen; Thomschewski, Aljoscha; Kronbichler, Martin; Höller, Peter; Crone, Julia S.; Schmid, Elisabeth V.; Butz, Kevin; Nardone, Raffaele; Trinka, Eugen
2013-01-01
Current research aims at identifying voluntary brain activation in patients who are behaviorally diagnosed as being unconscious, but are able to perform commands by modulating their brain activity patterns. This involves machine learning techniques and feature extraction methods such as applied in brain computer interfaces. In this study, we try to answer the question if features/classification methods which show advantages in healthy participants are also accurate when applied to data of patients with disorders of consciousness. A sample of healthy participants (N = 22), patients in a minimally conscious state (MCS; N = 5), and with unresponsive wakefulness syndrome (UWS; N = 9) was examined with a motor imagery task which involved imagery of moving both hands and an instruction to hold both hands firm. We extracted a set of 20 features from the electroencephalogram and used linear discriminant analysis, k-nearest neighbor classification, and support vector machines (SVM) as classification methods. In healthy participants, the best classification accuracies were seen with coherences (mean = .79; range = .53−.94) and power spectra (mean = .69; range = .40−.85). The coherence patterns in healthy participants did not match the expectation of central modulated -rhythm. Instead, coherence involved mainly frontal regions. In healthy participants, the best classification tool was SVM. Five patients had at least one feature-classifier outcome with p0.05 (none of which were coherence or power spectra), though none remained significant after false-discovery rate correction for multiple comparisons. The present work suggests the use of coherences in patients with disorders of consciousness because they show high reliability among healthy subjects and patient groups. However, feature extraction and classification is a challenging task in unresponsive patients because there is no ground truth to validate the results. PMID:24282545
Neves, R C; Stokol, T; Bach, K D; McArt, J A A
2018-02-01
The objective of this study was to assess an optimized ion-selective electrode Ca-module prototype as a potential cow-side device for ionized Ca (iCa) measurements in bovine blood. A linearity experiment showed no deviation from linearity over a range of iCa concentrations compared with a commercial point-of-care (POC) device commonly used in the field (POC VS ; VetScan i-STAT, Abaxis North America, Union City, CA) and a laboratory gold standard benchtop blood-gas analyzer [reference analyzer (RA); ABL-800 FLEX, Radiometer Medical, Copenhagen, Denmark]. Coefficient of variation on 3 samples with high, within-range, and low iCa concentrations ranged from 1.0 to 3.9% for the prototype. A follow-up validation experiment was performed, in which our objectives were to (1) assess the performance of the prototype cow-side against the POC VS (farm gold-standard) using fresh non-anticoagulated whole-blood samples; (2) assess the performance of the prototype and the POC VS against the RA in a diagnostic laboratory using blood collected in a heparin-balanced syringe; and (3) assess the agreement of the prototype and POC VS on-farm (fresh non-anticoagulated whole blood) against the RA on heparin-balanced blood. Finally, sensitivity and specificity of the results obtained by the prototype and the POC VS cow-side compared with the results obtained by the laboratory RA using 3 different iCa cut points for classification of subclinical hypocalcemia were calculated. A total of 101 periparturient Holstein cows from 3 dairy farms in New York State were used for the second experiment. Ionized Ca results from the prototype cow-side were, on average, 0.06 mmol/L higher than the POC VS . With heparin-balanced samples under laboratory conditions, the prototype and POC VS measured an average 0.04 mmol/L higher and lower, respectively, compared with the RA. Results from the prototype and POC VS cow-side were 0.01 mmol/L higher and 0.05 mmol/L lower, respectively, compared with results from the laboratory RA on heparinized blood. Sensitivity and specificity for the prototype and the POC VS under farm conditions at 3 potential subclinical hypocalcemia cut points were 100 and ≥93.5%, respectively. This novel ion-selective electrode Ca-module could become a rapid low-cost tool for assessing iCa cow-side, while qualitatively allowing classification of subclinical hypocalcemia on-farm. The Authors. Published by the Federation of Animal Science Societies and Elsevier Inc. on behalf of the American Dairy Science Association®. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
NASA Astrophysics Data System (ADS)
Wang, Yujie; Wang, Zhen; Wang, Yanli; Liu, Taigang; Zhang, Wenbing
2018-01-01
The thermodynamic and kinetic parameters of an RNA base pair with different nearest and next nearest neighbors were obtained through long-time molecular dynamics simulation of the opening-closing switch process of the base pair near its melting temperature. The results indicate that thermodynamic parameters of GC base pair are dependent on the nearest neighbor base pair, and the next nearest neighbor base pair has little effect, which validated the nearest-neighbor model. The closing and opening rates of the GC base pair also showed nearest neighbor dependences. At certain temperature, the closing and opening rates of the GC pair with nearest neighbor AU is larger than that with the nearest neighbor GC, and the next nearest neighbor plays little role. The free energy landscape of the GC base pair with the nearest neighbor GC is rougher than that with nearest neighbor AU.
A prototype of mammography CADx scheme integrated to imaging quality evaluation techniques
NASA Astrophysics Data System (ADS)
Schiabel, Homero; Matheus, Bruno R. N.; Angelo, Michele F.; Patrocínio, Ana Claudia; Ventura, Liliane
2011-03-01
As all women over the age of 40 are recommended to perform mammographic exams every two years, the demands on radiologists to evaluate mammographic images in short periods of time has increased considerably. As a tool to improve quality and accelerate analysis CADe/Dx (computer-aided detection/diagnosis) schemes have been investigated, but very few complete CADe/Dx schemes have been developed and most are restricted to detection and not diagnosis. The existent ones usually are associated to specific mammographic equipment (usually DR), which makes them very expensive. So this paper describes a prototype of a complete mammography CADx scheme developed by our research group integrated to an imaging quality evaluation process. The basic structure consists of pre-processing modules based on image acquisition and digitization procedures (FFDM, CR or film + scanner), a segmentation tool to detect clustered microcalcifications and suspect masses and a classification scheme, which evaluates as the presence of microcalcifications clusters as well as possible malignant masses based on their contour. The aim is to provide enough information not only on the detected structures but also a pre-report with a BI-RADS classification. At this time the system is still lacking an interface integrating all the modules. Despite this, it is functional as a prototype for clinical practice testing, with results comparable to others reported in literature.
Ozcift, Akin
2012-08-01
Parkinson disease (PD) is an age-related deterioration of certain nerve systems, which affects movement, balance, and muscle control of clients. PD is one of the common diseases which affect 1% of people older than 60 years. A new classification scheme based on support vector machine (SVM) selected features to train rotation forest (RF) ensemble classifiers is presented for improving diagnosis of PD. The dataset contains records of voice measurements from 31 people, 23 with PD and each record in the dataset is defined with 22 features. The diagnosis model first makes use of a linear SVM to select ten most relevant features from 22. As a second step of the classification model, six different classifiers are trained with the subset of features. Subsequently, at the third step, the accuracies of classifiers are improved by the utilization of RF ensemble classification strategy. The results of the experiments are evaluated using three metrics; classification accuracy (ACC), Kappa Error (KE) and Area under the Receiver Operating Characteristic (ROC) Curve (AUC). Performance measures of two base classifiers, i.e. KStar and IBk, demonstrated an apparent increase in PD diagnosis accuracy compared to similar studies in literature. After all, application of RF ensemble classification scheme improved PD diagnosis in 5 of 6 classifiers significantly. We, numerically, obtained about 97% accuracy in RF ensemble of IBk (a K-Nearest Neighbor variant) algorithm, which is a quite high performance for Parkinson disease diagnosis.
Breast cancer Ki67 expression preoperative discrimination by DCE-MRI radiomics features
NASA Astrophysics Data System (ADS)
Ma, Wenjuan; Ji, Yu; Qin, Zhuanping; Guo, Xinpeng; Jian, Xiqi; Liu, Peifang
2018-02-01
To investigate whether quantitative radiomics features extracted from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) are associated with Ki67 expression of breast cancer. In this institutional review board approved retrospective study, we collected 377 cases Chinese women who were diagnosed with invasive breast cancer in 2015. This cohort included 53 low-Ki67 expression (Ki67 proliferation index less than 14%) and 324 cases with high-Ki67 expression (Ki67 proliferation index more than 14%). A binary-classification of low- vs. high- Ki67 expression was performed. A set of 52 quantitative radiomics features, including morphological, gray scale statistic, and texture features, were extracted from the segmented lesion area. Three most common machine learning classification methods, including Naive Bayes, k-Nearest Neighbor and support vector machine with Gaussian kernel, were employed for the classification and the least absolute shrink age and selection operator (LASSO) method was used to select most predictive features set for the classifiers. Classification performance was evaluated by the area under receiver operating characteristic curve (AUC), accuracy, sensitivity and specificity. The model that used Naive Bayes classification method achieved the best performance than the other two methods, yielding 0.773 AUC value, 0.757 accuracy, 0.777 sensitivity and 0.769 specificity. Our study showed that quantitative radiomics imaging features of breast tumor extracted from DCE-MRI are associated with breast cancer Ki67 expression. Future larger studies are needed in order to further evaluate the findings.
DNA barcode analysis: a comparison of phylogenetic and statistical classification methods.
Austerlitz, Frederic; David, Olivier; Schaeffer, Brigitte; Bleakley, Kevin; Olteanu, Madalina; Leblois, Raphael; Veuille, Michel; Laredo, Catherine
2009-11-10
DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods. The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.
Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas
2014-07-01
Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Chow, Candace; Twele, André; Martinis, Sandro
2016-10-01
Flood extent maps derived from Synthetic Aperture Radar (SAR) data can communicate spatially-explicit information in a timely and cost-effective manner to support disaster management. Automated processing chains for SAR-based flood mapping have the potential to substantially reduce the critical time delay between the delivery of post-event satellite data and the subsequent provision of satellite derived crisis information to emergency management authorities. However, the accuracy of SAR-based flood mapping can vary drastically due to the prevalent land cover and topography of a given scene. While expert-based image interpretation with the consideration of contextual information can effectively isolate flood surface features, a fully-automated feature differentiation algorithm mainly based on the grey levels of a given pixel is comparatively more limited for features with similar SAR-backscattering characteristics. The inclusion of ancillary data in the automatic classification procedure can effectively reduce instances of misclassification. In this work, a near-global `Height Above Nearest Drainage' (HAND) index [10] was calculated with digital elevation data and drainage directions from the HydroSHEDS mapping project [2]. The index can be used to separate flood-prone regions from areas with a low probability of flood occurrence. Based on the HAND-index, an exclusion mask was computed to reduce water look-alikes with respect to the hydrologictopographic setting. The applicability of this near-global ancillary data set for the thematic improvement of Sentinel-1 and TerraSAR-X based services for flood and surface water monitoring has been validated both qualitatively and quantitatively. Application of a HAND-based exclusion mask resulted in improvements to the classification accuracy of SAR scenes with high amounts of water look-alikes and considerable elevation differences.
NASA Astrophysics Data System (ADS)
Kuijf, Hugo J.; Moeskops, Pim; de Vos, Bob D.; Bouvy, Willem H.; de Bresser, Jeroen; Biessels, Geert Jan; Viergever, Max A.; Vincken, Koen L.
2016-03-01
Novelty detection is concerned with identifying test data that differs from the training data of a classifier. In the case of brain MR images, pathology or imaging artefacts are examples of untrained data. In this proof-of-principle study, we measure the behaviour of a classifier during the classification of trained labels (i.e. normal brain tissue). Next, we devise a measure that distinguishes normal classifier behaviour from abnormal behavior that occurs in the case of a novelty. This will be evaluated by training a kNN classifier on normal brain tissue, applying it to images with an untrained pathology (white matter hyperintensities (WMH)), and determine if our measure is able to identify abnormal classifier behaviour at WMH locations. For our kNN classifier, behaviour is modelled as the mean, median, or q1 distance to the k nearest points. Healthy tissue was trained on 15 images; classifier behaviour was trained/tested on 5 images with leave-one-out cross-validation. For each trained class, we measure the distribution of mean/median/q1 distances to the k nearest point. Next, for each test voxel, we compute its Z-score with respect to the measured distribution of its predicted label. We consider a Z-score >=4 abnormal behaviour of the classifier, having a probability due to chance of 0.000032. Our measure identified >90% of WMH volume and also highlighted other non-trained findings. The latter being predominantly vessels, cerebral falx, brain mask errors, choroid plexus. This measure is generalizable to other classifiers and might help in detecting unexpected findings or novelties by measuring classifier behaviour.
2013-01-01
Background Identifying the emotional state is helpful in applications involving patients with autism and other intellectual disabilities; computer-based training, human computer interaction etc. Electrocardiogram (ECG) signals, being an activity of the autonomous nervous system (ANS), reflect the underlying true emotional state of a person. However, the performance of various methods developed so far lacks accuracy, and more robust methods need to be developed to identify the emotional pattern associated with ECG signals. Methods Emotional ECG data was obtained from sixty participants by inducing the six basic emotional states (happiness, sadness, fear, disgust, surprise and neutral) using audio-visual stimuli. The non-linear feature ‘Hurst’ was computed using Rescaled Range Statistics (RRS) and Finite Variance Scaling (FVS) methods. New Hurst features were proposed by combining the existing RRS and FVS methods with Higher Order Statistics (HOS). The features were then classified using four classifiers – Bayesian Classifier, Regression Tree, K- nearest neighbor and Fuzzy K-nearest neighbor. Seventy percent of the features were used for training and thirty percent for testing the algorithm. Results Analysis of Variance (ANOVA) conveyed that Hurst and the proposed features were statistically significant (p < 0.001). Hurst computed using RRS and FVS methods showed similar classification accuracy. The features obtained by combining FVS and HOS performed better with a maximum accuracy of 92.87% and 76.45% for classifying the six emotional states using random and subject independent validation respectively. Conclusions The results indicate that the combination of non-linear analysis and HOS tend to capture the finer emotional changes that can be seen in healthy ECG data. This work can be further fine tuned to develop a real time system. PMID:23680041
Nearest neighbors by neighborhood counting.
Wang, Hui
2006-06-01
Finding nearest neighbors is a general idea that underlies many artificial intelligence tasks, including machine learning, data mining, natural language understanding, and information retrieval. This idea is explicitly used in the k-nearest neighbors algorithm (kNN), a popular classification method. In this paper, this idea is adopted in the development of a general methodology, neighborhood counting, for devising similarity functions. We turn our focus from neighbors to neighborhoods, a region in the data space covering the data point in question. To measure the similarity between two data points, we consider all neighborhoods that cover both data points. We propose to use the number of such neighborhoods as a measure of similarity. Neighborhood can be defined for different types of data in different ways. Here, we consider one definition of neighborhood for multivariate data and derive a formula for such similarity, called neighborhood counting measure or NCM. NCM was tested experimentally in the framework of kNN. Experiments show that NCM is generally comparable to VDM and its variants, the state-of-the-art distance functions for multivariate data, and, at the same time, is consistently better for relatively large k values. Additionally, NCM consistently outperforms HEOM (a mixture of Euclidean and Hamming distances), the "standard" and most widely used distance function for multivariate data. NCM has a computational complexity in the same order as the standard Euclidean distance function and NCM is task independent and works for numerical and categorical data in a conceptually uniform way. The neighborhood counting methodology is proven sound for multivariate data experimentally. We hope it will work for other types of data.
Linear time relational prototype based learning.
Gisbrecht, Andrej; Mokbel, Bassam; Schleif, Frank-Michael; Zhu, Xibin; Hammer, Barbara
2012-10-01
Prototype based learning offers an intuitive interface to inspect large quantities of electronic data in supervised or unsupervised settings. Recently, many techniques have been extended to data described by general dissimilarities rather than Euclidean vectors, so-called relational data settings. Unlike the Euclidean counterparts, the techniques have quadratic time complexity due to the underlying quadratic dissimilarity matrix. Thus, they are infeasible already for medium sized data sets. The contribution of this article is twofold: On the one hand we propose a novel supervised prototype based classification technique for dissimilarity data based on popular learning vector quantization (LVQ), on the other hand we transfer a linear time approximation technique, the Nyström approximation, to this algorithm and an unsupervised counterpart, the relational generative topographic mapping (GTM). This way, linear time and space methods result. We evaluate the techniques on three examples from the biomedical domain.
Comparing K-mer based methods for improved classification of 16S sequences.
Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars
2015-07-01
The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.
Knowledge-based approach to video content classification
NASA Astrophysics Data System (ADS)
Chen, Yu; Wong, Edward K.
2001-01-01
A framework for video content classification using a knowledge-based approach is herein proposed. This approach is motivated by the fact that videos are rich in semantic contents, which can best be interpreted and analyzed by human experts. We demonstrate the concept by implementing a prototype video classification system using the rule-based programming language CLIPS 6.05. Knowledge for video classification is encoded as a set of rules in the rule base. The left-hand-sides of rules contain high level and low level features, while the right-hand-sides of rules contain intermediate results or conclusions. Our current implementation includes features computed from motion, color, and text extracted from video frames. Our current rule set allows us to classify input video into one of five classes: news, weather, reporting, commercial, basketball and football. We use MYCIN's inexact reasoning method for combining evidences, and to handle the uncertainties in the features and in the classification results. We obtained good results in a preliminary experiment, and it demonstrated the validity of the proposed approach.
Knowledge-based approach to video content classification
NASA Astrophysics Data System (ADS)
Chen, Yu; Wong, Edward K.
2000-12-01
A framework for video content classification using a knowledge-based approach is herein proposed. This approach is motivated by the fact that videos are rich in semantic contents, which can best be interpreted and analyzed by human experts. We demonstrate the concept by implementing a prototype video classification system using the rule-based programming language CLIPS 6.05. Knowledge for video classification is encoded as a set of rules in the rule base. The left-hand-sides of rules contain high level and low level features, while the right-hand-sides of rules contain intermediate results or conclusions. Our current implementation includes features computed from motion, color, and text extracted from video frames. Our current rule set allows us to classify input video into one of five classes: news, weather, reporting, commercial, basketball and football. We use MYCIN's inexact reasoning method for combining evidences, and to handle the uncertainties in the features and in the classification results. We obtained good results in a preliminary experiment, and it demonstrated the validity of the proposed approach.
Characterisation of the natural environment: quantitative indicators across Europe.
Smith, Graham; Cirach, Marta; Swart, Wim; Dėdelė, Audrius; Gidlow, Christopher; van Kempen, Elise; Kruize, Hanneke; Gražulevičienė, Regina; Nieuwenhuijsen, Mark J
2017-04-26
The World Health Organization recognises the importance of natural environments for human health. Evidence for natural environment-health associations comes largely from single countries or regions, with varied approaches to measuring natural environment exposure. We present a standardised approach to measuring neighbourhood natural environment exposure in cities in different regions of Europe. The Positive Health Effects of the Natural Outdoor environment in TYPical populations of different regions in Europe (PHENOTYPE) study aimed to explore the mechanisms linking natural environment exposure and health in four European cities (Barcelona, Spain; Doetinchem, the Netherlands; Kaunas, Lithuania; and Stoke-on-Trent, UK). Common GIS protocols were used to develop a hierarchy of natural environment measures, from simple measures (e.g., NDVI, Urban Atlas) using Europe-wide data sources, to detailed measures derived from local data that were specific to mechanisms thought to underpin natural environment-health associations (physical activity, social interaction, stress reduction/restoration). Indicators were created around residential addresses for a range of straight line and network buffers (100 m-1 km). For simple indicators derived from Europe-wide data, we observed differences between cities, which varied with different indicators (e.g., Kaunas and Doetinchem had equal highest mean NDVI within 100 m buffer, but mean distance to nearest natural environment in Kaunas was more twice that in Doetinchem). Mean distance to nearest natural environment for all cities suggested that most participants lived close to some kind of natural environments (64 ± 58-363 ± 281 m; mean 180 ± 204 m). The detailed classification highlighted marked between-city differences in terms of prominent types of natural environment. Indicators specific to mechanisms derived from this classification also captured more variation than the simple indicators. Distance to nearest and count indicators showed clear differences between cities, and those specific to the mechanisms showed within-city differences for Barcelona and Doetinchem. This paper demonstrates the feasibility and challenges of creating comparable GIS-derived natural environment exposure indicators across diverse European cities. Mechanism-specific indicators showed within- and between-city variability that supports their utility for ecological studies, which could inform more specific policy recommendations than the traditional proxies for natural environment access.
Texture analysis applied to second harmonic generation image data for ovarian cancer classification
NASA Astrophysics Data System (ADS)
Wen, Bruce L.; Brewer, Molly A.; Nadiarnykh, Oleg; Hocker, James; Singh, Vikas; Mackie, Thomas R.; Campagnola, Paul J.
2014-09-01
Remodeling of the extracellular matrix has been implicated in ovarian cancer. To quantitate the remodeling, we implement a form of texture analysis to delineate the collagen fibrillar morphology observed in second harmonic generation microscopy images of human normal and high grade malignant ovarian tissues. In the learning stage, a dictionary of "textons"-frequently occurring texture features that are identified by measuring the image response to a filter bank of various shapes, sizes, and orientations-is created. By calculating a representative model based on the texton distribution for each tissue type using a training set of respective second harmonic generation images, we then perform classification between images of normal and high grade malignant ovarian tissues. By optimizing the number of textons and nearest neighbors, we achieved classification accuracy up to 97% based on the area under receiver operating characteristic curves (true positives versus false positives). The local analysis algorithm is a more general method to probe rapidly changing fibrillar morphologies than global analyses such as FFT. It is also more versatile than other texture approaches as the filter bank can be highly tailored to specific applications (e.g., different disease states) by creating customized libraries based on common image features.
A new local-global approach for classification.
Peres, R T; Pedreira, C E
2010-09-01
In this paper, we propose a new local-global pattern classification scheme that combines supervised and unsupervised approaches, taking advantage of both, local and global environments. We understand as global methods the ones concerned with the aim of constructing a model for the whole problem space using the totality of the available observations. Local methods focus into sub regions of the space, possibly using an appropriately selected subset of the sample. In the proposed method, the sample is first divided in local cells by using a Vector Quantization unsupervised algorithm, the LBG (Linde-Buzo-Gray). In a second stage, the generated assemblage of much easier problems is locally solved with a scheme inspired by Bayes' rule. Four classification methods were implemented for comparison purposes with the proposed scheme: Learning Vector Quantization (LVQ); Feedforward Neural Networks; Support Vector Machine (SVM) and k-Nearest Neighbors. These four methods and the proposed scheme were implemented in eleven datasets, two controlled experiments, plus nine public available datasets from the UCI repository. The proposed method has shown a quite competitive performance when compared to these classical and largely used classifiers. Our method is simple concerning understanding and implementation and is based on very intuitive concepts. Copyright 2010 Elsevier Ltd. All rights reserved.
Drivelos, Spiros A; Higgins, Kevin; Kalivas, John H; Haroutounian, Serkos A; Georgiou, Constantinos A
2014-12-15
"Fava Santorinis", is a protected designation of origin (PDO) yellow split pea species growing only in the island of Santorini in Greece. Due to its nutritional quality and taste, it has gained a high monetary value. Thus, it is prone to adulteration with other yellow split peas. In order to discriminate "Fava Santorinis" from other yellow split peas, four classification methods utilising rare earth elements (REEs) measured through inductively coupled plasma-mass spectrometry (ICP-MS) are studied. The four classification processes are orthogonal projection analysis (OPA), Mahalanobis distance (MD), partial least squares discriminant analysis (PLS-DA) and k nearest neighbours (KNN). Since it is known that trace elements are often useful to determine geographical origin of food products, we further quantitated for trace elements using ICP-MS. Presented in this paper are results using the four classification processes based on the fusion of the REEs data with the trace element data. Overall, the OPA method was found to perform best with up to 100% accuracy using the fused data. Copyright © 2014 Elsevier Ltd. All rights reserved.
Automated 3D Phenotype Analysis Using Data Mining
Plyusnin, Ilya; Evans, Alistair R.; Karme, Aleksis; Gionis, Aristides; Jernvall, Jukka
2008-01-01
The ability to analyze and classify three-dimensional (3D) biological morphology has lagged behind the analysis of other biological data types such as gene sequences. Here, we introduce the techniques of data mining to the study of 3D biological shapes to bring the analyses of phenomes closer to the efficiency of studying genomes. We compiled five training sets of highly variable morphologies of mammalian teeth from the MorphoBrowser database. Samples were labeled either by dietary class or by conventional dental types (e.g. carnassial, selenodont). We automatically extracted a multitude of topological attributes using Geographic Information Systems (GIS)-like procedures that were then used in several combinations of feature selection schemes and probabilistic classification models to build and optimize classifiers for predicting the labels of the training sets. In terms of classification accuracy, computational time and size of the feature sets used, non-repeated best-first search combined with 1-nearest neighbor classifier was the best approach. However, several other classification models combined with the same searching scheme proved practical. The current study represents a first step in the automatic analysis of 3D phenotypes, which will be increasingly valuable with the future increase in 3D morphology and phenomics databases. PMID:18320060
Breathing Pattern Interpretation as an Alternative and Effective Voice Communication Solution.
Elsahar, Yasmin; Bouazza-Marouf, Kaddour; Kerr, David; Gaur, Atul; Kaushik, Vipul; Hu, Sijung
2018-05-15
Augmentative and alternative communication (AAC) systems tend to rely on the interpretation of purposeful gestures for interaction. Existing AAC methods could be cumbersome and limit the solutions in terms of versatility. The study aims to interpret breathing patterns (BPs) to converse with the outside world by means of a unidirectional microphone and researches breathing-pattern interpretation (BPI) to encode messages in an interactive manner with minimal training. We present BP processing work with (1) output synthesized machine-spoken words (SMSW) along with single-channel Weiner filtering (WF) for signal de-noising, and (2) k -nearest neighbor ( k-NN ) classification of BPs associated with embedded dynamic time warping (DTW). An approved protocol to collect analogue modulated BP sets belonging to 4 distinct classes with 10 training BPs per class and 5 live BPs per class was implemented with 23 healthy subjects. An 86% accuracy of k-NN classification was obtained with decreasing error rates of 17%, 14%, and 11% for the live classifications of classes 2, 3, and 4, respectively. The results express a systematic reliability of 89% with increased familiarity. The outcomes from the current AAC setup recommend a durable engineering solution directly beneficial to the sufferers.
Texture Classification by Texton: Statistical versus Binary
Guo, Zhenhua; Zhang, Zhongcheng; Li, Xiu; Li, Qin; You, Jane
2014-01-01
Using statistical textons for texture classification has shown great success recently. The maximal response 8 (Statistical_MR8), image patch (Statistical_Joint) and locally invariant fractal (Statistical_Fractal) are typical statistical texton algorithms and state-of-the-art texture classification methods. However, there are two limitations when using these methods. First, it needs a training stage to build a texton library, thus the recognition accuracy will be highly depended on the training samples; second, during feature extraction, local feature is assigned to a texton by searching for the nearest texton in the whole library, which is time consuming when the library size is big and the dimension of feature is high. To address the above two issues, in this paper, three binary texton counterpart methods were proposed, Binary_MR8, Binary_Joint, and Binary_Fractal. These methods do not require any training step but encode local feature into binary representation directly. The experimental results on the CUReT, UIUC and KTH-TIPS databases show that binary texton could get sound results with fast feature extraction, especially when the image size is not big and the quality of image is not poor. PMID:24520346
LMD Based Features for the Automatic Seizure Detection of EEG Signals Using SVM.
Zhang, Tao; Chen, Wanzhong
2017-08-01
Achieving the goal of detecting seizure activity automatically using electroencephalogram (EEG) signals is of great importance and significance for the treatment of epileptic seizures. To realize this aim, a newly-developed time-frequency analytical algorithm, namely local mean decomposition (LMD), is employed in the presented study. LMD is able to decompose an arbitrary signal into a series of product functions (PFs). Primarily, the raw EEG signal is decomposed into several PFs, and then the temporal statistical and non-linear features of the first five PFs are calculated. The features of each PF are fed into five classifiers, including back propagation neural network (BPNN), K-nearest neighbor (KNN), linear discriminant analysis (LDA), un-optimized support vector machine (SVM) and SVM optimized by genetic algorithm (GA-SVM), for five classification cases, respectively. Confluent features of all PFs and raw EEG are further passed into the high-performance GA-SVM for the same classification tasks. Experimental results on the international public Bonn epilepsy EEG dataset show that the average classification accuracy of the presented approach are equal to or higher than 98.10% in all the five cases, and this indicates the effectiveness of the proposed approach for automated seizure detection.
Feature selection gait-based gender classification under different circumstances
NASA Astrophysics Data System (ADS)
Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah
2014-05-01
This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.
Hodyna, Diana; Kovalishyn, Vasyl; Rogalsky, Sergiy; Blagodatnyi, Volodymyr; Petko, Kirill; Metelytsia, Larisa
2016-09-01
Predictive QSAR models for the inhibitors of B. subtilis and Ps. aeruginosa among imidazolium-based ionic liquids were developed using literary data. The regression QSAR models were created through Artificial Neural Network and k-nearest neighbor procedures. The classification QSAR models were constructed using WEKA-RF (random forest) method. The predictive ability of the models was tested by fivefold cross-validation; giving q(2) = 0.77-0.92 for regression models and accuracy 83-88% for classification models. Twenty synthesized samples of 1,3-dialkylimidazolium ionic liquids with predictive value of activity level of antimicrobial potential were evaluated. For all asymmetric 1,3-dialkylimidazolium ionic liquids, only compounds containing at least one radical with alkyl chain length of 12 carbon atoms showed high antibacterial activity. However, the activity of symmetric 1,3-dialkylimidazolium salts was found to have opposite relationship with the length of aliphatic radical being maximum for compounds based on 1,3-dioctylimidazolium cation. The obtained experimental results suggested that the application of classification QSAR models is more accurate for the prediction of activity of new imidazolium-based ILs as potential antibacterials. © 2016 John Wiley & Sons A/S.
A Step Towards EEG-based Brain Computer Interface for Autism Intervention*
Fan, Jing; Wade, Joshua W.; Bian, Dayi; Key, Alexandra P.; Warren, Zachary E.; Mion, Lorraine C.; Sarkar, Nilanjan
2017-01-01
Autism Spectrum Disorder (ASD) is a prevalent and costly neurodevelopmental disorder. Individuals with ASD often have deficits in social communication skills as well as adaptive behavior skills related to daily activities. We have recently designed a novel virtual reality (VR) based driving simulator for driving skill training for individuals with ASD. In this paper, we explored the feasibility of detecting engagement level, emotional states, and mental workload during VR-based driving using EEG as a first step towards a potential EEG-based Brain Computer Interface (BCI) for assisting autism intervention. We used spectral features of EEG signals from a 14-channel EEG neuroheadset, together with therapist ratings of behavioral engagement, enjoyment, frustration, boredom, and difficulty to train a group of classification models. Seven classification methods were applied and compared including Bayes network, naïve Bayes, Support Vector Machine (SVM), multilayer perceptron, K-nearest neighbors (KNN), random forest, and J48. The classification results were promising, with over 80% accuracy in classifying engagement and mental workload, and over 75% accuracy in classifying emotional states. Such results may lead to an adaptive closed-loop VR-based skill training system for use in autism intervention. PMID:26737113
Diaz, Naryttza N; Krause, Lutz; Goesmann, Alexander; Niehaus, Karsten; Nattkemper, Tim W
2009-01-01
Background Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. Results Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained. Conclusion An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date. PMID:19210774
NASA Technical Reports Server (NTRS)
Haralick, R. H. (Principal Investigator); Bosley, R. J.
1974-01-01
The author has identified the following significant results. A procedure was developed to extract cross-band textural features from ERTS MSS imagery. Evolving from a single image texture extraction procedure which uses spatial dependence matrices to measure relative co-occurrence of nearest neighbor grey tones, the cross-band texture procedure uses the distribution of neighboring grey tone N-tuple differences to measure the spatial interrelationships, or co-occurrences, of the grey tone N-tuples present in a texture pattern. In both procedures, texture is characterized in such a way as to be invariant under linear grey tone transformations. However, the cross-band procedure complements the single image procedure by extracting texture information and spectral information contained in ERTS multi-images. Classification experiments show that when used alone, without spectral processing, the cross-band texture procedure extracts more information than the single image texture analysis. Results show an improvement in average correct classification from 86.2% to 88.8% for ERTS image no. 1021-16333 with the cross-band texture procedure. However, when used together with spectral features, the single image texture plus spectral features perform better than the cross-band texture plus spectral features, with an average correct classification of 93.8% and 91.6%, respectively.
Using Bayesian neural networks to classify forest scenes
NASA Astrophysics Data System (ADS)
Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni
1998-10-01
We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.
Balouchestani, Mohammadreza; Krishnan, Sridhar
2014-01-01
Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
NASA Astrophysics Data System (ADS)
Lesniak, J. M.; Hupse, R.; Blanc, R.; Karssemeijer, N.; Székely, G.
2012-08-01
False positive (FP) marks represent an obstacle for effective use of computer-aided detection (CADe) of breast masses in mammography. Typically, the problem can be approached either by developing more discriminative features or by employing different classifier designs. In this paper, the usage of support vector machine (SVM) classification for FP reduction in CADe is investigated, presenting a systematic quantitative evaluation against neural networks, k-nearest neighbor classification, linear discriminant analysis and random forests. A large database of 2516 film mammography examinations and 73 input features was used to train the classifiers and evaluate for their performance on correctly diagnosed exams as well as false negatives. Further, classifier robustness was investigated using varying training data and feature sets as input. The evaluation was based on the mean exam sensitivity in 0.05-1 FPs on normals on the free-response receiver operating characteristic curve (FROC), incorporated into a tenfold cross validation framework. It was found that SVM classification using a Gaussian kernel offered significantly increased detection performance (P = 0.0002) compared to the reference methods. Varying training data and input features, SVMs showed improved exploitation of large feature sets. It is concluded that with the SVM-based CADe a significant reduction of FPs is possible outperforming other state-of-the-art approaches for breast mass CADe.
Mathias, Patrick C; Turner, Emily H; Scroggins, Sheena M; Salipante, Stephen J; Hoffman, Noah G; Pritchard, Colin C; Shirts, Brian H
2016-03-01
To apply techniques for ancestry and sex computation from next-generation sequencing (NGS) data as an approach to confirm sample identity and detect sample processing errors. We combined a principal component analysis method with k-nearest neighbors classification to compute the ancestry of patients undergoing NGS testing. By combining this calculation with X chromosome copy number data, we determined the sex and ancestry of patients for comparison with self-report. We also modeled the sensitivity of this technique in detecting sample processing errors. We applied this technique to 859 patient samples with reliable self-report data. Our k-nearest neighbors ancestry screen had an accuracy of 98.7% for patients reporting a single ancestry. Visual inspection of principal component plots was consistent with self-report in 99.6% of single-ancestry and mixed-ancestry patients. Our model demonstrates that approximately two-thirds of potential sample swaps could be detected in our patient population using this technique. Patient ancestry can be estimated from NGS data incidentally sequenced in targeted panels, enabling an inexpensive quality control method when coupled with patient self-report. © American Society for Clinical Pathology, 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Vitola, Jaime; Pozo, Francesc; Tibaduiza, Diego A.; Anaya, Maribel
2017-01-01
Civil and military structures are susceptible and vulnerable to damage due to the environmental and operational conditions. Therefore, the implementation of technology to provide robust solutions in damage identification (by using signals acquired directly from the structure) is a requirement to reduce operational and maintenance costs. In this sense, the use of sensors permanently attached to the structures has demonstrated a great versatility and benefit since the inspection system can be automated. This automation is carried out with signal processing tasks with the aim of a pattern recognition analysis. This work presents the detailed description of a structural health monitoring (SHM) system based on the use of a piezoelectric (PZT) active system. The SHM system includes: (i) the use of a piezoelectric sensor network to excite the structure and collect the measured dynamic response, in several actuation phases; (ii) data organization; (iii) advanced signal processing techniques to define the feature vectors; and finally; (iv) the nearest neighbor algorithm as a machine learning approach to classify different kinds of damage. A description of the experimental setup, the experimental validation and a discussion of the results from two different structures are included and analyzed. PMID:28230796
Vitola, Jaime; Pozo, Francesc; Tibaduiza, Diego A; Anaya, Maribel
2017-02-21
Civil and military structures are susceptible and vulnerable to damage due to the environmental and operational conditions. Therefore, the implementation of technology to provide robust solutions in damage identification (by using signals acquired directly from the structure) is a requirement to reduce operational and maintenance costs. In this sense, the use of sensors permanently attached to the structures has demonstrated a great versatility and benefit since the inspection system can be automated. This automation is carried out with signal processing tasks with the aim of a pattern recognition analysis. This work presents the detailed description of a structural health monitoring (SHM) system based on the use of a piezoelectric (PZT) active system. The SHM system includes: (i) the use of a piezoelectric sensor network to excite the structure and collect the measured dynamic response, in several actuation phases; (ii) data organization; (iii) advanced signal processing techniques to define the feature vectors; and finally; (iv) the nearest neighbor algorithm as a machine learning approach to classify different kinds of damage. A description of the experimental setup, the experimental validation and a discussion of the results from two different structures are included and analyzed.
Galván-Tejada, Carlos E.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Celaya-Padilla, José M.; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L.
2017-01-01
Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions. PMID:28216571
Galván-Tejada, Carlos E; Zanella-Calzada, Laura A; Galván-Tejada, Jorge I; Celaya-Padilla, José M; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L
2017-02-14
Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.
NASA Astrophysics Data System (ADS)
Brandl, Miriam B.; Beck, Dominik; Pham, Tuan D.
2011-06-01
The high dimensionality of image-based dataset can be a drawback for classification accuracy. In this study, we propose the application of fuzzy c-means clustering, cluster validity indices and the notation of a joint-feature-clustering matrix to find redundancies of image-features. The introduced matrix indicates how frequently features are grouped in a mutual cluster. The resulting information can be used to find data-derived feature prototypes with a common biological meaning, reduce data storage as well as computation times and improve the classification accuracy.
The Educational Index: Linking Academic Instruction to Capital Planning
ERIC Educational Resources Information Center
Anding, Craig W.; Richards, David; Zoller, Susan C.
2012-01-01
The Bailey Yard is the largest rail classification yard in the world, stretching eight miles across the western prairie at North Platte, Nebraska. Trains leave Bailey Yard everyday--coal trains, grain trains, manifest trains--and each one represents a train prototype. Minneapolis Public School buildings are like a manifest train, where the…
Implementing a Knowledge-Based Library Information System with Typed Horn Logic.
ERIC Educational Resources Information Center
Ait-Kaci, Hassan; And Others
1990-01-01
Describes a prototype library expert system called BABEL which uses a new programing language, LOGIN, that combines the idea of attribute inheritance with logic programing. Use of hierarchical classification of library objects to build a knowledge base for a library information system is explained, and further research is suggested. (11…
Machine learning for a Toolkit for Image Mining
NASA Technical Reports Server (NTRS)
Delanoy, Richard L.
1995-01-01
A prototype user environment is described that enables a user with very limited computer skills to collaborate with a computer algorithm to develop search tools (agents) that can be used for image analysis, creating metadata for tagging images, searching for images in an image database on the basis of image content, or as a component of computer vision algorithms. Agents are learned in an ongoing, two-way dialogue between the user and the algorithm. The user points to mistakes made in classification. The algorithm, in response, attempts to discover which image attributes are discriminating between objects of interest and clutter. It then builds a candidate agent and applies it to an input image, producing an 'interest' image highlighting features that are consistent with the set of objects and clutter indicated by the user. The dialogue repeats until the user is satisfied. The prototype environment, called the Toolkit for Image Mining (TIM) is currently capable of learning spectral and textural patterns. Learning exhibits rapid convergence to reasonable levels of performance and, when thoroughly trained, Fo appears to be competitive in discrimination accuracy with other classification techniques.
Classifying multispectral data by neural networks
NASA Technical Reports Server (NTRS)
Telfer, Brian A.; Szu, Harold H.; Kiang, Richard K.
1993-01-01
Several energy functions for synthesizing neural networks are tested on 2-D synthetic data and on Landsat-4 Thematic Mapper data. These new energy functions, designed specifically for minimizing misclassification error, in some cases yield significant improvements in classification accuracy over the standard least mean squares energy function. In addition to operating on networks with one output unit per class, a new energy function is tested for binary encoded outputs, which result in smaller network sizes. The Thematic Mapper data (four bands were used) is classified on a single pixel basis, to provide a starting benchmark against which further improvements will be measured. Improvements are underway to make use of both subpixel and superpixel (i.e. contextual or neighborhood) information in tile processing. For single pixel classification, the best neural network result is 78.7 percent, compared with 71.7 percent for a classical nearest neighbor classifier. The 78.7 percent result also improves on several earlier neural network results on this data.
Determination of quality television programmes based on sentiment analysis on Twitter
NASA Astrophysics Data System (ADS)
Amalia, A.; Oktinas, W.; Aulia, I.; Rahmat, R. F.
2018-03-01
Public sentiment from social media like Twitter can be used as one of the indicators to determine the quality of TV Programmes. In this study, we implemented information extraction on Twitter by using sentiment analysis method to assess the quality of TV Programmes. The first stage of this study is pre-processing which consists of cleansing, case folding, tokenizing, stop-word removal, stemming, and redundancy filtering. The next stage is weighting process for every single word by using TF-IDF method. The last step of this study is the sentiment classification process which is divided into three sentiment category which is positive, negative and neutral. We classify the TV programmes into several categories such as news, children, or films/soap operas. We implemented an improved k-nearest neighbor method in classification 4000 twitter status, for four biggest TV stations in Indonesia, with ratio 70% data for training and 30% of data for the testing process. The result obtained from this research generated the highest accuracy with k=10 as big as 90%.
NASA Astrophysics Data System (ADS)
Morizet, N.; Godin, N.; Tang, J.; Maillet, E.; Fregonese, M.; Normand, B.
2016-03-01
This paper aims to propose a novel approach to classify acoustic emission (AE) signals deriving from corrosion experiments, even if embedded into a noisy environment. To validate this new methodology, synthetic data are first used throughout an in-depth analysis, comparing Random Forests (RF) to the k-Nearest Neighbor (k-NN) algorithm. Moreover, a new evaluation tool called the alter-class matrix (ACM) is introduced to simulate different degrees of uncertainty on labeled data for supervised classification. Then, tests on real cases involving noise and crevice corrosion are conducted, by preprocessing the waveforms including wavelet denoising and extracting a rich set of features as input of the RF algorithm. To this end, a software called RF-CAM has been developed. Results show that this approach is very efficient on ground truth data and is also very promising on real data, especially for its reliability, performance and speed, which are serious criteria for the chemical industry.
Vafaee Sharbaf, Fatemeh; Mosafer, Sara; Moattar, Mohammad Hossein
2016-06-01
This paper proposes an approach for gene selection in microarray data. The proposed approach consists of a primary filter approach using Fisher criterion which reduces the initial genes and hence the search space and time complexity. Then, a wrapper approach which is based on cellular learning automata (CLA) optimized with ant colony method (ACO) is used to find the set of features which improve the classification accuracy. CLA is applied due to its capability to learn and model complicated relationships. The selected features from the last phase are evaluated using ROC curve and the most effective while smallest feature subset is determined. The classifiers which are evaluated in the proposed framework are K-nearest neighbor; support vector machine and naïve Bayes. The proposed approach is evaluated on 4 microarray datasets. The evaluations confirm that the proposed approach can find the smallest subset of genes while approaching the maximum accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
Classification of older adults with/without a fall history using machine learning methods.
Lin Zhang; Ou Ma; Fabre, Jennifer M; Wood, Robert H; Garcia, Stephanie U; Ivey, Kayla M; McCann, Evan D
2015-01-01
Falling is a serious problem in an aged society such that assessment of the risk of falls for individuals is imperative for the research and practice of falls prevention. This paper introduces an application of several machine learning methods for training a classifier which is capable of classifying individual older adults into a high risk group and a low risk group (distinguished by whether or not the members of the group have a recent history of falls). Using a 3D motion capture system, significant gait features related to falls risk are extracted. By training these features, classification hypotheses are obtained based on machine learning techniques (K Nearest-neighbour, Naive Bayes, Logistic Regression, Neural Network, and Support Vector Machine). Training and test accuracies with sensitivity and specificity of each of these techniques are assessed. The feature adjustment and tuning of the machine learning algorithms are discussed. The outcome of the study will benefit the prediction and prevention of falls.
Distance-informed metric learning for Alzheimer's disease staging.
Shi, Bibo; Wang, Zhewei; Liu, Jundong
2014-01-01
Identifying intermediate biomarkers of Alzheimer's disease (AD) is of great importance for diagnosis and prognosis of the disease. In this study, we develop a new AD staging method to classify patients into Normal Controls (NC), Mild Cognitive Impairment (MCI), and AD groups. Our solution employs a novel metric learning technique that improves classification rates through the guidance of some weak supervisory information in AD progression. More specifically, those information are in the form of pairwise constraints that specify the relative Mini Mental State Examination (MMSE) score disparity of two subjects, depending on whether they are in the same group or not. With the imposed constraints, the common knowledge that MCI generally sits in between of NC and AD can be integrated into the classification distance metric. Subjects from the Alzheimer's Disease Neuroimaging Initiative cohort (ADNI; 56 AD, 104 MCI, 161 controls) were used to demonstrate the improvements made comparing with two state-of-the-art metric learning solutions: large margin nearest neighbors (LMNN) and relevant component analysis (RCA).
Automated robot-assisted surgical skill evaluation: Predictive analytics approach.
Fard, Mahtab J; Ameri, Sattar; Darin Ellis, R; Chinnam, Ratna B; Pandya, Abhilash K; Klein, Michael D
2018-02-01
Surgical skill assessment has predominantly been a subjective task. Recently, technological advances such as robot-assisted surgery have created great opportunities for objective surgical evaluation. In this paper, we introduce a predictive framework for objective skill assessment based on movement trajectory data. Our aim is to build a classification framework to automatically evaluate the performance of surgeons with different levels of expertise. Eight global movement features are extracted from movement trajectory data captured by a da Vinci robot for surgeons with two levels of expertise - novice and expert. Three classification methods - k-nearest neighbours, logistic regression and support vector machines - are applied. The result shows that the proposed framework can classify surgeons' expertise as novice or expert with an accuracy of 82.3% for knot tying and 89.9% for a suturing task. This study demonstrates and evaluates the ability of machine learning methods to automatically classify expert and novice surgeons using global movement features. Copyright © 2017 John Wiley & Sons, Ltd.
Chen, Ting; Rangarajan, Anand; Vemuri, Baba C.
2010-01-01
This paper presents a novel classification via aggregated regression algorithm – dubbed CAVIAR – and its application to the OASIS MRI brain image database. The CAVIAR algorithm simultaneously combines a set of weak learners based on the assumption that the weight combination for the final strong hypothesis in CAVIAR depends on both the weak learners and the training data. A regularization scheme using the nearest neighbor method is imposed in the testing stage to avoid overfitting. A closed form solution to the cost function is derived for this algorithm. We use a novel feature – the histogram of the deformation field between the MRI brain scan and the atlas which captures the structural changes in the scan with respect to the atlas brain – and this allows us to automatically discriminate between various classes within OASIS [1] using CAVIAR. We empirically show that CAVIAR significantly increases the performance of the weak classifiers by showcasing the performance of our technique on OASIS. PMID:21151847
Chen, Ting; Rangarajan, Anand; Vemuri, Baba C
2010-04-14
This paper presents a novel classification via aggregated regression algorithm - dubbed CAVIAR - and its application to the OASIS MRI brain image database. The CAVIAR algorithm simultaneously combines a set of weak learners based on the assumption that the weight combination for the final strong hypothesis in CAVIAR depends on both the weak learners and the training data. A regularization scheme using the nearest neighbor method is imposed in the testing stage to avoid overfitting. A closed form solution to the cost function is derived for this algorithm. We use a novel feature - the histogram of the deformation field between the MRI brain scan and the atlas which captures the structural changes in the scan with respect to the atlas brain - and this allows us to automatically discriminate between various classes within OASIS [1] using CAVIAR. We empirically show that CAVIAR significantly increases the performance of the weak classifiers by showcasing the performance of our technique on OASIS.
Kelly, J F Daniel; Downey, Gerard
2005-05-04
Fourier transform infrared spectroscopy and attenuated total reflection sampling have been used to detect adulteration of single strength apple juice samples. The sample set comprised 224 authentic apple juices and 480 adulterated samples. Adulterants used included partially inverted cane syrup (PICS), beet sucrose (BS), high fructose corn syrup (HFCS), and a synthetic solution of fructose, glucose, and sucrose (FGS). Adulteration was carried out on individual apple juice samples at levels of 10, 20, 30, and 40% w/w. Spectral data were compressed by principal component analysis and analyzed using k-nearest neighbors and partial least squares regression techniques. Prediction results for the best classification models achieved an overall (authentic plus adulterated) correct classification rate of 96.5, 93.9, 92.2, and 82.4% for PICS, BS, HFCS, and FGS adulterants, respectively. This method shows promise as a rapid screening technique for the detection of a broad range of potential adulterants in apple juice.
X ray studies of the Hyades cluster
NASA Technical Reports Server (NTRS)
Stern, Robert A.
1993-01-01
The Hyades cluster occupies a unique position in both the history of astronomy and at the frontiers of contemporary astronomical research. At a distance of only 45 pc, the Hyades is the nearest star cluster in the Galaxy which is localized in the sky: the UMa cluster, which is closer, but much sparser, essentially surrounds the Solar neighborhood. The Hyades is the prototype cluster for distance determination using the 'moving-cluster' method, and thus serves to define the zero-age main sequence from which the cosmic distance scale is essentially bootstrapped. The Hyades age (0.6-0.7 Gyr), nearly 8 times younger than the Sun, guarantees the Hyades critical importance to studies of stellar evolution. The results of a complete survey of the Hyades cluster using the ROSAT All Sky Survey (RASS) are reported.
The 2015 WHO Classification of Tumors of the Thymus: Continuity and Changes
Marx, Alexander; Chan, John K.C.; Coindre, Jean-Michel; Detterbeck, Frank; Girard, Nicolas; Harris, Nancy L.; Jaffe, Elaine S.; Kurrer, Michael O.; Marom, Edith M.; Moreira, Andre L.; Mukai, Kiyoshi; Orazi, Attilio; Ströbel, Philipp
2015-01-01
This overview of the 4th edition of the WHO Classification of thymic tumors has two aims. First, to comprehensively list the established and new tumour entities and variants that are described in the new WHO Classification of thymic epithelial tumors, germ cell tumors, lymphomas, dendritic cell and myeloid neoplasms, and soft tissue tumors of the thymus and mediastinum; second, to highlight major differences in the new WHO Classification that result from the progress that has been made since the 3rd edition in 2004 at immunohistochemical, genetic and conceptual levels. Refined diagnostic criteria for type A, AB, B1–B3 thymomas and thymic squamous cell carcinoma are given and will hopefully improve the reproducibility of the classification and its clinical relevance. The clinical perspective of the classification has been strengthened by involving experts from radiology, thoracic surgery and oncology; by incorporating state-of-the-art PET/CT images; and by depicting prototypic cytological specimens. This makes the thymus section of the new WHO Classification of Tumours of the Lung, Pleura, Thymus and Heart a valuable tool for pathologists, cytologists and clinicians alike. The impact of the new WHO Classification on therapeutic decisions is exemplified in this overview for thymic epithelial tumors and mediastinal lymphomas, and future perspectives and challenges are discussed. PMID:26295375
On the development of an expert system for wheelchair selection
NASA Technical Reports Server (NTRS)
Madey, Gregory R.; Bhansin, Charlotte A.; Alaraini, Sulaiman A.; Nour, Mohamed A.
1994-01-01
The presentation of wheelchairs for the Multiple Sclerosis (MS) patients involves the examination of a number of complicated factors including ambulation status, length of diagnosis, and funding sources, to name a few. Consequently, only a few experts exist in this area. To aid medical therapists with the wheelchair selection decision, a prototype medical expert system (ES) was developed. This paper describes and discusses the steps of designing and developing the system, the experiences of the authors, and the lessons learned from working on this project. Wheelchair Advisor, programmed in CLIPS, serves as diagnosis, classification, prescription, and training tool in the MS field. Interviews, insurance letters, forms, and prototyping were used to gain knowledge regarding the wheelchair selection problem. Among the lessons learned are that evolutionary prototyping is superior to the conventional system development life-cycle (SDLC), the wheelchair selection is a good candidate for ES applications, and that ES can be applied to other similar medical subdomains.
Gold-standard for computer-assisted morphological sperm analysis.
Chang, Violeta; Garcia, Alejandra; Hitschfeld, Nancy; Härtel, Steffen
2017-04-01
Published algorithms for classification of human sperm heads are based on relatively small image databases that are not open to the public, and thus no direct comparison is available for competing methods. We describe a gold-standard for morphological sperm analysis (SCIAN-MorphoSpermGS), a dataset of sperm head images with expert-classification labels in one of the following classes: normal, tapered, pyriform, small or amorphous. This gold-standard is for evaluating and comparing known techniques and future improvements to present approaches for classification of human sperm heads for semen analysis. Although this paper does not provide a computational tool for morphological sperm analysis, we present a set of experiments for comparing sperm head description and classification common techniques. This classification base-line is aimed to be used as a reference for future improvements to present approaches for human sperm head classification. The gold-standard provides a label for each sperm head, which is achieved by majority voting among experts. The classification base-line compares four supervised learning methods (1- Nearest Neighbor, naive Bayes, decision trees and Support Vector Machine (SVM)) and three shape-based descriptors (Hu moments, Zernike moments and Fourier descriptors), reporting the accuracy and the true positive rate for each experiment. We used Fleiss' Kappa Coefficient to evaluate the inter-expert agreement and Fisher's exact test for inter-expert variability and statistical significant differences between descriptors and learning techniques. Our results confirm the high degree of inter-expert variability in the morphological sperm analysis. Regarding the classification base line, we show that none of the standard descriptors or classification approaches is best suitable for tackling the problem of sperm head classification. We discovered that the correct classification rate was highly variable when trying to discriminate among non-normal sperm heads. By using the Fourier descriptor and SVM, we achieved the best mean correct classification: only 49%. We conclude that the SCIAN-MorphoSpermGS will provide a standard tool for evaluation of characterization and classification approaches for human sperm heads. Indeed, there is a clear need for a specific shape-based descriptor for human sperm heads and a specific classification approach to tackle the problem of high variability within subcategories of abnormal sperm cells. Copyright © 2017 Elsevier Ltd. All rights reserved.
Classification of EEG Signals Based on Pattern Recognition Approach.
Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed
2017-01-01
Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a "pattern recognition" approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90-7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11-89.63% and 91.60-81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy.
Classification of EEG Signals Based on Pattern Recognition Approach
Amin, Hafeez Ullah; Mumtaz, Wajid; Subhani, Ahmad Rauf; Saad, Mohamad Naufal Mohamad; Malik, Aamir Saeed
2017-01-01
Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a “pattern recognition” approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90–7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11–89.63% and 91.60–81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy. PMID:29209190
Multiple Sparse Representations Classification
Plenge, Esben; Klein, Stefan S.; Niessen, Wiro J.; Meijering, Erik
2015-01-01
Sparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In this empirical study we propose to further leverage the redundancy of the learned dictionaries to achieve a more accurate classifier. In conventional SRC, each image pixel is associated with a small patch surrounding it. Using these patches, a dictionary is trained for each class in a supervised fashion. Commonly, redundant/overcomplete dictionaries are trained and image patches are sparsely represented by a linear combination of only a few of the dictionary elements. Given a set of trained dictionaries, a new patch is sparse coded using each of them, and subsequently assigned to the class whose dictionary yields the minimum residual energy. We propose a generalization of this scheme. The method, which we call multiple sparse representations classification (mSRC), is based on the observation that an overcomplete, class specific dictionary is capable of generating multiple accurate and independent estimates of a patch belonging to the class. So instead of finding a single sparse representation of a patch for each dictionary, we find multiple, and the corresponding residual energies provides an enhanced statistic which is used to improve classification. We demonstrate the efficacy of mSRC for three example applications: pixelwise classification of texture images, lumen segmentation in carotid artery magnetic resonance imaging (MRI), and bifurcation point detection in carotid artery MRI. We compare our method with conventional SRC, K-nearest neighbor, and support vector machine classifiers. The results show that mSRC outperforms SRC and the other reference methods. In addition, we present an extensive evaluation of the effect of the main mSRC parameters: patch size, dictionary size, and sparsity level. PMID:26177106
Kalpathy-Cramer, Jayashree; Hersh, William
2008-01-01
In 2006 and 2007, Oregon Health & Science University (OHSU) participated in the automatic image annotation task for medical images at ImageCLEF, an annual international benchmarking event that is part of the Cross Language Evaluation Forum (CLEF). The goal of the automatic annotation task was to classify 1000 test images based on the Image Retrieval in Medical Applications (IRMA) code, given a set of 10,000 training images. There were 116 distinct classes in 2006 and 2007. We evaluated the efficacy of a variety of primarily global features for this classification task. These included features based on histograms, gray level correlation matrices and the gist technique. A multitude of classifiers including k-nearest neighbors, two-level neural networks, support vector machines, and maximum likelihood classifiers were evaluated. Our official error rates for the 1000 test images were 26% in 2006 using the flat classification structure. The error count in 2007 was 67.8 using the hierarchical classification error computation based on the IRMA code in 2007. Confusion matrices as well as clustering experiments were used to identify visually similar classes. The use of the IRMA code did not help us in the classification task as the semantic hierarchy of the IRMA classes did not correspond well with the hierarchy based on clustering of image features that we used. Our most frequent misclassification errors were along the view axis. Subsequent experiments based on a two-stage classification system decreased our error rate to 19.8% for the 2006 dataset and our error count to 55.4 for the 2007 data. PMID:19884953
Effective Feature Selection for Classification of Promoter Sequences.
K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish
2016-01-01
Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
NASA Astrophysics Data System (ADS)
Lazcano, R.; Madroñal, D.; Fabelo, H.; Ortega, S.; Salvador, R.; Callicó, G. M.; Juárez, E.; Sanz, C.
2017-10-01
Hyperspectral Imaging (HI) assembles high resolution spectral information from hundreds of narrow bands across the electromagnetic spectrum, thus generating 3D data cubes in which each pixel gathers the spectral information of the reflectance of every spatial pixel. As a result, each image is composed of large volumes of data, which turns its processing into a challenge, as performance requirements have been continuously tightened. For instance, new HI applications demand real-time responses. Hence, parallel processing becomes a necessity to achieve this requirement, so the intrinsic parallelism of the algorithms must be exploited. In this paper, a spatial-spectral classification approach has been implemented using a dataflow language known as RVCCAL. This language represents a system as a set of functional units, and its main advantage is that it simplifies the parallelization process by mapping the different blocks over different processing units. The spatial-spectral classification approach aims at refining the classification results previously obtained by using a K-Nearest Neighbors (KNN) filtering process, in which both the pixel spectral value and the spatial coordinates are considered. To do so, KNN needs two inputs: a one-band representation of the hyperspectral image and the classification results provided by a pixel-wise classifier. Thus, spatial-spectral classification algorithm is divided into three different stages: a Principal Component Analysis (PCA) algorithm for computing the one-band representation of the image, a Support Vector Machine (SVM) classifier, and the KNN-based filtering algorithm. The parallelization of these algorithms shows promising results in terms of computational time, as the mapping of them over different cores presents a speedup of 2.69x when using 3 cores. Consequently, experimental results demonstrate that real-time processing of hyperspectral images is achievable.
Vector analysis of postcardiotomy behavioral phenomena.
Caston, J C; Miller, W C; Felber, W J
1975-04-01
The classification of postcardiotomy behavioral phenomena in Figure 1 is proposed for use as a clinical instrument to analyze etiological determinants. The utilization of a vector analysis analogy inherently denies absolutism. Classifications A-P are presented as prototypes of certain ratio imbalances of the metabolic, hemodynamic, environmental, and psychic vectors. Such a system allows for change from one type to another according to the individuality of the patient and the highly specific changes in his clinical presentation. A vector analysis also allows for infinite intermediary ratio imbalances between classification types as a function of time. Thus, postcardiotomy behavioral phenomena could be viewed as the vector summation of hemodynamic, metabolic, environmental, and psychic processes at a given point in time. Elaboration of unknown determinants in this complex syndrome appears to be task for the future.
Local classification: Locally weighted-partial least squares-discriminant analysis (LW-PLS-DA).
Bevilacqua, Marta; Marini, Federico
2014-08-01
The possibility of devising a simple, flexible and accurate non-linear classification method, by extending the locally weighted partial least squares (LW-PLS) approach to the cases where the algorithm is used in a discriminant way (partial least squares discriminant analysis, PLS-DA), is presented. In particular, to assess which category an unknown sample belongs to, the proposed algorithm operates by identifying which training objects are most similar to the one to be predicted and building a PLS-DA model using these calibration samples only. Moreover, the influence of the selected training samples on the local model can be further modulated by adopting a not uniform distance-based weighting scheme which allows the farthest calibration objects to have less impact than the closest ones. The performances of the proposed locally weighted-partial least squares-discriminant analysis (LW-PLS-DA) algorithm have been tested on three simulated data sets characterized by a varying degree of non-linearity: in all cases, a classification accuracy higher than 99% on external validation samples was achieved. Moreover, when also applied to a real data set (classification of rice varieties), characterized by a high extent of non-linearity, the proposed method provided an average correct classification rate of about 93% on the test set. By the preliminary results, showed in this paper, the performances of the proposed LW-PLS-DA approach have proved to be comparable and in some cases better than those obtained by other non-linear methods (k nearest neighbors, kernel-PLS-DA and, in the case of rice, counterpropagation neural networks). Copyright © 2014 Elsevier B.V. All rights reserved.
Empirical Wavelet Transform Based Features for Classification of Parkinson's Disease Severity.
Oung, Qi Wei; Muthusamy, Hariharan; Basah, Shafriza Nisha; Lee, Hoileong; Vijean, Vikneswaran
2017-12-29
Parkinson's disease (PD) is a type of progressive neurodegenerative disorder that has affected a large part of the population till now. Several symptoms of PD include tremor, rigidity, slowness of movements and vocal impairments. In order to develop an effective diagnostic system, a number of algorithms were proposed mainly to distinguish healthy individuals from the ones with PD. However, most of the previous works were conducted based on a binary classification, with the early PD stage and the advanced ones being treated equally. Therefore, in this work, we propose a multiclass classification with three classes of PD severity level (mild, moderate, severe) and healthy control. The focus is to detect and classify PD using signals from wearable motion and audio sensors based on both empirical wavelet transform (EWT) and empirical wavelet packet transform (EWPT) respectively. The EWT/EWPT was applied to decompose both speech and motion data signals up to five levels. Next, several features are extracted after obtaining the instantaneous amplitudes and frequencies from the coefficients of the decomposed signals by applying the Hilbert transform. The performance of the algorithm was analysed using three classifiers - K-nearest neighbour (KNN), probabilistic neural network (PNN) and extreme learning machine (ELM). Experimental results demonstrated that our proposed approach had the ability to differentiate PD from non-PD subjects, including their severity level - with classification accuracies of more than 90% using EWT/EWPT-ELM based on signals from motion and audio sensors respectively. Additionally, classification accuracy of more than 95% was achieved when EWT/EWPT-ELM is applied to signals from integration of both signal's information.
André, Nicole M.
2018-01-01
ABSTRACT The difficulties related to virus taxonomy have been amplified by recent advances in next-generation sequencing and metagenomics, prompting the field to revisit the question of what constitutes a useful viral classification. Here, taking a challenging classification found in coronaviruses, we argue that consideration of biological properties in addition to sequence-based demarcations is critical for generating useful taxonomy that recapitulates complex evolutionary histories. Within the Alphacoronavirus genus, the Alphacoronavirus 1 species encompasses several biologically distinct viruses. We carried out functionally based phylogenetic analysis, centered on the spike gene, which encodes the main surface antigen and primary driver of tropism and pathogenesis. Within the Alphacoronavirus 1 species, we identify clade A (encompassing serotype I feline coronavirus [FCoV] and canine coronavirus [CCoV]) and clade B (grouping serotype II FCoV and CCoV and transmissible gastroenteritis virus [TGEV]-like viruses). We propose this clade designation, along with the newly proposed Alphacoronavirus 2 species, as an improved way to classify the Alphacoronavirus genus. IMPORTANCE Our work focuses on improving the classification of the Alphacoronavirus genus. The Alphacoronavirus 1 species groups viruses of veterinary importance that infect distinct mammalian hosts and includes canine and feline coronaviruses and transmissible gastroenteritis virus. It is the prototype species of the Alphacoronavirus genus; however, it encompasses biologically distinct viruses. To better characterize this prototypical species, we performed phylogenetic analyses based on the sequences of the spike protein, one of the main determinants of tropism and pathogenesis, and reveal the existence of two subgroups or clades that fit with previously established serotype demarcations. We propose a new clade designation to better classify Alphacoronavirus 1 members. PMID:29299531
Whittaker, Gary R; André, Nicole M; Millet, Jean Kaoru
2018-01-01
The difficulties related to virus taxonomy have been amplified by recent advances in next-generation sequencing and metagenomics, prompting the field to revisit the question of what constitutes a useful viral classification. Here, taking a challenging classification found in coronaviruses, we argue that consideration of biological properties in addition to sequence-based demarcations is critical for generating useful taxonomy that recapitulates complex evolutionary histories. Within the Alphacoronavirus genus, the Alphacoronavirus 1 species encompasses several biologically distinct viruses. We carried out functionally based phylogenetic analysis, centered on the spike gene, which encodes the main surface antigen and primary driver of tropism and pathogenesis. Within the Alphacoronavirus 1 species, we identify clade A (encompassing serotype I feline coronavirus [FCoV] and canine coronavirus [CCoV]) and clade B (grouping serotype II FCoV and CCoV and transmissible gastroenteritis virus [TGEV]-like viruses). We propose this clade designation, along with the newly proposed Alphacoronavirus 2 species, as an improved way to classify the Alphacoronavirus genus. IMPORTANCE Our work focuses on improving the classification of the Alphacoronavirus genus. The Alphacoronavirus 1 species groups viruses of veterinary importance that infect distinct mammalian hosts and includes canine and feline coronaviruses and transmissible gastroenteritis virus. It is the prototype species of the Alphacoronavirus genus; however, it encompasses biologically distinct viruses. To better characterize this prototypical species, we performed phylogenetic analyses based on the sequences of the spike protein, one of the main determinants of tropism and pathogenesis, and reveal the existence of two subgroups or clades that fit with previously established serotype demarcations. We propose a new clade designation to better classify Alphacoronavirus 1 members.
Nearest neighbor-density-based clustering methods for large hyperspectral images
NASA Astrophysics Data System (ADS)
Cariou, Claude; Chehdi, Kacem
2017-10-01
We address the problem of hyperspectral image (HSI) pixel partitioning using nearest neighbor - density-based (NN-DB) clustering methods. NN-DB methods are able to cluster objects without specifying the number of clusters to be found. Within the NN-DB approach, we focus on deterministic methods, e.g. ModeSeek, knnClust, and GWENN (standing for Graph WatershEd using Nearest Neighbors). These methods only require the availability of a k-nearest neighbor (kNN) graph based on a given distance metric. Recently, a new DB clustering method, called Density Peak Clustering (DPC), has received much attention, and kNN versions of it have quickly followed and showed their efficiency. However, NN-DB methods still suffer from the difficulty of obtaining the kNN graph due to the quadratic complexity with respect to the number of pixels. This is why GWENN was embedded into a multiresolution (MR) scheme to bypass the computation of the full kNN graph over the image pixels. In this communication, we propose to extent the MR-GWENN scheme on three aspects. Firstly, similarly to knnClust, the original labeling rule of GWENN is modified to account for local density values, in addition to the labels of previously processed objects. Secondly, we set up a modified NN search procedure within the MR scheme, in order to stabilize of the number of clusters found from the coarsest to the finest spatial resolution. Finally, we show that these extensions can be easily adapted to the three other NN-DB methods (ModeSeek, knnClust, knnDPC) for pixel clustering in large HSIs. Experiments are conducted to compare the four NN-DB methods for pixel clustering in HSIs. We show that NN-DB methods can outperform a classical clustering method such as fuzzy c-means (FCM), in terms of classification accuracy, relevance of found clusters, and clustering speed. Finally, we demonstrate the feasibility and evaluate the performances of NN-DB methods on a very large image acquired by our AISA Eagle hyperspectral imaging sensor.
2015-01-01
Background The Internet has greatly enhanced health care, helping patients stay up-to-date on medical issues and general knowledge. Many cancer patients use the Internet for cancer diagnosis and related information. Recently, cloud computing has emerged as a new way of delivering health services but currently, there is no generic and fully automated cloud-based self-management intervention for breast cancer patients, as practical guidelines are lacking. Objective We investigated the prevalence and predictors of cloud use for medical diagnosis among women with breast cancer to gain insight into meaningful usage parameters to evaluate the use of generic, fully automated cloud-based self-intervention, by assessing how breast cancer survivors use a generic self-management model. The goal of this study was implemented and evaluated with a new prototype called “CIMIDx”, based on representative association rules that support the diagnosis of medical images (mammograms). Methods The proposed Cloud-Based System Support Intelligent Medical Image Diagnosis (CIMIDx) prototype includes two modules. The first is the design and development of the CIMIDx training and test cloud services. Deployed in the cloud, the prototype can be used for diagnosis and screening mammography by assessing the cancers detected, tumor sizes, histology, and stage of classification accuracy. To analyze the prototype’s classification accuracy, we conducted an experiment with data provided by clients. Second, by monitoring cloud server requests, the CIMIDx usage statistics were recorded for the cloud-based self-intervention groups. We conducted an evaluation of the CIMIDx cloud service usage, in which browsing functionalities were evaluated from the end-user’s perspective. Results We performed several experiments to validate the CIMIDx prototype for breast health issues. The first set of experiments evaluated the diagnostic performance of the CIMIDx framework. We collected medical information from 150 breast cancer survivors from hospitals and health centers. The CIMIDx prototype achieved high sensitivity of up to 99.29%, and accuracy of up to 98%. The second set of experiments evaluated CIMIDx use for breast health issues, using t tests and Pearson chi-square tests to assess differences, and binary logistic regression to estimate the odds ratio (OR) for the predictors’ use of CIMIDx. For the prototype usage statistics for the same 150 breast cancer survivors, we interviewed 114 (76.0%), through self-report questionnaires from CIMIDx blogs. The frequency of log-ins/person ranged from 0 to 30, total duration/person from 0 to 1500 minutes (25 hours). The 114 participants continued logging in to all phases, resulting in an intervention adherence rate of 44.3% (95% CI 33.2-55.9). The overall performance of the prototype for the good category, reported usefulness of the prototype (P=.77), overall satisfaction of the prototype (P=.31), ease of navigation (P=.89), user friendliness evaluation (P=.31), and overall satisfaction (P=.31). Positive evaluations given by 100 participants via a Web-based questionnaire supported our hypothesis. Conclusions The present study shows that women felt favorably about the use of a generic fully automated cloud-based self- management prototype. The study also demonstrated that the CIMIDx prototype resulted in the detection of more cancers in screening and diagnosing patients, with an increased accuracy rate. PMID:25830608
ERIC Educational Resources Information Center
Whitaker, Lydia R.; Simpson, Andrew; Roberson, Debi
2017-01-01
Impairments in recognizing subtle facial expressions, in individuals with autism spectrum disorder (ASD), may relate to difficulties in constructing prototypes of these expressions. Eighteen children with predominantly intellectual low-functioning ASD (LFA, IQ <80) and two control groups (mental and chronological age matched), were assessed for…
2011-08-01
5 Figure 4 Architetural diagram of running Blender on Amazon EC2 through Nimbis...classification of streaming data. Example input images (top left). All digit prototypes (cluster centers) found, with size proportional to frequency (top...Figure 4 Architetural diagram of running Blender on Amazon EC2 through Nimbis 1 http
A Study Regarding the Spontaneous Use of Geometric Shapes in Young Children's Drawings
ERIC Educational Resources Information Center
Villarroel, José Domingo; Sanz Ortega, Olga
2017-01-01
The studies regarding how the comprehension of geometric shapes evolves in childhood are largely based on the assessment of children's responses during the course of tasks linked to the recognition, classification or explanation of prototypes and models. Little attention has been granted to the issue as to what extent the geometric shape turns out…
NASA Astrophysics Data System (ADS)
Stevens, Jeffrey
The past decade has seen the emergence of many hyperspectral image (HSI) analysis algorithms based on graph theory and derived manifold-coordinates. Yet, despite the growing number of algorithms, there has been limited study of the graphs constructed from spectral data themselves. Which graphs are appropriate for various HSI analyses--and why? This research aims to begin addressing these questions as the performance of graph-based techniques is inextricably tied to the graphical model constructed from the spectral data. We begin with a literature review providing a survey of spectral graph construction techniques currently used by the hyperspectral community, starting with simple constructs demonstrating basic concepts and then incrementally adding components to derive more complex approaches. Throughout this development, we discuss algorithm advantages and disadvantages for different types of hyperspectral analysis. A focus is provided on techniques influenced by spectral density through which the concept of community structure arises. Through the use of simulated and real HSI data, we demonstrate density-based edge allocation produces more uniform nearest neighbor lists than non-density based techniques through increasing the number of intracluster edges, facilitating higher k-nearest neighbor (k-NN) classification performance. Imposing the common mutuality constraint to symmetrify adjacency matrices is demonstrated to be beneficial in most circumstances, especially in rural (less cluttered) scenes. Many complex adaptive edge-reweighting techniques are shown to slightly degrade nearest-neighbor list characteristics. Analysis suggests this condition is possibly attributable to the validity of characterizing spectral density by a single variable representing data scale for each pixel. Additionally, it is shown that imposing mutuality hurts the performance of adaptive edge-allocation techniques or any technique that aims to assign a low number of edges (<10) to any pixel. A simple k bias addresses this problem. Many of the adaptive edge-reweighting techniques are based on the concept of codensity, so we explore codensity properties as they relate to density-based edge reweighting. We find that codensity may not be the best estimator of local scale due to variations in cluster density, so we introduce and compare two inherently density-weighted graph construction techniques from the data mining literature: shared nearest neighbors (SNN) and mutual proximity (MP). MP and SNN are not reliant upon a codensity measure, hence are not susceptible to its shortcomings. Neither has been used for hyperspectral analyses, so this presents the first study of these techniques on HSI data. We demonstrate MP and SNN can offer better performance, but in general none of the reweighting techniques improve the quality of these spectral graphs in our neighborhood structure tests. As such, these complex adaptive edge-reweighting techniques may need to be modified to increase their effectiveness. During this investigation, we probe deeper into properties of high-dimensional data and introduce the concept of concentration of measure (CoM)--the degradation in the efficacy of many common distance measures with increasing dimensionality--as it relates to spectral graph construction. CoM exists in pairwise distances between HSI pixels, but not to the degree experienced in random data of the same extrinsic dimension; a characteristic we demonstrate is due to the rich correlation and cluster structure present in HSI data. CoM can lead to hubness--a condition wherein some nodes have short distances (high similarities) to an exceptionally large number of nodes. We study hub presence in 49 HSI datasets of varying resolutions, altitudes, and spectral bands to demonstrate hubness effects are negligible in a k-NN classification example (generalized counting scenarios), but we note its impact on methods that use edge weights to derive manifold coordinates or splitting clusters based on spectral graph theory requires more investigation. Many of these new graph-related quantities can be exploited to demonstrate new techniques for HSI classification and anomaly detection. We present an initial exploration into this relatively new and exciting field based on an enhanced Schroedinger Eigenmap classification example and compare results to the current state-of-the-art approach. We produce equivalent results, but demonstrate different types of misclassifications, opening the door to combine the best of both approaches to achieve truly superior performance. A separate less mature hubness-assisted anomaly detector (HAAD) is also presented.
Bissacco, Alessandro; Chiuso, Alessandro; Soatto, Stefano
2007-11-01
We address the problem of performing decision tasks, and in particular classification and recognition, in the space of dynamical models in order to compare time series of data. Motivated by the application of recognition of human motion in image sequences, we consider a class of models that include linear dynamics, both stable and marginally stable (periodic), both minimum and non-minimum phase, driven by non-Gaussian processes. This requires extending existing learning and system identification algorithms to handle periodic modes and nonminimum phase behavior, while taking into account higher-order statistics of the data. Once a model is identified, we define a kernel-based cord distance between models that includes their dynamics, their initial conditions as well as input distribution. This is made possible by a novel kernel defined between two arbitrary (non-Gaussian) distributions, which is computed by efficiently solving an optimal transport problem. We validate our choice of models, inference algorithm, and distance on the tasks of human motion synthesis (sample paths of the learned models), and recognition (nearest-neighbor classification in the computed distance). However, our work can be applied more broadly where one needs to compare historical data while taking into account periodic trends, non-minimum phase behavior, and non-Gaussian input distributions.
Moscetti, Roberto; Radicetti, Emanuele; Monarca, Danilo; Cecchini, Massimo; Massantini, Riccardo
2015-10-01
This study investigates the possibility of using near infrared spectroscopy for the authentication of the 'Nocciola Romana' hazelnut (Corylus avellana L. cvs Tonda Gentile Romana and Nocchione) as a Protected Designation of Origin (PDO) hazelnut from central Italy. Algorithms for the selection of the optimal pretreatments were tested in combination with the following discriminant routines: k-nearest neighbour, soft independent modelling of class analogy, partial least squares discriminant analysis and support vector machine discriminant analysis. The best results were obtained using a support vector machine discriminant analysis routine. Thus, classification performance rates with specificities, sensitivities and accuracies as high as 96.0%, 95.0% and 95.5%, respectively, were achieved. Various pretreatments, such as standard normal variate, mean centring and a Savitzky-Golay filter with seven smoothing points, were used. The optimal wavelengths for classification were mainly correlated with lipids, although some contribution from minor constituents, such as proteins and carbohydrates, was also observed. Near infrared spectroscopy could classify hazelnut according to the PDO 'Nocciola Romana' designation. Thus, the experimentation lays the foundations for a rapid, online, authentication system for hazelnut. However, model robustness should be improved taking into account agro-pedo-climatic growing conditions. © 2014 Society of Chemical Industry.
Multimodal segmentation of optic disc and cup from stereo fundus and SD-OCT images
NASA Astrophysics Data System (ADS)
Miri, Mohammad Saleh; Lee, Kyungmoo; Niemeijer, Meindert; Abràmoff, Michael D.; Kwon, Young H.; Garvin, Mona K.
2013-03-01
Glaucoma is one of the major causes of blindness worldwide. One important structural parameter for the diagnosis and management of glaucoma is the cup-to-disc ratio (CDR), which tends to become larger as glaucoma progresses. While approaches exist for segmenting the optic disc and cup within fundus photographs, and more recently, within spectral-domain optical coherence tomography (SD-OCT) volumes, no approaches have been reported for the simultaneous segmentation of these structures within both modalities combined. In this work, a multimodal pixel-classification approach for the segmentation of the optic disc and cup within fundus photographs and SD-OCT volumes is presented. In particular, after segmentation of other important structures (such as the retinal layers and retinal blood vessels) and fundus-to-SD-OCT image registration, features are extracted from both modalities and a k-nearest-neighbor classification approach is used to classify each pixel as cup, rim, or background. The approach is evaluated on 70 multimodal image pairs from 35 subjects in a leave-10%-out fashion (by subject). A significant improvement in classification accuracy is obtained using the multimodal approach over that obtained from the corresponding unimodal approach (97.8% versus 95.2%; p < 0:05; paired t-test).
Zhang, Kunli; Zhao, Yueshu; Zan, Hongying; Zhuang, Lei
2018-01-01
Obstetric electronic medical records (EMRs) contain massive amounts of medical data and health information. The information extraction and diagnosis assistants of obstetric EMRs are of great significance in improving the fertility level of the population. The admitting diagnosis in the first course record of the EMR is reasoned from various sources, such as chief complaints, auxiliary examinations, and physical examinations. This paper treats the diagnosis assistant as a multilabel classification task based on the analyses of obstetric EMRs. The latent Dirichlet allocation (LDA) topic and the word vector are used as features and the four multilabel classification methods, BP-MLL (backpropagation multilabel learning), RAkEL (RAndom k labELsets), MLkNN (multilabel k-nearest neighbor), and CC (chain classifier), are utilized to build the diagnosis assistant models. Experimental results conducted on real cases show that the BP-MLL achieves the best performance with an average precision up to 0.7413 ± 0.0100 when the number of label sets and the word dimensions are 71 and 100, respectively. The result of the diagnosis assistant can be introduced as a supplementary learning method for medical students. Additionally, the method can be used not only for obstetric EMRs but also for other medical records. PMID:29666671
Tang, Yunwei; Jing, Linhai; Li, Hui; Liu, Qingjie; Yan, Qi; Li, Xiuxia
2016-11-22
This study explores the ability of WorldView-2 (WV-2) imagery for bamboo mapping in a mountainous region in Sichuan Province, China. A large area of this place is covered by shadows in the image, and only a few sampled points derived were useful. In order to identify bamboos based on sparse training data, the sample size was expanded according to the reflectance of multispectral bands selected using the principal component analysis (PCA). Then, class separability based on the training data was calculated using a feature space optimization method to select the features for classification. Four regular object-based classification methods were applied based on both sets of training data. The results show that the k -nearest neighbor ( k -NN) method produced the greatest accuracy. A geostatistically-weighted k -NN classifier, accounting for the spatial correlation between classes, was then applied to further increase the accuracy. It achieved 82.65% and 93.10% of the producer's and user's accuracies respectively for the bamboo class. The canopy densities were estimated to explain the result. This study demonstrates that the WV-2 image can be used to identify small patches of understory bamboos given limited known samples, and the resulting bamboo distribution facilitates the assessments of the habitats of giant pandas.
Computer-aided diagnosis system: a Bayesian hybrid classification method.
Calle-Alonso, F; Pérez, C J; Arias-Nicolás, J P; Martín, J
2013-10-01
A novel method to classify multi-class biomedical objects is presented. The method is based on a hybrid approach which combines pairwise comparison, Bayesian regression and the k-nearest neighbor technique. It can be applied in a fully automatic way or in a relevance feedback framework. In the latter case, the information obtained from both an expert and the automatic classification is iteratively used to improve the results until a certain accuracy level is achieved, then, the learning process is finished and new classifications can be automatically performed. The method has been applied in two biomedical contexts by following the same cross-validation schemes as in the original studies. The first one refers to cancer diagnosis, leading to an accuracy of 77.35% versus 66.37%, originally obtained. The second one considers the diagnosis of pathologies of the vertebral column. The original method achieves accuracies ranging from 76.5% to 96.7%, and from 82.3% to 97.1% in two different cross-validation schemes. Even with no supervision, the proposed method reaches 96.71% and 97.32% in these two cases. By using a supervised framework the achieved accuracy is 97.74%. Furthermore, all abnormal cases were correctly classified. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Biswas, Surama; Dutta, Subarna; Acharyya, Sriyankar
2017-12-01
Identifying a small subset of disease critical genes out of a large size of microarray gene expression data is a challenge in computational life sciences. This paper has applied four meta-heuristic algorithms, namely, honey bee mating optimization (HBMO), harmony search (HS), differential evolution (DE) and genetic algorithm (basic version GA) to find disease critical genes of preeclampsia which affects women during gestation. Two hybrid algorithms, namely, HBMO-kNN and HS-kNN have been newly proposed here where kNN (k nearest neighbor classifier) is used for sample classification. Performances of these new approaches have been compared with other two hybrid algorithms, namely, DE-kNN and SGA-kNN. Three datasets of different sizes have been used. In a dataset, the set of genes found common in the output of each algorithm is considered here as disease critical genes. In different datasets, the percentage of classification or classification accuracy of meta-heuristic algorithms varied between 92.46 and 100%. HBMO-kNN has the best performance (99.64-100%) in almost all data sets. DE-kNN secures the second position (99.42-100%). Disease critical genes obtained here match with clinically revealed preeclampsia genes to a large extent.
Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.
Hernandez, Troy; Yang, Jie
2016-10-01
The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request.
Big genomics and clinical data analytics strategies for precision cancer prognosis.
Ow, Ghim Siong; Kuznetsov, Vladimir A
2016-11-07
The field of personalized and precise medicine in the era of big data analytics is growing rapidly. Previously, we proposed our model of patient classification termed Prognostic Signature Vector Matching (PSVM) and identified a 37 variable signature comprising 36 let-7b associated prognostic significant mRNAs and the age risk factor that stratified large high-grade serous ovarian cancer patient cohorts into three survival-significant risk groups. Here, we investigated the predictive performance of PSVM via optimization of the prognostic variable weights, which represent the relative importance of one prognostic variable over the others. In addition, we compared several multivariate prognostic models based on PSVM with classical machine learning techniques such as K-nearest-neighbor, support vector machine, random forest, neural networks and logistic regression. Our results revealed that negative log-rank p-values provides more robust weight values as opposed to the use of other quantities such as hazard ratios, fold change, or a combination of those factors. PSVM, together with the classical machine learning classifiers were combined in an ensemble (multi-test) voting system, which collectively provides a more precise and reproducible patient stratification. The use of the multi-test system approach, rather than the search for the ideal classification/prediction method, might help to address limitations of the individual classification algorithm in specific situation.
Li, Yanfei; Tian, Yun
2018-01-01
The development of network technology and the popularization of image capturing devices have led to a rapid increase in the number of digital images available, and it is becoming increasingly difficult to identify a desired image from among the massive number of possible images. Images usually contain rich semantic information, and people usually understand images at a high semantic level. Therefore, achieving the ability to use advanced technology to identify the emotional semantics contained in images to enable emotional semantic image classification remains an urgent issue in various industries. To this end, this study proposes an improved OCC emotion model that integrates personality and mood factors for emotional modelling to describe the emotional semantic information contained in an image. The proposed classification system integrates the k-Nearest Neighbour (KNN) algorithm with the Support Vector Machine (SVM) algorithm. The MapReduce parallel programming model was used to adapt the KNN-SVM algorithm for parallel implementation in the Hadoop cluster environment, thereby achieving emotional semantic understanding for the classification of a massive collection of images. For training and testing, 70,000 scene images were randomly selected from the SUN Database. The experimental results indicate that users with different personalities show overall consistency in their emotional understanding of the same image. For a training sample size of 50,000, the classification accuracies for different emotional categories targeted at users with different personalities were approximately 95%, and the training time was only 1/5 of that required for the corresponding algorithm with a single-node architecture. Furthermore, the speedup of the system also showed a linearly increasing tendency. Thus, the experiments achieved a good classification effect and can lay a foundation for classification in terms of additional types of emotional image semantics, thereby demonstrating the practical significance of the proposed model. PMID:29320579
Raster Vs. Point Cloud LiDAR Data Classification
NASA Astrophysics Data System (ADS)
El-Ashmawy, N.; Shaker, A.
2014-09-01
Airborne Laser Scanning systems with light detection and ranging (LiDAR) technology is one of the fast and accurate 3D point data acquisition techniques. Generating accurate digital terrain and/or surface models (DTM/DSM) is the main application of collecting LiDAR range data. Recently, LiDAR range and intensity data have been used for land cover classification applications. Data range and Intensity, (strength of the backscattered signals measured by the LiDAR systems), are affected by the flying height, the ground elevation, scanning angle and the physical characteristics of the objects surface. These effects may lead to uneven distribution of point cloud or some gaps that may affect the classification process. Researchers have investigated the conversion of LiDAR range point data to raster image for terrain modelling. Interpolation techniques have been used to achieve the best representation of surfaces, and to fill the gaps between the LiDAR footprints. Interpolation methods are also investigated to generate LiDAR range and intensity image data for land cover classification applications. In this paper, different approach has been followed to classifying the LiDAR data (range and intensity) for land cover mapping. The methodology relies on the classification of the point cloud data based on their range and intensity and then converted the classified points into raster image. The gaps in the data are filled based on the classes of the nearest neighbour. Land cover maps are produced using two approaches using: (a) the conventional raster image data based on point interpolation; and (b) the proposed point data classification. A study area covering an urban district in Burnaby, British Colombia, Canada, is selected to compare the results of the two approaches. Five different land cover classes can be distinguished in that area: buildings, roads and parking areas, trees, low vegetation (grass), and bare soil. The results show that an improvement of around 10 % in the classification results can be achieved by using the proposed approach.
Cao, Jianfang; Li, Yanfei; Tian, Yun
2018-01-01
The development of network technology and the popularization of image capturing devices have led to a rapid increase in the number of digital images available, and it is becoming increasingly difficult to identify a desired image from among the massive number of possible images. Images usually contain rich semantic information, and people usually understand images at a high semantic level. Therefore, achieving the ability to use advanced technology to identify the emotional semantics contained in images to enable emotional semantic image classification remains an urgent issue in various industries. To this end, this study proposes an improved OCC emotion model that integrates personality and mood factors for emotional modelling to describe the emotional semantic information contained in an image. The proposed classification system integrates the k-Nearest Neighbour (KNN) algorithm with the Support Vector Machine (SVM) algorithm. The MapReduce parallel programming model was used to adapt the KNN-SVM algorithm for parallel implementation in the Hadoop cluster environment, thereby achieving emotional semantic understanding for the classification of a massive collection of images. For training and testing, 70,000 scene images were randomly selected from the SUN Database. The experimental results indicate that users with different personalities show overall consistency in their emotional understanding of the same image. For a training sample size of 50,000, the classification accuracies for different emotional categories targeted at users with different personalities were approximately 95%, and the training time was only 1/5 of that required for the corresponding algorithm with a single-node architecture. Furthermore, the speedup of the system also showed a linearly increasing tendency. Thus, the experiments achieved a good classification effect and can lay a foundation for classification in terms of additional types of emotional image semantics, thereby demonstrating the practical significance of the proposed model.
Iliyasu, Abdullah M; Fatichah, Chastine
2017-12-19
A quantum hybrid (QH) intelligent approach that blends the adaptive search capability of the quantum-behaved particle swarm optimisation (QPSO) method with the intuitionistic rationality of traditional fuzzy k -nearest neighbours (Fuzzy k -NN) algorithm (known simply as the Q-Fuzzy approach) is proposed for efficient feature selection and classification of cells in cervical smeared (CS) images. From an initial multitude of 17 features describing the geometry, colour, and texture of the CS images, the QPSO stage of our proposed technique is used to select the best subset features (i.e., global best particles) that represent a pruned down collection of seven features. Using a dataset of almost 1000 images, performance evaluation of our proposed Q-Fuzzy approach assesses the impact of our feature selection on classification accuracy by way of three experimental scenarios that are compared alongside two other approaches: the All-features (i.e., classification without prior feature selection) and another hybrid technique combining the standard PSO algorithm with the Fuzzy k -NN technique (P-Fuzzy approach). In the first and second scenarios, we further divided the assessment criteria in terms of classification accuracy based on the choice of best features and those in terms of the different categories of the cervical cells. In the third scenario, we introduced new QH hybrid techniques, i.e., QPSO combined with other supervised learning methods, and compared the classification accuracy alongside our proposed Q-Fuzzy approach. Furthermore, we employed statistical approaches to establish qualitative agreement with regards to the feature selection in the experimental scenarios 1 and 3. The synergy between the QPSO and Fuzzy k -NN in the proposed Q-Fuzzy approach improves classification accuracy as manifest in the reduction in number cell features, which is crucial for effective cervical cancer detection and diagnosis.
NASA Astrophysics Data System (ADS)
Teye, Ernest; Huang, Xingyi; Dai, Huang; Chen, Quansheng
2013-10-01
Quick, accurate and reliable technique for discrimination of cocoa beans according to geographical origin is essential for quality control and traceability management. This current study presents the application of Near Infrared Spectroscopy technique and multivariate classification for the differentiation of Ghana cocoa beans. A total of 194 cocoa bean samples from seven cocoa growing regions were used. Principal component analysis (PCA) was used to extract relevant information from the spectral data and this gave visible cluster trends. The performance of four multivariate classification methods: Linear discriminant analysis (LDA), K-nearest neighbors (KNN), Back propagation artificial neural network (BPANN) and Support vector machine (SVM) were compared. The performances of the models were optimized by cross validation. The results revealed that; SVM model was superior to all the mathematical methods with a discrimination rate of 100% in both the training and prediction set after preprocessing with Mean centering (MC). BPANN had a discrimination rate of 99.23% for the training set and 96.88% for prediction set. While LDA model had 96.15% and 90.63% for the training and prediction sets respectively. KNN model had 75.01% for the training set and 72.31% for prediction set. The non-linear classification methods used were superior to the linear ones. Generally, the results revealed that NIR Spectroscopy coupled with SVM model could be used successfully to discriminate cocoa beans according to their geographical origins for effective quality assurance.
Khondoker, Mizanur R; Bachmann, Till T; Mewissen, Muriel; Dickinson, Paul; Dobrzelecki, Bartosz; Campbell, Colin J; Mount, Andrew R; Walton, Anthony J; Crain, Jason; Schulze, Holger; Giraud, Gerard; Ross, Alan J; Ciani, Ilenia; Ember, Stuart W J; Tlili, Chaker; Terry, Jonathan G; Grant, Eilidh; McDonnell, Nicola; Ghazal, Peter
2010-12-01
Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (http://www.cran.r-project.org/web/packages/optBiomarker/).
A Comparative Study with RapidMiner and WEKA Tools over some Classification Techniques for SMS Spam
NASA Astrophysics Data System (ADS)
Foozy, Cik Feresa Mohd; Ahmad, Rabiah; Faizal Abdollah, M. A.; Chai Wen, Chuah
2017-08-01
SMS Spamming is a serious attack that can manipulate the use of the SMS by spreading the advertisement in bulk. By sending the unwanted SMS that contain advertisement can make the users feeling disturb and this against the privacy of the mobile users. To overcome these issues, many studies have proposed to detect SMS Spam by using data mining tools. This paper will do a comparative study using five machine learning techniques such as Naïve Bayes, K-NN (K-Nearest Neighbour Algorithm), Decision Tree, Random Forest and Decision Stumps to observe the accuracy result between RapidMiner and WEKA for dataset SMS Spam UCI Machine Learning repository.
Topological magnons in a one-dimensional itinerant flatband ferromagnet
NASA Astrophysics Data System (ADS)
Su, Xiao-Fei; Gu, Zhao-Long; Dong, Zhao-Yang; Li, Jian-Xin
2018-06-01
Different from previous scenarios that topological magnons emerge in local spin models, we propose an alternative that itinerant electron magnets can host topological magnons. A one-dimensional Tasaki model with a flatband is considered as the prototype. This model can be viewed as a quarter-filled periodic Anderson model with impurities located in between and hybridizing with the nearest-neighbor conducting electrons, together with a Hubbard repulsion for these electrons. By increasing the Hubbard interaction, the gap between the acoustic and optical magnons closes and reopens while the Berry phase of the acoustic band changes from 0 to π , leading to the occurrence of a topological transition. After this transition, there always exist in-gap edge magnonic modes, which is consistent with the bulk-edge correspondence. The Hubbard interaction-driven transition reveals a new mechanism to realize nontrivial magnon bands.
Kaewkamnerd, Saowaluck; Uthaipibull, Chairat; Intarapanich, Apichart; Pannarut, Montri; Chaotheing, Sastra; Tongsima, Sissades
2012-01-01
Current malaria diagnosis relies primarily on microscopic examination of Giemsa-stained thick and thin blood films. This method requires vigorously trained technicians to efficiently detect and classify the malaria parasite species such as Plasmodium falciparum (Pf) and Plasmodium vivax (Pv) for an appropriate drug administration. However, accurate classification of parasite species is difficult to achieve because of inherent technical limitations and human inconsistency. To improve performance of malaria parasite classification, many researchers have proposed automated malaria detection devices using digital image analysis. These image processing tools, however, focus on detection of parasites on thin blood films, which may not detect the existence of parasites due to the parasite scarcity on the thin blood film. The problem is aggravated with low parasitemia condition. Automated detection and classification of parasites on thick blood films, which contain more numbers of parasite per detection area, would address the previous limitation. The prototype of an automatic malaria parasite identification system is equipped with mountable motorized units for controlling the movements of objective lens and microscope stage. This unit was tested for its precision to move objective lens (vertical movement, z-axis) and microscope stage (in x- and y-horizontal movements). The average precision of x-, y- and z-axes movements were 71.481 ± 7.266 μm, 40.009 ± 0.000 μm, and 7.540 ± 0.889 nm, respectively. Classification of parasites on 60 Giemsa-stained thick blood films (40 blood films containing infected red blood cells and 20 control blood films of normal red blood cells) was tested using the image analysis module. By comparing our results with the ones verified by trained malaria microscopists, the prototype detected parasite-positive and parasite-negative blood films at the rate of 95% and 68.5% accuracy, respectively. For classification performance, the thick blood films with Pv parasite was correctly classified with the success rate of 75% while the accuracy of Pf classification was 90%. This work presents an automatic device for both detection and classification of malaria parasite species on thick blood film. The system is based on digital image analysis and featured with motorized stage units, designed to easily be mounted on most conventional light microscopes used in the endemic areas. The constructed motorized module could control the movements of objective lens and microscope stage at high precision for effective acquisition of quality images for analysis. The analysis program could accurately classify parasite species, into Pf or Pv, based on distribution of chromatin size.
Advances in Additive Manufacturing
2016-07-14
of 3D - printed structures. Analysis examples will include quantification of tolerance differences between the designed and manufactured parts, void...15. SUBJECT TERMS 3-D printing , validation and verification, nondestructive inspection, print -on-the-move, prototyping 16. SECURITY CLASSIFICATION...researching the formation of AM-grade metal powder from battlefield scrap and operating base waste, 2) potential of 3-D printing with sand to make
de Barros, Alba Lúcia; Fakih, Flávio Trevisani; Michel, Jeanne Liliane
2002-01-01
This article reports the pathway used to build a prototype of a computer nurse's clinical decision making support system, using NANDA, NIC and NOC classifications, as an auxiliary tool in the insertion of nursing data in the computerized patient record of Hospital São Paulo/UNIFESP.
A Feature-based Approach to Big Data Analysis of Medical Images
Toews, Matthew; Wachinger, Christian; Estepar, Raul San Jose; Wells, William M.
2015-01-01
This paper proposes an inference method well-suited to large sets of medical images. The method is based upon a framework where distinctive 3D scale-invariant features are indexed efficiently to identify approximate nearest-neighbor (NN) feature matches in O(log N) computational complexity in the number of images N. It thus scales well to large data sets, in contrast to methods based on pair-wise image registration or feature matching requiring O(N) complexity. Our theoretical contribution is a density estimator based on a generative model that generalizes kernel density estimation and K-nearest neighbor (KNN) methods. The estimator can be used for on-the-fly queries, without requiring explicit parametric models or an off-line training phase. The method is validated on a large multi-site data set of 95,000,000 features extracted from 19,000 lung CT scans. Subject-level classification identifies all images of the same subjects across the entire data set despite deformation due to breathing state, including unintentional duplicate scans. State-of-the-art performance is achieved in predicting chronic pulmonary obstructive disorder (COPD) severity across the 5-category GOLD clinical rating, with an accuracy of 89% if both exact and one-off predictions are considered correct. PMID:26221685
A Feature-Based Approach to Big Data Analysis of Medical Images.
Toews, Matthew; Wachinger, Christian; Estepar, Raul San Jose; Wells, William M
2015-01-01
This paper proposes an inference method well-suited to large sets of medical images. The method is based upon a framework where distinctive 3D scale-invariant features are indexed efficiently to identify approximate nearest-neighbor (NN) feature matches-in O (log N) computational complexity in the number of images N. It thus scales well to large data sets, in contrast to methods based on pair-wise image registration or feature matching requiring O(N) complexity. Our theoretical contribution is a density estimator based on a generative model that generalizes kernel density estimation and K-nearest neighbor (KNN) methods.. The estimator can be used for on-the-fly queries, without requiring explicit parametric models or an off-line training phase. The method is validated on a large multi-site data set of 95,000,000 features extracted from 19,000 lung CT scans. Subject-level classification identifies all images of the same subjects across the entire data set despite deformation due to breathing state, including unintentional duplicate scans. State-of-the-art performance is achieved in predicting chronic pulmonary obstructive disorder (COPD) severity across the 5-category GOLD clinical rating, with an accuracy of 89% if both exact and one-off predictions are considered correct.
Stapanian, Martin A.; Adams, Jean V.; Fennessy, M. Siobhan; Mack, John; Micacchion, Mick
2013-01-01
A persistent question among ecologists and environmental managers is whether constructed wetlands are structurally or functionally equivalent to naturally occurring wetlands. We examined 19 variables collected from 10 constructed and nine natural emergent wetlands in Ohio, USA. Our primary objective was to identify candidate indicators of wetland class (natural or constructed), based on measurements of soil properties and an index of vegetation integrity, that can be used to track the progress of constructed wetlands toward a natural state. The method of nearest shrunken centroids was used to find a subset of variables that would serve as the best classifiers of wetland class, and error rate was calculated using a five-fold cross-validation procedure. The shrunken differences of percent total organic carbon (% TOC) and percent dry weight of the soil exhibited the greatest distances from the overall centroid. Classification based on these two variables yielded a misclassification rate of 11% based on cross-validation. Our results indicate that % TOC and percent dry weight can be used as candidate indicators of the status of emergent, constructed wetlands in Ohio and for assessing the performance of mitigation. The method of nearest shrunken centroids has excellent potential for further applications in ecology.
Novel approach for image skeleton and distance transformation parallel algorithms
NASA Astrophysics Data System (ADS)
Qing, Kent P.; Means, Robert W.
1994-05-01
Image Understanding is more important in medical imaging than ever, particularly where real-time automatic inspection, screening and classification systems are installed. Skeleton and distance transformations are among the common operations that extract useful information from binary images and aid in Image Understanding. The distance transformation describes the objects in an image by labeling every pixel in each object with the distance to its nearest boundary. The skeleton algorithm starts from the distance transformation and finds the set of pixels that have a locally maximum label. The distance algorithm has to scan the entire image several times depending on the object width. For each pixel, the algorithm must access the neighboring pixels and find the maximum distance from the nearest boundary. It is a computational and memory access intensive procedure. In this paper, we propose a novel parallel approach to the distance transform and skeleton algorithms using the latest VLSI high- speed convolutional chips such as HNC's ViP. The algorithm speed is dependent on the object's width and takes (k + [(k-1)/3]) * 7 milliseconds for a 512 X 512 image with k being the maximum distance of the largest object. All objects in the image will be skeletonized at the same time in parallel.
Prediction of Human Intestinal Absorption of Compounds Using Artificial Intelligence Techniques.
Kumar, Rajnish; Sharma, Anju; Siddiqui, Mohammed Haris; Tiwari, Rajesh Kumar
2017-01-01
Information about Pharmacokinetics of compounds is an essential component of drug design and development. Modeling the pharmacokinetic properties require identification of the factors effecting absorption, distribution, metabolism and excretion of compounds. There have been continuous attempts in the prediction of intestinal absorption of compounds using various Artificial intelligence methods in the effort to reduce the attrition rate of drug candidates entering to preclinical and clinical trials. Currently, there are large numbers of individual predictive models available for absorption using machine learning approaches. Six Artificial intelligence methods namely, Support vector machine, k- nearest neighbor, Probabilistic neural network, Artificial neural network, Partial least square and Linear discriminant analysis were used for prediction of absorption of compounds. Prediction accuracy of Support vector machine, k- nearest neighbor, Probabilistic neural network, Artificial neural network, Partial least square and Linear discriminant analysis for prediction of intestinal absorption of compounds was found to be 91.54%, 88.33%, 84.30%, 86.51%, 79.07% and 80.08% respectively. Comparative analysis of all the six prediction models suggested that Support vector machine with Radial basis function based kernel is comparatively better for binary classification of compounds using human intestinal absorption and may be useful at preliminary stages of drug design and development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Inverse scattering transform analysis of rogue waves using local periodization procedure
NASA Astrophysics Data System (ADS)
Randoux, Stéphane; Suret, Pierre; El, Gennady
2016-07-01
The nonlinear Schrödinger equation (NLSE) stands out as the dispersive nonlinear partial differential equation that plays a prominent role in the modeling and understanding of the wave phenomena relevant to many fields of nonlinear physics. The question of random input problems in the one-dimensional and integrable NLSE enters within the framework of integrable turbulence, and the specific question of the formation of rogue waves (RWs) has been recently extensively studied in this context. The determination of exact analytic solutions of the focusing 1D-NLSE prototyping RW events of statistical relevance is now considered as the problem of central importance. Here we address this question from the perspective of the inverse scattering transform (IST) method that relies on the integrable nature of the wave equation. We develop a conceptually new approach to the RW classification in which appropriate, locally coherent structures are specifically isolated from a globally incoherent wave train to be subsequently analyzed by implementing a numerical IST procedure relying on a spatial periodization of the object under consideration. Using this approach we extend the existing classifications of the prototypes of RWs from standard breathers and their collisions to more general nonlinear modes characterized by their nonlinear spectra.
Distance learning in discriminative vector quantization.
Schneider, Petra; Biehl, Michael; Hammer, Barbara
2009-10-01
Discriminative vector quantization schemes such as learning vector quantization (LVQ) and extensions thereof offer efficient and intuitive classifiers based on the representation of classes by prototypes. The original methods, however, rely on the Euclidean distance corresponding to the assumption that the data can be represented by isotropic clusters. For this reason, extensions of the methods to more general metric structures have been proposed, such as relevance adaptation in generalized LVQ (GLVQ) and matrix learning in GLVQ. In these approaches, metric parameters are learned based on the given classification task such that a data-driven distance measure is found. In this letter, we consider full matrix adaptation in advanced LVQ schemes. In particular, we introduce matrix learning to a recent statistical formalization of LVQ, robust soft LVQ, and we compare the results on several artificial and real-life data sets to matrix learning in GLVQ, a derivation of LVQ-like learning based on a (heuristic) cost function. In all cases, matrix adaptation allows a significant improvement of the classification accuracy. Interestingly, however, the principled behavior of the models with respect to prototype locations and extracted matrix dimensions shows several characteristic differences depending on the data sets.
Inverse scattering transform analysis of rogue waves using local periodization procedure
Randoux, Stéphane; Suret, Pierre; El, Gennady
2016-01-01
The nonlinear Schrödinger equation (NLSE) stands out as the dispersive nonlinear partial differential equation that plays a prominent role in the modeling and understanding of the wave phenomena relevant to many fields of nonlinear physics. The question of random input problems in the one-dimensional and integrable NLSE enters within the framework of integrable turbulence, and the specific question of the formation of rogue waves (RWs) has been recently extensively studied in this context. The determination of exact analytic solutions of the focusing 1D-NLSE prototyping RW events of statistical relevance is now considered as the problem of central importance. Here we address this question from the perspective of the inverse scattering transform (IST) method that relies on the integrable nature of the wave equation. We develop a conceptually new approach to the RW classification in which appropriate, locally coherent structures are specifically isolated from a globally incoherent wave train to be subsequently analyzed by implementing a numerical IST procedure relying on a spatial periodization of the object under consideration. Using this approach we extend the existing classifications of the prototypes of RWs from standard breathers and their collisions to more general nonlinear modes characterized by their nonlinear spectra. PMID:27385164
Fabelo, Himar; Ortega, Samuel; Ravi, Daniele; Kiran, B Ravi; Sosa, Coralia; Bulters, Diederik; Callicó, Gustavo M; Bulstrode, Harry; Szolna, Adam; Piñeiro, Juan F; Kabwama, Silvester; Madroñal, Daniel; Lazcano, Raquel; J-O'Shanahan, Aruma; Bisshopp, Sara; Hernández, María; Báez, Abelardo; Yang, Guang-Zhong; Stanciulescu, Bogdan; Salvador, Rubén; Juárez, Eduardo; Sarmiento, Roberto
2018-01-01
Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a non-contact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. Firstly, a supervised pixel-wise classification using a Support Vector Machine classifier is performed. The generated classification map is spatially homogenized using a one-band representation of the HS cube, employing the Fixed Reference t-Stochastic Neighbors Embedding dimensional reduction algorithm, and performing a K-Nearest Neighbors filtering. The information generated by the supervised stage is combined with a segmentation map obtained via unsupervised clustering employing a Hierarchical K-Means algorithm. The fusion is performed using a majority voting approach that associates each cluster with a certain class. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising, obtaining an accurate delineation of the tumor area.
Kabwama, Silvester; Madroñal, Daniel; Lazcano, Raquel; J-O’Shanahan, Aruma; Bisshopp, Sara; Hernández, María; Báez, Abelardo; Yang, Guang-Zhong; Stanciulescu, Bogdan; Salvador, Rubén; Juárez, Eduardo; Sarmiento, Roberto
2018-01-01
Surgery for brain cancer is a major problem in neurosurgery. The diffuse infiltration into the surrounding normal brain by these tumors makes their accurate identification by the naked eye difficult. Since surgery is the common treatment for brain cancer, an accurate radical resection of the tumor leads to improved survival rates for patients. However, the identification of the tumor boundaries during surgery is challenging. Hyperspectral imaging is a non-contact, non-ionizing and non-invasive technique suitable for medical diagnosis. This study presents the development of a novel classification method taking into account the spatial and spectral characteristics of the hyperspectral images to help neurosurgeons to accurately determine the tumor boundaries in surgical-time during the resection, avoiding excessive excision of normal tissue or unintentionally leaving residual tumor. The algorithm proposed in this study to approach an efficient solution consists of a hybrid framework that combines both supervised and unsupervised machine learning methods. Firstly, a supervised pixel-wise classification using a Support Vector Machine classifier is performed. The generated classification map is spatially homogenized using a one-band representation of the HS cube, employing the Fixed Reference t-Stochastic Neighbors Embedding dimensional reduction algorithm, and performing a K-Nearest Neighbors filtering. The information generated by the supervised stage is combined with a segmentation map obtained via unsupervised clustering employing a Hierarchical K-Means algorithm. The fusion is performed using a majority voting approach that associates each cluster with a certain class. To evaluate the proposed approach, five hyperspectral images of surface of the brain affected by glioblastoma tumor in vivo from five different patients have been used. The final classification maps obtained have been analyzed and validated by specialists. These preliminary results are promising, obtaining an accurate delineation of the tumor area. PMID:29554126
Assessment of pedophilia using hemodynamic brain response to sexual stimuli.
Ponseti, Jorge; Granert, Oliver; Jansen, Olav; Wolff, Stephan; Beier, Klaus; Neutze, Janina; Deuschl, Günther; Mehdorn, Hubertus; Siebner, Hartwig; Bosinski, Hartmut
2012-02-01
Accurately assessing sexual preference is important in the treatment of child sex offenders. Phallometry is the standard method to identify sexual preference; however, this measure has been criticized for its intrusiveness and limited reliability. To evaluate whether spatial response pattern to sexual stimuli as revealed by a change in the blood oxygen level-dependent signal facilitates the identification of pedophiles. During functional magnetic resonance imaging, pedophilic and nonpedophilic participants were briefly exposed to same- and opposite-sex images of nude children and adults. We calculated differences in blood oxygen level-dependent signals to child and adult sexual stimuli for each participant. The corresponding contrast images were entered into a group analysis to calculate whole-brain difference maps between groups. We calculated an expression value that corresponded to the group result for each participant. These expression values were submitted to 2 different classification algorithms: Fisher linear discriminant analysis and κ -nearest neighbor analysis. This classification procedure was cross-validated using the leave-one-out method. Section of Sexual Medicine, Medical School, Christian Albrechts University of Kiel, Kiel, Germany. We recruited 24 participants with pedophilia who were sexually attracted to either prepubescent girls (n = 11) or prepubescent boys (n = 13) and 32 healthy male controls who were sexually attracted to either adult women (n = 18) or adult men (n = 14). Sensitivity and specificity scores of the 2 classification algorithms. The highest classification accuracy was achieved by Fisher linear discriminant analysis, which showed a mean accuracy of 95% (100% specificity, 88% sensitivity). Functional brain response patterns to sexual stimuli contain sufficient information to identify pedophiles with high accuracy. The automatic classification of these patterns is a promising objective tool to clinically diagnose pedophilia.
A biologically plausible computational model for auditory object recognition.
Larson, Eric; Billimoria, Cyrus P; Sen, Kamal
2009-01-01
Object recognition is a task of fundamental importance for sensory systems. Although this problem has been intensively investigated in the visual system, relatively little is known about the recognition of complex auditory objects. Recent work has shown that spike trains from individual sensory neurons can be used to discriminate between and recognize stimuli. Multiple groups have developed spike similarity or dissimilarity metrics to quantify the differences between spike trains. Using a nearest-neighbor approach the spike similarity metrics can be used to classify the stimuli into groups used to evoke the spike trains. The nearest prototype spike train to the tested spike train can then be used to identify the stimulus. However, how biological circuits might perform such computations remains unclear. Elucidating this question would facilitate the experimental search for such circuits in biological systems, as well as the design of artificial circuits that can perform such computations. Here we present a biologically plausible model for discrimination inspired by a spike distance metric using a network of integrate-and-fire model neurons coupled to a decision network. We then apply this model to the birdsong system in the context of song discrimination and recognition. We show that the model circuit is effective at recognizing individual songs, based on experimental input data from field L, the avian primary auditory cortex analog. We also compare the performance and robustness of this model to two alternative models of song discrimination: a model based on coincidence detection and a model based on firing rate.
Jan, Shau-Shiun; Hsu, Li-Ta; Tsai, Wen-Ming
2010-01-01
In order to provide the seamless navigation and positioning services for indoor environments, an indoor location based service (LBS) test bed is developed to integrate the indoor positioning system and the indoor three-dimensional (3D) geographic information system (GIS). A wireless sensor network (WSN) is used in the developed indoor positioning system. Considering the power consumption, in this paper the ZigBee radio is used as the wireless protocol, and the received signal strength (RSS) fingerprinting positioning method is applied as the primary indoor positioning algorithm. The matching processes of the user location include the nearest neighbor (NN) algorithm, the K-weighted nearest neighbors (KWNN) algorithm, and the probabilistic approach. To enhance the positioning accuracy for the dynamic user, the particle filter is used to improve the positioning performance. As part of this research, a 3D indoor GIS is developed to be used with the indoor positioning system. This involved using the computer-aided design (CAD) software and the virtual reality markup language (VRML) to implement a prototype indoor LBS test bed. Thus, a rapid and practical procedure for constructing a 3D indoor GIS is proposed, and this GIS is easy to update and maintenance for users. The building of the Department of Aeronautics and Astronautics at National Cheng Kung University in Taiwan is used as an example to assess the performance of various algorithms for the indoor positioning system.
Jan, Shau-Shiun; Hsu, Li-Ta; Tsai, Wen-Ming
2010-01-01
In order to provide the seamless navigation and positioning services for indoor environments, an indoor location based service (LBS) test bed is developed to integrate the indoor positioning system and the indoor three-dimensional (3D) geographic information system (GIS). A wireless sensor network (WSN) is used in the developed indoor positioning system. Considering the power consumption, in this paper the ZigBee radio is used as the wireless protocol, and the received signal strength (RSS) fingerprinting positioning method is applied as the primary indoor positioning algorithm. The matching processes of the user location include the nearest neighbor (NN) algorithm, the K-weighted nearest neighbors (KWNN) algorithm, and the probabilistic approach. To enhance the positioning accuracy for the dynamic user, the particle filter is used to improve the positioning performance. As part of this research, a 3D indoor GIS is developed to be used with the indoor positioning system. This involved using the computer-aided design (CAD) software and the virtual reality markup language (VRML) to implement a prototype indoor LBS test bed. Thus, a rapid and practical procedure for constructing a 3D indoor GIS is proposed, and this GIS is easy to update and maintenance for users. The building of the Department of Aeronautics and Astronautics at National Cheng Kung University in Taiwan is used as an example to assess the performance of various algorithms for the indoor positioning system. PMID:22319282
Zou, Meng; Liu, Zhaoqi; Zhang, Xiang-Sun; Wang, Yong
2015-10-15
In prognosis and survival studies, an important goal is to identify multi-biomarker panels with predictive power using molecular characteristics or clinical observations. Such analysis is often challenged by censored, small-sample-size, but high-dimensional genomic profiles or clinical data. Therefore, sophisticated models and algorithms are in pressing need. In this study, we propose a novel Area Under Curve (AUC) optimization method for multi-biomarker panel identification named Nearest Centroid Classifier for AUC optimization (NCC-AUC). Our method is motived by the connection between AUC score for classification accuracy evaluation and Harrell's concordance index in survival analysis. This connection allows us to convert the survival time regression problem to a binary classification problem. Then an optimization model is formulated to directly maximize AUC and meanwhile minimize the number of selected features to construct a predictor in the nearest centroid classifier framework. NCC-AUC shows its great performance by validating both in genomic data of breast cancer and clinical data of stage IB Non-Small-Cell Lung Cancer (NSCLC). For the genomic data, NCC-AUC outperforms Support Vector Machine (SVM) and Support Vector Machine-based Recursive Feature Elimination (SVM-RFE) in classification accuracy. It tends to select a multi-biomarker panel with low average redundancy and enriched biological meanings. Also NCC-AUC is more significant in separation of low and high risk cohorts than widely used Cox model (Cox proportional-hazards regression model) and L1-Cox model (L1 penalized in Cox model). These performance gains of NCC-AUC are quite robust across 5 subtypes of breast cancer. Further in an independent clinical data, NCC-AUC outperforms SVM and SVM-RFE in predictive accuracy and is consistently better than Cox model and L1-Cox model in grouping patients into high and low risk categories. In summary, NCC-AUC provides a rigorous optimization framework to systematically reveal multi-biomarker panel from genomic and clinical data. It can serve as a useful tool to identify prognostic biomarkers for survival analysis. NCC-AUC is available at http://doc.aporc.org/wiki/NCC-AUC. ywang@amss.ac.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Yousef Kalafi, Elham; Tan, Wooi Boon; Town, Christopher; Dhillon, Sarinder Kaur
2016-12-22
Monogeneans are flatworms (Platyhelminthes) that are primarily found on gills and skin of fishes. Monogenean parasites have attachment appendages at their haptoral regions that help them to move about the body surface and feed on skin and gill debris. Haptoral attachment organs consist of sclerotized hard parts such as hooks, anchors and marginal hooks. Monogenean species are differentiated based on their haptoral bars, anchors, marginal hooks, reproductive parts' (male and female copulatory organs) morphological characters and soft anatomical parts. The complex structure of these diagnostic organs and also their overlapping in microscopic digital images are impediments for developing fully automated identification system for monogeneans (LNCS 7666:256-263, 2012), (ISDA; 457-462, 2011), (J Zoolog Syst Evol Res 52(2): 95-99. 2013;). In this study images of hard parts of the haptoral organs such as bars and anchors are used to develop a fully automated identification technique for monogenean species identification by implementing image processing techniques and machine learning methods. Images of four monogenean species namely Sinodiplectanotrema malayanus, Trianchoratus pahangensis, Metahaliotrema mizellei and Metahaliotrema sp. (undescribed) were used to develop an automated technique for identification. K-nearest neighbour (KNN) was applied to classify the monogenean specimens based on the extracted features. 50% of the dataset was used for training and the other 50% was used as testing for system evaluation. Our approach demonstrated overall classification accuracy of 90%. In this study Leave One Out (LOO) cross validation is used for validation of our system and the accuracy is 91.25%. The methods presented in this study facilitate fast and accurate fully automated classification of monogeneans at the species level. In future studies more classes will be included in the model, the time to capture the monogenean images will be reduced and improvements in extraction and selection of features will be implemented.
Flowability of granular materials with industrial applications - An experimental approach
NASA Astrophysics Data System (ADS)
Torres-Serra, Joel; Romero, Enrique; Rodríguez-Ferran, Antonio; Caba, Joan; Arderiu, Xavier; Padullés, Josep-Manel; González, Juanjo
2017-06-01
Designing bulk material handling equipment requires a thorough understanding of the mechanical behaviour of powders and grains. Experimental characterization of granular materials is introduced focusing on flowability. A new prototype is presented which performs granular column collapse tests. The device consists of a channel whose design accounts for test inspection using visualization techniques and load measurements. A reservoir is attached where packing state of the granular material can be adjusted before run-off to simulate actual handling conditions by fluidisation and deaeration of the pile. Bulk materials on the market, with a wide range of particle sizes, can be tested with the prototype and the results used for classification in terms of flowability to improve industrial equipment selection processes.
A Web-Based Framework For a Time-Domain Warehouse
NASA Astrophysics Data System (ADS)
Brewer, J. M.; Bloom, J. S.; Kennedy, R.; Starr, D. L.
2009-09-01
The Berkeley Transients Classification Pipeline (TCP) uses a machine-learning classifier to automatically categorize transients from large data torrents and provide automated notification of astronomical events of scientific interest. As part of the training process, we created a large warehouse of light-curve sources with well-labelled classes that serve as priors to the classification engine. This web-based interactive framework, which we are now making public via DotAstro.org (http://dotastro.org/), allows us to ingest time-variable source data in a wide variety of formats and store it in a common internal data model. Data is passed between pipeline modules in a prototype XML representation of time-series format (VOTimeseries), which can also be emitted to collaborators through dotastro.org. After import, the sources can be visualized using Google Sky, light curves can be inspected interactively, and classifications can be manually adjusted.
Satellite image analysis using neural networks
NASA Technical Reports Server (NTRS)
Sheldon, Roger A.
1990-01-01
The tremendous backlog of unanalyzed satellite data necessitates the development of improved methods for data cataloging and analysis. Ford Aerospace has developed an image analysis system, SIANN (Satellite Image Analysis using Neural Networks) that integrates the technologies necessary to satisfy NASA's science data analysis requirements for the next generation of satellites. SIANN will enable scientists to train a neural network to recognize image data containing scenes of interest and then rapidly search data archives for all such images. The approach combines conventional image processing technology with recent advances in neural networks to provide improved classification capabilities. SIANN allows users to proceed through a four step process of image classification: filtering and enhancement, creation of neural network training data via application of feature extraction algorithms, configuring and training a neural network model, and classification of images by application of the trained neural network. A prototype experimentation testbed was completed and applied to climatological data.
Predicting Player Position for Talent Identification in Association Football
NASA Astrophysics Data System (ADS)
Razali, Nazim; Mustapha, Aida; Yatim, Faiz Ahmad; Aziz, Ruhaya Ab
2017-08-01
This paper is set to introduce a new framework from the perspective of Computer Science for identifying talents in the sport of football based on the players’ individual qualities; physical, mental, and technical. The combination of qualities as assessed by coaches are then used to predict the players’ position in a match that suits the player the best in a particular team formation. Evaluation of the proposed framework is two-fold; quantitatively via classification experiments to predict player position, and qualitatively via a Talent Identification Site developed to achieve the same goal. Results from the classification experiments using Bayesian Networks, Decision Trees, and K-Nearest Neighbor have shown an average of 98% accuracy, which will promote consistency in decision-making though elimination of personal bias in team selection. The positive reviews on the Football Identification Site based on user acceptance evaluation also indicates that the framework is sufficient to serve as the basis of developing an intelligent team management system in different sports, whereby growth and performance of sport players can be monitored and identified.
Nagarajan, R; Hariharan, M; Satiyan, M
2012-08-01
Developing tools to assist physically disabled and immobilized people through facial expression is a challenging area of research and has attracted many researchers recently. In this paper, luminance stickers based facial expression recognition is proposed. Recognition of facial expression is carried out by employing Discrete Wavelet Transform (DWT) as a feature extraction method. Different wavelet families with their different orders (db1 to db20, Coif1 to Coif 5 and Sym2 to Sym8) are utilized to investigate their performance in recognizing facial expression and to evaluate their computational time. Standard deviation is computed for the coefficients of first level of wavelet decomposition for every order of wavelet family. This standard deviation is used to form a set of feature vectors for classification. In this study, conventional validation and cross validation are performed to evaluate the efficiency of the suggested feature vectors. Three different classifiers namely Artificial Neural Network (ANN), k-Nearest Neighborhood (kNN) and Linear Discriminant Analysis (LDA) are used to classify a set of eight facial expressions. The experimental results demonstrate that the proposed method gives very promising classification accuracies.
Bajoub, Aadil; Medina-Rodríguez, Santiago; Gómez-Romero, María; Ajal, El Amine; Bagur-González, María Gracia; Fernández-Gutiérrez, Alberto; Carrasco-Pancorbo, Alegría
2017-01-15
High Performance Liquid Chromatography (HPLC) with diode array (DAD) and fluorescence (FLD) detection was used to acquire the fingerprints of the phenolic fraction of monovarietal extra-virgin olive oils (extra-VOOs) collected over three consecutive crop seasons (2011/2012-2013/2014). The chromatographic fingerprints of 140 extra-VOO samples processed from olive fruits of seven olive varieties, were recorded and statistically treated for varietal authentication purposes. First, DAD and FLD chromatographic-fingerprint datasets were separately processed and, subsequently, were joined using "Low-level" and "Mid-Level" data fusion methods. After the preliminary examination by principal component analysis (PCA), three supervised pattern recognition techniques, Partial Least Squares Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogies (SIMCA) and K-Nearest Neighbors (k-NN) were applied to the four chromatographic-fingerprinting matrices. The classification models built were very sensitive and selective, showing considerably good recognition and prediction abilities. The combination "chromatographic dataset+chemometric technique" allowing the most accurate classification for each monovarietal extra-VOO was highlighted. Copyright © 2016 Elsevier Ltd. All rights reserved.
Chen, Xue; Li, Xiaohui; Yang, Sibo; Yu, Xin; Liu, Aichun
2018-01-01
Lymphoma is a significant cancer that affects the human lymphatic and hematopoietic systems. In this work, discrimination of lymphoma using laser-induced breakdown spectroscopy (LIBS) conducted on whole blood samples is presented. The whole blood samples collected from lymphoma patients and healthy controls are deposited onto standard quantitative filter papers and ablated with a 1064 nm Q-switched Nd:YAG laser. 16 atomic and ionic emission lines of calcium (Ca), iron (Fe), magnesium (Mg), potassium (K) and sodium (Na) are selected to discriminate the cancer disease. Chemometric methods, including principal component analysis (PCA), linear discriminant analysis (LDA) classification, and k nearest neighbor (kNN) classification are used to build the discrimination models. Both LDA and kNN models have achieved very good discrimination performances for lymphoma, with an accuracy of over 99.7%, a sensitivity of over 0.996, and a specificity of over 0.997. These results demonstrate that the whole-blood-based LIBS technique in combination with chemometric methods can serve as a fast, less invasive, and accurate method for detection and discrimination of human malignancies. PMID:29541503
NASA Astrophysics Data System (ADS)
Karsi, Redouane; Zaim, Mounia; El Alami, Jamila
2017-07-01
Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.
On-line classification of pollutants in water using wireless portable electronic noses.
Herrero, José Luis; Lozano, Jesús; Santos, José Pedro; Suárez, José Ignacio
2016-06-01
A portable electronic nose with database connection for on-line classification of pollutants in water is presented in this paper. It is a hand-held, lightweight and powered instrument with wireless communications capable of standalone operation. A network of similar devices can be configured for distributed measurements. It uses four resistive microsensors and headspace as sampling method for extracting the volatile compounds from glass vials. The measurement and control program has been developed in LabVIEW using the database connection toolkit to send the sensors data to a server for training and classification with Artificial Neural Networks (ANNs). The use of a server instead of the microprocessor of the e-nose increases the capacity of memory and the computing power of the classifier and allows external users to perform data classification. To address this challenge, this paper also proposes a web-based framework (based on RESTFul web services, Asynchronous JavaScript and XML and JavaScript Object Notation) that allows remote users to train ANNs and request classification values regardless user's location and the type of device used. Results show that the proposed prototype can discriminate the samples measured (Blank water, acetone, toluene, ammonia, formaldehyde, hydrogen peroxide, ethanol, benzene, dichloromethane, acetic acid, xylene and dimethylacetamide) with a 94% classification success rate. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Gautam, Nitin
The main objectives of this thesis are to develop a robust statistical method for the classification of ocean precipitation based on physical properties to which the SSM/I is sensitive and to examine how these properties vary globally and seasonally. A two step approach is adopted for the classification of oceanic precipitation classes from multispectral SSM/I data: (1)we subjectively define precipitation classes using a priori information about the precipitating system and its possible distinct signature on SSM/I data such as scattering by ice particles aloft in the precipitating cloud, emission by liquid rain water below freezing level, the difference of polarization at 19 GHz-an indirect measure of optical depth, etc.; (2)we then develop an objective classification scheme which is found to reproduce the subjective classification with high accuracy. This hybrid strategy allows us to use the characteristics of the data to define and encode classes and helps retain the physical interpretation of classes. The classification methods based on k-nearest neighbor and neural network are developed to objectively classify six precipitation classes. It is found that the classification method based neural network yields high accuracy for all precipitation classes. An inversion method based on minimum variance approach was used to retrieve gross microphysical properties of these precipitation classes such as column integrated liquid water path, column integrated ice water path, and column integrated min water path. This classification method is then applied to 2 years (1991-92) of SSM/I data to examine and document the seasonal and global distribution of precipitation frequency corresponding to each of these objectively defined six classes. The characteristics of the distribution are found to be consistent with assumptions used in defining these six precipitation classes and also with well known climatological patterns of precipitation regions. The seasonal and global distribution of these six classes is also compared with the earlier results obtained from Comprehensive Ocean Atmosphere Data Sets (COADS). It is found that the gross pattern of the distributions obtained from SSM/I and COADS data match remarkably well with each other.
Computational theory of line drawing interpretation
NASA Technical Reports Server (NTRS)
Witkin, A. P.
1981-01-01
The recovery of the three dimensional structure of visible surfaces depicted in an image by emphasizing the role of geometric cues present in line drawings, was studied. Three key components are line classification, line interpretation, and surface interpolation. A model for three dimensional line interpretation and surface orientation was refined and a theory for the recovery of surface shape from surface marking geometry was developed. A new approach to the classification of edges was developed and implemented signatures were deduced for each of several edge types, expressed in terms of correlational properties of the image intensities in the vicinity of the edge. A computer program was developed that evaluates image edges as compared with these prototype signatures.
Adaptive fuzzy leader clustering of complex data sets in pattern recognition
NASA Technical Reports Server (NTRS)
Newton, Scott C.; Pemmaraju, Surya; Mitra, Sunanda
1992-01-01
A modular, unsupervised neural network architecture for clustering and classification of complex data sets is presented. The adaptive fuzzy leader clustering (AFLC) architecture is a hybrid neural-fuzzy system that learns on-line in a stable and efficient manner. The initial classification is performed in two stages: a simple competitive stage and a distance metric comparison stage. The cluster prototypes are then incrementally updated by relocating the centroid positions from fuzzy C-means system equations for the centroids and the membership values. The AFLC algorithm is applied to the Anderson Iris data and laser-luminescent fingerprint image data. It is concluded that the AFLC algorithm successfully classifies features extracted from real data, discrete or continuous.
Pre-Prototype 5kW Fuel Cell Power Plant Development.
1987-04-01
1982 - JANUARY 1986 * APPROVED FOR PUBLIC RELEASEj DISTRIBUTION IS UNLIMITED AEROPROPULSION LABORATORYE AIR FORCE WRIGHT AERONAUTICAL LABORATORIES AI R...apptacable) Aero Propulsion Laboratory (AFWAL/POO§-2) Air Froce Wright Aeronautical Laboratories- 6c. ADDRESS (City. State alnd ZIP Code) 7b. ADDRESS (City...CLASSIFICATION OF THIS PAGE EXECUTIVE SUMMARY This report summarizes efforts under a U.S. Air Force Wright Aeronautical Laboratories (Wright-Patterson
Processing of Fear and Anger Facial Expressions: The Role of Spatial Frequency
Comfort, William E.; Wang, Meng; Benton, Christopher P.; Zana, Yossi
2013-01-01
Spatial frequency (SF) components encode a portion of the affective value expressed in face images. The aim of this study was to estimate the relative weight of specific frequency spectrum bandwidth on the discrimination of anger and fear facial expressions. The general paradigm was a classification of the expression of faces morphed at varying proportions between anger and fear images in which SF adaptation and SF subtraction are expected to shift classification of facial emotion. A series of three experiments was conducted. In Experiment 1 subjects classified morphed face images that were unfiltered or filtered to remove either low (<8 cycles/face), middle (12–28 cycles/face), or high (>32 cycles/face) SF components. In Experiment 2 subjects were adapted to unfiltered or filtered prototypical (non-morphed) fear face images and subsequently classified morphed face images. In Experiment 3 subjects were adapted to unfiltered or filtered prototypical fear face images with the phase component randomized before classifying morphed face images. Removing mid frequency components from the target images shifted classification toward fear. The same shift was observed under adaptation condition to unfiltered and low- and middle-range filtered fear images. However, when the phase spectrum of the same adaptation stimuli was randomized, no adaptation effect was observed. These results suggest that medium SF components support the perception of fear more than anger at both low and high level of processing. They also suggest that the effect at high-level processing stage is related more to high-level featural and/or configural information than to the low-level frequency spectrum. PMID:23637687
Ali, Safdar; Majid, Abdul; Khan, Asifullah
2014-04-01
Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our analysis demonstrates that ensemble-RF, ensemble-SVM and ensemble-KNN are more effective than their individual counterparts. The proposed 'IDM-PhyChm-Ens' method has shown improved performance compared to existing techniques.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clark, Jared Matthew; Daum, Keith Alvin; Kalival, J. H.
2003-01-01
This initial study evaluates the use of ion mobility spectrometry (IMS) as a rapid test procedure for potential detection of adulterated perfumes and speciation of plant life. Sample types measured consist of five genuine perfumes, two species of sagebrush, and four species of flowers. Each sample type is treated as a separate classification problem. It is shown that discrimination using principal component analysis with K-nearest neighbors can distinguish one class from another. Discriminatory models generated using principal component regressions are not as effective. Results from this examination are encouraging and represent an initial phase demonstrating that perfumes and plants possessmore » characteristic chemical signatures that can be used for reliable identification.« less
Neighborhood graph and learning discriminative distance functions for clinical decision support.
Tsymbal, Alexey; Zhou, Shaohua Kevin; Huber, Martin
2009-01-01
There are two essential reasons for the slow progress in the acceptance of clinical case retrieval and similarity search-based decision support systems; the especial complexity of clinical data making it difficult to define a meaningful and effective distance function on them and the lack of transparency and explanation ability in many existing clinical case retrieval decision support systems. In this paper, we try to address these two problems by introducing a novel technique for visualizing inter-patient similarity based on a node-link representation with neighborhood graphs and by considering two techniques for learning discriminative distance function that help to combine the power of strong "black box" learners with the transparency of case retrieval and nearest neighbor classification.
NASA Technical Reports Server (NTRS)
Sabol, Donald E., Jr.; Roberts, Dar A.; Adams, John B.; Smith, Milton O.
1993-01-01
An important application of remote sensing is to map and monitor changes over large areas of the land surface. This is particularly significant with the current interest in monitoring vegetation communities. Most of traditional methods for mapping different types of plant communities are based upon statistical classification techniques (i.e., parallel piped, nearest-neighbor, etc.) applied to uncalibrated multispectral data. Classes from these techniques are typically difficult to interpret (particularly to a field ecologist/botanist). Also, classes derived for one image can be very different from those derived from another image of the same area, making interpretation of observed temporal changes nearly impossible. More recently, neural networks have been applied to classification. Neural network classification, based upon spectral matching, is weak in dealing with spectral mixtures (a condition prevalent in images of natural surfaces). Another approach to mapping vegetation communities is based on spectral mixture analysis, which can provide a consistent framework for image interpretation. Roberts et al. (1990) mapped vegetation using the band residuals from a simple mixing model (the same spectral endmembers applied to all image pixels). Sabol et al. (1992b) and Roberts et al. (1992) used different methods to apply the most appropriate spectral endmembers to each image pixel, thereby allowing mapping of vegetation based upon the the different endmember spectra. In this paper, we describe a new approach to classification of vegetation communities based upon the spectra fractions derived from spectral mixture analysis. This approach was applied to three 1992 AVIRIS images of Jasper Ridge, California to observe seasonal changes in surface composition.
Real-Time Classification of Hand Motions Using Ultrasound Imaging of Forearm Muscles.
Akhlaghi, Nima; Baker, Clayton A; Lahlou, Mohamed; Zafar, Hozaifah; Murthy, Karthik G; Rangwala, Huzefa S; Kosecka, Jana; Joiner, Wilsaan M; Pancrazio, Joseph J; Sikdar, Siddhartha
2016-08-01
Surface electromyography (sEMG) has been the predominant method for sensing electrical activity for a number of applications involving muscle-computer interfaces, including myoelectric control of prostheses and rehabilitation robots. Ultrasound imaging for sensing mechanical deformation of functional muscle compartments can overcome several limitations of sEMG, including the inability to differentiate between deep contiguous muscle compartments, low signal-to-noise ratio, and lack of a robust graded signal. The objective of this study was to evaluate the feasibility of real-time graded control using a computationally efficient method to differentiate between complex hand motions based on ultrasound imaging of forearm muscles. Dynamic ultrasound images of the forearm muscles were obtained from six able-bodied volunteers and analyzed to map muscle activity based on the deformation of the contracting muscles during different hand motions. Each participant performed 15 different hand motions, including digit flexion, different grips (i.e., power grasp and pinch grip), and grips in combination with wrist pronation. During the training phase, we generated a database of activity patterns corresponding to different hand motions for each participant. During the testing phase, novel activity patterns were classified using a nearest neighbor classification algorithm based on that database. The average classification accuracy was 91%. Real-time image-based control of a virtual hand showed an average classification accuracy of 92%. Our results demonstrate the feasibility of using ultrasound imaging as a robust muscle-computer interface. Potential clinical applications include control of multiarticulated prosthetic hands, stroke rehabilitation, and fundamental investigations of motor control and biomechanics.
Estimating local scaling properties for the classification of interstitial lung disease patterns
NASA Astrophysics Data System (ADS)
Huber, Markus B.; Nagarajan, Mahesh B.; Leinsinger, Gerda; Ray, Lawrence A.; Wismueller, Axel
2011-03-01
Local scaling properties of texture regions were compared in their ability to classify morphological patterns known as 'honeycombing' that are considered indicative for the presence of fibrotic interstitial lung diseases in high-resolution computed tomography (HRCT) images. For 14 patients with known occurrence of honeycombing, a stack of 70 axial, lung kernel reconstructed images were acquired from HRCT chest exams. 241 regions of interest of both healthy and pathological (89) lung tissue were identified by an experienced radiologist. Texture features were extracted using six properties calculated from gray-level co-occurrence matrices (GLCM), Minkowski Dimensions (MDs), and the estimation of local scaling properties with Scaling Index Method (SIM). A k-nearest-neighbor (k-NN) classifier and a Multilayer Radial Basis Functions Network (RBFN) were optimized in a 10-fold cross-validation for each texture vector, and the classification accuracy was calculated on independent test sets as a quantitative measure of automated tissue characterization. A Wilcoxon signed-rank test was used to compare two accuracy distributions including the Bonferroni correction. The best classification results were obtained by the set of SIM features, which performed significantly better than all the standard GLCM and MD features (p < 0.005) for both classifiers with the highest accuracy (94.1%, 93.7%; for the k-NN and RBFN classifier, respectively). The best standard texture features were the GLCM features 'homogeneity' (91.8%, 87.2%) and 'absolute value' (90.2%, 88.5%). The results indicate that advanced texture features using local scaling properties can provide superior classification performance in computer-assisted diagnosis of interstitial lung diseases when compared to standard texture analysis methods.
NASA Astrophysics Data System (ADS)
Seo, Young Wook; Yoon, Seung Chul; Park, Bosoon; Hinton, Arthur; Windham, William R.; Lawrence, Kurt C.
2013-05-01
Salmonella is a major cause of foodborne disease outbreaks resulting from the consumption of contaminated food products in the United States. This paper reports the development of a hyperspectral imaging technique for detecting and differentiating two of the most common Salmonella serotypes, Salmonella Enteritidis (SE) and Salmonella Typhimurium (ST), from background microflora that are often found in poultry carcass rinse. Presumptive positive screening of colonies with a traditional direct plating method is a labor intensive and time consuming task. Thus, this paper is concerned with the detection of differences in spectral characteristics among the pure SE, ST, and background microflora grown on brilliant green sulfa (BGS) and xylose lysine tergitol 4 (XLT4) agar media with a spread plating technique. Visible near-infrared hyperspectral imaging, providing the spectral and spatial information unique to each microorganism, was utilized to differentiate SE and ST from the background microflora. A total of 10 classification models, including five machine learning algorithms, each without and with principal component analysis (PCA), were validated and compared to find the best model in classification accuracy. The five machine learning (classification) algorithms used in this study were Mahalanobis distance (MD), k-nearest neighbor (kNN), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and support vector machine (SVM). The average classification accuracy of all 10 models on a calibration (or training) set of the pure cultures on BGS agar plates was 98% (Kappa coefficient = 0.95) in determining the presence of SE and/or ST although it was difficult to differentiate between SE and ST. The average classification accuracy of all 10 models on a training set for ST detection on XLT4 agar was over 99% (Kappa coefficient = 0.99) although SE colonies on XLT4 agar were difficult to differentiate from background microflora. The average classification accuracy of all 10 models on a validation set of chicken carcass rinses spiked with SE or ST and incubated on BGS agar plates was 94.45% and 83.73%, without and with PCA for classification, respectively. The best performing classification model on the validation set was QDA without PCA by achieving the classification accuracy of 98.65% (Kappa coefficient=0.98). The overall best performing classification model regardless of using PCA was MD with the classification accuracy of 94.84% (Kappa coefficient=0.88) on the validation set.
NASA Astrophysics Data System (ADS)
de Garidel-Thoron, T.; Marchant, R.; Soto, E.; Gally, Y.; Beaufort, L.; Bolton, C. T.; Bouslama, M.; Licari, L.; Mazur, J. C.; Brutti, J. M.; Norsa, F.
2017-12-01
Foraminifera tests are the main proxy carriers for paleoceanographic reconstructions. Both geochemical and taxonomical studies require large numbers of tests to achieve statistical relevance. To date, the extraction of foraminifera from the sediment coarse fraction is still done by hand and thus time-consuming. Moreover, the recognition of morphotypes, ecologically relevant, requires some taxonomical skills not easily taught. The automatic recognition and extraction of foraminifera would largely help paleoceanographers to overcome these issues. Recent advances in automatic image classification using machine learning opens the way to automatic extraction of foraminifera. Here we detail progress on the design of an automatic picking machine as part of the FIRST project. The machine handles 30 pre-sieved samples (100-1000µm), separating them into individual particles (including foraminifera) and imaging each in pseudo-3D. The particles are classified and specimens of interest are sorted either for Individual Foraminifera Analyses (44 per slide) and/or for classical multiple analyses (8 morphological classes per slide, up to 1000 individuals per hole). The classification is based on machine learning using Convolutional Neural Networks (CNNs), similar to the approach used in the coccolithophorid imaging system SYRACO. To prove its feasibility, we built two training image datasets of modern planktonic foraminifera containing approximately 2000 and 5000 images each, corresponding to 15 & 25 morphological classes. Using a CNN with a residual topology (ResNet) we achieve over 95% correct classification for each dataset. We tested the network on 160,000 images from 45 depths of a sediment core from the Pacific ocean, for which we have human counts. The current algorithm is able to reproduce the downcore variability in both Globigerinoides ruber and the fragmentation index (r2 = 0.58 and 0.88 respectively). The FIRST prototype yields some promising results for high-resolution paleoceanographic studies and evolutionary studies.
Tang, Yunwei; Jing, Linhai; Li, Hui; Liu, Qingjie; Yan, Qi; Li, Xiuxia
2016-01-01
This study explores the ability of WorldView-2 (WV-2) imagery for bamboo mapping in a mountainous region in Sichuan Province, China. A large area of this place is covered by shadows in the image, and only a few sampled points derived were useful. In order to identify bamboos based on sparse training data, the sample size was expanded according to the reflectance of multispectral bands selected using the principal component analysis (PCA). Then, class separability based on the training data was calculated using a feature space optimization method to select the features for classification. Four regular object-based classification methods were applied based on both sets of training data. The results show that the k-nearest neighbor (k-NN) method produced the greatest accuracy. A geostatistically-weighted k-NN classifier, accounting for the spatial correlation between classes, was then applied to further increase the accuracy. It achieved 82.65% and 93.10% of the producer’s and user’s accuracies respectively for the bamboo class. The canopy densities were estimated to explain the result. This study demonstrates that the WV-2 image can be used to identify small patches of understory bamboos given limited known samples, and the resulting bamboo distribution facilitates the assessments of the habitats of giant pandas. PMID:27879661
Image classification independent of orientation and scale
NASA Astrophysics Data System (ADS)
Arsenault, Henri H.; Parent, Sebastien; Moisan, Sylvain
1998-04-01
The recognition of targets independently of orientation has become fairly well developed in recent years for in-plane rotation. The out-of-plane rotation problem is much less advanced. When both out-of-plane rotations and changes of scale are present, the problem becomes very difficult. In this paper we describe our research on the combined out-of- plane rotation problem and the scale invariance problem. The rotations were limited to rotations about an axis perpendicular to the line of sight. The objects to be classified were three kinds of military vehicles. The inputs used were infrared imagery and photographs. We used a variation of a method proposed by Neiberg and Casasent, where a neural network is trained with a subset of the database and a minimum distances from lines in feature space are used for classification instead of nearest neighbors. Each line in the feature space corresponds to one class of objects, and points on one line correspond to different orientations of the same target. We found that the training samples needed to be closer for some orientations than for others, and that the most difficult orientations are where the target is head-on to the observer. By means of some additional training of the neural network, we were able to achieve 100% correct classification for 360 degree rotation and a range of scales over a factor of five.
NASA Technical Reports Server (NTRS)
Chettri, Samir R.; Cromp, Robert F.
1993-01-01
In this paper we discuss a neural network architecture (the Probabilistic Neural Net or the PNN) that, to the best of our knowledge, has not previously been applied to remotely sensed data. The PNN is a supervised non-parametric classification algorithm as opposed to the Gaussian maximum likelihood classifier (GMLC). The PNN works by fitting a Gaussian kernel to each training point. The width of the Gaussian is controlled by a tuning parameter called the window width. If very small widths are used, the method is equivalent to the nearest neighbor method. For large windows, the PNN behaves like the GMLC. The basic implementation of the PNN requires no training time at all. In this respect it is far better than the commonly used backpropagation neural network which can be shown to take O(N6) time for training where N is the dimensionality of the input vector. In addition the PNN can be implemented in a feed forward mode in hardware. The disadvantage of the PNN is that it requires all the training data to be stored. Some solutions to this problem are discussed in the paper. Finally, we discuss the accuracy of the PNN with respect to the GMLC and the backpropagation neural network (BPNN). The PNN is shown to be better than GMLC and not as good as the BPNN with regards to classification accuracy.
Go, Taesik; Byeon, Hyeokjun; Lee, Sang Joon
2018-04-30
Cell types of erythrocytes should be identified because they are closely related to their functionality and viability. Conventional methods for classifying erythrocytes are time consuming and labor intensive. Therefore, an automatic and accurate erythrocyte classification system is indispensable in healthcare and biomedical fields. In this study, we proposed a new label-free sensor for automatic identification of erythrocyte cell types using a digital in-line holographic microscopy (DIHM) combined with machine learning algorithms. A total of 12 features, including information on intensity distributions, morphological descriptors, and optical focusing characteristics, is quantitatively obtained from numerically reconstructed holographic images. All individual features for discocytes, echinocytes, and spherocytes are statistically different. To improve the performance of cell type identification, we adopted several machine learning algorithms, such as decision tree model, support vector machine, linear discriminant classification, and k-nearest neighbor classification. With the aid of these machine learning algorithms, the extracted features are effectively utilized to distinguish erythrocytes. Among the four tested algorithms, the decision tree model exhibits the best identification performance for the training sets (n = 440, 98.18%) and test sets (n = 190, 97.37%). This proposed methodology, which smartly combined DIHM and machine learning, would be helpful for sensing abnormal erythrocytes and computer-aided diagnosis of hematological diseases in clinic. Copyright © 2017 Elsevier B.V. All rights reserved.
Prediction of in vivo hepatotoxicity effects using in vitro ...
High-throughput in vitro transcriptomics data support molecular understanding of chemical-induced toxicity. Here, we evaluated the utility of such data to predict liver toxicity. First, in vitro gene expression data for 93 genes was generated following exposure of metabolically competent HepaRG cells to 1060 environmental chemicals from the US EPA ToxCast library. The empirical relationship between these data and rat chronic liver endpoints from animal studies in the Toxicity Reference Database (ToxRefDB) was then evaluated using machine learning techniques. Chemicals were classified as positive (242) or negative (135) based on observed hepatic histopathologic effects, and divided into three categories: hypertrophy (183), injury (112) and proliferative lesions (101). Hepatotoxicants were classified on the basis of the bioactivity of 93 genes (descriptors) using six machine learning algorithms: linear discriminant analysis, naïve Bayes, support vector classification, classification and regression trees, k-nearest neighbors, and an ensemble of classifiers. Classification performance was evaluated using 10-fold cross-validation testing, and in-loop, filter-based, feature subset selection. The best balanced accuracy for prediction of hypertrophy, injury and proliferative lesions were 0.81 ± 0.07, 0.79 ± 0.08 and 0.77 ± 0.09, respectively. Gene specific perturbation of xenobiotic metabolism enzymes (CYP7A1/2E1/4A11/1A1/4A22) and transporters (ABCG2, ABCB11, SLC22
Development of a Multiple-Stage Differential Mobility Analyzer (MDMA)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Da-Ren; Cheng, Mengdawn
2007-01-01
A new DMA column has been designed with the capability of simultaneously extracting monodisperse particles of different sizes in multiple stages. We call this design a multistage DMA, or MDMA. A prototype MDMA has been constructed and experimentally evaluated in this study. The new column enables the fast measurement of particles in a wide size range, while preserving the powerful particle classification function of a DMA. The prototype MDMA has three sampling stages, capable of classifying monodisperse particles of three different sizes simultaneously. The scanning voltage operation of a DMA can be applied to this new column. Each stage ofmore » MDMA column covers a fraction of the entire particle size range to be measured. The covered size fractions of two adjacent stages of the MDMA are designed somewhat overlapped. The arrangement leads to the reduction of scanning voltage range and thus the cycling time of the measurement. The modular sampling stage design of the MDMA allows the flexible configuration of desired particle classification lengths and variable number of stages in the MDMA. The design of our MDMA also permits operation at high sheath flow, enabling high-resolution particle size measurement and/or reduction of the lower sizing limit. Using the tandem DMA technique, the performance of the MDMA, i.e., sizing accuracy, resolution, and transmission efficiency, was evaluated at different ratios of aerosol and sheath flowrates. Two aerosol sampling schemes were investigated. One was to extract aerosol flows at an evenly partitioned flowrate at each stage, and the other was to extract aerosol at a rate the same as the polydisperse aerosol flowrate at each stage. We detail the prototype design of the MDMA and the evaluation result on the transfer functions of the MDMA at different particle sizes and operational conditions.« less
Diagnostic tools for nearest neighbors techniques when used with satellite imagery
Ronald E. McRoberts
2009-01-01
Nearest neighbors techniques are non-parametric approaches to multivariate prediction that are useful for predicting both continuous and categorical forest attribute variables. Although some assumptions underlying nearest neighbor techniques are common to other prediction techniques such as regression, other assumptions are unique to nearest neighbor techniques....
A Scatter-Based Prototype Framework and Multi-Class Extension of Support Vector Machines
Jenssen, Robert; Kloft, Marius; Zien, Alexander; Sonnenburg, Sören; Müller, Klaus-Robert
2012-01-01
We provide a novel interpretation of the dual of support vector machines (SVMs) in terms of scatter with respect to class prototypes and their mean. As a key contribution, we extend this framework to multiple classes, providing a new joint Scatter SVM algorithm, at the level of its binary counterpart in the number of optimization variables. This enables us to implement computationally efficient solvers based on sequential minimal and chunking optimization. As a further contribution, the primal problem formulation is developed in terms of regularized risk minimization and the hinge loss, revealing the score function to be used in the actual classification of test patterns. We investigate Scatter SVM properties related to generalization ability, computational efficiency, sparsity and sensitivity maps, and report promising results. PMID:23118845
Self-Contained Automated Vehicle Washing System
2014-09-26
SECURITY CLASSIFICATION OF: The Self Contained Automated Vehicle Washing System is a prototype that offers a reduction in the quantity of water ...supplied to the front lines by recycling wash water used in the cleaning of vehicles as well as capturing debris and other contaminates. The system also...of the warfighter to contaminates in the washing process. The System offers plug and play option for reclamation of the wash water and integration of
Optoelectronic devices toward monolithic integration
NASA Astrophysics Data System (ADS)
Ghergia, V.
1992-12-01
Starting from the present state of tl art of discrete devices up to the on going realization of monolithic semicorxtuctor integrated prototypes an overview ofoptoelectronic devices for telecom applications is given inchiding a short classification of the different kind of integrated devices. On the future perspective of IBCN distribution network some economica of hybrid and monolithic forms of integration are attempted. lnaflyashoitpresentationoftheactivitiesperformedintbefieldofmonolithic integration by EEC ESPR1T and RACE projects is reported. 1.
Building a Foundation on Sand: The Demise of Leaders Resulting from Toxic Followership
2016-05-26
Toxic Followers, Prototype Theory, Mission Command, Operational Art , McClellan, Johnston, MacArthur, Powell, 16. SECURITY CLASSIFICATION OF: a. REPORT...structured and more vague problems. Using mission command throughout the Army facilitates leaders practicing operational art .2 Current US joint doctrine...defines operational art as the “use of creative thinking by commanders and staffs to design strategies, campaigns, and major operations and organize
DARPA counter-sniper program: Phase 1 Acoustic Systems Demonstration results
NASA Astrophysics Data System (ADS)
Carapezza, Edward M.; Law, David B.; Csanadi, Christina J.
1997-02-01
During October 1995 through May 1996, the Defense Advanced Research Projects Agency sponsored the development of prototype systems that exploit acoustic muzzle blast and ballistic shock wave signatures to accurately predict the location of gunfire events and associated shooter locations using either single or multiple volumetric arrays. The output of these acoustic systems is an estimate of the shooter location and a classification estimate of the caliber of the shooter's weapon. A portable display and control unit provides both graphical and alphanumeric shooter location related information integrated on a two- dimensional digital map of the defended area. The final Phase I Acoustic Systems Demonstration field tests were completed in May. These these tests were held at USMC Base Camp Pendleton Military Operations Urban Training (MOUT) facility. These tests were structured to provide challenging gunfire related scenarios with significant reverberation and multi-path conditions. Special shot geometries and false alarms were included in these tests to probe potential system vulnerabilities and to determine the performance and robustness of the systems. Five prototypes developed by U.S. companies and one Israeli developed prototype were tested. This analysis quantifies the spatial resolution estimation capability (azimuth, elevation and range) of these prototypes and describes their ability to accurately classify the type of bullet fired in a challenging urban- like setting.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carlson, Thomas J.; Deng, Zhiqun; Myers, Joshua R.
2011-09-30
The Marine Animal Alert System (MAAS) in development by the Pacific Northwest National Laboratory is focused on providing elements of compliance monitoring to support deployment of marine hydrokinetic energy devices. An initial focus is prototype tidal turbines to be deployed in Puget Sound in Washington State. The MAAS will help manage the risk of injury or mortality to marine animals from blade strike or contact with tidal turbines. In particular, development has focused on detection, classification, and localization of listed Southern Resident killer whales within 200 m of prototype turbines using both active and passive acoustic approaches. At the closemore » of FY 2011, a passive acoustic system consisting of a pair of four-element star arrays and parallel processing of eight channels of acoustic receptions has been designed and built. Field tests of the prototype system are scheduled for the fourth quarter of calendar year 2011. Field deployment and testing of the passive acoustic prototype is scheduled for the first quarter of FY 2012. The design of an active acoustic system that could be built using commercially available off-the-shelf components from active acoustic system vendors is also in the final stages of design and specification.« less
A new EEG measure using the 1D cluster variation method
NASA Astrophysics Data System (ADS)
Maren, Alianna J.; Szu, Harold H.
2015-05-01
A new information measure, drawing on the 1-D Cluster Variation Method (CVM), describes local pattern distributions (nearest-neighbor and next-nearest neighbor) in a binary 1-D vector in terms of a single interaction enthalpy parameter h for the specific case where the fractions of elements in each of two states are the same (x1=x2=0.5). An example application of this method would be for EEG interpretation in Brain-Computer Interfaces (BCIs), especially in the frontier of invariant biometrics based on distinctive and invariant individual responses to stimuli containing an image of a person with whom there is a strong affiliative response (e.g., to a person's grandmother). This measure is obtained by mapping EEG observed configuration variables (z1, z2, z3 for next-nearest neighbor triplets) to h using the analytic function giving h in terms of these variables at equilibrium. This mapping results in a small phase space region of resulting h values, which characterizes local pattern distributions in the source data. The 1-D vector with equal fractions of units in each of the two states can be obtained using the method for transforming natural images into a binarized equi-probability ensemble (Saremi & Sejnowski, 2014; Stephens et al., 2013). An intrinsically 2-D data configuration can be mapped to 1-D using the 1-D Peano-Hilbert space-filling curve, which has demonstrated a 20 dB lower baseline using the method compared with other approaches (cf. SPIE ICA etc. by Hsu & Szu, 2014). This CVM-based method has multiple potential applications; one near-term one is optimizing classification of the EEG signals from a COTS 1-D BCI baseball hat. This can result in a convenient 3-D lab-tethered EEG, configured in a 1-D CVM equiprobable binary vector, and potentially useful for Smartphone wireless display. Longer-range applications include interpreting neural assembly activations via high-density implanted soft, cellular-scale electrodes.
Ising lattices with +/-J second-nearest-neighbor interactions
NASA Astrophysics Data System (ADS)
Ramírez-Pastor, A. J.; Nieto, F.; Vogel, E. E.
1997-06-01
Second-nearest-neighbor interactions are added to the usual nearest-neighbor Ising Hamiltonian for square lattices in different ways. The starting point is a square lattice where half the nearest-neighbor interactions are ferromagnetic and the other half of the bonds are antiferromagnetic. Then, second-nearest-neighbor interactions can also be assigned randomly or in a variety of causal manners determined by the nearest-neighbor interactions. In the present paper we consider three causal and three random ways of assigning second-nearest-neighbor exchange interactions. Several ground-state properties are then calculated for each of these lattices:energy per bond ɛg, site correlation parameter pg, maximal magnetization μg, and fraction of unfrustrated bonds hg. A set of 500 samples is considered for each size N (number of spins) and array (way of distributing the N spins). The properties of the original lattices with only nearest-neighbor interactions are already known, which allows realizing the effect of the additional interactions. We also include cubic lattices to discuss the distinction between coordination number and dimensionality. Comparison with results for triangular and honeycomb lattices is done at specific points.
SVM based colon polyps classifier in a wireless active stereo endoscope.
Ayoub, J; Granado, B; Mhanna, Y; Romain, O
2010-01-01
This work focuses on the recognition of three-dimensional colon polyps captured by an active stereo vision sensor. The detection algorithm consists of SVM classifier trained on robust feature descriptors. The study is related to Cyclope, this prototype sensor allows real time 3D object reconstruction and continues to be optimized technically to improve its classification task by differentiation between hyperplastic and adenomatous polyps. Experimental results were encouraging and show correct classification rate of approximately 97%. The work contains detailed statistics about the detection rate and the computing complexity. Inspired by intensity histogram, the work shows a new approach that extracts a set of features based on depth histogram and combines stereo measurement with SVM classifiers to correctly classify benign and malignant polyps.
Fomin, Petr; Zhelondz, Dmitry; Kargel, Christian
2017-05-01
For the production of high-quality parts from recycled plastics, a very high purity of the plastic waste to be recycled is mandatory. The incorporation of fluorescent tracers ("markers") into plastics during the manufacturing process helps overcome typical problems of non-tracer based optical classification methods. Despite the unique emission spectra of fluorescent markers, the classification becomes difficult when the host plastics exhibit (strong) autofluorescence that spectrally overlaps the marker fluorescence. Increasing the marker concentration is not an option from an economic perspective and might also adversely affect the properties of the plastics. A measurement approach that suppresses the autofluorescence in the acquired signal is time-gated fluorescence spectroscopy (TGFS). Unfortunately, TGFS is associated with a lower signal-to-noise (S/N) ratio, which results in larger classification errors. In order to optimize the S/N ratio we investigate and validate the best TGFS parameters-derived from a model for the fluorescence signal-for plastics labeled with four specifically designed fluorescent markers. In this study we also demonstrate the implementation of TGFS on a measurement and classification prototype system and determine its performance. Mean values for a sensitivity of [Formula: see text] = 99.93% and precision [Formula: see text] = 99.80% were achieved, proving that a highly reliable classification of plastics can be achieved in practice.
NASA Astrophysics Data System (ADS)
Ma, Yitao; Miura, Sadahiko; Honjo, Hiroaki; Ikeda, Shoji; Hanyu, Takahiro; Ohno, Hideo; Endoh, Tetsuo
2017-04-01
A high-density nonvolatile associative memory (NV-AM) based on spin transfer torque magnetoresistive random access memory (STT-MRAM), which achieves highly concurrent and ultralow-power nearest neighbor search with full adaptivity of the template data format, has been proposed and fabricated using the 90 nm CMOS/70 nm perpendicular-magnetic-tunnel-junction hybrid process. A truly compact current-mode circuitry is developed to realize flexibly controllable and high-parallel similarity evaluation, which makes the NV-AM adaptable to any dimensionality and component-bit of template data. A compact dual-stage time-domain minimum searching circuit is also developed, which can freely extend the system for more template data by connecting multiple NM-AM cores without additional circuits for integrated processing. Both the embedded STT-MRAM module and the computing circuit modules in this NV-AM chip are synchronously power-gated to completely eliminate standby power and maximally reduce operation power by only activating the currently accessed circuit blocks. The operations of a prototype chip at 40 MHz are demonstrated by measurement. The average operation power is only 130 µW, and the circuit density is less than 11 µm2/bit. Compared with the latest conventional works in both volatile and nonvolatile approaches, more than 31.3% circuit area reductions and 99.2% power improvements are achieved, respectively. Further power performance analyses are discussed, which verify the special superiority of the proposed NV-AM in low-power and large-memory-based VLSIs.
Genetics-Based Classification of Filoviruses Calls for Expanded Sampling of Genomic Sequences
Lauber, Chris; Gorbalenya, Alexander E.
2012-01-01
We have recently developed a computational approach for hierarchical, genome-based classification of viruses of a family (DEmARC). In DEmARC, virus clusters are delimited objectively by devising a universal family-wide threshold on intra-cluster genetic divergence of viruses that is specific for each level of the classification. Here, we apply DEmARC to a set of 56 filoviruses with complete genome sequences and compare the resulting classification to the ICTV taxonomy of the family Filoviridae. We find in total six candidate taxon levels two of which correspond to the species and genus ranks of the family. At these two levels, the six filovirus species and two genera officially recognized by ICTV, as well as a seventh tentative species for Lloviu virus and prototyping a third genus, are reproduced. DEmARC lends the highest possible support for these two as well as the four other levels, implying that the actual number of valid taxon levels remains uncertain and the choice of levels for filovirus species and genera is arbitrary. Based on our experience with other virus families, we conclude that the current sampling of filovirus genomic sequences needs to be considerably expanded in order to resolve these uncertainties in the framework of genetics-based classification. PMID:23170166
Genetics-based classification of filoviruses calls for expanded sampling of genomic sequences.
Lauber, Chris; Gorbalenya, Alexander E
2012-09-01
We have recently developed a computational approach for hierarchical, genome-based classification of viruses of a family (DEmARC). In DEmARC, virus clusters are delimited objectively by devising a universal family-wide threshold on intra-cluster genetic divergence of viruses that is specific for each level of the classification. Here, we apply DEmARC to a set of 56 filoviruses with complete genome sequences and compare the resulting classification to the ICTV taxonomy of the family Filoviridae. We find in total six candidate taxon levels two of which correspond to the species and genus ranks of the family. At these two levels, the six filovirus species and two genera officially recognized by ICTV, as well as a seventh tentative species for Lloviu virus and prototyping a third genus, are reproduced. DEmARC lends the highest possible support for these two as well as the four other levels, implying that the actual number of valid taxon levels remains uncertain and the choice of levels for filovirus species and genera is arbitrary. Based on our experience with other virus families, we conclude that the current sampling of filovirus genomic sequences needs to be considerably expanded in order to resolve these uncertainties in the framework of genetics-based classification.
An embedded implementation based on adaptive filter bank for brain-computer interface systems.
Belwafi, Kais; Romain, Olivier; Gannouni, Sofien; Ghaffari, Fakhreddine; Djemal, Ridha; Ouni, Bouraoui
2018-07-15
Brain-computer interface (BCI) is a new communication pathway for users with neurological deficiencies. The implementation of a BCI system requires complex electroencephalography (EEG) signal processing including filtering, feature extraction and classification algorithms. Most of current BCI systems are implemented on personal computers. Therefore, there is a great interest in implementing BCI on embedded platforms to meet system specifications in terms of time response, cost effectiveness, power consumption, and accuracy. This article presents an embedded-BCI (EBCI) system based on a Stratix-IV field programmable gate array. The proposed system relays on the weighted overlap-add (WOLA) algorithm to perform dynamic filtering of EEG-signals by analyzing the event-related desynchronization/synchronization (ERD/ERS). The EEG-signals are classified, using the linear discriminant analysis algorithm, based on their spatial features. The proposed system performs fast classification within a time delay of 0.430 s/trial, achieving an average accuracy of 76.80% according to an offline approach and 80.25% using our own recording. The estimated power consumption of the prototype is approximately 0.7 W. Results show that the proposed EBCI system reduces the overall classification error rate for the three datasets of the BCI-competition by 5% compared to other similar implementations. Moreover, experiment shows that the proposed system maintains a high accuracy rate with a short processing time, a low power consumption, and a low cost. Performing dynamic filtering of EEG-signals using WOLA increases the recognition rate of ERD/ERS patterns of motor imagery brain activity. This approach allows to develop a complete prototype of a EBCI system that achieves excellent accuracy rates. Copyright © 2018 Elsevier B.V. All rights reserved.
Therapeutic Targeting of Alternative Translation Initiation in Breast Cancer
2009-04-01
investigation within the next 6 months. Cell type specific cancer cell killing of the prototype oncolytic poliovirus , PVS-RIPO, depends on selective...demanded by FDA. 15. SUBJECT TERMS Translation, eIF4E, eIF4G, IRES, Cancer, Poliovirus 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18...genetically recombinant poliovirus . Moreover, my work has laid the groundwork for correlative testing and efficacy studies of a vast array of protein kinase
2013-10-01
collection in underway. 15. SUBJECT TERMS Spinal Cord Injury, Immunogenetics, Chronic pain, Opioids 16. SECURITY CLASSIFICATION OF: 17...prototypic opioid , morphine, is capable of TLR4-mediated proinflammation6-8 . As such, exposure to morphine at the time of injury may result in...fashion to the spinal cord injury and/or to experience inflammation in response to opioid exposure. Critically, this genetic variability may
2014-10-01
collection in underway. 15. SUBJECT TERMS Spinal Cord Injury, Immunogenetics, Chronic pain, Opioids 16. SECURITY CLASSIFICATION OF: 17...The prototypic opioid , morphine, is capable of TLR4-mediated proinflammation6-8. As such, exposure to morphine at the time of injury may result in...proinflammatory fashion to the spinal cord injury, and/or to experience inflammation in response to opioid exposure. Critically, this genetic variability
Machine-learning approaches to select Wolf-Rayet candidates
NASA Astrophysics Data System (ADS)
Marston, A. P.; Morello, G.; Morris, P.; van Dyk, S.; Mauerhan, J.
2017-11-01
The WR stellar population can be distinguished, at least partially, from other stellar populations by broad-band IR colour selection. We present the use of a machine learning classifier to quantitatively improve the selection of Galactic Wolf-Rayet (WR) candidates. These methods are used to separate the other stellar populations which have similar IR colours. We show the results of the classifications obtained by using the 2MASS J, H and K photometric bands, and the Spitzer/IRAC bands at 3.6, 4.5, 5.8 and 8.0μm. The k-Nearest Neighbour method has been used to select Galactic WR candidates for observational follow-up. A few candidates have been spectroscopically observed. Preliminary observations suggest that a detection rate of 50% can easily be achieved.
Iddamalgoda, Lahiru; Das, Partha S; Aponso, Achala; Sundararajan, Vijayaraghava S; Suravajhala, Prashanth; Valadi, Jayaraman K
2016-01-01
Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation.
Turbulent chimeras in large semiconductor laser arrays
Shena, J.; Hizanidis, J.; Kovanis, V.; Tsironis, G. P.
2017-01-01
Semiconductor laser arrays have been investigated experimentally and theoretically from the viewpoint of temporal and spatial coherence for the past forty years. In this work, we are focusing on a rather novel complex collective behavior, namely chimera states, where synchronized clusters of emitters coexist with unsynchronized ones. For the first time, we find such states exist in large diode arrays based on quantum well gain media with nearest-neighbor interactions. The crucial parameters are the evanescent coupling strength and the relative optical frequency detuning between the emitters of the array. By employing a recently proposed figure of merit for classifying chimera states, we provide quantitative and qualitative evidence for the observed dynamics. The corresponding chimeras are identified as turbulent according to the irregular temporal behavior of the classification measure. PMID:28165053
Turbulent chimeras in large semiconductor laser arrays
NASA Astrophysics Data System (ADS)
Shena, J.; Hizanidis, J.; Kovanis, V.; Tsironis, G. P.
2017-02-01
Semiconductor laser arrays have been investigated experimentally and theoretically from the viewpoint of temporal and spatial coherence for the past forty years. In this work, we are focusing on a rather novel complex collective behavior, namely chimera states, where synchronized clusters of emitters coexist with unsynchronized ones. For the first time, we find such states exist in large diode arrays based on quantum well gain media with nearest-neighbor interactions. The crucial parameters are the evanescent coupling strength and the relative optical frequency detuning between the emitters of the array. By employing a recently proposed figure of merit for classifying chimera states, we provide quantitative and qualitative evidence for the observed dynamics. The corresponding chimeras are identified as turbulent according to the irregular temporal behavior of the classification measure.
Syed, Zeeshan; Saeed, Mohammed; Rubinfeld, Ilan
2010-01-01
For many clinical conditions, only a small number of patients experience adverse outcomes. Developing risk stratification algorithms for these conditions typically requires collecting large volumes of data to capture enough positive and negative for training. This process is slow, expensive, and may not be appropriate for new phenomena. In this paper, we explore different anomaly detection approaches to identify high-risk patients as cases that lie in sparse regions of the feature space. We study three broad categories of anomaly detection methods: classification-based, nearest neighbor-based, and clustering-based techniques. When evaluated on data from the National Surgical Quality Improvement Program (NSQIP), these methods were able to successfully identify patients at an elevated risk of mortality and rare morbidities following inpatient surgical procedures. PMID:21347083
Experimental study of digital image processing techniques for LANDSAT data
NASA Technical Reports Server (NTRS)
Rifman, S. S. (Principal Investigator); Allendoerfer, W. B.; Caron, R. H.; Pemberton, L. J.; Mckinnon, D. M.; Polanski, G.; Simon, K. W.
1976-01-01
The author has identified the following significant results. Results are reported for: (1) subscene registration, (2) full scene rectification and registration, (3) resampling techniques, (4) and ground control point (GCP) extraction. Subscenes (354 pixels x 234 lines) were registered to approximately 1/4 pixel accuracy and evaluated by change detection imagery for three cases: (1) bulk data registration, (2) precision correction of a reference subscene using GCP data, and (3) independently precision processed subscenes. Full scene rectification and registration results were evaluated by using a correlation technique to measure registration errors of 0.3 pixel rms thoughout the full scene. Resampling evaluations of nearest neighbor and TRW cubic convolution processed data included change detection imagery and feature classification. Resampled data were also evaluated for an MSS scene containing specular solar reflections.
Özdemir, Merve Erkınay; Telatar, Ziya; Eroğul, Osman; Tunca, Yusuf
2018-05-01
Dysmorphic syndromes have different facial malformations. These malformations are significant to an early diagnosis of dysmorphic syndromes and contain distinctive information for face recognition. In this study we define the certain features of each syndrome by considering facial malformations and classify Fragile X, Hurler, Prader Willi, Down, Wolf Hirschhorn syndromes and healthy groups automatically. The reference points are marked on the face images and ratios between the points' distances are taken into consideration as features. We suggest a neural network based hierarchical decision tree structure in order to classify the syndrome types. We also implement k-nearest neighbor (k-NN) and artificial neural network (ANN) classifiers to compare classification accuracy with our hierarchical decision tree. The classification accuracy is 50, 73 and 86.7% with k-NN, ANN and hierarchical decision tree methods, respectively. Then, the same images are shown to a clinical expert who achieve a recognition rate of 46.7%. We develop an efficient system to recognize different syndrome types automatically in a simple, non-invasive imaging data, which is independent from the patient's age, sex and race at high accuracy. The promising results indicate that our method can be used for pre-diagnosis of the dysmorphic syndromes by clinical experts.
Treelets Binary Feature Retrieval for Fast Keypoint Recognition.
Zhu, Jianke; Wu, Chenxia; Chen, Chun; Cai, Deng
2015-10-01
Fast keypoint recognition is essential to many vision tasks. In contrast to the classification-based approaches, we directly formulate the keypoint recognition as an image patch retrieval problem, which enjoys the merit of finding the matched keypoint and its pose simultaneously. To effectively extract the binary features from each patch surrounding the keypoint, we make use of treelets transform that can group the highly correlated data together and reduce the noise through the local analysis. Treelets is a multiresolution analysis tool, which provides an orthogonal basis to reflect the geometry of the noise-free data. To facilitate the real-world applications, we have proposed two novel approaches. One is the convolutional treelets that capture the image patch information locally and globally while reducing the computational cost. The other is the higher-order treelets that reflect the relationship between the rows and columns within image patch. An efficient sub-signature-based locality sensitive hashing scheme is employed for fast approximate nearest neighbor search in patch retrieval. Experimental evaluations on both synthetic data and the real-world Oxford dataset have shown that our proposed treelets binary feature retrieval methods outperform the state-of-the-art feature descriptors and classification-based approaches.
NASA Astrophysics Data System (ADS)
Khazendar, Shan; Farren, Jessica; Al-Assam, Hisham; Sayasneh, Ahmed; Du, Hongbo; Bourne, Tom; Jassim, Sabah A.
2014-05-01
Ultrasound is an effective multipurpose imaging modality that has been widely used for monitoring and diagnosing early pregnancy events. Technology developments coupled with wide public acceptance has made ultrasound an ideal tool for better understanding and diagnosing of early pregnancy. The first measurable signs of an early pregnancy are the geometric characteristics of the Gestational Sac (GS). Currently, the size of the GS is manually estimated from ultrasound images. The manual measurement involves multiple subjective decisions, in which dimensions are taken in three planes to establish what is known as Mean Sac Diameter (MSD). The manual measurement results in inter- and intra-observer variations, which may lead to difficulties in diagnosis. This paper proposes a fully automated diagnosis solution to accurately identify miscarriage cases in the first trimester of pregnancy based on automatic quantification of the MSD. Our study shows a strong positive correlation between the manual and the automatic MSD estimations. Our experimental results based on a dataset of 68 ultrasound images illustrate the effectiveness of the proposed scheme in identifying early miscarriage cases with classification accuracies comparable with those of domain experts using K nearest neighbor classifier on automatically estimated MSDs.